feat: 改进对象系统

2026-01-01 20:18:18 +08:00
parent 94839c6369
commit 9b32a01a10
39 changed files with 682 additions and 180 deletions
--- a/examples/repo.ipynb
+++ b/examples/repo.ipynb
@@ -0,0 +1,366 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "51b89355",
+   "metadata": {},
+   "source": [
+    "# 演练场\n",
+    "此笔记本将带你了解 repomgr 与 particles 对象相关操作"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c49014",
+   "metadata": {},
+   "source": [
+    "# 从一个例子开始\n",
+    "## 了解文件结构\n",
+    "了解一下文件结构"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5ed9864",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tree # 了解文件结构"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e10922b",
+   "metadata": {},
+   "source": [
+    "如果你先前运行了单元格, 请运行下面一格清理."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9777730e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!rm -rf test_new_repo\n",
+    "!rm -rf heurams.log*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "058c098f",
+   "metadata": {},
+   "source": [
+    "## 导入模块\n",
+    "导入所需模块, 你会看到欢迎信息, 标示了库所使用的配置.  \n",
+    "HeurAMS 在基础设施也使用配置文件实现隐式的依赖注入.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bf1b00c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import heurams.kernel.repolib as repolib # 这是 RepoLib 子模块, 用于管理和结构化 repo(中文含义: 仓库) 数据结构与本地文件间的联系\n",
+    "import heurams.kernel.particles as pt # 这是 Particles(中文含义: 粒子) 子模块, 用于运行时的记忆管理操作\n",
+    "from pathlib import Path # 这是 Python 的 Pathlib 模块, 用于表示文件路径, 在整个项目中, 都使用此模块表示路径"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea1f68bb",
+   "metadata": {},
+   "source": [
+    "## 运行时检查\n",
+    "如你所见, repo 在文件系统内存储为一个文件夹.  \n",
+    "因此在载入之前, 首先要检查这是否是一个合乎标准的 repo 文件夹.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "897b62d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "is_vaild = repolib.Repo.check_repodir(Path(\"./test_repo\"))\n",
+    "print(f\"这是一个 {'合规' if is_vaild else '不合规'} 的 repo!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24a19991",
+   "metadata": {},
+   "source": [
+    "## 加载仓库\n",
+    "接下来, 正式加载 repo."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "708ae7e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_repo = repolib.Repo.create_from_repodir(Path(\"./test_repo\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "474f8eb7",
+   "metadata": {},
+   "source": [
+    "## 导出为字典\n",
+    "作为一个数据容器, repo 相应地建立了导入和导出的功能.  \n",
+    "我们刚刚从本地文件夹导入了一个 repo.  \n",
+    "现在试试导出为一个字典."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a11115fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_repo_dic = test_repo.export_to_single_dict()\n",
+    "from pprint import pprint\n",
+    "pprint(test_repo_dic)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35a2e06f",
+   "metadata": {},
+   "source": [
+    "## 持久化与部分保存\n",
+    "如你所见, 所有内容被结构化地输出了!  \n",
+    "\n",
+    "现在写回到文件夹!  \n",
+    "\n",
+    "我们注意到, 并非所有的内容都要被修改.  \n",
+    "我们可以只保存接受修改的一部分, 默认情况下, 是迭代的记忆数据(algodata).  \n",
+    "这就是为什么我们一般不使用单个 json 或 toml 来存储 repo.\n",
+    "\n",
+    "persist_to_repodir 接受两个可选参数: \n",
+    "- save_list: 默认为 [\"algodata\"], 是要持久化的数据.\n",
+    "- source: 默认为原目录, 你也可以手动指定为其他文件夹(通过 Path)\n",
+    "\n",
+    "现在做一些演练, 我们将创建一个位于 test_new_repo 的\"克隆\", 此时我们!\n",
+    "除非文件夹已经存在, Repo 对象将会为你自动创建新文件夹."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "05eeaacc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "test_repo.persist_to_repodir(save_list=[\"schedule\", \"payload\", \"manifest\", \"typedef\", \"algodata\"], source=Path(\"test_new_repo\"))\n",
+    "!tree"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "059d7bdf",
+   "metadata": {},
+   "source": [
+    "如你所见, test_new_repo 已被生成!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef8925c",
+   "metadata": {},
+   "source": [
+    "# 数据结构\n",
+    "现在讲解 repo 的数据结构"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c19fed95",
+   "metadata": {},
+   "source": [
+    "## Lict 对象\n",
+    "Lict 对象集成了部分列表和字典的功能, 数据在这两种风格的 API 间都可用, 且修改是同步的.  \n",
+    "Lict 默认情况下不会保存序列顺序, 而是在列表形式下, 自动按索引字符序排布, 详情请参阅源代码.  \n",
+    "现在导入并初始化一个 Lict 对象:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e88bd7c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from heurams.utils.lict import Lict\n",
+    "lct = Lict() # 空的\n",
+    "lct = Lict(initlist=[(\"name\", \"tom\"), (\"age\", 12), (\"enemy\", \"jerry\")]) # 基于列表\n",
+    "print(lct)\n",
+    "lct = Lict(initdict={\"name\": \"tom\", \"age\": 12, \"enemy\": \"jerry\"}) # 基于字典\n",
+    "print(lct)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d760bf9",
+   "metadata": {},
+   "source": [
+    "### 输出形式\n",
+    "lct 的\"官方\"输出形式是列表形式\n",
+    "你也可以选择输出字典形式"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "248f6cba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(lct.dicted_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29dce184",
+   "metadata": {},
+   "source": [
+    "### dicted_data 属性与修改方式\n",
+    "dicted_data 属性是一个字典, 它自动同步来自 Lict 对象操作的修改.\n",
+    "一个注意事项: 不要直接修改 dicted_data, 这将不会触发同步 hook.\n",
+    "如果你一定要这样做, 请在完事后手动运行同步 hook.\n",
+    "推荐的修改方式是直接把 lct 当作一个字典"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a0eb07a7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 由于 jupyter 的环境处理, 请不要重复运行此单元格, 如果想再看一遍, 请重启 jupyter 后再全部运行\n",
+    "\n",
+    "# 错误的方式\n",
+    "lct.dicted_data[\"type\"] = \"cat\"\n",
+    "print(lct) # 将不会同步修改\n",
+    "\n",
+    "# 不推荐, 但可用的方式\n",
+    "lct.dicted_data[\"type\"] = \"cat\"\n",
+    "lct._sync_based_on_dict()\n",
+    "print(lct)\n",
+    "\n",
+    "# 推荐方式\n",
+    "lct['is_human'] = False\n",
+    "print(lct)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2337d113",
+   "metadata": {},
+   "source": [
+    "### data 属性与修改方式\n",
+    "data 属性是一个列表, 它自动同步来自 Lict 对象操作的修改.\n",
+    "一个注意事项: 不要直接修改 data, 这将不会触发同步 hook, 并且可能破坏排序.\n",
+    "如果你一定要这样做, 请在完事后手动运行同步 hook 和 sort, 此处不演示.\n",
+    "推荐的修改方式是直接把 lct 当作一个列表, 且避免使用索引修改"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ab442d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 由于 jupyter 的环境处理, 请不要重复运行此单元格, 如果想再看一遍, 请重启 jupyter 后再全部运行\n",
+    "\n",
+    "# 唯一推荐方式\n",
+    "lct.append(('enemy_2', 'spike'))\n",
+    "print(lct.dicted_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3383f59",
+   "metadata": {},
+   "source": [
+    "### 多面手\n",
+    "Lict 有一些很酷的功能\n",
+    "详情请看源文件\n",
+    "此处是一些例子"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3ca752f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lct = Lict(initdict={'age': 12, 'enemy': 'jerry', 'is_human': False, 'name': 'tom', 'type': 'cat', 'enemy_2': 'spike'})\n",
+    "print(lct)\n",
+    "print(lct.dicted_data)\n",
+    "print(\"------\")\n",
+    "for i in lct:\n",
+    "    print(i)\n",
+    "print(len(lct))\n",
+    "while len(lct) > 0:\n",
+    "    print(lct.pop())\n",
+    "    print(lct)\n",
+    "lct = Lict(initdict={'age': 12, 'enemy': 'jerry', 'is_human': False, 'name': 'tom', 'type': 'cat', 'enemy_2': 'spike'})\n",
+    "..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d6d3483",
+   "metadata": {},
+   "source": [
+    "关爱环境 从你我做起"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "773bf99c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!rm -rf test_new_repo\n",
+    "!rm -rf heurams.log*"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/examples/test_repo/algodata.json
+++ b/examples/test_repo/algodata.json
@@ -0,0 +1 @@
+{}
--- a/examples/test_repo/manifest.toml
+++ b/examples/test_repo/manifest.toml
@@ -0,0 +1,3 @@
+title = "测试单元: 过秦论"
+author = "__heurams__"
+desc = "高考古诗文: 过秦论"
--- a/examples/test_repo/payload.toml
+++ b/examples/test_repo/payload.toml
@@ -0,0 +1,11 @@
+["秦孝公据崤函之固, 拥雍州之地,"]
+note = []
+content = "秦孝公/据/崤函/之固/, 拥/雍州/之地,/"
+translation = "秦孝公占据着崤山和函谷关的险固地势，拥有雍州的土地，"
+keyword_note = {"据"="占据", "崤函"="崤山和函谷关", "雍州"="古代九州之一"}
+
+["君臣固守以窥周室,"]
+note = []
+content = "君臣/固守/以窥/周室,/"
+translation = "君臣牢固地守卫着，借以窥视周王室的权力，"
+keyword_note = {"窥"="窥视"}
--- a/examples/test_repo/schedule.toml
+++ b/examples/test_repo/schedule.toml
@@ -0,0 +1,5 @@
+schedule = ["quick_review", "recognition", "final_review"]
+[phases]
+quick_review = [["FillBlank", "1.0"], ["SelectMeaning", "0.5"], ["Recognition", "1.0"]]
+recognition = [["Recognition", "1.0"]]
+final_review = [["FillBlank", "0.7"], ["SelectMeaning", "0.7"], ["Recognition", "1.0"]]
--- a/examples/test_repo/typedef.toml
+++ b/examples/test_repo/typedef.toml
@@ -0,0 +1,19 @@
+["古文句"]
+
+[annotation]
+note = "笔记"
+keyword_note = "关键词翻译"
+translation = "语句翻译"
+delimiter = "分隔符"
+content = "内容"
+tts_text = "文本转语音文本"
+
+["common"]
+delimiter = "/"
+tts_text = "eval:payload['content'].replace('/', '')"
+
+["puzzles"] # 谜题定义
+# 我们称 "Recognition" 为 recognition 谜题的 alia
+"Recognition" = { __origin__ = "recognition", __hint__ = "", primary = "eval:payload['content']", secondary = ["eval:payload['keyword_note']", "eval:payload['note']"], top_dim = ["eval:payload['translation']"] }
+"SelectMeaning" = { __origin__ = "mcq", __hint__ = "eval:payload['content']", primary = "eval:payload['content']", mapping = "eval:payload['keyword_note']", jammer = "eval:list(payload['keyword_note'].values())", max_riddles_num = "eval:default['mcq']['max_riddles_num']", prefix = "选择正确项: " }
+"FillBlank" = { __origin__ = "cloze", __hint__ = "", text = "eval:payload['content']", delimiter = "eval:metadata['formation']['delimiter']", min_denominator = "eval:default['cloze']['min_denominator']"}