{ "cells": [ { "cell_type": "markdown", "id": "51b89355", "metadata": {}, "source": [ "# 演练场\n", "此笔记本将带你了解 repomgr 与 particles 对象相关操作" ] }, { "cell_type": "markdown", "id": "f5c49014", "metadata": {}, "source": [ "# 从一个例子开始\n", "## 了解文件结构\n", "了解一下文件结构" ] }, { "cell_type": "code", "execution_count": 1, "id": "a5ed9864", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34m.\u001b[0m\n", "├── \u001b[00mrepo.ipynb\u001b[0m\n", "└── \u001b[01;34mtest_repo\u001b[0m\n", " ├── \u001b[00malgodata.json\u001b[0m\n", " ├── \u001b[00mmanifest.toml\u001b[0m\n", " ├── \u001b[00mpayload.toml\u001b[0m\n", " ├── \u001b[00mschedule.toml\u001b[0m\n", " └── \u001b[00mtypedef.toml\u001b[0m\n", "\n", "2 directories, 6 files\n" ] } ], "source": [ "!tree # 了解文件结构" ] }, { "cell_type": "markdown", "id": "4e10922b", "metadata": {}, "source": [ "如果你先前运行了单元格, 请运行下面一格清理." ] }, { "cell_type": "code", "execution_count": 2, "id": "9777730e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zsh:1: no matches found: heurams.log*\n" ] } ], "source": [ "!rm -rf test_new_repo\n", "!rm -rf heurams.log*" ] }, { "cell_type": "markdown", "id": "058c098f", "metadata": {}, "source": [ "## 导入模块\n", "导入所需模块, 你会看到欢迎信息, 标示了库所使用的配置. \n", "HeurAMS 在基础设施也使用配置文件实现隐式的依赖注入. " ] }, { "cell_type": "code", "execution_count": 3, "id": "bf1b00c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "欢迎使用 HeurAMS 及其组件!\n", "rootdir: /mnt/data/Devel/HeurAMS/HeurAMS/src/heurams\n", "workdir: /mnt/data/Devel/HeurAMS/HeurAMS/examples\n", "未能加载自定义用户配置\n" ] } ], "source": [ "import heurams.kernel.repolib as repolib # 这是 RepoLib 子模块, 用于管理和结构化 repo(中文含义: 仓库) 数据结构与本地文件间的联系\n", "import heurams.kernel.particles as pt # 这是 Particles(中文含义: 粒子) 子模块, 用于运行时的记忆管理操作\n", "from pathlib import (\n", " Path,\n", ") # 这是 Python 的 Pathlib 模块, 用于表示文件路径, 在整个项目中, 都使用此模块表示路径" ] }, { "cell_type": "markdown", "id": "ea1f68bb", "metadata": {}, "source": [ "## 运行时检查\n", "如你所见, repo 在文件系统内存储为一个文件夹. \n", "因此在载入之前, 首先要检查这是否是一个合乎标准的 repo 文件夹. " ] }, { "cell_type": "code", "execution_count": 4, "id": "897b62d7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "这是一个 合规 的 repo!\n" ] } ], "source": [ "is_vaild = repolib.Repo.check_repodir(Path(\"./test_repo\"))\n", "print(f\"这是一个 {'合规' if is_vaild else '不合规'} 的 repo!\")" ] }, { "cell_type": "markdown", "id": "24a19991", "metadata": {}, "source": [ "## 加载仓库\n", "接下来, 正式加载 repo." ] }, { "cell_type": "code", "execution_count": 5, "id": "708ae7e4", "metadata": {}, "outputs": [], "source": [ "test_repo = repolib.Repo.create_from_repodir(Path(\"./test_repo\"))" ] }, { "cell_type": "markdown", "id": "474f8eb7", "metadata": {}, "source": [ "## 导出为字典\n", "作为一个数据容器, repo 相应地建立了导入和导出的功能. \n", "我们刚刚从本地文件夹导入了一个 repo. \n", "现在试试导出为一个字典." ] }, { "cell_type": "code", "execution_count": 6, "id": "a11115fb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'algodata': [('君臣固守以窥周室,', {}), ('秦孝公据崤函之固, 拥雍州之地,', {})],\n", " 'manifest': {'author': '__heurams__',\n", " 'desc': '高考古诗文: 过秦论',\n", " 'title': '测试单元: 过秦论'},\n", " 'payload': [('君臣固守以窥周室,',\n", " {'content': '君臣/固守/以窥/周室,/',\n", " 'keyword_note': {'窥': '窥视'},\n", " 'note': [],\n", " 'translation': '君臣牢固地守卫着,借以窥视周王室的权力,'}),\n", " ('秦孝公据崤函之固, 拥雍州之地,',\n", " {'content': '秦孝公/据/崤函/之固/, 拥/雍州/之地,/',\n", " 'keyword_note': {'崤函': '崤山和函谷关', '据': '占据', '雍州': '古代九州之一'},\n", " 'note': [],\n", " 'translation': '秦孝公占据着崤山和函谷关的险固地势,拥有雍州的土地,'})],\n", " 'schedule': {'phases': {'final_review': [['FillBlank', '0.7'],\n", " ['SelectMeaning', '0.7'],\n", " ['Recognition', '1.0']],\n", " 'quick_review': [['FillBlank', '1.0'],\n", " ['SelectMeaning', '0.5'],\n", " ['Recognition', '1.0']],\n", " 'recognition': [['Recognition', '1.0']]},\n", " 'schedule': ['quick_review', 'recognition', 'final_review']},\n", " 'source': PosixPath('test_repo'),\n", " 'typedef': {'annotation': {'content': '内容',\n", " 'delimiter': '分隔符',\n", " 'keyword_note': '关键词翻译',\n", " 'note': '笔记',\n", " 'translation': '语句翻译',\n", " 'tts_text': '文本转语音文本'},\n", " 'common': {'delimiter': '/',\n", " 'tts_text': \"eval:payload['content'].replace('/', '')\"},\n", " 'puzzles': {'FillBlank': {'__hint__': '',\n", " '__origin__': 'cloze',\n", " 'delimiter': \"eval:metadata['formation']['delimiter']\",\n", " 'min_denominator': \"eval:default['cloze']['min_denominator']\",\n", " 'text': \"eval:payload['content']\"},\n", " 'Recognition': {'__hint__': '',\n", " '__origin__': 'recognition',\n", " 'primary': \"eval:payload['content']\",\n", " 'secondary': [\"eval:payload['keyword_note']\",\n", " \"eval:payload['note']\"],\n", " 'top_dim': [\"eval:payload['translation']\"]},\n", " 'SelectMeaning': {'__hint__': \"eval:payload['content']\",\n", " '__origin__': 'mcq',\n", " 'jammer': \"eval:list(payload['keyword_note'].values())\",\n", " 'mapping': \"eval:payload['keyword_note']\",\n", " 'max_riddles_num': \"eval:default['mcq']['max_riddles_num']\",\n", " 'prefix': '选择正确项: ',\n", " 'primary': \"eval:payload['content']\"}},\n", " '古文句': {}}}\n" ] } ], "source": [ "test_repo_dic = test_repo.export_to_single_dict()\n", "from pprint import pprint\n", "\n", "pprint(test_repo_dic)" ] }, { "cell_type": "markdown", "id": "35a2e06f", "metadata": {}, "source": [ "## 持久化与部分保存\n", "如你所见, 所有内容被结构化地输出了! \n", "\n", "现在写回到文件夹! \n", "\n", "我们注意到, 并非所有的内容都要被修改. \n", "我们可以只保存接受修改的一部分, 默认情况下, 是迭代的记忆数据(algodata). \n", "这就是为什么我们一般不使用单个 json 或 toml 来存储 repo.\n", "\n", "persist_to_repodir 接受两个可选参数: \n", "- save_list: 默认为 [\"algodata\"], 是要持久化的数据.\n", "- source: 默认为原目录, 你也可以手动指定为其他文件夹(通过 Path)\n", "\n", "现在做一些演练, 我们将创建一个位于 test_new_repo 的\"克隆\", 此时我们!\n", "除非文件夹已经存在, Repo 对象将会为你自动创建新文件夹." ] }, { "cell_type": "code", "execution_count": 7, "id": "05eeaacc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34m.\u001b[0m\n", "├── \u001b[00mheurams.log\u001b[0m\n", "├── \u001b[00mrepo.ipynb\u001b[0m\n", "├── \u001b[01;34mtest_new_repo\u001b[0m\n", "│   ├── \u001b[00malgodata.json\u001b[0m\n", "│   ├── \u001b[00mmanifest.toml\u001b[0m\n", "│   ├── \u001b[00mpayload.toml\u001b[0m\n", "│   ├── \u001b[00mschedule.toml\u001b[0m\n", "│   └── \u001b[00mtypedef.toml\u001b[0m\n", "└── \u001b[01;34mtest_repo\u001b[0m\n", " ├── \u001b[00malgodata.json\u001b[0m\n", " ├── \u001b[00mmanifest.toml\u001b[0m\n", " ├── \u001b[00mpayload.toml\u001b[0m\n", " ├── \u001b[00mschedule.toml\u001b[0m\n", " └── \u001b[00mtypedef.toml\u001b[0m\n", "\n", "3 directories, 12 files\n" ] } ], "source": [ "test_repo.persist_to_repodir(\n", " save_list=[\"schedule\", \"payload\", \"manifest\", \"typedef\", \"algodata\"],\n", " source=Path(\"test_new_repo\"),\n", ")\n", "!tree" ] }, { "cell_type": "markdown", "id": "059d7bdf", "metadata": {}, "source": [ "如你所见, test_new_repo 已被生成!" ] }, { "cell_type": "markdown", "id": "4ef8925c", "metadata": {}, "source": [ "# 数据结构\n", "现在讲解 repo 的数据结构" ] }, { "cell_type": "markdown", "id": "c19fed95", "metadata": {}, "source": [ "## Lict 对象\n", "Lict 对象集成了部分列表和字典的功能, 数据在这两种风格的 API 间都可用, 且修改是同步的. \n", "Lict 默认情况下不会保存序列顺序, 而是在列表形式下, 自动按索引字符序排布, 详情请参阅源代码. \n", "现在导入并初始化一个 Lict 对象:" ] }, { "cell_type": "code", "execution_count": 8, "id": "7e88bd7c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('age', 12), ('enemy', 'jerry'), ('name', 'tom')]\n", "[('age', 12), ('enemy', 'jerry'), ('name', 'tom')]\n" ] } ], "source": [ "from heurams.utils.lict import Lict\n", "\n", "lct = Lict() # 空的\n", "lct = Lict(initlist=[(\"name\", \"tom\"), (\"age\", 12), (\"enemy\", \"jerry\")]) # 基于列表\n", "print(lct)\n", "lct = Lict(initdict={\"name\": \"tom\", \"age\": 12, \"enemy\": \"jerry\"}) # 基于字典\n", "print(lct)" ] }, { "cell_type": "markdown", "id": "4d760bf9", "metadata": {}, "source": [ "### 输出形式\n", "lct 的\"官方\"输出形式是列表形式\n", "你也可以选择输出字典形式" ] }, { "cell_type": "code", "execution_count": 9, "id": "248f6cba", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'name': 'tom', 'age': 12, 'enemy': 'jerry'}\n" ] } ], "source": [ "print(lct.dicted_data)" ] }, { "cell_type": "markdown", "id": "29dce184", "metadata": {}, "source": [ "### dicted_data 属性与修改方式\n", "dicted_data 属性是一个字典, 它自动同步来自 Lict 对象操作的修改.\n", "一个注意事项: 不要直接修改 dicted_data, 这将不会触发同步 hook.\n", "如果你一定要这样做, 请在完事后手动运行同步 hook.\n", "推荐的修改方式是直接把 lct 当作一个字典" ] }, { "cell_type": "code", "execution_count": 10, "id": "a0eb07a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('age', 12), ('enemy', 'jerry'), ('name', 'tom')]\n", "[('age', 12), ('enemy', 'jerry'), ('name', 'tom'), ('type', 'cat')]\n", "[('age', 12), ('enemy', 'jerry'), ('is_human', False), ('name', 'tom'), ('type', 'cat')]\n" ] } ], "source": [ "# 由于 jupyter 的环境处理, 请不要重复运行此单元格, 如果想再看一遍, 请重启 jupyter 后再全部运行\n", "\n", "# 错误的方式\n", "lct.dicted_data[\"type\"] = \"cat\"\n", "print(lct) # 将不会同步修改\n", "\n", "# 不推荐, 但可用的方式\n", "lct.dicted_data[\"type\"] = \"cat\"\n", "lct._sync_based_on_dict()\n", "print(lct)\n", "\n", "# 推荐方式\n", "lct[\"is_human\"] = False\n", "print(lct)" ] }, { "cell_type": "markdown", "id": "2337d113", "metadata": {}, "source": [ "### data 属性与修改方式\n", "data 属性是一个列表, 它自动同步来自 Lict 对象操作的修改.\n", "一个注意事项: 不要直接修改 data, 这将不会触发同步 hook, 并且可能破坏排序.\n", "如果你一定要这样做, 请在完事后手动运行同步 hook 和 sort, 此处不演示.\n", "推荐的修改方式是直接把 lct 当作一个列表, 且避免使用索引修改" ] }, { "cell_type": "code", "execution_count": 11, "id": "0ab442d4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'age': 12, 'enemy': 'jerry', 'is_human': False, 'name': 'tom', 'type': 'cat', 'enemy_2': 'spike'}\n" ] } ], "source": [ "# 由于 jupyter 的环境处理, 请不要重复运行此单元格, 如果想再看一遍, 请重启 jupyter 后再全部运行\n", "\n", "# 唯一推荐方式\n", "lct.append((\"enemy_2\", \"spike\"))\n", "print(lct.dicted_data)" ] }, { "cell_type": "markdown", "id": "a3383f59", "metadata": {}, "source": [ "### 多面手\n", "Lict 有一些很酷的功能\n", "详情请看源文件\n", "此处是一些例子" ] }, { "cell_type": "code", "execution_count": 12, "id": "f3ca752f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('age', 12), ('enemy', 'jerry'), ('enemy_2', 'spike'), ('is_human', False), ('name', 'tom'), ('type', 'cat')]\n", "{'age': 12, 'enemy': 'jerry', 'is_human': False, 'name': 'tom', 'type': 'cat', 'enemy_2': 'spike'}\n", "------\n", "('age', 12)\n", "('enemy', 'jerry')\n", "('enemy_2', 'spike')\n", "('is_human', False)\n", "('name', 'tom')\n", "('type', 'cat')\n", "6\n", "('type', 'cat')\n", "[('age', 12), ('enemy', 'jerry'), ('enemy_2', 'spike'), ('is_human', False), ('name', 'tom')]\n", "('name', 'tom')\n", "[('age', 12), ('enemy', 'jerry'), ('enemy_2', 'spike'), ('is_human', False)]\n", "('is_human', False)\n", "[('age', 12), ('enemy', 'jerry'), ('enemy_2', 'spike')]\n", "('enemy_2', 'spike')\n", "[('age', 12), ('enemy', 'jerry')]\n", "('enemy', 'jerry')\n", "[('age', 12)]\n", "('age', 12)\n", "[]\n" ] }, { "data": { "text/plain": [ "Ellipsis" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lct = Lict(\n", " initdict={\n", " \"age\": 12,\n", " \"enemy\": \"jerry\",\n", " \"is_human\": False,\n", " \"name\": \"tom\",\n", " \"type\": \"cat\",\n", " \"enemy_2\": \"spike\",\n", " }\n", ")\n", "print(lct)\n", "print(lct.dicted_data)\n", "print(\"------\")\n", "for i in lct:\n", " print(i)\n", "print(len(lct))\n", "while len(lct) > 0:\n", " print(lct.pop())\n", " print(lct)\n", "lct = Lict(\n", " initdict={\n", " \"age\": 12,\n", " \"enemy\": \"jerry\",\n", " \"is_human\": False,\n", " \"name\": \"tom\",\n", " \"type\": \"cat\",\n", " \"enemy_2\": \"spike\",\n", " }\n", ")\n", "..." ] }, { "cell_type": "markdown", "id": "2d6d3483", "metadata": {}, "source": [ "关爱环境 从你我做起" ] }, { "cell_type": "code", "execution_count": 13, "id": "773bf99c", "metadata": {}, "outputs": [], "source": [ "!rm -rf test_new_repo\n", "!rm -rf heurams.log*" ] }, { "cell_type": "code", "execution_count": 14, "id": "8645c5a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{ 'content': '君臣/固守/以窥/周室,/',\n", " 'delimiter': '/',\n", " 'keyword_note': {'窥': '窥视'},\n", " 'note': [],\n", " 'translation': '君臣牢固地守卫着,借以窥视周王室的权力,',\n", " 'tts_text': '君臣固守以窥周室,'}\n", "{ 'SM-2': { 'efactor': 2.5,\n", " 'interval': 1,\n", " 'is_activated': 1,\n", " 'last_date': 20454,\n", " 'last_modify': 1767274438.752494,\n", " 'next_date': 20455,\n", " 'real_rept': 1,\n", " 'rept': 0}}\n", "{ 'content': '秦孝公/据/崤函/之固/, 拥/雍州/之地,/',\n", " 'delimiter': '/',\n", " 'keyword_note': {'崤函': '崤山和函谷关', '据': '占据', '雍州': '古代九州之一'},\n", " 'note': [],\n", " 'translation': '秦孝公占据着崤山和函谷关的险固地势,拥有雍州的土地,',\n", " 'tts_text': '秦孝公据崤函之固, 拥雍州之地,'}\n", "{ 'SM-2': { 'efactor': 2.5,\n", " 'interval': 1,\n", " 'is_activated': 1,\n", " 'last_date': 20454,\n", " 'last_modify': 1767274438.7534873,\n", " 'next_date': 20455,\n", " 'real_rept': 1,\n", " 'rept': 0}}\n", "{ 'algodata': [ ( '君臣固守以窥周室,',\n", " { 'SM-2': { 'efactor': 2.5,\n", " 'interval': 1,\n", " 'is_activated': 1,\n", " 'last_date': 20454,\n", " 'last_modify': 1767274438.752494,\n", " 'next_date': 20455,\n", " 'real_rept': 1,\n", " 'rept': 0}}),\n", " ( '秦孝公据崤函之固, 拥雍州之地,',\n", " { 'SM-2': { 'efactor': 2.5,\n", " 'interval': 1,\n", " 'is_activated': 1,\n", " 'last_date': 20454,\n", " 'last_modify': 1767274438.7534873,\n", " 'next_date': 20455,\n", " 'real_rept': 1,\n", " 'rept': 0}})],\n", " 'manifest': { 'author': '__heurams__',\n", " 'desc': '高考古诗文: 过秦论',\n", " 'title': '测试单元: 过秦论'},\n", " 'payload': [ ( '君臣固守以窥周室,',\n", " { 'content': '君臣/固守/以窥/周室,/',\n", " 'keyword_note': {'窥': '窥视'},\n", " 'note': [],\n", " 'translation': '君臣牢固地守卫着,借以窥视周王室的权力,'}),\n", " ( '秦孝公据崤函之固, 拥雍州之地,',\n", " { 'content': '秦孝公/据/崤函/之固/, 拥/雍州/之地,/',\n", " 'keyword_note': { '崤函': '崤山和函谷关',\n", " '据': '占据',\n", " '雍州': '古代九州之一'},\n", " 'note': [],\n", " 'translation': '秦孝公占据着崤山和函谷关的险固地势,拥有雍州的土地,'})],\n", " 'schedule': { 'phases': { 'final_review': [ ['FillBlank', '0.7'],\n", " ['SelectMeaning', '0.7'],\n", " ['Recognition', '1.0']],\n", " 'quick_review': [ ['FillBlank', '1.0'],\n", " ['SelectMeaning', '0.5'],\n", " ['Recognition', '1.0']],\n", " 'recognition': [['Recognition', '1.0']]},\n", " 'schedule': [ 'quick_review',\n", " 'recognition',\n", " 'final_review']},\n", " 'source': PosixPath('test_repo'),\n", " 'typedef': { 'annotation': { 'content': '内容',\n", " 'delimiter': '分隔符',\n", " 'keyword_note': '关键词翻译',\n", " 'note': '笔记',\n", " 'translation': '语句翻译',\n", " 'tts_text': '文本转语音文本'},\n", " 'common': { 'delimiter': '/',\n", " 'tts_text': \"eval:payload['content'].replace('/', \"\n", " \"'')\"},\n", " 'puzzles': { 'FillBlank': { '__hint__': '',\n", " '__origin__': 'cloze',\n", " 'delimiter': \"eval:metadata['formation']['delimiter']\",\n", " 'min_denominator': \"eval:default['cloze']['min_denominator']\",\n", " 'text': \"eval:payload['content']\"},\n", " 'Recognition': { '__hint__': '',\n", " '__origin__': 'recognition',\n", " 'primary': \"eval:payload['content']\",\n", " 'secondary': [ \"eval:payload['keyword_note']\",\n", " \"eval:payload['note']\"],\n", " 'top_dim': [ \"eval:payload['translation']\"]},\n", " 'SelectMeaning': { '__hint__': \"eval:payload['content']\",\n", " '__origin__': 'mcq',\n", " 'jammer': \"eval:list(payload['keyword_note'].values())\",\n", " 'mapping': \"eval:payload['keyword_note']\",\n", " 'max_riddles_num': \"eval:default['mcq']['max_riddles_num']\",\n", " 'prefix': '选择正确项: ',\n", " 'primary': \"eval:payload['content']\"}},\n", " '古文句': {}}}\n" ] } ], "source": [ "repo = repolib.Repo.create_from_repodir(Path(\"./test_repo\"))\n", "for i in repo.ident_index:\n", " n = pt.Nucleon.create_on_nucleonic_data(\n", " nucleonic_data=repo.nucleonic_data_lict.get_itemic_unit(i)\n", " )\n", " e = pt.Electron.create_on_electonic_data(\n", " electronic_data=repo.electronic_data_lict.get_itemic_unit(i)\n", " )\n", " e.activate()\n", " e.revisor(5, True)\n", " print(repr(n))\n", " print(repr(e))\n", "print(repo)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.11" } }, "nbformat": 4, "nbformat_minor": 5 }