Преглед изворни кода

refactor: v5.1 hybrid storage architecture - SQLite for large data

Solve state.json token explosion problem after 20 chapters.

Changes:
- Move entities/aliases/state_changes/relationships to SQLite (index.db)
- Keep state.json slim (<5KB): progress, protagonist_state, strand_tracker
- Add SQLStateManager for high-level SQLite operations
- Add migrate_state_to_sqlite.py for existing projects
- Update Context Agent to use SQL on-demand queries
- Update Data Agent to use SQLite incremental writes
lingfengQAQ пре 5 месеци
родитељ
комит
e7cd24fa96

+ 46 - 13
.claude/agents/context-agent.md

@@ -1,15 +1,21 @@
 ---
 ---
 name: context-agent
 name: context-agent
-description: 智能上下文搜集Agent,为章节写作准备完整的上下文包。在写作前自动调用,负责读取大纲、状态、索引、RAG检索、设定集,并智能筛选组装上下文。
+description: 智能上下文搜集Agent (v5.1),为章节写作准备完整的上下文包。在写作前自动调用,负责读取大纲、状态、索引、RAG检索、设定集,并智能筛选组装上下文。支持 SQL 按需查询优化。
 tools: Read, Grep, Bash
 tools: Read, Grep, Bash
 ---
 ---
 
 
-# context-agent (上下文搜集Agent)
+# context-agent (上下文搜集Agent v5.1)
 
 
 > **Role**: 智能上下文工程师,负责为章节写作准备精准、完整的上下文信息包。
 > **Role**: 智能上下文工程师,负责为章节写作准备精准、完整的上下文信息包。
 >
 >
 > **Philosophy**: 按需召回,智能筛选 - 不是堆砌信息,而是提供写作真正需要的上下文。
 > **Philosophy**: 按需召回,智能筛选 - 不是堆砌信息,而是提供写作真正需要的上下文。
 
 
+**v5.1 变更**:
+- 使用 SQL 按需查询替代全量读取 state.json
+- 核心实体(主角 + tier=核心/重要)全量加载
+- 其他实体按需从 index.db 查询
+- 减少 token 消耗,提升响应速度
+
 ## 输入
 ## 输入
 
 
 ```json
 ```json
@@ -22,9 +28,9 @@ tools: Read, Grep, Bash
 ```
 ```
 
 
 **重要**: 所有数据读取自 `{project_root}/.webnovel/` 目录:
 **重要**: 所有数据读取自 `{project_root}/.webnovel/` 目录:
-- state.json → `{project_root}/.webnovel/state.json`
-- vectors.db → `{project_root}/.webnovel/vectors.db`
-- index.db → `{project_root}/.webnovel/index.db`
+- state.json → `{project_root}/.webnovel/state.json` (精简版,只含进度和配置)
+- index.db → `{project_root}/.webnovel/index.db` (实体、别名、关系、状态变化)
+- vectors.db → `{project_root}/.webnovel/vectors.db` (RAG 向量)
 
 
 ## 输出
 ## 输出
 
 
@@ -99,17 +105,34 @@ tools: Read, Grep, Bash
 - 发生在什么地点?
 - 发生在什么地点?
 - 是否涉及战斗/突破/重要对话?
 - 是否涉及战斗/突破/重要对话?
 
 
-### Step 2: 获取主角状态
+### Step 2: 获取主角状态 (v5.1 SQL 查询)
+
+**v5.1 优化**: 使用 SQL 查询替代全量读取 state.json
+
+```bash
+# 获取主角实体
+python -m data_modules.index_manager get-protagonist --project-root "."
+
+# 获取核心实体(主角 + tier=核心/重要)
+python -m data_modules.index_manager get-core-entities --project-root "."
+
+# 获取最近状态变化
+python -m data_modules.index_manager get-state-changes --entity "xiaoyan" --limit 10 --project-root "."
 
 
-使用 Read 工具读取 `.webnovel/state.json`,提取:
+# 获取实体关系
+python -m data_modules.index_manager get-relationships --entity "xiaoyan" --project-root "."
+```
+
+**读取精简版 state.json** (使用 Read 工具):
 - `progress.current_chapter` - 进度
 - `progress.current_chapter` - 进度
-- `entities_v3.角色` - 主角实体属性 (境界/位置/物品)
-- `relationships` - 重要关系
-- `state_changes` - 最近变化记录
-- `disambiguation_warnings` - 消歧警告 (0.5-0.8)
-- `disambiguation_pending` - 待确认消歧 (<0.5)
+- `protagonist_state` - 主角状态快照
+- `strand_tracker` - 节奏追踪
+- `disambiguation_warnings` - 消歧警告
+- `disambiguation_pending` - 待确认消歧
+
+**注意**: v5.1 中 entities_v3、alias_index、state_changes、structured_relationships 已迁移到 index.db,不再从 state.json 读取。
 
 
-### Step 3: 查询相关实体
+### Step 3: 查询相关实体 (v5.1 SQL 按需查询)
 
 
 ```bash
 ```bash
 # 查询本章地点相关场景
 # 查询本章地点相关场景
@@ -120,12 +143,22 @@ python -m data_modules.index_manager entity-appearances --entity "yaolao" --proj
 
 
 # 查询最近出场实体
 # 查询最近出场实体
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
+
+# v5.1 新增: 按需获取特定实体详情
+python -m data_modules.index_manager get-entity --id "yaolao" --project-root "."
+
+# v5.1 新增: 按别名查找实体(一对多)
+python -m data_modules.index_manager get-by-alias --alias "药老" --project-root "."
+
+# v5.1 新增: 按类型获取实体
+python -m data_modules.index_manager get-entities-by-type --type "角色" --project-root "."
 ```
 ```
 
 
 **处理逻辑**:
 **处理逻辑**:
 - 地点相关: 召回最近3次在该地点的场景
 - 地点相关: 召回最近3次在该地点的场景
 - 角色相关: 召回角色最近出场状态
 - 角色相关: 召回角色最近出场状态
 - 伏笔: 筛选 urgency >= medium 的伏笔
 - 伏笔: 筛选 urgency >= medium 的伏笔
+- **v5.1 优化**: 非核心实体按需查询,不全量加载
 
 
 ### Step 4: 语义检索 (RAG)
 ### Step 4: 语义检索 (RAG)
 
 

+ 51 - 39
.claude/agents/data-agent.md

@@ -1,19 +1,20 @@
 ---
 ---
 name: data-agent
 name: data-agent
-description: 数据处理Agent (v5.0),负责AI实体提取、场景切片、索引构建。使用 entities_v3 格式和一对多别名。在章节完成后自动调用,处理数据链的写入工作。
+description: 数据处理Agent (v5.1),负责AI实体提取、场景切片、索引构建。使用 entities_v3 格式和一对多别名。在章节完成后自动调用,处理数据链的写入工作。支持 SQLite 增量写入优化。
 tools: Read, Write, Bash
 tools: Read, Write, Bash
 ---
 ---
 
 
-# data-agent (数据处理Agent v5.0)
+# data-agent (数据处理Agent v5.1)
 
 
 > **Role**: 智能数据工程师,负责从章节正文中提取结构化信息并写入数据链。
 > **Role**: 智能数据工程师,负责从章节正文中提取结构化信息并写入数据链。
 >
 >
 > **Philosophy**: AI驱动提取,智能消歧 - 用语义理解替代正则匹配,用置信度控制质量。
 > **Philosophy**: AI驱动提取,智能消歧 - 用语义理解替代正则匹配,用置信度控制质量。
 
 
-**v5.0 变更**:
-- 使用 `entities_v3` 分组格式 (按类型: 角色/地点/物品/势力/招式)
-- 别名索引支持一对多 (同一别名可映射多个实体)
-- `alias_index` 内嵌在 `state.json` 中
+**v5.1 变更**:
+- 使用 SQLite 增量写入替代 JSON 追加
+- 实体/别名/状态变化/关系 直接写入 index.db
+- state.json 只保留精简数据(进度、配置、节奏追踪)
+- 解决 state.json 膨胀问题(20章后 token 爆炸)
 
 
 ## 输入
 ## 输入
 
 
@@ -28,10 +29,10 @@ tools: Read, Write, Bash
 }
 }
 ```
 ```
 
 
-**重要**: 所有数据必须写入 `{project_root}/.webnovel/` 目录,包括
-- state.json → `{project_root}/.webnovel/state.json`
-- vectors.db → `{project_root}/.webnovel/vectors.db`
-- index.db → `{project_root}/.webnovel/index.db`
+**重要**: 所有数据写入 `{project_root}/.webnovel/` 目录:
+- index.db → 实体、别名、状态变化、关系、章节索引 (SQLite)
+- state.json → 进度、配置、节奏追踪 (精简 JSON < 5KB)
+- vectors.db → RAG 向量 (SQLite)
 
 
 ## 输出
 ## 输出
 
 
@@ -59,24 +60,29 @@ tools: Read, Write, Bash
 
 
 ## 执行流程
 ## 执行流程
 
 
-### Step A: 加载上下文
+### Step A: 加载上下文 (v5.1 SQL 查询)
 
 
-使用 Read 工具读取章节正文和已有实体库:
+使用 Read 工具读取章节正文:
 - 章节正文: `正文/第0100章.md`
 - 章节正文: `正文/第0100章.md`
-- 实体库: `.webnovel/state.json` → entities
 
 
-使用 Bash 工具查询:
+使用 Bash 工具从 index.db 查询已有实体:
 ```bash
 ```bash
-# 查询实体别名
-python -m data_modules.entity_linker list-aliases --entity "xiaoyan" --project-root "."
+# v5.1: 从 SQLite 获取核心实体
+python -m data_modules.index_manager get-core-entities --project-root "."
+
+# v5.1: 获取实体别名
+python -m data_modules.index_manager get-aliases --entity "xiaoyan" --project-root "."
 
 
 # 查询最近出场记录
 # 查询最近出场记录
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
+
+# v5.1: 按别名查找实体(一对多)
+python -m data_modules.index_manager get-by-alias --alias "萧炎" --project-root "."
 ```
 ```
 
 
 **准备数据**:
 **准备数据**:
-- 已有实体列表 (id, name, aliases, type)
-- 别名映射表 (alias → entity_id)
+- 已有实体列表 (从 index.db 获取)
+- 别名映射表 (从 index.db aliases 表获取)
 - 最近出场实体 (用于上下文推断)
 - 最近出场实体 (用于上下文推断)
 
 
 ### Step B: AI 实体提取
 ### Step B: AI 实体提取
@@ -146,39 +152,45 @@ for uncertain_item in uncertain:
 → 代词"他"需根据上下文推断
 → 代词"他"需根据上下文推断
 ```
 ```
 
 
-### Step D: 写入存储
+### Step D: 写入存储 (v5.1 SQLite 增量写入)
 
 
-**更新 state.json (v5.0 entities_v3 格式)**:
+**v5.1 优化**: 使用 SQLite 增量写入替代 JSON 追加
+
+**写入 index.db (实体/别名/状态变化/关系)**:
 ```bash
 ```bash
-python -m data_modules.state_manager process-chapter --chapter 100 --data '{...}' --project-root "."
-```
+# v5.1: 写入/更新实体
+python -m data_modules.index_manager upsert-entity --data '{"id":"hongyi_girl","type":"角色","canonical_name":"红衣女子","tier":"装饰","current":{},"first_appearance":100,"last_appearance":100}' --project-root "."
 
 
-写入内容:
-- 新实体添加到 `entities_v3.{类型}.{entity_id}`
-- 状态变化更新到对应实体的 `current` 字段
-- 新关系添加到 `relationships`
-- 新别名注册到 `alias_index`(一对多格式)
-- 更新 `progress.current_chapter`
-- **自动同步主角状态**:`entities_v3.角色.{主角ID}.current` → `protagonist_state`
+# v5.1: 注册别名(一对多)
+python -m data_modules.index_manager register-alias --alias "红衣女子" --entity "hongyi_girl" --type "角色" --project-root "."
 
 
-> **主角同步说明**:为避免双源不一致,`process_chapter_result()` 会自动调用 `sync_protagonist_from_entity()`,将主角实体的 realm/location 同步到 `protagonist_state`,确保 consistency-checker 等依赖 `protagonist_state` 的组件获取最新数据。
+# v5.1: 记录状态变化
+python -m data_modules.index_manager record-state-change --data '{"entity_id":"xiaoyan","field":"realm","old_value":"斗者","new_value":"斗师","reason":"突破","chapter":100}' --project-root "."
 
 
-**更新 index.db**:
+# v5.1: 写入/更新关系
+python -m data_modules.index_manager upsert-relationship --data '{"from_entity":"xiaoyan","to_entity":"hongyi_girl","type":"相识","description":"初次见面","chapter":100}' --project-root "."
+```
+
+**写入 index.db (章节/场景/出场)**:
 ```bash
 ```bash
 python -m data_modules.index_manager process-chapter --chapter 100 --title "突破" --location "天云宗" --word-count 3500 --entities '[...]' --scenes '[...]' --project-root "."
 python -m data_modules.index_manager process-chapter --chapter 100 --title "突破" --location "天云宗" --word-count 3500 --entities '[...]' --scenes '[...]' --project-root "."
 ```
 ```
 
 
-写入内容:
-- 章节元数据 (location, characters, word_count)
-- 实体出场记录
-- 场景索引
-
-**注册新别名 (v5.0 一对多)**:
+**更新精简版 state.json**:
 ```bash
 ```bash
-python -m data_modules.entity_linker register-alias --entity "hongyi_girl" --alias "红衣女子" --type "角色" --project-root "."
+# 仍使用 state_manager,但只写入精简数据
+python -m data_modules.state_manager process-chapter --chapter 100 --data '{...}' --project-root "."
 ```
 ```
 
 
-> 注:v5.0 别名索引支持一对多,同一别名(如"天云宗")可同时映射到地点和势力。
+写入内容 (v5.1 精简):
+- 更新 `progress.current_chapter`
+- 更新 `protagonist_state`(主角状态快照)
+- 更新 `strand_tracker`(节奏追踪)
+- 更新 `disambiguation_warnings/pending`
+
+> **v5.1 变更**: entities_v3、alias_index、state_changes、structured_relationships 不再写入 state.json,改为写入 index.db。state.json 保持 < 5KB。
+
+> **主角同步说明**:`process_chapter_result()` 会自动调用 `sync_protagonist_from_entity()`,将主角实体的 realm/location 同步到 `protagonist_state`。
 
 
 ### Step E: AI 场景切片
 ### Step E: AI 场景切片
 
 

+ 662 - 5
.claude/scripts/data_modules/index_manager.py

@@ -1,21 +1,32 @@
 #!/usr/bin/env python3
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 # -*- coding: utf-8 -*-
 """
 """
-Index Manager - 索引管理模块
+Index Manager - 索引管理模块 (v5.1)
 
 
 管理 index.db (SQLite) 的读写操作:
 管理 index.db (SQLite) 的读写操作:
 - 章节元数据索引
 - 章节元数据索引
 - 实体出场记录
 - 实体出场记录
 - 场景索引
 - 场景索引
+- 实体存储 (从 state.json 迁移)
+- 别名索引 (一对多)
+- 状态变化记录
+- 关系存储
 - 快速查询接口
 - 快速查询接口
+
+v5.1 变更:
+- 新增 entities 表替代 state.json 中的 entities_v3
+- 新增 aliases 表替代 state.json 中的 alias_index (支持一对多)
+- 新增 state_changes 表替代 state.json 中的 state_changes
+- 新增 relationships 表替代 state.json 中的 structured_relationships
 """
 """
 
 
 import sqlite3
 import sqlite3
 import json
 import json
 from pathlib import Path
 from pathlib import Path
 from typing import Dict, List, Optional, Any, Tuple
 from typing import Dict, List, Optional, Any, Tuple
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from contextlib import contextmanager
 from contextlib import contextmanager
+from datetime import datetime
 
 
 from .config import get_config
 from .config import get_config
 
 
@@ -43,6 +54,42 @@ class SceneMeta:
     characters: List[str]
     characters: List[str]
 
 
 
 
+@dataclass
+class EntityMeta:
+    """实体元数据 (v5.1 新增)"""
+    id: str
+    type: str  # 角色/地点/物品/势力/招式
+    canonical_name: str
+    tier: str = "装饰"  # 核心/重要/次要/装饰
+    desc: str = ""
+    current: Dict = field(default_factory=dict)  # 当前状态 (realm/location/items等)
+    first_appearance: int = 0
+    last_appearance: int = 0
+    is_protagonist: bool = False
+    is_archived: bool = False
+
+
+@dataclass
+class StateChangeMeta:
+    """状态变化记录 (v5.1 新增)"""
+    entity_id: str
+    field: str
+    old_value: str
+    new_value: str
+    reason: str
+    chapter: int
+
+
+@dataclass
+class RelationshipMeta:
+    """关系记录 (v5.1 新增)"""
+    from_entity: str
+    to_entity: str
+    type: str
+    description: str
+    chapter: int
+
+
 class IndexManager:
 class IndexManager:
     """索引管理器"""
     """索引管理器"""
 
 
@@ -102,6 +149,77 @@ class IndexManager:
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_entity ON appearances(entity_id)")
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_entity ON appearances(entity_id)")
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_chapter ON appearances(chapter)")
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_chapter ON appearances(chapter)")
 
 
+            # ==================== v5.1 新增表 ====================
+
+            # 实体表 (替代 state.json 中的 entities_v3)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS entities (
+                    id TEXT PRIMARY KEY,
+                    type TEXT NOT NULL,
+                    canonical_name TEXT NOT NULL,
+                    tier TEXT DEFAULT '装饰',
+                    desc TEXT,
+                    current_json TEXT,
+                    first_appearance INTEGER DEFAULT 0,
+                    last_appearance INTEGER DEFAULT 0,
+                    is_protagonist INTEGER DEFAULT 0,
+                    is_archived INTEGER DEFAULT 0,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+                )
+            """)
+
+            # 别名表 (替代 state.json 中的 alias_index,支持一对多)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS aliases (
+                    alias TEXT NOT NULL,
+                    entity_id TEXT NOT NULL,
+                    entity_type TEXT NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    PRIMARY KEY (alias, entity_id, entity_type)
+                )
+            """)
+
+            # 状态变化表 (替代 state.json 中的 state_changes)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS state_changes (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    entity_id TEXT NOT NULL,
+                    field TEXT NOT NULL,
+                    old_value TEXT,
+                    new_value TEXT,
+                    reason TEXT,
+                    chapter INTEGER NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+                )
+            """)
+
+            # 关系表 (替代 state.json 中的 structured_relationships)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS relationships (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    from_entity TEXT NOT NULL,
+                    to_entity TEXT NOT NULL,
+                    type TEXT NOT NULL,
+                    description TEXT,
+                    chapter INTEGER NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    UNIQUE(from_entity, to_entity, type)
+                )
+            """)
+
+            # v5.1 新索引
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_type ON entities(type)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_tier ON entities(tier)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_protagonist ON entities(is_protagonist)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_aliases_entity ON aliases(entity_id)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_aliases_alias ON aliases(alias)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_state_changes_entity ON state_changes(entity_id)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_state_changes_chapter ON state_changes(chapter)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_entity)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_entity)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_chapter ON relationships(chapter)")
+
             conn.commit()
             conn.commit()
 
 
     @contextmanager
     @contextmanager
@@ -274,6 +392,375 @@ class IndexManager:
             """, (chapter,))
             """, (chapter,))
             return [self._row_to_dict(row, parse_json=["mentions"]) for row in cursor.fetchall()]
             return [self._row_to_dict(row, parse_json=["mentions"]) for row in cursor.fetchall()]
 
 
+    # ==================== v5.1 实体操作 ====================
+
+    def upsert_entity(self, entity: EntityMeta) -> bool:
+        """
+        插入或更新实体 (智能合并)
+
+        - 新实体: 直接插入
+        - 已存在: 更新 current_json, last_appearance, updated_at
+
+        返回是否为新实体
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            # 检查是否存在
+            cursor.execute("SELECT id, current_json FROM entities WHERE id = ?", (entity.id,))
+            existing = cursor.fetchone()
+
+            if existing:
+                # 已存在: 智能合并 current_json
+                old_current = {}
+                if existing["current_json"]:
+                    try:
+                        old_current = json.loads(existing["current_json"])
+                    except json.JSONDecodeError:
+                        pass
+
+                # 合并 current (新值覆盖旧值)
+                merged_current = {**old_current, **entity.current}
+
+                cursor.execute("""
+                    UPDATE entities SET
+                        current_json = ?,
+                        last_appearance = ?,
+                        updated_at = CURRENT_TIMESTAMP
+                    WHERE id = ?
+                """, (
+                    json.dumps(merged_current, ensure_ascii=False),
+                    entity.last_appearance,
+                    entity.id
+                ))
+                conn.commit()
+                return False
+            else:
+                # 新实体: 插入
+                cursor.execute("""
+                    INSERT INTO entities
+                    (id, type, canonical_name, tier, desc, current_json,
+                     first_appearance, last_appearance, is_protagonist, is_archived)
+                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                """, (
+                    entity.id,
+                    entity.type,
+                    entity.canonical_name,
+                    entity.tier,
+                    entity.desc,
+                    json.dumps(entity.current, ensure_ascii=False),
+                    entity.first_appearance,
+                    entity.last_appearance,
+                    1 if entity.is_protagonist else 0,
+                    1 if entity.is_archived else 0
+                ))
+                conn.commit()
+                return True
+
+    def get_entity(self, entity_id: str) -> Optional[Dict]:
+        """获取单个实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT * FROM entities WHERE id = ?", (entity_id,))
+            row = cursor.fetchone()
+            if row:
+                return self._row_to_dict(row, parse_json=["current_json"])
+            return None
+
+    def get_entities_by_type(self, entity_type: str, include_archived: bool = False) -> List[Dict]:
+        """按类型获取实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            if include_archived:
+                cursor.execute("""
+                    SELECT * FROM entities WHERE type = ?
+                    ORDER BY last_appearance DESC
+                """, (entity_type,))
+            else:
+                cursor.execute("""
+                    SELECT * FROM entities WHERE type = ? AND is_archived = 0
+                    ORDER BY last_appearance DESC
+                """, (entity_type,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_entities_by_tier(self, tier: str) -> List[Dict]:
+        """按重要度获取实体 (核心/重要/次要/装饰)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM entities WHERE tier = ? AND is_archived = 0
+                ORDER BY last_appearance DESC
+            """, (tier,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_core_entities(self) -> List[Dict]:
+        """获取所有核心实体 (用于 Context Agent 全量加载)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM entities
+                WHERE (tier IN ('核心', '重要') OR is_protagonist = 1) AND is_archived = 0
+                ORDER BY is_protagonist DESC, tier, last_appearance DESC
+            """)
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_protagonist(self) -> Optional[Dict]:
+        """获取主角实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT * FROM entities WHERE is_protagonist = 1 LIMIT 1")
+            row = cursor.fetchone()
+            if row:
+                return self._row_to_dict(row, parse_json=["current_json"])
+            return None
+
+    def update_entity_current(self, entity_id: str, updates: Dict) -> bool:
+        """
+        增量更新实体的 current 字段 (不覆盖其他字段)
+
+        例如: update_entity_current("xiaoyan", {"realm": "斗师"})
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            cursor.execute("SELECT current_json FROM entities WHERE id = ?", (entity_id,))
+            row = cursor.fetchone()
+            if not row:
+                return False
+
+            current = {}
+            if row["current_json"]:
+                try:
+                    current = json.loads(row["current_json"])
+                except json.JSONDecodeError:
+                    pass
+
+            current.update(updates)
+
+            cursor.execute("""
+                UPDATE entities SET
+                    current_json = ?,
+                    updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (json.dumps(current, ensure_ascii=False), entity_id))
+            conn.commit()
+            return True
+
+    def archive_entity(self, entity_id: str) -> bool:
+        """归档实体 (不删除,只是标记)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                UPDATE entities SET is_archived = 1, updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (entity_id,))
+            conn.commit()
+            return cursor.rowcount > 0
+
+    # ==================== v5.1 别名操作 ====================
+
+    def register_alias(self, alias: str, entity_id: str, entity_type: str) -> bool:
+        """
+        注册别名 (支持一对多)
+
+        同一别名可映射多个实体 (如 "天云宗" → 地点 + 势力)
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            try:
+                cursor.execute("""
+                    INSERT OR IGNORE INTO aliases (alias, entity_id, entity_type)
+                    VALUES (?, ?, ?)
+                """, (alias, entity_id, entity_type))
+                conn.commit()
+                return cursor.rowcount > 0
+            except sqlite3.IntegrityError:
+                return False
+
+    def get_entities_by_alias(self, alias: str) -> List[Dict]:
+        """
+        根据别名查找实体 (一对多)
+
+        返回所有匹配的实体 (可能有多个不同类型)
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT e.*, a.entity_type as alias_type
+                FROM entities e
+                JOIN aliases a ON e.id = a.entity_id
+                WHERE a.alias = ?
+            """, (alias,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_entity_aliases(self, entity_id: str) -> List[str]:
+        """获取实体的所有别名"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT alias FROM aliases WHERE entity_id = ?", (entity_id,))
+            return [row["alias"] for row in cursor.fetchall()]
+
+    def remove_alias(self, alias: str, entity_id: str) -> bool:
+        """移除别名"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("DELETE FROM aliases WHERE alias = ? AND entity_id = ?", (alias, entity_id))
+            conn.commit()
+            return cursor.rowcount > 0
+
+    # ==================== v5.1 状态变化操作 ====================
+
+    def record_state_change(self, change: StateChangeMeta) -> int:
+        """
+        记录状态变化
+
+        返回记录 ID
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                INSERT INTO state_changes
+                (entity_id, field, old_value, new_value, reason, chapter)
+                VALUES (?, ?, ?, ?, ?, ?)
+            """, (
+                change.entity_id,
+                change.field,
+                change.old_value,
+                change.new_value,
+                change.reason,
+                change.chapter
+            ))
+            conn.commit()
+            return cursor.lastrowid
+
+    def get_entity_state_changes(self, entity_id: str, limit: int = 20) -> List[Dict]:
+        """获取实体的状态变化历史"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                WHERE entity_id = ?
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (entity_id, limit))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_recent_state_changes(self, limit: int = 50) -> List[Dict]:
+        """获取最近的状态变化"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (limit,))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_chapter_state_changes(self, chapter: int) -> List[Dict]:
+        """获取某章的所有状态变化"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                WHERE chapter = ?
+                ORDER BY id
+            """, (chapter,))
+            return [dict(row) for row in cursor.fetchall()]
+
+    # ==================== v5.1 关系操作 ====================
+
+    def upsert_relationship(self, rel: RelationshipMeta) -> bool:
+        """
+        插入或更新关系
+
+        相同 (from, to, type) 会更新 description 和 chapter
+        返回是否为新关系
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            # 检查是否存在
+            cursor.execute("""
+                SELECT id FROM relationships
+                WHERE from_entity = ? AND to_entity = ? AND type = ?
+            """, (rel.from_entity, rel.to_entity, rel.type))
+            existing = cursor.fetchone()
+
+            if existing:
+                cursor.execute("""
+                    UPDATE relationships SET
+                        description = ?,
+                        chapter = ?
+                    WHERE id = ?
+                """, (rel.description, rel.chapter, existing["id"]))
+                conn.commit()
+                return False
+            else:
+                cursor.execute("""
+                    INSERT INTO relationships
+                    (from_entity, to_entity, type, description, chapter)
+                    VALUES (?, ?, ?, ?, ?)
+                """, (
+                    rel.from_entity,
+                    rel.to_entity,
+                    rel.type,
+                    rel.description,
+                    rel.chapter
+                ))
+                conn.commit()
+                return True
+
+    def get_entity_relationships(self, entity_id: str, direction: str = "both") -> List[Dict]:
+        """
+        获取实体的关系
+
+        direction: "from" | "to" | "both"
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            if direction == "from":
+                cursor.execute("""
+                    SELECT * FROM relationships WHERE from_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id,))
+            elif direction == "to":
+                cursor.execute("""
+                    SELECT * FROM relationships WHERE to_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id,))
+            else:  # both
+                cursor.execute("""
+                    SELECT * FROM relationships
+                    WHERE from_entity = ? OR to_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id, entity_id))
+
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_relationship_between(self, entity1: str, entity2: str) -> List[Dict]:
+        """获取两个实体之间的所有关系"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM relationships
+                WHERE (from_entity = ? AND to_entity = ?)
+                   OR (from_entity = ? AND to_entity = ?)
+                ORDER BY chapter DESC
+            """, (entity1, entity2, entity2, entity1))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_recent_relationships(self, limit: int = 30) -> List[Dict]:
+        """获取最近建立的关系"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM relationships
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (limit,))
+            return [dict(row) for row in cursor.fetchall()]
+
     # ==================== 批量操作 ====================
     # ==================== 批量操作 ====================
 
 
     def process_chapter_data(
     def process_chapter_data(
@@ -361,16 +848,38 @@ class IndexManager:
             scenes = cursor.fetchone()[0]
             scenes = cursor.fetchone()[0]
 
 
             cursor.execute("SELECT COUNT(DISTINCT entity_id) FROM appearances")
             cursor.execute("SELECT COUNT(DISTINCT entity_id) FROM appearances")
-            entities = cursor.fetchone()[0]
+            appearances = cursor.fetchone()[0]
 
 
             cursor.execute("SELECT MAX(chapter) FROM chapters")
             cursor.execute("SELECT MAX(chapter) FROM chapters")
             max_chapter = cursor.fetchone()[0] or 0
             max_chapter = cursor.fetchone()[0] or 0
 
 
+            # v5.1 新增统计
+            cursor.execute("SELECT COUNT(*) FROM entities")
+            entities = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM entities WHERE is_archived = 0")
+            active_entities = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM aliases")
+            aliases = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM state_changes")
+            state_changes = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM relationships")
+            relationships = cursor.fetchone()[0]
+
             return {
             return {
                 "chapters": chapters,
                 "chapters": chapters,
                 "scenes": scenes,
                 "scenes": scenes,
+                "appearances": appearances,
+                "max_chapter": max_chapter,
+                # v5.1 新增
                 "entities": entities,
                 "entities": entities,
-                "max_chapter": max_chapter
+                "active_entities": active_entities,
+                "aliases": aliases,
+                "state_changes": state_changes,
+                "relationships": relationships
             }
             }
 
 
 
 
@@ -379,7 +888,7 @@ class IndexManager:
 def main():
 def main():
     import argparse
     import argparse
 
 
-    parser = argparse.ArgumentParser(description="Index Manager CLI")
+    parser = argparse.ArgumentParser(description="Index Manager CLI (v5.1)")
     parser.add_argument("--project-root", type=str, help="项目根目录")
     parser.add_argument("--project-root", type=str, help="项目根目录")
 
 
     subparsers = parser.add_subparsers(dest="command")
     subparsers = parser.add_subparsers(dest="command")
@@ -414,6 +923,59 @@ def main():
     process_parser.add_argument("--entities", required=True, help="JSON 格式的实体列表")
     process_parser.add_argument("--entities", required=True, help="JSON 格式的实体列表")
     process_parser.add_argument("--scenes", required=True, help="JSON 格式的场景列表")
     process_parser.add_argument("--scenes", required=True, help="JSON 格式的场景列表")
 
 
+    # ==================== v5.1 新增命令 ====================
+
+    # 获取实体
+    get_entity_parser = subparsers.add_parser("get-entity")
+    get_entity_parser.add_argument("--id", required=True, help="实体 ID")
+
+    # 获取核心实体
+    subparsers.add_parser("get-core-entities")
+
+    # 获取主角
+    subparsers.add_parser("get-protagonist")
+
+    # 按类型获取实体
+    type_parser = subparsers.add_parser("get-entities-by-type")
+    type_parser.add_argument("--type", required=True, help="实体类型 (角色/地点/物品/势力/招式)")
+    type_parser.add_argument("--include-archived", action="store_true")
+
+    # 按别名查找实体
+    alias_parser = subparsers.add_parser("get-by-alias")
+    alias_parser.add_argument("--alias", required=True, help="别名")
+
+    # 获取实体别名
+    aliases_parser = subparsers.add_parser("get-aliases")
+    aliases_parser.add_argument("--entity", required=True, help="实体 ID")
+
+    # 注册别名
+    reg_alias_parser = subparsers.add_parser("register-alias")
+    reg_alias_parser.add_argument("--alias", required=True)
+    reg_alias_parser.add_argument("--entity", required=True)
+    reg_alias_parser.add_argument("--type", required=True, help="实体类型")
+
+    # 获取实体关系
+    rel_parser = subparsers.add_parser("get-relationships")
+    rel_parser.add_argument("--entity", required=True)
+    rel_parser.add_argument("--direction", choices=["from", "to", "both"], default="both")
+
+    # 获取状态变化
+    changes_parser = subparsers.add_parser("get-state-changes")
+    changes_parser.add_argument("--entity", required=True)
+    changes_parser.add_argument("--limit", type=int, default=20)
+
+    # 写入实体
+    upsert_entity_parser = subparsers.add_parser("upsert-entity")
+    upsert_entity_parser.add_argument("--data", required=True, help="JSON 格式的实体数据")
+
+    # 写入关系
+    upsert_rel_parser = subparsers.add_parser("upsert-relationship")
+    upsert_rel_parser.add_argument("--data", required=True, help="JSON 格式的关系数据")
+
+    # 写入状态变化
+    state_change_parser = subparsers.add_parser("record-state-change")
+    state_change_parser.add_argument("--data", required=True, help="JSON 格式的状态变化数据")
+
     args = parser.parse_args()
     args = parser.parse_args()
 
 
     # 初始化
     # 初始化
@@ -466,6 +1028,101 @@ def main():
         print(f"✓ 已处理第 {args.chapter} 章")
         print(f"✓ 已处理第 {args.chapter} 章")
         print(f"  章节: {stats['chapters']}, 场景: {stats['scenes']}, 出场记录: {stats['appearances']}")
         print(f"  章节: {stats['chapters']}, 场景: {stats['scenes']}, 出场记录: {stats['appearances']}")
 
 
+    # ==================== v5.1 新增命令处理 ====================
+
+    elif args.command == "get-entity":
+        entity = manager.get_entity(args.id)
+        if entity:
+            print(json.dumps(entity, ensure_ascii=False, indent=2))
+        else:
+            print(f"未找到实体: {args.id}")
+
+    elif args.command == "get-core-entities":
+        entities = manager.get_core_entities()
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-protagonist":
+        protagonist = manager.get_protagonist()
+        if protagonist:
+            print(json.dumps(protagonist, ensure_ascii=False, indent=2))
+        else:
+            print("未设置主角")
+
+    elif args.command == "get-entities-by-type":
+        entities = manager.get_entities_by_type(args.type, args.include_archived)
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-by-alias":
+        entities = manager.get_entities_by_alias(args.alias)
+        if entities:
+            print(json.dumps(entities, ensure_ascii=False, indent=2))
+        else:
+            print(f"未找到别名: {args.alias}")
+
+    elif args.command == "get-aliases":
+        aliases = manager.get_entity_aliases(args.entity)
+        if aliases:
+            print(f"{args.entity} 的别名: {', '.join(aliases)}")
+        else:
+            print(f"{args.entity} 没有别名")
+
+    elif args.command == "register-alias":
+        success = manager.register_alias(args.alias, args.entity, args.type)
+        if success:
+            print(f"✓ 已注册别名: {args.alias} → {args.entity} ({args.type})")
+        else:
+            print(f"别名已存在或注册失败: {args.alias}")
+
+    elif args.command == "get-relationships":
+        rels = manager.get_entity_relationships(args.entity, args.direction)
+        print(json.dumps(rels, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-state-changes":
+        changes = manager.get_entity_state_changes(args.entity, args.limit)
+        print(json.dumps(changes, ensure_ascii=False, indent=2))
+
+    elif args.command == "upsert-entity":
+        data = json.loads(args.data)
+        entity = EntityMeta(
+            id=data["id"],
+            type=data["type"],
+            canonical_name=data["canonical_name"],
+            tier=data.get("tier", "装饰"),
+            desc=data.get("desc", ""),
+            current=data.get("current", {}),
+            first_appearance=data.get("first_appearance", 0),
+            last_appearance=data.get("last_appearance", 0),
+            is_protagonist=data.get("is_protagonist", False),
+            is_archived=data.get("is_archived", False)
+        )
+        is_new = manager.upsert_entity(entity)
+        print(f"✓ {'新建' if is_new else '更新'}实体: {entity.id}")
+
+    elif args.command == "upsert-relationship":
+        data = json.loads(args.data)
+        rel = RelationshipMeta(
+            from_entity=data["from_entity"],
+            to_entity=data["to_entity"],
+            type=data["type"],
+            description=data.get("description", ""),
+            chapter=data["chapter"]
+        )
+        is_new = manager.upsert_relationship(rel)
+        print(f"✓ {'新建' if is_new else '更新'}关系: {rel.from_entity} → {rel.to_entity} ({rel.type})")
+
+    elif args.command == "record-state-change":
+        data = json.loads(args.data)
+        change = StateChangeMeta(
+            entity_id=data["entity_id"],
+            field=data["field"],
+            old_value=data.get("old_value", ""),
+            new_value=data["new_value"],
+            reason=data.get("reason", ""),
+            chapter=data["chapter"]
+        )
+        record_id = manager.record_state_change(change)
+        print(f"✓ 已记录状态变化 #{record_id}: {change.entity_id}.{change.field}")
+
 
 
 if __name__ == "__main__":
 if __name__ == "__main__":
     main()
     main()

+ 358 - 0
.claude/scripts/data_modules/migrate_state_to_sqlite.py

@@ -0,0 +1,358 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+migrate_state_to_sqlite.py - 数据迁移脚本 (v5.1)
+
+将 state.json 中的大数据迁移到 SQLite (index.db):
+- entities_v3 → entities 表
+- alias_index → aliases 表
+- state_changes → state_changes 表
+- structured_relationships → relationships 表
+
+迁移后 state.json 只保留精简数据 (< 5KB):
+- progress
+- protagonist_state
+- strand_tracker
+- disambiguation_warnings/pending
+- project_info
+- world_settings (骨架)
+- plot_threads
+- relationships (简化版)
+- review_checkpoints
+
+用法:
+    python -m data_modules.migrate_state_to_sqlite --project-root "D:/wk/斗破苍穹"
+    python -m data_modules.migrate_state_to_sqlite --project-root "." --dry-run
+    python -m data_modules.migrate_state_to_sqlite --project-root "." --backup
+"""
+
+import json
+import shutil
+from pathlib import Path
+from datetime import datetime
+from typing import Dict, Any, List
+
+from .config import get_config, DataModulesConfig
+from .sql_state_manager import SQLStateManager, EntityData
+
+
+def migrate_state_to_sqlite(
+    config: DataModulesConfig,
+    dry_run: bool = False,
+    backup: bool = True,
+    verbose: bool = True
+) -> Dict[str, int]:
+    """
+    执行迁移
+
+    参数:
+    - config: 配置对象
+    - dry_run: 只分析不实际写入
+    - backup: 迁移前备份 state.json
+    - verbose: 打印详细日志
+
+    返回: 迁移统计
+    """
+    stats = {
+        "entities": 0,
+        "aliases": 0,
+        "state_changes": 0,
+        "relationships": 0,
+        "skipped": 0,
+        "errors": 0
+    }
+
+    # 读取 state.json
+    state_file = config.state_file
+    if not state_file.exists():
+        if verbose:
+            print(f"❌ state.json 不存在: {state_file}")
+        return stats
+
+    with open(state_file, 'r', encoding='utf-8') as f:
+        state = json.load(f)
+
+    if verbose:
+        file_size = state_file.stat().st_size / 1024
+        print(f"📄 读取 state.json ({file_size:.1f} KB)")
+
+    # 备份
+    if backup and not dry_run:
+        backup_file = state_file.with_suffix(f".json.backup-{datetime.now().strftime('%Y%m%d_%H%M%S')}")
+        shutil.copy(state_file, backup_file)
+        if verbose:
+            print(f"💾 已备份到: {backup_file}")
+
+    # 初始化 SQLStateManager
+    sql_manager = SQLStateManager(config)
+
+    # 1. 迁移 entities_v3
+    entities_v3 = state.get("entities_v3", {})
+    if verbose:
+        print(f"\n🔄 迁移 entities_v3...")
+
+    for entity_type, entities in entities_v3.items():
+        if not isinstance(entities, dict):
+            continue
+
+        for entity_id, entity_data in entities.items():
+            if not isinstance(entity_data, dict):
+                stats["skipped"] += 1
+                continue
+
+            try:
+                entity = EntityData(
+                    id=entity_id,
+                    type=entity_type,
+                    name=entity_data.get("canonical_name", entity_data.get("name", entity_id)),
+                    tier=entity_data.get("tier", "装饰"),
+                    desc=entity_data.get("desc", ""),
+                    current=entity_data.get("current", {}),
+                    aliases=[],  # 别名单独处理
+                    first_appearance=entity_data.get("first_appearance", 0),
+                    last_appearance=entity_data.get("last_appearance", 0),
+                    is_protagonist=entity_data.get("is_protagonist", False)
+                )
+
+                if not dry_run:
+                    sql_manager.upsert_entity(entity)
+                stats["entities"] += 1
+
+                if verbose and stats["entities"] % 50 == 0:
+                    print(f"  已迁移 {stats['entities']} 个实体...")
+
+            except Exception as e:
+                stats["errors"] += 1
+                if verbose:
+                    print(f"  ⚠️ 实体迁移失败 {entity_id}: {e}")
+
+    if verbose:
+        print(f"  ✅ 实体: {stats['entities']} 个")
+
+    # 2. 迁移 alias_index
+    alias_index = state.get("alias_index", {})
+    if verbose:
+        print(f"\n🔄 迁移 alias_index...")
+
+    for alias, entries in alias_index.items():
+        if not isinstance(entries, list):
+            continue
+
+        for entry in entries:
+            if not isinstance(entry, dict):
+                stats["skipped"] += 1
+                continue
+
+            entity_id = entry.get("id")
+            entity_type = entry.get("type")
+            if not entity_id or not entity_type:
+                stats["skipped"] += 1
+                continue
+
+            try:
+                if not dry_run:
+                    sql_manager.register_alias(alias, entity_id, entity_type)
+                stats["aliases"] += 1
+
+            except Exception as e:
+                stats["errors"] += 1
+                if verbose:
+                    print(f"  ⚠️ 别名迁移失败 {alias}: {e}")
+
+    if verbose:
+        print(f"  ✅ 别名: {stats['aliases']} 个")
+
+    # 3. 迁移 state_changes
+    state_changes = state.get("state_changes", [])
+    if verbose:
+        print(f"\n🔄 迁移 state_changes...")
+
+    for change in state_changes:
+        if not isinstance(change, dict):
+            stats["skipped"] += 1
+            continue
+
+        try:
+            entity_id = change.get("entity_id", "")
+            if not entity_id:
+                stats["skipped"] += 1
+                continue
+
+            if not dry_run:
+                sql_manager.record_state_change(
+                    entity_id=entity_id,
+                    field=change.get("field", ""),
+                    old_value=change.get("old", change.get("old_value", "")),
+                    new_value=change.get("new", change.get("new_value", "")),
+                    reason=change.get("reason", ""),
+                    chapter=change.get("chapter", 0)
+                )
+            stats["state_changes"] += 1
+
+        except Exception as e:
+            stats["errors"] += 1
+            if verbose:
+                print(f"  ⚠️ 状态变化迁移失败: {e}")
+
+    if verbose:
+        print(f"  ✅ 状态变化: {stats['state_changes']} 条")
+
+    # 4. 迁移 structured_relationships
+    relationships = state.get("structured_relationships", [])
+    if verbose:
+        print(f"\n🔄 迁移 structured_relationships...")
+
+    for rel in relationships:
+        if not isinstance(rel, dict):
+            stats["skipped"] += 1
+            continue
+
+        try:
+            from_entity = rel.get("from", rel.get("from_entity", ""))
+            to_entity = rel.get("to", rel.get("to_entity", ""))
+            if not from_entity or not to_entity:
+                stats["skipped"] += 1
+                continue
+
+            if not dry_run:
+                sql_manager.upsert_relationship(
+                    from_entity=from_entity,
+                    to_entity=to_entity,
+                    type=rel.get("type", "相识"),
+                    description=rel.get("description", ""),
+                    chapter=rel.get("chapter", 0)
+                )
+            stats["relationships"] += 1
+
+        except Exception as e:
+            stats["errors"] += 1
+            if verbose:
+                print(f"  ⚠️ 关系迁移失败: {e}")
+
+    if verbose:
+        print(f"  ✅ 关系: {stats['relationships']} 条")
+
+    # 5. 精简 state.json(移除已迁移字段)
+    if not dry_run:
+        if verbose:
+            print(f"\n🔄 精简 state.json...")
+
+        # 保留字段
+        slim_state = {
+            "project_info": state.get("project_info", {}),
+            "progress": state.get("progress", {}),
+            "protagonist_state": state.get("protagonist_state", {}),
+            "strand_tracker": state.get("strand_tracker", {}),
+            "world_settings": _slim_world_settings(state.get("world_settings", {})),
+            "plot_threads": state.get("plot_threads", {}),
+            "relationships": _slim_relationships(state.get("relationships", {})),
+            "review_checkpoints": state.get("review_checkpoints", [])[-10:],  # 只保留最近10个
+            "disambiguation_warnings": state.get("disambiguation_warnings", [])[-20:],
+            "disambiguation_pending": state.get("disambiguation_pending", [])[-10:],
+            # v5.1 标记
+            "_migrated_to_sqlite": True,
+            "_migration_timestamp": datetime.now().isoformat()
+        }
+
+        with open(state_file, 'w', encoding='utf-8') as f:
+            json.dump(slim_state, f, ensure_ascii=False, indent=2)
+
+        new_size = state_file.stat().st_size / 1024
+        if verbose:
+            print(f"  ✅ 精简后: {new_size:.1f} KB")
+
+    # 打印统计
+    if verbose:
+        print(f"\n" + "=" * 50)
+        print(f"📊 迁移统计:")
+        print(f"  实体: {stats['entities']}")
+        print(f"  别名: {stats['aliases']}")
+        print(f"  状态变化: {stats['state_changes']}")
+        print(f"  关系: {stats['relationships']}")
+        print(f"  跳过: {stats['skipped']}")
+        print(f"  错误: {stats['errors']}")
+        if dry_run:
+            print(f"\n⚠️ 这是 dry-run 模式,实际未写入任何数据")
+
+    return stats
+
+
+def _slim_world_settings(world_settings: Dict) -> Dict:
+    """精简 world_settings,只保留骨架"""
+    if not isinstance(world_settings, dict):
+        return {}
+
+    slim = {}
+
+    # power_system: 只保留等级名称
+    power_system = world_settings.get("power_system", [])
+    if isinstance(power_system, list):
+        slim["power_system"] = [
+            p.get("name") if isinstance(p, dict) else p
+            for p in power_system[:20]  # 最多20个等级
+        ]
+
+    # factions: 只保留名称和简述
+    factions = world_settings.get("factions", [])
+    if isinstance(factions, list):
+        slim["factions"] = [
+            {"name": f.get("name"), "type": f.get("type")}
+            if isinstance(f, dict) else f
+            for f in factions[:30]  # 最多30个势力
+        ]
+
+    # locations: 只保留名称
+    locations = world_settings.get("locations", [])
+    if isinstance(locations, list):
+        slim["locations"] = [
+            loc.get("name") if isinstance(loc, dict) else loc
+            for loc in locations[:50]  # 最多50个地点
+        ]
+
+    return slim
+
+
+def _slim_relationships(relationships: Dict) -> Dict:
+    """精简 relationships,只保留核心关系"""
+    if not isinstance(relationships, dict):
+        return {}
+
+    # 只保留 relationships 字典本身,不做额外精简
+    # 因为这个字段本身应该比较小
+    return relationships
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="迁移 state.json 到 SQLite (v5.1)")
+    parser.add_argument("--project-root", type=str, required=True, help="项目根目录")
+    parser.add_argument("--dry-run", action="store_true", help="只分析不实际写入")
+    parser.add_argument("--backup", action="store_true", default=True, help="迁移前备份")
+    parser.add_argument("--no-backup", action="store_true", help="不备份")
+    parser.add_argument("--quiet", action="store_true", help="安静模式")
+
+    args = parser.parse_args()
+
+    config = DataModulesConfig.from_project_root(args.project_root)
+    backup = not args.no_backup
+
+    print(f"🚀 开始迁移 state.json → SQLite")
+    print(f"   项目: {config.project_root}")
+    print(f"   state.json: {config.state_file}")
+    print(f"   index.db: {config.index_db}")
+    print()
+
+    stats = migrate_state_to_sqlite(
+        config=config,
+        dry_run=args.dry_run,
+        backup=backup,
+        verbose=not args.quiet
+    )
+
+    if stats["errors"] > 0:
+        exit(1)
+
+
+if __name__ == "__main__":
+    main()

+ 532 - 0
.claude/scripts/data_modules/sql_state_manager.py

@@ -0,0 +1,532 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+SQL State Manager - SQLite 状态管理模块 (v5.1)
+
+基于 IndexManager 扩展,提供与 StateManager 兼容的高级接口,
+将大数据(实体、别名、状态变化、关系)存储到 SQLite 而非 JSON。
+
+目标:
+- 替代 state.json 中的大数据字段
+- 保持与 Data Agent / Context Agent 的接口兼容
+- 支持增量写入和按需查询
+"""
+
+import json
+from typing import Dict, List, Optional, Any
+from dataclasses import dataclass, field
+from datetime import datetime
+
+from .index_manager import (
+    IndexManager,
+    EntityMeta,
+    StateChangeMeta,
+    RelationshipMeta
+)
+from .config import get_config
+
+
+@dataclass
+class EntityData:
+    """实体数据(用于 Data Agent 输入)"""
+    id: str
+    type: str  # 角色/地点/物品/势力/招式
+    name: str
+    tier: str = "装饰"
+    desc: str = ""
+    current: Dict[str, Any] = field(default_factory=dict)
+    aliases: List[str] = field(default_factory=list)
+    first_appearance: int = 0
+    last_appearance: int = 0
+    is_protagonist: bool = False
+
+
+class SQLStateManager:
+    """
+    SQLite 状态管理器 (v5.1)
+
+    提供与 StateManager 兼容的接口,但数据存储在 SQLite (index.db) 中。
+    用于替代 state.json 中膨胀的数据结构。
+
+    用法:
+    ```python
+    manager = SQLStateManager(config)
+
+    # 写入实体
+    manager.upsert_entity(EntityData(
+        id="xiaoyan",
+        type="角色",
+        name="萧炎",
+        tier="核心",
+        current={"realm": "斗师", "location": "天云宗"},
+        aliases=["小炎子", "废柴"],
+        is_protagonist=True
+    ))
+
+    # 写入状态变化
+    manager.record_state_change(
+        entity_id="xiaoyan",
+        field="realm",
+        old_value="斗者",
+        new_value="斗师",
+        reason="闭关突破",
+        chapter=100
+    )
+
+    # 写入关系
+    manager.upsert_relationship(
+        from_entity="xiaoyan",
+        to_entity="yaolao",
+        type="师徒",
+        description="药老收萧炎为徒",
+        chapter=5
+    )
+
+    # 读取
+    protagonist = manager.get_protagonist()
+    core_entities = manager.get_core_entities()
+    changes = manager.get_recent_state_changes(limit=50)
+    ```
+    """
+
+    # v5.0 支持的实体类型
+    ENTITY_TYPES = ["角色", "地点", "物品", "势力", "招式"]
+
+    def __init__(self, config=None):
+        self.config = config or get_config()
+        self._index_manager = IndexManager(config)
+
+    # ==================== 实体操作 ====================
+
+    def upsert_entity(self, entity: EntityData) -> bool:
+        """
+        插入或更新实体
+
+        自动处理:
+        - 实体基本信息写入 entities 表
+        - 别名写入 aliases 表
+        - canonical_name 自动添加为别名
+
+        返回: 是否为新实体
+        """
+        # 构建 EntityMeta
+        meta = EntityMeta(
+            id=entity.id,
+            type=entity.type,
+            canonical_name=entity.name,
+            tier=entity.tier,
+            desc=entity.desc,
+            current=entity.current,
+            first_appearance=entity.first_appearance,
+            last_appearance=entity.last_appearance,
+            is_protagonist=entity.is_protagonist,
+            is_archived=False
+        )
+
+        is_new = self._index_manager.upsert_entity(meta)
+
+        # 注册别名
+        # 1. canonical_name 本身作为别名
+        self._index_manager.register_alias(entity.name, entity.id, entity.type)
+
+        # 2. 其他别名
+        for alias in entity.aliases:
+            if alias and alias != entity.name:
+                self._index_manager.register_alias(alias, entity.id, entity.type)
+
+        return is_new
+
+    def get_entity(self, entity_id: str) -> Optional[Dict]:
+        """获取实体详情"""
+        entity = self._index_manager.get_entity(entity_id)
+        if entity:
+            # 添加别名
+            entity["aliases"] = self._index_manager.get_entity_aliases(entity_id)
+        return entity
+
+    def get_entities_by_type(self, entity_type: str, include_archived: bool = False) -> List[Dict]:
+        """按类型获取实体"""
+        entities = self._index_manager.get_entities_by_type(entity_type, include_archived)
+        for e in entities:
+            e["aliases"] = self._index_manager.get_entity_aliases(e["id"])
+        return entities
+
+    def get_core_entities(self) -> List[Dict]:
+        """
+        获取核心实体(用于 Context Agent 全量加载)
+
+        返回所有 tier=核心/重要 或 is_protagonist=1 的实体
+        """
+        entities = self._index_manager.get_core_entities()
+        for e in entities:
+            e["aliases"] = self._index_manager.get_entity_aliases(e["id"])
+        return entities
+
+    def get_protagonist(self) -> Optional[Dict]:
+        """获取主角实体"""
+        protagonist = self._index_manager.get_protagonist()
+        if protagonist:
+            protagonist["aliases"] = self._index_manager.get_entity_aliases(protagonist["id"])
+        return protagonist
+
+    def update_entity_current(self, entity_id: str, updates: Dict) -> bool:
+        """增量更新实体的 current 字段"""
+        return self._index_manager.update_entity_current(entity_id, updates)
+
+    def resolve_alias(self, alias: str) -> List[Dict]:
+        """
+        根据别名解析实体(一对多)
+
+        返回所有匹配的实体
+        """
+        return self._index_manager.get_entities_by_alias(alias)
+
+    def register_alias(self, alias: str, entity_id: str, entity_type: str) -> bool:
+        """注册别名"""
+        return self._index_manager.register_alias(alias, entity_id, entity_type)
+
+    # ==================== 状态变化操作 ====================
+
+    def record_state_change(
+        self,
+        entity_id: str,
+        field: str,
+        old_value: Any,
+        new_value: Any,
+        reason: str,
+        chapter: int
+    ) -> int:
+        """
+        记录状态变化
+
+        返回: 记录 ID
+        """
+        change = StateChangeMeta(
+            entity_id=entity_id,
+            field=field,
+            old_value=str(old_value) if old_value is not None else "",
+            new_value=str(new_value),
+            reason=reason,
+            chapter=chapter
+        )
+        return self._index_manager.record_state_change(change)
+
+    def get_entity_state_changes(self, entity_id: str, limit: int = 20) -> List[Dict]:
+        """获取实体的状态变化历史"""
+        return self._index_manager.get_entity_state_changes(entity_id, limit)
+
+    def get_recent_state_changes(self, limit: int = 50) -> List[Dict]:
+        """获取最近的状态变化"""
+        return self._index_manager.get_recent_state_changes(limit)
+
+    def get_chapter_state_changes(self, chapter: int) -> List[Dict]:
+        """获取某章的所有状态变化"""
+        return self._index_manager.get_chapter_state_changes(chapter)
+
+    # ==================== 关系操作 ====================
+
+    def upsert_relationship(
+        self,
+        from_entity: str,
+        to_entity: str,
+        type: str,
+        description: str,
+        chapter: int
+    ) -> bool:
+        """
+        插入或更新关系
+
+        返回: 是否为新关系
+        """
+        rel = RelationshipMeta(
+            from_entity=from_entity,
+            to_entity=to_entity,
+            type=type,
+            description=description,
+            chapter=chapter
+        )
+        return self._index_manager.upsert_relationship(rel)
+
+    def get_entity_relationships(self, entity_id: str, direction: str = "both") -> List[Dict]:
+        """获取实体的关系"""
+        return self._index_manager.get_entity_relationships(entity_id, direction)
+
+    def get_relationship_between(self, entity1: str, entity2: str) -> List[Dict]:
+        """获取两个实体之间的所有关系"""
+        return self._index_manager.get_relationship_between(entity1, entity2)
+
+    def get_recent_relationships(self, limit: int = 30) -> List[Dict]:
+        """获取最近建立的关系"""
+        return self._index_manager.get_recent_relationships(limit)
+
+    # ==================== 批量写入(供 Data Agent 使用) ====================
+
+    def process_chapter_entities(
+        self,
+        chapter: int,
+        entities_appeared: List[Dict],
+        entities_new: List[Dict],
+        state_changes: List[Dict],
+        relationships_new: List[Dict]
+    ) -> Dict[str, int]:
+        """
+        处理章节的实体数据(Data Agent 主入口)
+
+        参数:
+        - chapter: 章节号
+        - entities_appeared: 出场的已有实体
+          [{"id": "xiaoyan", "type": "角色", "mentions": ["萧炎", "他"], "confidence": 0.95}]
+        - entities_new: 新发现的实体
+          [{"suggested_id": "hongyi_girl", "name": "红衣女子", "type": "角色", "tier": "装饰"}]
+        - state_changes: 状态变化
+          [{"entity_id": "xiaoyan", "field": "realm", "old": "斗者", "new": "斗师", "reason": "突破"}]
+        - relationships_new: 新关系
+          [{"from": "xiaoyan", "to": "hongyi_girl", "type": "相识", "description": "初次见面"}]
+
+        返回: 写入统计
+        """
+        stats = {
+            "entities_updated": 0,
+            "entities_created": 0,
+            "state_changes": 0,
+            "relationships": 0,
+            "aliases": 0
+        }
+
+        # 1. 处理出场实体(更新 last_appearance)
+        for entity in entities_appeared:
+            entity_id = entity.get("id")
+            if not entity_id:
+                continue
+
+            self._index_manager.update_entity_current(entity_id, {})  # 触发 updated_at
+            # 更新 last_appearance
+            existing = self._index_manager.get_entity(entity_id)
+            if existing:
+                # 使用 SQL 直接更新 last_appearance
+                self._update_last_appearance(entity_id, chapter)
+                stats["entities_updated"] += 1
+
+            # 记录出场(保留原有逻辑)
+            self._index_manager.record_appearance(
+                entity_id=entity_id,
+                chapter=chapter,
+                mentions=entity.get("mentions", []),
+                confidence=entity.get("confidence", 1.0)
+            )
+
+        # 2. 处理新实体
+        for entity in entities_new:
+            suggested_id = entity.get("suggested_id") or entity.get("id")
+            if not suggested_id:
+                continue
+
+            entity_data = EntityData(
+                id=suggested_id,
+                type=entity.get("type", "角色"),
+                name=entity.get("name", suggested_id),
+                tier=entity.get("tier", "装饰"),
+                desc=entity.get("desc", ""),
+                current=entity.get("current", {}),
+                aliases=entity.get("aliases", []),
+                first_appearance=chapter,
+                last_appearance=chapter,
+                is_protagonist=entity.get("is_protagonist", False)
+            )
+            is_new = self.upsert_entity(entity_data)
+            if is_new:
+                stats["entities_created"] += 1
+            else:
+                stats["entities_updated"] += 1
+
+            # 统计别名
+            stats["aliases"] += 1 + len(entity_data.aliases)
+
+        # 3. 处理状态变化
+        for change in state_changes:
+            entity_id = change.get("entity_id")
+            if not entity_id:
+                continue
+
+            self.record_state_change(
+                entity_id=entity_id,
+                field=change.get("field", ""),
+                old_value=change.get("old", change.get("old_value", "")),
+                new_value=change.get("new", change.get("new_value", "")),
+                reason=change.get("reason", ""),
+                chapter=chapter
+            )
+            stats["state_changes"] += 1
+
+            # 同步更新实体的 current
+            field_name = change.get("field")
+            new_value = change.get("new", change.get("new_value"))
+            if field_name and new_value:
+                self._index_manager.update_entity_current(entity_id, {field_name: new_value})
+
+        # 4. 处理新关系
+        for rel in relationships_new:
+            from_entity = rel.get("from", rel.get("from_entity"))
+            to_entity = rel.get("to", rel.get("to_entity"))
+            if not from_entity or not to_entity:
+                continue
+
+            self.upsert_relationship(
+                from_entity=from_entity,
+                to_entity=to_entity,
+                type=rel.get("type", "相识"),
+                description=rel.get("description", ""),
+                chapter=chapter
+            )
+            stats["relationships"] += 1
+
+        return stats
+
+    def _update_last_appearance(self, entity_id: str, chapter: int):
+        """更新实体的 last_appearance"""
+        with self._index_manager._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                UPDATE entities SET
+                    last_appearance = MAX(last_appearance, ?),
+                    updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (chapter, entity_id))
+            conn.commit()
+
+    # ==================== 统计 ====================
+
+    def get_stats(self) -> Dict[str, int]:
+        """获取统计信息"""
+        return self._index_manager.get_stats()
+
+    # ==================== 格式转换(兼容性) ====================
+
+    def export_to_entities_v3_format(self) -> Dict[str, Dict[str, Dict]]:
+        """
+        导出为 entities_v3 格式(用于兼容性)
+
+        返回: {"角色": {"xiaoyan": {...}}, "地点": {...}, ...}
+        """
+        result = {t: {} for t in self.ENTITY_TYPES}
+
+        for entity_type in self.ENTITY_TYPES:
+            entities = self.get_entities_by_type(entity_type, include_archived=True)
+            for e in entities:
+                entity_dict = {
+                    "name": e.get("canonical_name"),
+                    "tier": e.get("tier", "装饰"),
+                    "aliases": e.get("aliases", []),
+                    "desc": e.get("desc", ""),
+                    "current": e.get("current_json", {}),
+                    "history": [],  # 历史记录需要从 state_changes 表查询
+                    "first_appearance": e.get("first_appearance", 0),
+                    "last_appearance": e.get("last_appearance", 0)
+                }
+                if e.get("is_protagonist"):
+                    entity_dict["is_protagonist"] = True
+                result[entity_type][e["id"]] = entity_dict
+
+        return result
+
+    def export_to_alias_index_format(self) -> Dict[str, List[Dict[str, str]]]:
+        """
+        导出为 alias_index 格式(用于兼容性)
+
+        返回: {"萧炎": [{"type": "角色", "id": "xiaoyan"}], ...}
+        """
+        result = {}
+
+        with self._index_manager._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT alias, entity_id, entity_type FROM aliases")
+            for row in cursor.fetchall():
+                alias = row["alias"]
+                if alias not in result:
+                    result[alias] = []
+                result[alias].append({
+                    "type": row["entity_type"],
+                    "id": row["entity_id"]
+                })
+
+        return result
+
+
+# ==================== CLI 接口 ====================
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="SQL State Manager CLI (v5.1)")
+    parser.add_argument("--project-root", type=str, help="项目根目录")
+
+    subparsers = parser.add_subparsers(dest="command")
+
+    # 获取统计
+    subparsers.add_parser("stats")
+
+    # 获取主角
+    subparsers.add_parser("get-protagonist")
+
+    # 获取核心实体
+    subparsers.add_parser("get-core-entities")
+
+    # 导出 entities_v3 格式
+    subparsers.add_parser("export-entities-v3")
+
+    # 导出 alias_index 格式
+    subparsers.add_parser("export-alias-index")
+
+    # 处理章节数据
+    process_parser = subparsers.add_parser("process-chapter")
+    process_parser.add_argument("--chapter", type=int, required=True)
+    process_parser.add_argument("--data", required=True, help="JSON 格式的章节数据")
+
+    args = parser.parse_args()
+
+    # 初始化
+    config = None
+    if args.project_root:
+        from .config import DataModulesConfig
+        config = DataModulesConfig.from_project_root(args.project_root)
+
+    manager = SQLStateManager(config)
+
+    if args.command == "stats":
+        stats = manager.get_stats()
+        print(json.dumps(stats, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-protagonist":
+        protagonist = manager.get_protagonist()
+        if protagonist:
+            print(json.dumps(protagonist, ensure_ascii=False, indent=2))
+        else:
+            print("未设置主角")
+
+    elif args.command == "get-core-entities":
+        entities = manager.get_core_entities()
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "export-entities-v3":
+        data = manager.export_to_entities_v3_format()
+        print(json.dumps(data, ensure_ascii=False, indent=2))
+
+    elif args.command == "export-alias-index":
+        data = manager.export_to_alias_index_format()
+        print(json.dumps(data, ensure_ascii=False, indent=2))
+
+    elif args.command == "process-chapter":
+        data = json.loads(args.data)
+        stats = manager.process_chapter_entities(
+            chapter=args.chapter,
+            entities_appeared=data.get("entities_appeared", []),
+            entities_new=data.get("entities_new", []),
+            state_changes=data.get("state_changes", []),
+            relationships_new=data.get("relationships_new", [])
+        )
+        print(f"✓ 已处理第 {args.chapter} 章")
+        print(json.dumps(stats, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 85 - 4
.claude/scripts/data_modules/state_manager.py

@@ -1,12 +1,16 @@
 #!/usr/bin/env python3
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 # -*- coding: utf-8 -*-
 """
 """
-State Manager - 状态管理模块
+State Manager - 状态管理模块 (v5.1)
 
 
 管理 state.json 的读写操作:
 管理 state.json 的读写操作:
 - 实体状态管理
 - 实体状态管理
 - 进度追踪
 - 进度追踪
 - 关系记录
 - 关系记录
+
+v5.1 变更:
+- 集成 SQLStateManager,同步写入 SQLite (index.db)
+- state.json 保留精简数据,大数据自动迁移到 SQLite
 """
 """
 
 
 import json
 import json
@@ -74,17 +78,34 @@ class _EntityPatch:
 
 
 
 
 class StateManager:
 class StateManager:
-    """状态管理器 (v5.0 entities_v3 格式)"""
+    """状态管理器 (v5.1 entities_v3 格式 + SQLite 同步)"""
 
 
     # v5.0 支持的实体类型
     # v5.0 支持的实体类型
     ENTITY_TYPES = ["角色", "地点", "物品", "势力", "招式"]
     ENTITY_TYPES = ["角色", "地点", "物品", "势力", "招式"]
 
 
-    def __init__(self, config=None):
+    def __init__(self, config=None, enable_sqlite_sync: bool = True):
+        """
+        初始化状态管理器
+
+        参数:
+        - config: 配置对象
+        - enable_sqlite_sync: 是否启用 SQLite 同步 (默认 True)
+        """
         self.config = config or get_config()
         self.config = config or get_config()
         self._state: Dict[str, Any] = {}
         self._state: Dict[str, Any] = {}
         # 与 security_utils.atomic_write_json 保持一致:state.json.lock
         # 与 security_utils.atomic_write_json 保持一致:state.json.lock
         self._lock_path = self.config.state_file.with_suffix(self.config.state_file.suffix + ".lock")
         self._lock_path = self.config.state_file.with_suffix(self.config.state_file.suffix + ".lock")
 
 
+        # v5.1: SQLite 同步
+        self._enable_sqlite_sync = enable_sqlite_sync
+        self._sql_state_manager = None
+        if enable_sqlite_sync:
+            try:
+                from .sql_state_manager import SQLStateManager
+                self._sql_state_manager = SQLStateManager(self.config)
+            except ImportError:
+                pass  # SQLStateManager 不可用时静默降级
+
         # 待写入的增量(锁内重读 + 合并 + 写入)
         # 待写入的增量(锁内重读 + 合并 + 写入)
         self._pending_entity_patches: Dict[tuple[str, str], _EntityPatch] = {}
         self._pending_entity_patches: Dict[tuple[str, str], _EntityPatch] = {}
         self._pending_alias_entries: Dict[str, List[Dict[str, str]]] = {}
         self._pending_alias_entries: Dict[str, List[Dict[str, str]]] = {}
@@ -95,6 +116,15 @@ class StateManager:
         self._pending_progress_chapter: Optional[int] = None
         self._pending_progress_chapter: Optional[int] = None
         self._pending_progress_words_delta: int = 0
         self._pending_progress_words_delta: int = 0
 
 
+        # v5.1: 缓存待同步到 SQLite 的数据
+        self._pending_sqlite_data: Dict[str, Any] = {
+            "entities_appeared": [],
+            "entities_new": [],
+            "state_changes": [],
+            "relationships_new": [],
+            "chapter": None
+        }
+
         self._load_state()
         self._load_state()
 
 
     def _now_progress_timestamp(self) -> str:
     def _now_progress_timestamp(self) -> str:
@@ -424,9 +454,49 @@ class StateManager:
                 self._pending_disambiguation_pending.clear()
                 self._pending_disambiguation_pending.clear()
                 self._pending_progress_chapter = None
                 self._pending_progress_chapter = None
                 self._pending_progress_words_delta = 0
                 self._pending_progress_words_delta = 0
+
+                # v5.1: 同步到 SQLite
+                self._sync_to_sqlite()
+
         except filelock.Timeout:
         except filelock.Timeout:
             raise RuntimeError("无法获取 state.json 文件锁,请稍后重试")
             raise RuntimeError("无法获取 state.json 文件锁,请稍后重试")
 
 
+    def _sync_to_sqlite(self):
+        """v5.1: 同步待处理数据到 SQLite"""
+        if not self._sql_state_manager:
+            return
+
+        sqlite_data = self._pending_sqlite_data
+        chapter = sqlite_data.get("chapter")
+
+        if chapter is None:
+            # 清空并返回
+            self._clear_pending_sqlite_data()
+            return
+
+        try:
+            self._sql_state_manager.process_chapter_entities(
+                chapter=chapter,
+                entities_appeared=sqlite_data.get("entities_appeared", []),
+                entities_new=sqlite_data.get("entities_new", []),
+                state_changes=sqlite_data.get("state_changes", []),
+                relationships_new=sqlite_data.get("relationships_new", [])
+            )
+        except Exception:
+            pass  # SQLite 同步失败时静默降级,不影响主流程
+        finally:
+            self._clear_pending_sqlite_data()
+
+    def _clear_pending_sqlite_data(self):
+        """清空待同步的 SQLite 数据"""
+        self._pending_sqlite_data = {
+            "entities_appeared": [],
+            "entities_new": [],
+            "state_changes": [],
+            "relationships_new": [],
+            "chapter": None
+        }
+
     # ==================== 进度管理 ====================
     # ==================== 进度管理 ====================
 
 
     def get_current_chapter(self) -> int:
     def get_current_chapter(self) -> int:
@@ -794,7 +864,7 @@ class StateManager:
 
 
     def process_chapter_result(self, chapter: int, result: Dict) -> List[str]:
     def process_chapter_result(self, chapter: int, result: Dict) -> List[str]:
         """
         """
-        处理 Data Agent 的章节处理结果 (v5.0)
+        处理 Data Agent 的章节处理结果 (v5.1)
 
 
         输入格式:
         输入格式:
         - entities_appeared: 出场实体列表
         - entities_appeared: 出场实体列表
@@ -806,12 +876,17 @@ class StateManager:
         """
         """
         warnings = []
         warnings = []
 
 
+        # v5.1: 记录章节号用于 SQLite 同步
+        self._pending_sqlite_data["chapter"] = chapter
+
         # 处理出场实体
         # 处理出场实体
         for entity in result.get("entities_appeared", []):
         for entity in result.get("entities_appeared", []):
             entity_id = entity.get("id")
             entity_id = entity.get("id")
             entity_type = entity.get("type")
             entity_type = entity.get("type")
             if entity_id:
             if entity_id:
                 self.update_entity_appearance(entity_id, chapter, entity_type)
                 self.update_entity_appearance(entity_id, chapter, entity_type)
+                # v5.1: 缓存用于 SQLite 同步
+                self._pending_sqlite_data["entities_appeared"].append(entity)
 
 
         # 处理新实体
         # 处理新实体
         for entity in result.get("entities_new", []):
         for entity in result.get("entities_new", []):
@@ -828,6 +903,8 @@ class StateManager:
                 )
                 )
                 if not self.add_entity(new_entity):
                 if not self.add_entity(new_entity):
                     warnings.append(f"实体已存在: {entity_id}")
                     warnings.append(f"实体已存在: {entity_id}")
+                # v5.1: 缓存用于 SQLite 同步
+                self._pending_sqlite_data["entities_new"].append(entity)
 
 
         # 处理状态变化
         # 处理状态变化
         for change in result.get("state_changes", []):
         for change in result.get("state_changes", []):
@@ -839,6 +916,8 @@ class StateManager:
                 reason=change.get("reason", ""),
                 reason=change.get("reason", ""),
                 chapter=chapter
                 chapter=chapter
             )
             )
+            # v5.1: 缓存用于 SQLite 同步
+            self._pending_sqlite_data["state_changes"].append(change)
 
 
         # 处理关系
         # 处理关系
         for rel in result.get("relationships_new", []):
         for rel in result.get("relationships_new", []):
@@ -849,6 +928,8 @@ class StateManager:
                 description=rel.get("description", ""),
                 description=rel.get("description", ""),
                 chapter=chapter
                 chapter=chapter
             )
             )
+            # v5.1: 缓存用于 SQLite 同步
+            self._pending_sqlite_data["relationships_new"].append(rel)
 
 
         # 处理消歧不确定项(不影响实体写入,但必须对 Writer 可见)
         # 处理消歧不确定项(不影响实体写入,但必须对 Writer 可见)
         warnings.extend(self._record_disambiguation(chapter, result.get("uncertain", [])))
         warnings.extend(self._record_disambiguation(chapter, result.get("uncertain", [])))

+ 154 - 37
.claude/skills/webnovel-query/references/system-data-flow.md

@@ -1,7 +1,7 @@
 ---
 ---
 name: system-data-flow
 name: system-data-flow
 purpose: 项目初始化和状态查询时加载,理解数据结构
 purpose: 项目初始化和状态查询时加载,理解数据结构
-version: "5.0"
+version: "5.1"
 ---
 ---
 
 
 <context>
 <context>
@@ -18,51 +18,77 @@ version: "5.0"
 ├── 大纲/           # 卷纲/章纲/场景纲
 ├── 大纲/           # 卷纲/章纲/场景纲
 ├── 设定集/         # 世界观/力量体系/角色卡/物品卡
 ├── 设定集/         # 世界观/力量体系/角色卡/物品卡
 └── .webnovel/
 └── .webnovel/
-    ├── state.json          # 权威状态(entities_v3 + alias_index + 进度/主角/strand_tracker)
+    ├── state.json          # 精简状态 (< 5KB):进度/主角/strand_tracker/消歧
+    ├── index.db            # SQLite 主存储:实体/别名/关系/状态变化/章节/场景
     ├── workflow_state.json # 工作流断点(用于 /webnovel-resume)
     ├── workflow_state.json # 工作流断点(用于 /webnovel-resume)
-    ├── index.db            # SQLite 索引(章节/实体/别名/关系/伏笔,可重建)
+    ├── vectors.db          # RAG 向量数据库
     └── archive/            # 归档数据(不活跃角色/已回收伏笔)
     └── archive/            # 归档数据(不活跃角色/已回收伏笔)
 ```
 ```
 
 
-## v5.0 双 Agent 架构
+## v5.1 架构变更
+
+**核心变化**: 解决 state.json 膨胀问题(20章后 token 爆炸)
+
+| 数据类型 | v5.0 存储位置 | v5.1 存储位置 |
+|----------|--------------|--------------|
+| entities_v3 | state.json | **index.db** (entities 表) |
+| alias_index | state.json | **index.db** (aliases 表) |
+| state_changes | state.json | **index.db** (state_changes 表) |
+| structured_relationships | state.json | **index.db** (relationships 表) |
+| progress | state.json | state.json (保留) |
+| protagonist_state | state.json | state.json (保留) |
+| strand_tracker | state.json | state.json (保留) |
+| disambiguation_* | state.json | state.json (保留) |
+
+## v5.1 双 Agent 架构
 
 
 ```
 ```
 写作前: Context Agent 读取数据 → 组装上下文包
 写作前: Context Agent 读取数据 → 组装上下文包
+        ├── 从 state.json 读取精简数据(进度/配置)
+        └── 从 index.db SQL 按需查询(实体/关系)
+
 写作中: Writer 使用上下文包生成纯正文(无 XML 标签)
 写作中: Writer 使用上下文包生成纯正文(无 XML 标签)
+
 写作后: Data Agent 处理正文 → AI 提取实体 → 写入数据链
 写作后: Data Agent 处理正文 → AI 提取实体 → 写入数据链
+        ├── 写入 index.db(实体/别名/状态变化/关系)
+        └── 更新 state.json(进度/主角快照)
 
 
-Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
+Context Agent (读) ←→ index.db + state.json ←→ Data Agent (写)
 ```
 ```
 
 
-## 脚本/模块职责速查 (v5.0)
+## 脚本/模块职责速查 (v5.1)
 
 
 ### 核心脚本
 ### 核心脚本
 
 
 | 脚本 | 输入 | 输出 |
 | 脚本 | 输入 | 输出 |
 |------|------|------|
 |------|------|------|
-| `init_project.py` | 项目信息 | 生成 `.webnovel/state.json`  |
+| `init_project.py` | 项目信息 | 生成 `.webnovel/state.json` + 初始化 `index.db` |
 | `update_state.py` | 参数 | 原子更新 `state.json` 字段(进度/主角/strand_tracker) |
 | `update_state.py` | 参数 | 原子更新 `state.json` 字段(进度/主角/strand_tracker) |
 | `backup_manager.py` | 章节号 | 自动 Git 备份 |
 | `backup_manager.py` | 章节号 | 自动 Git 备份 |
 | `status_reporter.py` | 无 | 生成健康报告/伏笔紧急度 |
 | `status_reporter.py` | 无 | 生成健康报告/伏笔紧急度 |
 | `archive_manager.py` | 无 | 归档不活跃数据 |
 | `archive_manager.py` | 无 | 归档不活跃数据 |
+| `migrate_state_to_sqlite.py` | 项目路径 | 迁移旧 state.json 到 SQLite (v5.1 新增) |
 
 
 ### data_modules 模块
 ### data_modules 模块
 
 
 | 模块 | 职责 |
 | 模块 | 职责 |
 |------|------|
 |------|------|
-| `state_manager.py` | 实体状态管理(读写 entities_v3) |
-| `index_manager.py` | SQLite 索引管理(章节/实体/场景查询) |
-| `entity_linker.py` | 别名注册与消歧(alias_index 管理) |
+| `state_manager.py` | 实体状态管理(精简 state.json + SQLite 同步) |
+| `sql_state_manager.py` | SQLite 状态管理(v5.1 新增,替代 JSON 写入) |
+| `index_manager.py` | SQLite 索引管理(实体/别名/关系/状态变化/章节/场景) |
+| `entity_linker.py` | 别名注册与消歧 |
 | `rag_adapter.py` | 向量嵌入与语义检索 |
 | `rag_adapter.py` | 向量嵌入与语义检索 |
 | `style_sampler.py` | 风格样本提取与管理 |
 | `style_sampler.py` | 风格样本提取与管理 |
 | `api_client.py` | LLM API 调用封装 |
 | `api_client.py` | LLM API 调用封装 |
 | `config.py` | 配置管理 |
 | `config.py` | 配置管理 |
 
 
-## 每章数据链(v5.0 顺序)
+## 每章数据链(v5.1 顺序)
 
 
 ```
 ```
 1. Context Agent 组装上下文包
 1. Context Agent 组装上下文包
-   → 读取大纲/state.json/index.db/RAG
+   → 读取 state.json(精简版:进度/配置)
+   → SQL 查询 index.db(核心实体/按需实体)
+   → RAG 检索(相关场景)
    → 输出上下文包 JSON
    → 输出上下文包 JSON
 
 
 2. Writer 生成章节内容
 2. Writer 生成章节内容
@@ -80,8 +106,8 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
 5. Data Agent 处理数据链
 5. Data Agent 处理数据链
    → AI 实体提取(替代 XML 标签解析)
    → AI 实体提取(替代 XML 标签解析)
    → 实体消歧(置信度策略)
    → 实体消歧(置信度策略)
-   → 更新 state.json (entities_v3 + alias_index + 进度/消歧记录)
-   → 更新 index.db
+   → 写入 index.db(实体/别名/状态变化/关系)
+   → 更新 state.json(进度/主角快照)
    → 向量嵌入 (RAG)
    → 向量嵌入 (RAG)
    → 风格样本评估
    → 风格样本评估
 
 
@@ -90,7 +116,7 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
 
 
 > `update_state.py` 用于手动/脚本化更新 `progress`/`protagonist_state`/`strand_tracker` 等字段;主流程通常由 Data Agent 在处理数据链时同步推进进度。
 > `update_state.py` 用于手动/脚本化更新 `progress`/`protagonist_state`/`strand_tracker` 等字段;主流程通常由 Data Agent 在处理数据链时同步推进进度。
 
 
-## state.json 核心字段 (v5.0)
+## state.json 精简结构 (v5.1)
 
 
 ```json
 ```json
 {
 {
@@ -102,22 +128,6 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
     "location": {"current": "", "last_chapter": 0},
     "location": {"current": "", "last_chapter": 0},
     "golden_finger": {"name": "", "level": 1, "skills": []}
     "golden_finger": {"name": "", "level": 1, "skills": []}
   },
   },
-  "entities_v3": {
-    "角色": {"entity_id": {"canonical_name": "", "aliases": [], "tier": "", "current": {}, "history": []}},
-    "地点": {},
-    "物品": {},
-    "势力": {},
-    "招式": {}
-  },
-  "alias_index": {
-    "别名": [{"type": "角色", "id": "entity_id"}]
-  },
-  "relationships": {},
-  "structured_relationships": [],
-  "disambiguation_warnings": [],
-  "disambiguation_pending": [],
-  "plot_threads": {"active_threads": [], "foreshadowing": []},
-  "world_settings": {},
   "strand_tracker": {
   "strand_tracker": {
     "last_quest_chapter": 0,
     "last_quest_chapter": 0,
     "last_fire_chapter": 0,
     "last_fire_chapter": 0,
@@ -126,10 +136,71 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
     "chapters_since_switch": 0,
     "chapters_since_switch": 0,
     "history": []
     "history": []
   },
   },
-  "review_checkpoints": []
+  "relationships": {},
+  "plot_threads": {"active_threads": [], "foreshadowing": []},
+  "world_settings": {},
+  "disambiguation_warnings": [],
+  "disambiguation_pending": [],
+  "review_checkpoints": [],
+  "_migrated_to_sqlite": true
 }
 }
 ```
 ```
 
 
+> **v5.1 变更**: entities_v3、alias_index、state_changes、structured_relationships 已迁移到 index.db,不再存储在 state.json 中。
+
+## index.db 表结构 (v5.1)
+
+```sql
+-- 实体表
+CREATE TABLE entities (
+    id TEXT PRIMARY KEY,
+    type TEXT NOT NULL,           -- 角色/地点/物品/势力/招式
+    canonical_name TEXT NOT NULL,
+    tier TEXT DEFAULT '装饰',     -- 核心/重要/次要/装饰
+    desc TEXT,
+    current_json TEXT,            -- JSON: {realm, location, ...}
+    first_appearance INTEGER,
+    last_appearance INTEGER,
+    is_protagonist INTEGER DEFAULT 0,
+    is_archived INTEGER DEFAULT 0
+);
+
+-- 别名表(一对多)
+CREATE TABLE aliases (
+    alias TEXT NOT NULL,
+    entity_id TEXT NOT NULL,
+    entity_type TEXT NOT NULL,
+    PRIMARY KEY (alias, entity_id, entity_type)
+);
+
+-- 状态变化表
+CREATE TABLE state_changes (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    entity_id TEXT NOT NULL,
+    field TEXT NOT NULL,
+    old_value TEXT,
+    new_value TEXT,
+    reason TEXT,
+    chapter INTEGER NOT NULL
+);
+
+-- 关系表
+CREATE TABLE relationships (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    from_entity TEXT NOT NULL,
+    to_entity TEXT NOT NULL,
+    type TEXT NOT NULL,
+    description TEXT,
+    chapter INTEGER NOT NULL,
+    UNIQUE(from_entity, to_entity, type)
+);
+
+-- 原有表(保留)
+CREATE TABLE chapters (...);
+CREATE TABLE scenes (...);
+CREATE TABLE appearances (...);
+```
+
 ## Data Agent AI 提取流程
 ## Data Agent AI 提取流程
 
 
 v5.0 不再要求 XML 标签,由 Data Agent 智能提取:
 v5.0 不再要求 XML 标签,由 Data Agent 智能提取:
@@ -180,11 +251,44 @@ cat .webnovel/state.json | jq '.progress'
 </example>
 </example>
 
 
 <example>
 <example>
-<input>查询实体别名</input>
+<input>v5.1: 查询实体(SQL)</input>
+<output>
+```bash
+python -m data_modules.index_manager get-entity --id "xiaoyan" --project-root "."
+# 输出: {"id": "xiaoyan", "type": "角色", "canonical_name": "萧炎", ...}
+
+python -m data_modules.index_manager get-core-entities --project-root "."
+# 输出: 所有核心实体(主角 + tier=核心/重要)
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 按别名查找实体(一对多)</input>
+<output>
+```bash
+python -m data_modules.index_manager get-by-alias --alias "天云宗" --project-root "."
+# 输出: [{"id": "loc_tianyunzong", "type": "地点"}, {"id": "faction_tianyunzong", "type": "势力"}]
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 查询状态变化</input>
+<output>
+```bash
+python -m data_modules.index_manager get-state-changes --entity "xiaoyan" --limit 10 --project-root "."
+# 输出: [{entity_id, field, old_value, new_value, reason, chapter}, ...]
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 查询关系</input>
 <output>
 <output>
 ```bash
 ```bash
-cat .webnovel/state.json | jq '.alias_index["林天"]'
-# 输出: [{"type": "角色", "id": "lintian"}]
+python -m data_modules.index_manager get-relationships --entity "xiaoyan" --project-root "."
+# 输出: [{from_entity, to_entity, type, description, chapter}, ...]
 ```
 ```
 </output>
 </output>
 </example>
 </example>
@@ -207,13 +311,26 @@ python -m data_modules.index_manager entity-appearances --entity "lintian" --pro
 </output>
 </output>
 </example>
 </example>
 
 
+<example>
+<input>v5.1: 迁移旧 state.json 到 SQLite</input>
+<output>
+```bash
+python -m data_modules.migrate_state_to_sqlite --project-root "." --backup
+# 自动备份 state.json,迁移数据到 index.db,精简 state.json
+```
+</output>
+</example>
+
 </examples>
 </examples>
 
 
 <errors>
 <errors>
 ❌ 伏笔状态写成"待回收" → ✅ 使用规范值"未回收"
 ❌ 伏笔状态写成"待回收" → ✅ 使用规范值"未回收"
 ❌ 手工更新忘记加 planted_chapter → ✅ 脚本已自动补全
 ❌ 手工更新忘记加 planted_chapter → ✅ 脚本已自动补全
 ❌ 归档路径混淆 → ✅ 固定为 `.webnovel/archive/*.json`
 ❌ 归档路径混淆 → ✅ 固定为 `.webnovel/archive/*.json`
-❌ alias_index 期望单对象 → ✅ v5.0 使用数组格式(一对多)
-❌ 期望 XML 标签提取 → ✅ v5.0 由 Data Agent AI 自动提取
+❌ alias_index 期望单对象 → ✅ v5.0+ 使用数组格式(一对多)
+❌ 期望 XML 标签提取 → ✅ v5.0+ 由 Data Agent AI 自动提取
 ❌ 使用旧版 data_modules.state_manager schema → ✅ 统一使用 entities_v3 结构
 ❌ 使用旧版 data_modules.state_manager schema → ✅ 统一使用 entities_v3 结构
+❌ v5.1 仍从 state.json 读取 entities_v3 → ✅ 改用 SQL 查询 index.db
+❌ v5.1 仍写入 state.json 大数据 → ✅ 改用 SQLite 增量写入
+❌ v5.1 state.json 膨胀 → ✅ 运行迁移脚本: `python -m data_modules.migrate_state_to_sqlite`
 </errors>
 </errors>