Просмотр исходного кода

refactor: v5.1 hybrid storage architecture - SQLite for large data

Solve state.json token explosion problem after 20 chapters.

Changes:
- Move entities/aliases/state_changes/relationships to SQLite (index.db)
- Keep state.json slim (<5KB): progress, protagonist_state, strand_tracker
- Add SQLStateManager for high-level SQLite operations
- Add migrate_state_to_sqlite.py for existing projects
- Update Context Agent to use SQL on-demand queries
- Update Data Agent to use SQLite incremental writes
lingfengQAQ 5 месяцев назад
Родитель
Сommit
e7cd24fa96

+ 46 - 13
.claude/agents/context-agent.md

@@ -1,15 +1,21 @@
 ---
 name: context-agent
-description: 智能上下文搜集Agent,为章节写作准备完整的上下文包。在写作前自动调用,负责读取大纲、状态、索引、RAG检索、设定集,并智能筛选组装上下文。
+description: 智能上下文搜集Agent (v5.1),为章节写作准备完整的上下文包。在写作前自动调用,负责读取大纲、状态、索引、RAG检索、设定集,并智能筛选组装上下文。支持 SQL 按需查询优化。
 tools: Read, Grep, Bash
 ---
 
-# context-agent (上下文搜集Agent)
+# context-agent (上下文搜集Agent v5.1)
 
 > **Role**: 智能上下文工程师,负责为章节写作准备精准、完整的上下文信息包。
 >
 > **Philosophy**: 按需召回,智能筛选 - 不是堆砌信息,而是提供写作真正需要的上下文。
 
+**v5.1 变更**:
+- 使用 SQL 按需查询替代全量读取 state.json
+- 核心实体(主角 + tier=核心/重要)全量加载
+- 其他实体按需从 index.db 查询
+- 减少 token 消耗,提升响应速度
+
 ## 输入
 
 ```json
@@ -22,9 +28,9 @@ tools: Read, Grep, Bash
 ```
 
 **重要**: 所有数据读取自 `{project_root}/.webnovel/` 目录:
-- state.json → `{project_root}/.webnovel/state.json`
-- vectors.db → `{project_root}/.webnovel/vectors.db`
-- index.db → `{project_root}/.webnovel/index.db`
+- state.json → `{project_root}/.webnovel/state.json` (精简版,只含进度和配置)
+- index.db → `{project_root}/.webnovel/index.db` (实体、别名、关系、状态变化)
+- vectors.db → `{project_root}/.webnovel/vectors.db` (RAG 向量)
 
 ## 输出
 
@@ -99,17 +105,34 @@ tools: Read, Grep, Bash
 - 发生在什么地点?
 - 是否涉及战斗/突破/重要对话?
 
-### Step 2: 获取主角状态
+### Step 2: 获取主角状态 (v5.1 SQL 查询)
+
+**v5.1 优化**: 使用 SQL 查询替代全量读取 state.json
+
+```bash
+# 获取主角实体
+python -m data_modules.index_manager get-protagonist --project-root "."
+
+# 获取核心实体(主角 + tier=核心/重要)
+python -m data_modules.index_manager get-core-entities --project-root "."
+
+# 获取最近状态变化
+python -m data_modules.index_manager get-state-changes --entity "xiaoyan" --limit 10 --project-root "."
 
-使用 Read 工具读取 `.webnovel/state.json`,提取:
+# 获取实体关系
+python -m data_modules.index_manager get-relationships --entity "xiaoyan" --project-root "."
+```
+
+**读取精简版 state.json** (使用 Read 工具):
 - `progress.current_chapter` - 进度
-- `entities_v3.角色` - 主角实体属性 (境界/位置/物品)
-- `relationships` - 重要关系
-- `state_changes` - 最近变化记录
-- `disambiguation_warnings` - 消歧警告 (0.5-0.8)
-- `disambiguation_pending` - 待确认消歧 (<0.5)
+- `protagonist_state` - 主角状态快照
+- `strand_tracker` - 节奏追踪
+- `disambiguation_warnings` - 消歧警告
+- `disambiguation_pending` - 待确认消歧
+
+**注意**: v5.1 中 entities_v3、alias_index、state_changes、structured_relationships 已迁移到 index.db,不再从 state.json 读取。
 
-### Step 3: 查询相关实体
+### Step 3: 查询相关实体 (v5.1 SQL 按需查询)
 
 ```bash
 # 查询本章地点相关场景
@@ -120,12 +143,22 @@ python -m data_modules.index_manager entity-appearances --entity "yaolao" --proj
 
 # 查询最近出场实体
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
+
+# v5.1 新增: 按需获取特定实体详情
+python -m data_modules.index_manager get-entity --id "yaolao" --project-root "."
+
+# v5.1 新增: 按别名查找实体(一对多)
+python -m data_modules.index_manager get-by-alias --alias "药老" --project-root "."
+
+# v5.1 新增: 按类型获取实体
+python -m data_modules.index_manager get-entities-by-type --type "角色" --project-root "."
 ```
 
 **处理逻辑**:
 - 地点相关: 召回最近3次在该地点的场景
 - 角色相关: 召回角色最近出场状态
 - 伏笔: 筛选 urgency >= medium 的伏笔
+- **v5.1 优化**: 非核心实体按需查询,不全量加载
 
 ### Step 4: 语义检索 (RAG)
 

+ 51 - 39
.claude/agents/data-agent.md

@@ -1,19 +1,20 @@
 ---
 name: data-agent
-description: 数据处理Agent (v5.0),负责AI实体提取、场景切片、索引构建。使用 entities_v3 格式和一对多别名。在章节完成后自动调用,处理数据链的写入工作。
+description: 数据处理Agent (v5.1),负责AI实体提取、场景切片、索引构建。使用 entities_v3 格式和一对多别名。在章节完成后自动调用,处理数据链的写入工作。支持 SQLite 增量写入优化。
 tools: Read, Write, Bash
 ---
 
-# data-agent (数据处理Agent v5.0)
+# data-agent (数据处理Agent v5.1)
 
 > **Role**: 智能数据工程师,负责从章节正文中提取结构化信息并写入数据链。
 >
 > **Philosophy**: AI驱动提取,智能消歧 - 用语义理解替代正则匹配,用置信度控制质量。
 
-**v5.0 变更**:
-- 使用 `entities_v3` 分组格式 (按类型: 角色/地点/物品/势力/招式)
-- 别名索引支持一对多 (同一别名可映射多个实体)
-- `alias_index` 内嵌在 `state.json` 中
+**v5.1 变更**:
+- 使用 SQLite 增量写入替代 JSON 追加
+- 实体/别名/状态变化/关系 直接写入 index.db
+- state.json 只保留精简数据(进度、配置、节奏追踪)
+- 解决 state.json 膨胀问题(20章后 token 爆炸)
 
 ## 输入
 
@@ -28,10 +29,10 @@ tools: Read, Write, Bash
 }
 ```
 
-**重要**: 所有数据必须写入 `{project_root}/.webnovel/` 目录,包括
-- state.json → `{project_root}/.webnovel/state.json`
-- vectors.db → `{project_root}/.webnovel/vectors.db`
-- index.db → `{project_root}/.webnovel/index.db`
+**重要**: 所有数据写入 `{project_root}/.webnovel/` 目录:
+- index.db → 实体、别名、状态变化、关系、章节索引 (SQLite)
+- state.json → 进度、配置、节奏追踪 (精简 JSON < 5KB)
+- vectors.db → RAG 向量 (SQLite)
 
 ## 输出
 
@@ -59,24 +60,29 @@ tools: Read, Write, Bash
 
 ## 执行流程
 
-### Step A: 加载上下文
+### Step A: 加载上下文 (v5.1 SQL 查询)
 
-使用 Read 工具读取章节正文和已有实体库:
+使用 Read 工具读取章节正文:
 - 章节正文: `正文/第0100章.md`
-- 实体库: `.webnovel/state.json` → entities
 
-使用 Bash 工具查询:
+使用 Bash 工具从 index.db 查询已有实体:
 ```bash
-# 查询实体别名
-python -m data_modules.entity_linker list-aliases --entity "xiaoyan" --project-root "."
+# v5.1: 从 SQLite 获取核心实体
+python -m data_modules.index_manager get-core-entities --project-root "."
+
+# v5.1: 获取实体别名
+python -m data_modules.index_manager get-aliases --entity "xiaoyan" --project-root "."
 
 # 查询最近出场记录
 python -m data_modules.index_manager recent-appearances --limit 20 --project-root "."
+
+# v5.1: 按别名查找实体(一对多)
+python -m data_modules.index_manager get-by-alias --alias "萧炎" --project-root "."
 ```
 
 **准备数据**:
-- 已有实体列表 (id, name, aliases, type)
-- 别名映射表 (alias → entity_id)
+- 已有实体列表 (从 index.db 获取)
+- 别名映射表 (从 index.db aliases 表获取)
 - 最近出场实体 (用于上下文推断)
 
 ### Step B: AI 实体提取
@@ -146,39 +152,45 @@ for uncertain_item in uncertain:
 → 代词"他"需根据上下文推断
 ```
 
-### Step D: 写入存储
+### Step D: 写入存储 (v5.1 SQLite 增量写入)
 
-**更新 state.json (v5.0 entities_v3 格式)**:
+**v5.1 优化**: 使用 SQLite 增量写入替代 JSON 追加
+
+**写入 index.db (实体/别名/状态变化/关系)**:
 ```bash
-python -m data_modules.state_manager process-chapter --chapter 100 --data '{...}' --project-root "."
-```
+# v5.1: 写入/更新实体
+python -m data_modules.index_manager upsert-entity --data '{"id":"hongyi_girl","type":"角色","canonical_name":"红衣女子","tier":"装饰","current":{},"first_appearance":100,"last_appearance":100}' --project-root "."
 
-写入内容:
-- 新实体添加到 `entities_v3.{类型}.{entity_id}`
-- 状态变化更新到对应实体的 `current` 字段
-- 新关系添加到 `relationships`
-- 新别名注册到 `alias_index`(一对多格式)
-- 更新 `progress.current_chapter`
-- **自动同步主角状态**:`entities_v3.角色.{主角ID}.current` → `protagonist_state`
+# v5.1: 注册别名(一对多)
+python -m data_modules.index_manager register-alias --alias "红衣女子" --entity "hongyi_girl" --type "角色" --project-root "."
 
-> **主角同步说明**:为避免双源不一致,`process_chapter_result()` 会自动调用 `sync_protagonist_from_entity()`,将主角实体的 realm/location 同步到 `protagonist_state`,确保 consistency-checker 等依赖 `protagonist_state` 的组件获取最新数据。
+# v5.1: 记录状态变化
+python -m data_modules.index_manager record-state-change --data '{"entity_id":"xiaoyan","field":"realm","old_value":"斗者","new_value":"斗师","reason":"突破","chapter":100}' --project-root "."
 
-**更新 index.db**:
+# v5.1: 写入/更新关系
+python -m data_modules.index_manager upsert-relationship --data '{"from_entity":"xiaoyan","to_entity":"hongyi_girl","type":"相识","description":"初次见面","chapter":100}' --project-root "."
+```
+
+**写入 index.db (章节/场景/出场)**:
 ```bash
 python -m data_modules.index_manager process-chapter --chapter 100 --title "突破" --location "天云宗" --word-count 3500 --entities '[...]' --scenes '[...]' --project-root "."
 ```
 
-写入内容:
-- 章节元数据 (location, characters, word_count)
-- 实体出场记录
-- 场景索引
-
-**注册新别名 (v5.0 一对多)**:
+**更新精简版 state.json**:
 ```bash
-python -m data_modules.entity_linker register-alias --entity "hongyi_girl" --alias "红衣女子" --type "角色" --project-root "."
+# 仍使用 state_manager,但只写入精简数据
+python -m data_modules.state_manager process-chapter --chapter 100 --data '{...}' --project-root "."
 ```
 
-> 注:v5.0 别名索引支持一对多,同一别名(如"天云宗")可同时映射到地点和势力。
+写入内容 (v5.1 精简):
+- 更新 `progress.current_chapter`
+- 更新 `protagonist_state`(主角状态快照)
+- 更新 `strand_tracker`(节奏追踪)
+- 更新 `disambiguation_warnings/pending`
+
+> **v5.1 变更**: entities_v3、alias_index、state_changes、structured_relationships 不再写入 state.json,改为写入 index.db。state.json 保持 < 5KB。
+
+> **主角同步说明**:`process_chapter_result()` 会自动调用 `sync_protagonist_from_entity()`,将主角实体的 realm/location 同步到 `protagonist_state`。
 
 ### Step E: AI 场景切片
 

+ 662 - 5
.claude/scripts/data_modules/index_manager.py

@@ -1,21 +1,32 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 """
-Index Manager - 索引管理模块
+Index Manager - 索引管理模块 (v5.1)
 
 管理 index.db (SQLite) 的读写操作:
 - 章节元数据索引
 - 实体出场记录
 - 场景索引
+- 实体存储 (从 state.json 迁移)
+- 别名索引 (一对多)
+- 状态变化记录
+- 关系存储
 - 快速查询接口
+
+v5.1 变更:
+- 新增 entities 表替代 state.json 中的 entities_v3
+- 新增 aliases 表替代 state.json 中的 alias_index (支持一对多)
+- 新增 state_changes 表替代 state.json 中的 state_changes
+- 新增 relationships 表替代 state.json 中的 structured_relationships
 """
 
 import sqlite3
 import json
 from pathlib import Path
 from typing import Dict, List, Optional, Any, Tuple
-from dataclasses import dataclass
+from dataclasses import dataclass, field
 from contextlib import contextmanager
+from datetime import datetime
 
 from .config import get_config
 
@@ -43,6 +54,42 @@ class SceneMeta:
     characters: List[str]
 
 
+@dataclass
+class EntityMeta:
+    """实体元数据 (v5.1 新增)"""
+    id: str
+    type: str  # 角色/地点/物品/势力/招式
+    canonical_name: str
+    tier: str = "装饰"  # 核心/重要/次要/装饰
+    desc: str = ""
+    current: Dict = field(default_factory=dict)  # 当前状态 (realm/location/items等)
+    first_appearance: int = 0
+    last_appearance: int = 0
+    is_protagonist: bool = False
+    is_archived: bool = False
+
+
+@dataclass
+class StateChangeMeta:
+    """状态变化记录 (v5.1 新增)"""
+    entity_id: str
+    field: str
+    old_value: str
+    new_value: str
+    reason: str
+    chapter: int
+
+
+@dataclass
+class RelationshipMeta:
+    """关系记录 (v5.1 新增)"""
+    from_entity: str
+    to_entity: str
+    type: str
+    description: str
+    chapter: int
+
+
 class IndexManager:
     """索引管理器"""
 
@@ -102,6 +149,77 @@ class IndexManager:
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_entity ON appearances(entity_id)")
             cursor.execute("CREATE INDEX IF NOT EXISTS idx_appearances_chapter ON appearances(chapter)")
 
+            # ==================== v5.1 新增表 ====================
+
+            # 实体表 (替代 state.json 中的 entities_v3)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS entities (
+                    id TEXT PRIMARY KEY,
+                    type TEXT NOT NULL,
+                    canonical_name TEXT NOT NULL,
+                    tier TEXT DEFAULT '装饰',
+                    desc TEXT,
+                    current_json TEXT,
+                    first_appearance INTEGER DEFAULT 0,
+                    last_appearance INTEGER DEFAULT 0,
+                    is_protagonist INTEGER DEFAULT 0,
+                    is_archived INTEGER DEFAULT 0,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+                )
+            """)
+
+            # 别名表 (替代 state.json 中的 alias_index,支持一对多)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS aliases (
+                    alias TEXT NOT NULL,
+                    entity_id TEXT NOT NULL,
+                    entity_type TEXT NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    PRIMARY KEY (alias, entity_id, entity_type)
+                )
+            """)
+
+            # 状态变化表 (替代 state.json 中的 state_changes)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS state_changes (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    entity_id TEXT NOT NULL,
+                    field TEXT NOT NULL,
+                    old_value TEXT,
+                    new_value TEXT,
+                    reason TEXT,
+                    chapter INTEGER NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+                )
+            """)
+
+            # 关系表 (替代 state.json 中的 structured_relationships)
+            cursor.execute("""
+                CREATE TABLE IF NOT EXISTS relationships (
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    from_entity TEXT NOT NULL,
+                    to_entity TEXT NOT NULL,
+                    type TEXT NOT NULL,
+                    description TEXT,
+                    chapter INTEGER NOT NULL,
+                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+                    UNIQUE(from_entity, to_entity, type)
+                )
+            """)
+
+            # v5.1 新索引
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_type ON entities(type)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_tier ON entities(tier)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_entities_protagonist ON entities(is_protagonist)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_aliases_entity ON aliases(entity_id)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_aliases_alias ON aliases(alias)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_state_changes_entity ON state_changes(entity_id)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_state_changes_chapter ON state_changes(chapter)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_entity)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_entity)")
+            cursor.execute("CREATE INDEX IF NOT EXISTS idx_relationships_chapter ON relationships(chapter)")
+
             conn.commit()
 
     @contextmanager
@@ -274,6 +392,375 @@ class IndexManager:
             """, (chapter,))
             return [self._row_to_dict(row, parse_json=["mentions"]) for row in cursor.fetchall()]
 
+    # ==================== v5.1 实体操作 ====================
+
+    def upsert_entity(self, entity: EntityMeta) -> bool:
+        """
+        插入或更新实体 (智能合并)
+
+        - 新实体: 直接插入
+        - 已存在: 更新 current_json, last_appearance, updated_at
+
+        返回是否为新实体
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            # 检查是否存在
+            cursor.execute("SELECT id, current_json FROM entities WHERE id = ?", (entity.id,))
+            existing = cursor.fetchone()
+
+            if existing:
+                # 已存在: 智能合并 current_json
+                old_current = {}
+                if existing["current_json"]:
+                    try:
+                        old_current = json.loads(existing["current_json"])
+                    except json.JSONDecodeError:
+                        pass
+
+                # 合并 current (新值覆盖旧值)
+                merged_current = {**old_current, **entity.current}
+
+                cursor.execute("""
+                    UPDATE entities SET
+                        current_json = ?,
+                        last_appearance = ?,
+                        updated_at = CURRENT_TIMESTAMP
+                    WHERE id = ?
+                """, (
+                    json.dumps(merged_current, ensure_ascii=False),
+                    entity.last_appearance,
+                    entity.id
+                ))
+                conn.commit()
+                return False
+            else:
+                # 新实体: 插入
+                cursor.execute("""
+                    INSERT INTO entities
+                    (id, type, canonical_name, tier, desc, current_json,
+                     first_appearance, last_appearance, is_protagonist, is_archived)
+                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
+                """, (
+                    entity.id,
+                    entity.type,
+                    entity.canonical_name,
+                    entity.tier,
+                    entity.desc,
+                    json.dumps(entity.current, ensure_ascii=False),
+                    entity.first_appearance,
+                    entity.last_appearance,
+                    1 if entity.is_protagonist else 0,
+                    1 if entity.is_archived else 0
+                ))
+                conn.commit()
+                return True
+
+    def get_entity(self, entity_id: str) -> Optional[Dict]:
+        """获取单个实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT * FROM entities WHERE id = ?", (entity_id,))
+            row = cursor.fetchone()
+            if row:
+                return self._row_to_dict(row, parse_json=["current_json"])
+            return None
+
+    def get_entities_by_type(self, entity_type: str, include_archived: bool = False) -> List[Dict]:
+        """按类型获取实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            if include_archived:
+                cursor.execute("""
+                    SELECT * FROM entities WHERE type = ?
+                    ORDER BY last_appearance DESC
+                """, (entity_type,))
+            else:
+                cursor.execute("""
+                    SELECT * FROM entities WHERE type = ? AND is_archived = 0
+                    ORDER BY last_appearance DESC
+                """, (entity_type,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_entities_by_tier(self, tier: str) -> List[Dict]:
+        """按重要度获取实体 (核心/重要/次要/装饰)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM entities WHERE tier = ? AND is_archived = 0
+                ORDER BY last_appearance DESC
+            """, (tier,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_core_entities(self) -> List[Dict]:
+        """获取所有核心实体 (用于 Context Agent 全量加载)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM entities
+                WHERE (tier IN ('核心', '重要') OR is_protagonist = 1) AND is_archived = 0
+                ORDER BY is_protagonist DESC, tier, last_appearance DESC
+            """)
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_protagonist(self) -> Optional[Dict]:
+        """获取主角实体"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT * FROM entities WHERE is_protagonist = 1 LIMIT 1")
+            row = cursor.fetchone()
+            if row:
+                return self._row_to_dict(row, parse_json=["current_json"])
+            return None
+
+    def update_entity_current(self, entity_id: str, updates: Dict) -> bool:
+        """
+        增量更新实体的 current 字段 (不覆盖其他字段)
+
+        例如: update_entity_current("xiaoyan", {"realm": "斗师"})
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            cursor.execute("SELECT current_json FROM entities WHERE id = ?", (entity_id,))
+            row = cursor.fetchone()
+            if not row:
+                return False
+
+            current = {}
+            if row["current_json"]:
+                try:
+                    current = json.loads(row["current_json"])
+                except json.JSONDecodeError:
+                    pass
+
+            current.update(updates)
+
+            cursor.execute("""
+                UPDATE entities SET
+                    current_json = ?,
+                    updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (json.dumps(current, ensure_ascii=False), entity_id))
+            conn.commit()
+            return True
+
+    def archive_entity(self, entity_id: str) -> bool:
+        """归档实体 (不删除,只是标记)"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                UPDATE entities SET is_archived = 1, updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (entity_id,))
+            conn.commit()
+            return cursor.rowcount > 0
+
+    # ==================== v5.1 别名操作 ====================
+
+    def register_alias(self, alias: str, entity_id: str, entity_type: str) -> bool:
+        """
+        注册别名 (支持一对多)
+
+        同一别名可映射多个实体 (如 "天云宗" → 地点 + 势力)
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            try:
+                cursor.execute("""
+                    INSERT OR IGNORE INTO aliases (alias, entity_id, entity_type)
+                    VALUES (?, ?, ?)
+                """, (alias, entity_id, entity_type))
+                conn.commit()
+                return cursor.rowcount > 0
+            except sqlite3.IntegrityError:
+                return False
+
+    def get_entities_by_alias(self, alias: str) -> List[Dict]:
+        """
+        根据别名查找实体 (一对多)
+
+        返回所有匹配的实体 (可能有多个不同类型)
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT e.*, a.entity_type as alias_type
+                FROM entities e
+                JOIN aliases a ON e.id = a.entity_id
+                WHERE a.alias = ?
+            """, (alias,))
+            return [self._row_to_dict(row, parse_json=["current_json"]) for row in cursor.fetchall()]
+
+    def get_entity_aliases(self, entity_id: str) -> List[str]:
+        """获取实体的所有别名"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT alias FROM aliases WHERE entity_id = ?", (entity_id,))
+            return [row["alias"] for row in cursor.fetchall()]
+
+    def remove_alias(self, alias: str, entity_id: str) -> bool:
+        """移除别名"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("DELETE FROM aliases WHERE alias = ? AND entity_id = ?", (alias, entity_id))
+            conn.commit()
+            return cursor.rowcount > 0
+
+    # ==================== v5.1 状态变化操作 ====================
+
+    def record_state_change(self, change: StateChangeMeta) -> int:
+        """
+        记录状态变化
+
+        返回记录 ID
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                INSERT INTO state_changes
+                (entity_id, field, old_value, new_value, reason, chapter)
+                VALUES (?, ?, ?, ?, ?, ?)
+            """, (
+                change.entity_id,
+                change.field,
+                change.old_value,
+                change.new_value,
+                change.reason,
+                change.chapter
+            ))
+            conn.commit()
+            return cursor.lastrowid
+
+    def get_entity_state_changes(self, entity_id: str, limit: int = 20) -> List[Dict]:
+        """获取实体的状态变化历史"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                WHERE entity_id = ?
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (entity_id, limit))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_recent_state_changes(self, limit: int = 50) -> List[Dict]:
+        """获取最近的状态变化"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (limit,))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_chapter_state_changes(self, chapter: int) -> List[Dict]:
+        """获取某章的所有状态变化"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM state_changes
+                WHERE chapter = ?
+                ORDER BY id
+            """, (chapter,))
+            return [dict(row) for row in cursor.fetchall()]
+
+    # ==================== v5.1 关系操作 ====================
+
+    def upsert_relationship(self, rel: RelationshipMeta) -> bool:
+        """
+        插入或更新关系
+
+        相同 (from, to, type) 会更新 description 和 chapter
+        返回是否为新关系
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            # 检查是否存在
+            cursor.execute("""
+                SELECT id FROM relationships
+                WHERE from_entity = ? AND to_entity = ? AND type = ?
+            """, (rel.from_entity, rel.to_entity, rel.type))
+            existing = cursor.fetchone()
+
+            if existing:
+                cursor.execute("""
+                    UPDATE relationships SET
+                        description = ?,
+                        chapter = ?
+                    WHERE id = ?
+                """, (rel.description, rel.chapter, existing["id"]))
+                conn.commit()
+                return False
+            else:
+                cursor.execute("""
+                    INSERT INTO relationships
+                    (from_entity, to_entity, type, description, chapter)
+                    VALUES (?, ?, ?, ?, ?)
+                """, (
+                    rel.from_entity,
+                    rel.to_entity,
+                    rel.type,
+                    rel.description,
+                    rel.chapter
+                ))
+                conn.commit()
+                return True
+
+    def get_entity_relationships(self, entity_id: str, direction: str = "both") -> List[Dict]:
+        """
+        获取实体的关系
+
+        direction: "from" | "to" | "both"
+        """
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+
+            if direction == "from":
+                cursor.execute("""
+                    SELECT * FROM relationships WHERE from_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id,))
+            elif direction == "to":
+                cursor.execute("""
+                    SELECT * FROM relationships WHERE to_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id,))
+            else:  # both
+                cursor.execute("""
+                    SELECT * FROM relationships
+                    WHERE from_entity = ? OR to_entity = ?
+                    ORDER BY chapter DESC
+                """, (entity_id, entity_id))
+
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_relationship_between(self, entity1: str, entity2: str) -> List[Dict]:
+        """获取两个实体之间的所有关系"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM relationships
+                WHERE (from_entity = ? AND to_entity = ?)
+                   OR (from_entity = ? AND to_entity = ?)
+                ORDER BY chapter DESC
+            """, (entity1, entity2, entity2, entity1))
+            return [dict(row) for row in cursor.fetchall()]
+
+    def get_recent_relationships(self, limit: int = 30) -> List[Dict]:
+        """获取最近建立的关系"""
+        with self._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                SELECT * FROM relationships
+                ORDER BY chapter DESC, id DESC
+                LIMIT ?
+            """, (limit,))
+            return [dict(row) for row in cursor.fetchall()]
+
     # ==================== 批量操作 ====================
 
     def process_chapter_data(
@@ -361,16 +848,38 @@ class IndexManager:
             scenes = cursor.fetchone()[0]
 
             cursor.execute("SELECT COUNT(DISTINCT entity_id) FROM appearances")
-            entities = cursor.fetchone()[0]
+            appearances = cursor.fetchone()[0]
 
             cursor.execute("SELECT MAX(chapter) FROM chapters")
             max_chapter = cursor.fetchone()[0] or 0
 
+            # v5.1 新增统计
+            cursor.execute("SELECT COUNT(*) FROM entities")
+            entities = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM entities WHERE is_archived = 0")
+            active_entities = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM aliases")
+            aliases = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM state_changes")
+            state_changes = cursor.fetchone()[0]
+
+            cursor.execute("SELECT COUNT(*) FROM relationships")
+            relationships = cursor.fetchone()[0]
+
             return {
                 "chapters": chapters,
                 "scenes": scenes,
+                "appearances": appearances,
+                "max_chapter": max_chapter,
+                # v5.1 新增
                 "entities": entities,
-                "max_chapter": max_chapter
+                "active_entities": active_entities,
+                "aliases": aliases,
+                "state_changes": state_changes,
+                "relationships": relationships
             }
 
 
@@ -379,7 +888,7 @@ class IndexManager:
 def main():
     import argparse
 
-    parser = argparse.ArgumentParser(description="Index Manager CLI")
+    parser = argparse.ArgumentParser(description="Index Manager CLI (v5.1)")
     parser.add_argument("--project-root", type=str, help="项目根目录")
 
     subparsers = parser.add_subparsers(dest="command")
@@ -414,6 +923,59 @@ def main():
     process_parser.add_argument("--entities", required=True, help="JSON 格式的实体列表")
     process_parser.add_argument("--scenes", required=True, help="JSON 格式的场景列表")
 
+    # ==================== v5.1 新增命令 ====================
+
+    # 获取实体
+    get_entity_parser = subparsers.add_parser("get-entity")
+    get_entity_parser.add_argument("--id", required=True, help="实体 ID")
+
+    # 获取核心实体
+    subparsers.add_parser("get-core-entities")
+
+    # 获取主角
+    subparsers.add_parser("get-protagonist")
+
+    # 按类型获取实体
+    type_parser = subparsers.add_parser("get-entities-by-type")
+    type_parser.add_argument("--type", required=True, help="实体类型 (角色/地点/物品/势力/招式)")
+    type_parser.add_argument("--include-archived", action="store_true")
+
+    # 按别名查找实体
+    alias_parser = subparsers.add_parser("get-by-alias")
+    alias_parser.add_argument("--alias", required=True, help="别名")
+
+    # 获取实体别名
+    aliases_parser = subparsers.add_parser("get-aliases")
+    aliases_parser.add_argument("--entity", required=True, help="实体 ID")
+
+    # 注册别名
+    reg_alias_parser = subparsers.add_parser("register-alias")
+    reg_alias_parser.add_argument("--alias", required=True)
+    reg_alias_parser.add_argument("--entity", required=True)
+    reg_alias_parser.add_argument("--type", required=True, help="实体类型")
+
+    # 获取实体关系
+    rel_parser = subparsers.add_parser("get-relationships")
+    rel_parser.add_argument("--entity", required=True)
+    rel_parser.add_argument("--direction", choices=["from", "to", "both"], default="both")
+
+    # 获取状态变化
+    changes_parser = subparsers.add_parser("get-state-changes")
+    changes_parser.add_argument("--entity", required=True)
+    changes_parser.add_argument("--limit", type=int, default=20)
+
+    # 写入实体
+    upsert_entity_parser = subparsers.add_parser("upsert-entity")
+    upsert_entity_parser.add_argument("--data", required=True, help="JSON 格式的实体数据")
+
+    # 写入关系
+    upsert_rel_parser = subparsers.add_parser("upsert-relationship")
+    upsert_rel_parser.add_argument("--data", required=True, help="JSON 格式的关系数据")
+
+    # 写入状态变化
+    state_change_parser = subparsers.add_parser("record-state-change")
+    state_change_parser.add_argument("--data", required=True, help="JSON 格式的状态变化数据")
+
     args = parser.parse_args()
 
     # 初始化
@@ -466,6 +1028,101 @@ def main():
         print(f"✓ 已处理第 {args.chapter} 章")
         print(f"  章节: {stats['chapters']}, 场景: {stats['scenes']}, 出场记录: {stats['appearances']}")
 
+    # ==================== v5.1 新增命令处理 ====================
+
+    elif args.command == "get-entity":
+        entity = manager.get_entity(args.id)
+        if entity:
+            print(json.dumps(entity, ensure_ascii=False, indent=2))
+        else:
+            print(f"未找到实体: {args.id}")
+
+    elif args.command == "get-core-entities":
+        entities = manager.get_core_entities()
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-protagonist":
+        protagonist = manager.get_protagonist()
+        if protagonist:
+            print(json.dumps(protagonist, ensure_ascii=False, indent=2))
+        else:
+            print("未设置主角")
+
+    elif args.command == "get-entities-by-type":
+        entities = manager.get_entities_by_type(args.type, args.include_archived)
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-by-alias":
+        entities = manager.get_entities_by_alias(args.alias)
+        if entities:
+            print(json.dumps(entities, ensure_ascii=False, indent=2))
+        else:
+            print(f"未找到别名: {args.alias}")
+
+    elif args.command == "get-aliases":
+        aliases = manager.get_entity_aliases(args.entity)
+        if aliases:
+            print(f"{args.entity} 的别名: {', '.join(aliases)}")
+        else:
+            print(f"{args.entity} 没有别名")
+
+    elif args.command == "register-alias":
+        success = manager.register_alias(args.alias, args.entity, args.type)
+        if success:
+            print(f"✓ 已注册别名: {args.alias} → {args.entity} ({args.type})")
+        else:
+            print(f"别名已存在或注册失败: {args.alias}")
+
+    elif args.command == "get-relationships":
+        rels = manager.get_entity_relationships(args.entity, args.direction)
+        print(json.dumps(rels, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-state-changes":
+        changes = manager.get_entity_state_changes(args.entity, args.limit)
+        print(json.dumps(changes, ensure_ascii=False, indent=2))
+
+    elif args.command == "upsert-entity":
+        data = json.loads(args.data)
+        entity = EntityMeta(
+            id=data["id"],
+            type=data["type"],
+            canonical_name=data["canonical_name"],
+            tier=data.get("tier", "装饰"),
+            desc=data.get("desc", ""),
+            current=data.get("current", {}),
+            first_appearance=data.get("first_appearance", 0),
+            last_appearance=data.get("last_appearance", 0),
+            is_protagonist=data.get("is_protagonist", False),
+            is_archived=data.get("is_archived", False)
+        )
+        is_new = manager.upsert_entity(entity)
+        print(f"✓ {'新建' if is_new else '更新'}实体: {entity.id}")
+
+    elif args.command == "upsert-relationship":
+        data = json.loads(args.data)
+        rel = RelationshipMeta(
+            from_entity=data["from_entity"],
+            to_entity=data["to_entity"],
+            type=data["type"],
+            description=data.get("description", ""),
+            chapter=data["chapter"]
+        )
+        is_new = manager.upsert_relationship(rel)
+        print(f"✓ {'新建' if is_new else '更新'}关系: {rel.from_entity} → {rel.to_entity} ({rel.type})")
+
+    elif args.command == "record-state-change":
+        data = json.loads(args.data)
+        change = StateChangeMeta(
+            entity_id=data["entity_id"],
+            field=data["field"],
+            old_value=data.get("old_value", ""),
+            new_value=data["new_value"],
+            reason=data.get("reason", ""),
+            chapter=data["chapter"]
+        )
+        record_id = manager.record_state_change(change)
+        print(f"✓ 已记录状态变化 #{record_id}: {change.entity_id}.{change.field}")
+
 
 if __name__ == "__main__":
     main()

+ 358 - 0
.claude/scripts/data_modules/migrate_state_to_sqlite.py

@@ -0,0 +1,358 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+migrate_state_to_sqlite.py - 数据迁移脚本 (v5.1)
+
+将 state.json 中的大数据迁移到 SQLite (index.db):
+- entities_v3 → entities 表
+- alias_index → aliases 表
+- state_changes → state_changes 表
+- structured_relationships → relationships 表
+
+迁移后 state.json 只保留精简数据 (< 5KB):
+- progress
+- protagonist_state
+- strand_tracker
+- disambiguation_warnings/pending
+- project_info
+- world_settings (骨架)
+- plot_threads
+- relationships (简化版)
+- review_checkpoints
+
+用法:
+    python -m data_modules.migrate_state_to_sqlite --project-root "D:/wk/斗破苍穹"
+    python -m data_modules.migrate_state_to_sqlite --project-root "." --dry-run
+    python -m data_modules.migrate_state_to_sqlite --project-root "." --backup
+"""
+
+import json
+import shutil
+from pathlib import Path
+from datetime import datetime
+from typing import Dict, Any, List
+
+from .config import get_config, DataModulesConfig
+from .sql_state_manager import SQLStateManager, EntityData
+
+
+def migrate_state_to_sqlite(
+    config: DataModulesConfig,
+    dry_run: bool = False,
+    backup: bool = True,
+    verbose: bool = True
+) -> Dict[str, int]:
+    """
+    执行迁移
+
+    参数:
+    - config: 配置对象
+    - dry_run: 只分析不实际写入
+    - backup: 迁移前备份 state.json
+    - verbose: 打印详细日志
+
+    返回: 迁移统计
+    """
+    stats = {
+        "entities": 0,
+        "aliases": 0,
+        "state_changes": 0,
+        "relationships": 0,
+        "skipped": 0,
+        "errors": 0
+    }
+
+    # 读取 state.json
+    state_file = config.state_file
+    if not state_file.exists():
+        if verbose:
+            print(f"❌ state.json 不存在: {state_file}")
+        return stats
+
+    with open(state_file, 'r', encoding='utf-8') as f:
+        state = json.load(f)
+
+    if verbose:
+        file_size = state_file.stat().st_size / 1024
+        print(f"📄 读取 state.json ({file_size:.1f} KB)")
+
+    # 备份
+    if backup and not dry_run:
+        backup_file = state_file.with_suffix(f".json.backup-{datetime.now().strftime('%Y%m%d_%H%M%S')}")
+        shutil.copy(state_file, backup_file)
+        if verbose:
+            print(f"💾 已备份到: {backup_file}")
+
+    # 初始化 SQLStateManager
+    sql_manager = SQLStateManager(config)
+
+    # 1. 迁移 entities_v3
+    entities_v3 = state.get("entities_v3", {})
+    if verbose:
+        print(f"\n🔄 迁移 entities_v3...")
+
+    for entity_type, entities in entities_v3.items():
+        if not isinstance(entities, dict):
+            continue
+
+        for entity_id, entity_data in entities.items():
+            if not isinstance(entity_data, dict):
+                stats["skipped"] += 1
+                continue
+
+            try:
+                entity = EntityData(
+                    id=entity_id,
+                    type=entity_type,
+                    name=entity_data.get("canonical_name", entity_data.get("name", entity_id)),
+                    tier=entity_data.get("tier", "装饰"),
+                    desc=entity_data.get("desc", ""),
+                    current=entity_data.get("current", {}),
+                    aliases=[],  # 别名单独处理
+                    first_appearance=entity_data.get("first_appearance", 0),
+                    last_appearance=entity_data.get("last_appearance", 0),
+                    is_protagonist=entity_data.get("is_protagonist", False)
+                )
+
+                if not dry_run:
+                    sql_manager.upsert_entity(entity)
+                stats["entities"] += 1
+
+                if verbose and stats["entities"] % 50 == 0:
+                    print(f"  已迁移 {stats['entities']} 个实体...")
+
+            except Exception as e:
+                stats["errors"] += 1
+                if verbose:
+                    print(f"  ⚠️ 实体迁移失败 {entity_id}: {e}")
+
+    if verbose:
+        print(f"  ✅ 实体: {stats['entities']} 个")
+
+    # 2. 迁移 alias_index
+    alias_index = state.get("alias_index", {})
+    if verbose:
+        print(f"\n🔄 迁移 alias_index...")
+
+    for alias, entries in alias_index.items():
+        if not isinstance(entries, list):
+            continue
+
+        for entry in entries:
+            if not isinstance(entry, dict):
+                stats["skipped"] += 1
+                continue
+
+            entity_id = entry.get("id")
+            entity_type = entry.get("type")
+            if not entity_id or not entity_type:
+                stats["skipped"] += 1
+                continue
+
+            try:
+                if not dry_run:
+                    sql_manager.register_alias(alias, entity_id, entity_type)
+                stats["aliases"] += 1
+
+            except Exception as e:
+                stats["errors"] += 1
+                if verbose:
+                    print(f"  ⚠️ 别名迁移失败 {alias}: {e}")
+
+    if verbose:
+        print(f"  ✅ 别名: {stats['aliases']} 个")
+
+    # 3. 迁移 state_changes
+    state_changes = state.get("state_changes", [])
+    if verbose:
+        print(f"\n🔄 迁移 state_changes...")
+
+    for change in state_changes:
+        if not isinstance(change, dict):
+            stats["skipped"] += 1
+            continue
+
+        try:
+            entity_id = change.get("entity_id", "")
+            if not entity_id:
+                stats["skipped"] += 1
+                continue
+
+            if not dry_run:
+                sql_manager.record_state_change(
+                    entity_id=entity_id,
+                    field=change.get("field", ""),
+                    old_value=change.get("old", change.get("old_value", "")),
+                    new_value=change.get("new", change.get("new_value", "")),
+                    reason=change.get("reason", ""),
+                    chapter=change.get("chapter", 0)
+                )
+            stats["state_changes"] += 1
+
+        except Exception as e:
+            stats["errors"] += 1
+            if verbose:
+                print(f"  ⚠️ 状态变化迁移失败: {e}")
+
+    if verbose:
+        print(f"  ✅ 状态变化: {stats['state_changes']} 条")
+
+    # 4. 迁移 structured_relationships
+    relationships = state.get("structured_relationships", [])
+    if verbose:
+        print(f"\n🔄 迁移 structured_relationships...")
+
+    for rel in relationships:
+        if not isinstance(rel, dict):
+            stats["skipped"] += 1
+            continue
+
+        try:
+            from_entity = rel.get("from", rel.get("from_entity", ""))
+            to_entity = rel.get("to", rel.get("to_entity", ""))
+            if not from_entity or not to_entity:
+                stats["skipped"] += 1
+                continue
+
+            if not dry_run:
+                sql_manager.upsert_relationship(
+                    from_entity=from_entity,
+                    to_entity=to_entity,
+                    type=rel.get("type", "相识"),
+                    description=rel.get("description", ""),
+                    chapter=rel.get("chapter", 0)
+                )
+            stats["relationships"] += 1
+
+        except Exception as e:
+            stats["errors"] += 1
+            if verbose:
+                print(f"  ⚠️ 关系迁移失败: {e}")
+
+    if verbose:
+        print(f"  ✅ 关系: {stats['relationships']} 条")
+
+    # 5. 精简 state.json(移除已迁移字段)
+    if not dry_run:
+        if verbose:
+            print(f"\n🔄 精简 state.json...")
+
+        # 保留字段
+        slim_state = {
+            "project_info": state.get("project_info", {}),
+            "progress": state.get("progress", {}),
+            "protagonist_state": state.get("protagonist_state", {}),
+            "strand_tracker": state.get("strand_tracker", {}),
+            "world_settings": _slim_world_settings(state.get("world_settings", {})),
+            "plot_threads": state.get("plot_threads", {}),
+            "relationships": _slim_relationships(state.get("relationships", {})),
+            "review_checkpoints": state.get("review_checkpoints", [])[-10:],  # 只保留最近10个
+            "disambiguation_warnings": state.get("disambiguation_warnings", [])[-20:],
+            "disambiguation_pending": state.get("disambiguation_pending", [])[-10:],
+            # v5.1 标记
+            "_migrated_to_sqlite": True,
+            "_migration_timestamp": datetime.now().isoformat()
+        }
+
+        with open(state_file, 'w', encoding='utf-8') as f:
+            json.dump(slim_state, f, ensure_ascii=False, indent=2)
+
+        new_size = state_file.stat().st_size / 1024
+        if verbose:
+            print(f"  ✅ 精简后: {new_size:.1f} KB")
+
+    # 打印统计
+    if verbose:
+        print(f"\n" + "=" * 50)
+        print(f"📊 迁移统计:")
+        print(f"  实体: {stats['entities']}")
+        print(f"  别名: {stats['aliases']}")
+        print(f"  状态变化: {stats['state_changes']}")
+        print(f"  关系: {stats['relationships']}")
+        print(f"  跳过: {stats['skipped']}")
+        print(f"  错误: {stats['errors']}")
+        if dry_run:
+            print(f"\n⚠️ 这是 dry-run 模式,实际未写入任何数据")
+
+    return stats
+
+
+def _slim_world_settings(world_settings: Dict) -> Dict:
+    """精简 world_settings,只保留骨架"""
+    if not isinstance(world_settings, dict):
+        return {}
+
+    slim = {}
+
+    # power_system: 只保留等级名称
+    power_system = world_settings.get("power_system", [])
+    if isinstance(power_system, list):
+        slim["power_system"] = [
+            p.get("name") if isinstance(p, dict) else p
+            for p in power_system[:20]  # 最多20个等级
+        ]
+
+    # factions: 只保留名称和简述
+    factions = world_settings.get("factions", [])
+    if isinstance(factions, list):
+        slim["factions"] = [
+            {"name": f.get("name"), "type": f.get("type")}
+            if isinstance(f, dict) else f
+            for f in factions[:30]  # 最多30个势力
+        ]
+
+    # locations: 只保留名称
+    locations = world_settings.get("locations", [])
+    if isinstance(locations, list):
+        slim["locations"] = [
+            loc.get("name") if isinstance(loc, dict) else loc
+            for loc in locations[:50]  # 最多50个地点
+        ]
+
+    return slim
+
+
+def _slim_relationships(relationships: Dict) -> Dict:
+    """精简 relationships,只保留核心关系"""
+    if not isinstance(relationships, dict):
+        return {}
+
+    # 只保留 relationships 字典本身,不做额外精简
+    # 因为这个字段本身应该比较小
+    return relationships
+
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="迁移 state.json 到 SQLite (v5.1)")
+    parser.add_argument("--project-root", type=str, required=True, help="项目根目录")
+    parser.add_argument("--dry-run", action="store_true", help="只分析不实际写入")
+    parser.add_argument("--backup", action="store_true", default=True, help="迁移前备份")
+    parser.add_argument("--no-backup", action="store_true", help="不备份")
+    parser.add_argument("--quiet", action="store_true", help="安静模式")
+
+    args = parser.parse_args()
+
+    config = DataModulesConfig.from_project_root(args.project_root)
+    backup = not args.no_backup
+
+    print(f"🚀 开始迁移 state.json → SQLite")
+    print(f"   项目: {config.project_root}")
+    print(f"   state.json: {config.state_file}")
+    print(f"   index.db: {config.index_db}")
+    print()
+
+    stats = migrate_state_to_sqlite(
+        config=config,
+        dry_run=args.dry_run,
+        backup=backup,
+        verbose=not args.quiet
+    )
+
+    if stats["errors"] > 0:
+        exit(1)
+
+
+if __name__ == "__main__":
+    main()

+ 532 - 0
.claude/scripts/data_modules/sql_state_manager.py

@@ -0,0 +1,532 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+SQL State Manager - SQLite 状态管理模块 (v5.1)
+
+基于 IndexManager 扩展,提供与 StateManager 兼容的高级接口,
+将大数据(实体、别名、状态变化、关系)存储到 SQLite 而非 JSON。
+
+目标:
+- 替代 state.json 中的大数据字段
+- 保持与 Data Agent / Context Agent 的接口兼容
+- 支持增量写入和按需查询
+"""
+
+import json
+from typing import Dict, List, Optional, Any
+from dataclasses import dataclass, field
+from datetime import datetime
+
+from .index_manager import (
+    IndexManager,
+    EntityMeta,
+    StateChangeMeta,
+    RelationshipMeta
+)
+from .config import get_config
+
+
+@dataclass
+class EntityData:
+    """实体数据(用于 Data Agent 输入)"""
+    id: str
+    type: str  # 角色/地点/物品/势力/招式
+    name: str
+    tier: str = "装饰"
+    desc: str = ""
+    current: Dict[str, Any] = field(default_factory=dict)
+    aliases: List[str] = field(default_factory=list)
+    first_appearance: int = 0
+    last_appearance: int = 0
+    is_protagonist: bool = False
+
+
+class SQLStateManager:
+    """
+    SQLite 状态管理器 (v5.1)
+
+    提供与 StateManager 兼容的接口,但数据存储在 SQLite (index.db) 中。
+    用于替代 state.json 中膨胀的数据结构。
+
+    用法:
+    ```python
+    manager = SQLStateManager(config)
+
+    # 写入实体
+    manager.upsert_entity(EntityData(
+        id="xiaoyan",
+        type="角色",
+        name="萧炎",
+        tier="核心",
+        current={"realm": "斗师", "location": "天云宗"},
+        aliases=["小炎子", "废柴"],
+        is_protagonist=True
+    ))
+
+    # 写入状态变化
+    manager.record_state_change(
+        entity_id="xiaoyan",
+        field="realm",
+        old_value="斗者",
+        new_value="斗师",
+        reason="闭关突破",
+        chapter=100
+    )
+
+    # 写入关系
+    manager.upsert_relationship(
+        from_entity="xiaoyan",
+        to_entity="yaolao",
+        type="师徒",
+        description="药老收萧炎为徒",
+        chapter=5
+    )
+
+    # 读取
+    protagonist = manager.get_protagonist()
+    core_entities = manager.get_core_entities()
+    changes = manager.get_recent_state_changes(limit=50)
+    ```
+    """
+
+    # v5.0 支持的实体类型
+    ENTITY_TYPES = ["角色", "地点", "物品", "势力", "招式"]
+
+    def __init__(self, config=None):
+        self.config = config or get_config()
+        self._index_manager = IndexManager(config)
+
+    # ==================== 实体操作 ====================
+
+    def upsert_entity(self, entity: EntityData) -> bool:
+        """
+        插入或更新实体
+
+        自动处理:
+        - 实体基本信息写入 entities 表
+        - 别名写入 aliases 表
+        - canonical_name 自动添加为别名
+
+        返回: 是否为新实体
+        """
+        # 构建 EntityMeta
+        meta = EntityMeta(
+            id=entity.id,
+            type=entity.type,
+            canonical_name=entity.name,
+            tier=entity.tier,
+            desc=entity.desc,
+            current=entity.current,
+            first_appearance=entity.first_appearance,
+            last_appearance=entity.last_appearance,
+            is_protagonist=entity.is_protagonist,
+            is_archived=False
+        )
+
+        is_new = self._index_manager.upsert_entity(meta)
+
+        # 注册别名
+        # 1. canonical_name 本身作为别名
+        self._index_manager.register_alias(entity.name, entity.id, entity.type)
+
+        # 2. 其他别名
+        for alias in entity.aliases:
+            if alias and alias != entity.name:
+                self._index_manager.register_alias(alias, entity.id, entity.type)
+
+        return is_new
+
+    def get_entity(self, entity_id: str) -> Optional[Dict]:
+        """获取实体详情"""
+        entity = self._index_manager.get_entity(entity_id)
+        if entity:
+            # 添加别名
+            entity["aliases"] = self._index_manager.get_entity_aliases(entity_id)
+        return entity
+
+    def get_entities_by_type(self, entity_type: str, include_archived: bool = False) -> List[Dict]:
+        """按类型获取实体"""
+        entities = self._index_manager.get_entities_by_type(entity_type, include_archived)
+        for e in entities:
+            e["aliases"] = self._index_manager.get_entity_aliases(e["id"])
+        return entities
+
+    def get_core_entities(self) -> List[Dict]:
+        """
+        获取核心实体(用于 Context Agent 全量加载)
+
+        返回所有 tier=核心/重要 或 is_protagonist=1 的实体
+        """
+        entities = self._index_manager.get_core_entities()
+        for e in entities:
+            e["aliases"] = self._index_manager.get_entity_aliases(e["id"])
+        return entities
+
+    def get_protagonist(self) -> Optional[Dict]:
+        """获取主角实体"""
+        protagonist = self._index_manager.get_protagonist()
+        if protagonist:
+            protagonist["aliases"] = self._index_manager.get_entity_aliases(protagonist["id"])
+        return protagonist
+
+    def update_entity_current(self, entity_id: str, updates: Dict) -> bool:
+        """增量更新实体的 current 字段"""
+        return self._index_manager.update_entity_current(entity_id, updates)
+
+    def resolve_alias(self, alias: str) -> List[Dict]:
+        """
+        根据别名解析实体(一对多)
+
+        返回所有匹配的实体
+        """
+        return self._index_manager.get_entities_by_alias(alias)
+
+    def register_alias(self, alias: str, entity_id: str, entity_type: str) -> bool:
+        """注册别名"""
+        return self._index_manager.register_alias(alias, entity_id, entity_type)
+
+    # ==================== 状态变化操作 ====================
+
+    def record_state_change(
+        self,
+        entity_id: str,
+        field: str,
+        old_value: Any,
+        new_value: Any,
+        reason: str,
+        chapter: int
+    ) -> int:
+        """
+        记录状态变化
+
+        返回: 记录 ID
+        """
+        change = StateChangeMeta(
+            entity_id=entity_id,
+            field=field,
+            old_value=str(old_value) if old_value is not None else "",
+            new_value=str(new_value),
+            reason=reason,
+            chapter=chapter
+        )
+        return self._index_manager.record_state_change(change)
+
+    def get_entity_state_changes(self, entity_id: str, limit: int = 20) -> List[Dict]:
+        """获取实体的状态变化历史"""
+        return self._index_manager.get_entity_state_changes(entity_id, limit)
+
+    def get_recent_state_changes(self, limit: int = 50) -> List[Dict]:
+        """获取最近的状态变化"""
+        return self._index_manager.get_recent_state_changes(limit)
+
+    def get_chapter_state_changes(self, chapter: int) -> List[Dict]:
+        """获取某章的所有状态变化"""
+        return self._index_manager.get_chapter_state_changes(chapter)
+
+    # ==================== 关系操作 ====================
+
+    def upsert_relationship(
+        self,
+        from_entity: str,
+        to_entity: str,
+        type: str,
+        description: str,
+        chapter: int
+    ) -> bool:
+        """
+        插入或更新关系
+
+        返回: 是否为新关系
+        """
+        rel = RelationshipMeta(
+            from_entity=from_entity,
+            to_entity=to_entity,
+            type=type,
+            description=description,
+            chapter=chapter
+        )
+        return self._index_manager.upsert_relationship(rel)
+
+    def get_entity_relationships(self, entity_id: str, direction: str = "both") -> List[Dict]:
+        """获取实体的关系"""
+        return self._index_manager.get_entity_relationships(entity_id, direction)
+
+    def get_relationship_between(self, entity1: str, entity2: str) -> List[Dict]:
+        """获取两个实体之间的所有关系"""
+        return self._index_manager.get_relationship_between(entity1, entity2)
+
+    def get_recent_relationships(self, limit: int = 30) -> List[Dict]:
+        """获取最近建立的关系"""
+        return self._index_manager.get_recent_relationships(limit)
+
+    # ==================== 批量写入(供 Data Agent 使用) ====================
+
+    def process_chapter_entities(
+        self,
+        chapter: int,
+        entities_appeared: List[Dict],
+        entities_new: List[Dict],
+        state_changes: List[Dict],
+        relationships_new: List[Dict]
+    ) -> Dict[str, int]:
+        """
+        处理章节的实体数据(Data Agent 主入口)
+
+        参数:
+        - chapter: 章节号
+        - entities_appeared: 出场的已有实体
+          [{"id": "xiaoyan", "type": "角色", "mentions": ["萧炎", "他"], "confidence": 0.95}]
+        - entities_new: 新发现的实体
+          [{"suggested_id": "hongyi_girl", "name": "红衣女子", "type": "角色", "tier": "装饰"}]
+        - state_changes: 状态变化
+          [{"entity_id": "xiaoyan", "field": "realm", "old": "斗者", "new": "斗师", "reason": "突破"}]
+        - relationships_new: 新关系
+          [{"from": "xiaoyan", "to": "hongyi_girl", "type": "相识", "description": "初次见面"}]
+
+        返回: 写入统计
+        """
+        stats = {
+            "entities_updated": 0,
+            "entities_created": 0,
+            "state_changes": 0,
+            "relationships": 0,
+            "aliases": 0
+        }
+
+        # 1. 处理出场实体(更新 last_appearance)
+        for entity in entities_appeared:
+            entity_id = entity.get("id")
+            if not entity_id:
+                continue
+
+            self._index_manager.update_entity_current(entity_id, {})  # 触发 updated_at
+            # 更新 last_appearance
+            existing = self._index_manager.get_entity(entity_id)
+            if existing:
+                # 使用 SQL 直接更新 last_appearance
+                self._update_last_appearance(entity_id, chapter)
+                stats["entities_updated"] += 1
+
+            # 记录出场(保留原有逻辑)
+            self._index_manager.record_appearance(
+                entity_id=entity_id,
+                chapter=chapter,
+                mentions=entity.get("mentions", []),
+                confidence=entity.get("confidence", 1.0)
+            )
+
+        # 2. 处理新实体
+        for entity in entities_new:
+            suggested_id = entity.get("suggested_id") or entity.get("id")
+            if not suggested_id:
+                continue
+
+            entity_data = EntityData(
+                id=suggested_id,
+                type=entity.get("type", "角色"),
+                name=entity.get("name", suggested_id),
+                tier=entity.get("tier", "装饰"),
+                desc=entity.get("desc", ""),
+                current=entity.get("current", {}),
+                aliases=entity.get("aliases", []),
+                first_appearance=chapter,
+                last_appearance=chapter,
+                is_protagonist=entity.get("is_protagonist", False)
+            )
+            is_new = self.upsert_entity(entity_data)
+            if is_new:
+                stats["entities_created"] += 1
+            else:
+                stats["entities_updated"] += 1
+
+            # 统计别名
+            stats["aliases"] += 1 + len(entity_data.aliases)
+
+        # 3. 处理状态变化
+        for change in state_changes:
+            entity_id = change.get("entity_id")
+            if not entity_id:
+                continue
+
+            self.record_state_change(
+                entity_id=entity_id,
+                field=change.get("field", ""),
+                old_value=change.get("old", change.get("old_value", "")),
+                new_value=change.get("new", change.get("new_value", "")),
+                reason=change.get("reason", ""),
+                chapter=chapter
+            )
+            stats["state_changes"] += 1
+
+            # 同步更新实体的 current
+            field_name = change.get("field")
+            new_value = change.get("new", change.get("new_value"))
+            if field_name and new_value:
+                self._index_manager.update_entity_current(entity_id, {field_name: new_value})
+
+        # 4. 处理新关系
+        for rel in relationships_new:
+            from_entity = rel.get("from", rel.get("from_entity"))
+            to_entity = rel.get("to", rel.get("to_entity"))
+            if not from_entity or not to_entity:
+                continue
+
+            self.upsert_relationship(
+                from_entity=from_entity,
+                to_entity=to_entity,
+                type=rel.get("type", "相识"),
+                description=rel.get("description", ""),
+                chapter=chapter
+            )
+            stats["relationships"] += 1
+
+        return stats
+
+    def _update_last_appearance(self, entity_id: str, chapter: int):
+        """更新实体的 last_appearance"""
+        with self._index_manager._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("""
+                UPDATE entities SET
+                    last_appearance = MAX(last_appearance, ?),
+                    updated_at = CURRENT_TIMESTAMP
+                WHERE id = ?
+            """, (chapter, entity_id))
+            conn.commit()
+
+    # ==================== 统计 ====================
+
+    def get_stats(self) -> Dict[str, int]:
+        """获取统计信息"""
+        return self._index_manager.get_stats()
+
+    # ==================== 格式转换(兼容性) ====================
+
+    def export_to_entities_v3_format(self) -> Dict[str, Dict[str, Dict]]:
+        """
+        导出为 entities_v3 格式(用于兼容性)
+
+        返回: {"角色": {"xiaoyan": {...}}, "地点": {...}, ...}
+        """
+        result = {t: {} for t in self.ENTITY_TYPES}
+
+        for entity_type in self.ENTITY_TYPES:
+            entities = self.get_entities_by_type(entity_type, include_archived=True)
+            for e in entities:
+                entity_dict = {
+                    "name": e.get("canonical_name"),
+                    "tier": e.get("tier", "装饰"),
+                    "aliases": e.get("aliases", []),
+                    "desc": e.get("desc", ""),
+                    "current": e.get("current_json", {}),
+                    "history": [],  # 历史记录需要从 state_changes 表查询
+                    "first_appearance": e.get("first_appearance", 0),
+                    "last_appearance": e.get("last_appearance", 0)
+                }
+                if e.get("is_protagonist"):
+                    entity_dict["is_protagonist"] = True
+                result[entity_type][e["id"]] = entity_dict
+
+        return result
+
+    def export_to_alias_index_format(self) -> Dict[str, List[Dict[str, str]]]:
+        """
+        导出为 alias_index 格式(用于兼容性)
+
+        返回: {"萧炎": [{"type": "角色", "id": "xiaoyan"}], ...}
+        """
+        result = {}
+
+        with self._index_manager._get_conn() as conn:
+            cursor = conn.cursor()
+            cursor.execute("SELECT alias, entity_id, entity_type FROM aliases")
+            for row in cursor.fetchall():
+                alias = row["alias"]
+                if alias not in result:
+                    result[alias] = []
+                result[alias].append({
+                    "type": row["entity_type"],
+                    "id": row["entity_id"]
+                })
+
+        return result
+
+
+# ==================== CLI 接口 ====================
+
+def main():
+    import argparse
+
+    parser = argparse.ArgumentParser(description="SQL State Manager CLI (v5.1)")
+    parser.add_argument("--project-root", type=str, help="项目根目录")
+
+    subparsers = parser.add_subparsers(dest="command")
+
+    # 获取统计
+    subparsers.add_parser("stats")
+
+    # 获取主角
+    subparsers.add_parser("get-protagonist")
+
+    # 获取核心实体
+    subparsers.add_parser("get-core-entities")
+
+    # 导出 entities_v3 格式
+    subparsers.add_parser("export-entities-v3")
+
+    # 导出 alias_index 格式
+    subparsers.add_parser("export-alias-index")
+
+    # 处理章节数据
+    process_parser = subparsers.add_parser("process-chapter")
+    process_parser.add_argument("--chapter", type=int, required=True)
+    process_parser.add_argument("--data", required=True, help="JSON 格式的章节数据")
+
+    args = parser.parse_args()
+
+    # 初始化
+    config = None
+    if args.project_root:
+        from .config import DataModulesConfig
+        config = DataModulesConfig.from_project_root(args.project_root)
+
+    manager = SQLStateManager(config)
+
+    if args.command == "stats":
+        stats = manager.get_stats()
+        print(json.dumps(stats, ensure_ascii=False, indent=2))
+
+    elif args.command == "get-protagonist":
+        protagonist = manager.get_protagonist()
+        if protagonist:
+            print(json.dumps(protagonist, ensure_ascii=False, indent=2))
+        else:
+            print("未设置主角")
+
+    elif args.command == "get-core-entities":
+        entities = manager.get_core_entities()
+        print(json.dumps(entities, ensure_ascii=False, indent=2))
+
+    elif args.command == "export-entities-v3":
+        data = manager.export_to_entities_v3_format()
+        print(json.dumps(data, ensure_ascii=False, indent=2))
+
+    elif args.command == "export-alias-index":
+        data = manager.export_to_alias_index_format()
+        print(json.dumps(data, ensure_ascii=False, indent=2))
+
+    elif args.command == "process-chapter":
+        data = json.loads(args.data)
+        stats = manager.process_chapter_entities(
+            chapter=args.chapter,
+            entities_appeared=data.get("entities_appeared", []),
+            entities_new=data.get("entities_new", []),
+            state_changes=data.get("state_changes", []),
+            relationships_new=data.get("relationships_new", [])
+        )
+        print(f"✓ 已处理第 {args.chapter} 章")
+        print(json.dumps(stats, ensure_ascii=False, indent=2))
+
+
+if __name__ == "__main__":
+    main()

+ 85 - 4
.claude/scripts/data_modules/state_manager.py

@@ -1,12 +1,16 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 """
-State Manager - 状态管理模块
+State Manager - 状态管理模块 (v5.1)
 
 管理 state.json 的读写操作:
 - 实体状态管理
 - 进度追踪
 - 关系记录
+
+v5.1 变更:
+- 集成 SQLStateManager,同步写入 SQLite (index.db)
+- state.json 保留精简数据,大数据自动迁移到 SQLite
 """
 
 import json
@@ -74,17 +78,34 @@ class _EntityPatch:
 
 
 class StateManager:
-    """状态管理器 (v5.0 entities_v3 格式)"""
+    """状态管理器 (v5.1 entities_v3 格式 + SQLite 同步)"""
 
     # v5.0 支持的实体类型
     ENTITY_TYPES = ["角色", "地点", "物品", "势力", "招式"]
 
-    def __init__(self, config=None):
+    def __init__(self, config=None, enable_sqlite_sync: bool = True):
+        """
+        初始化状态管理器
+
+        参数:
+        - config: 配置对象
+        - enable_sqlite_sync: 是否启用 SQLite 同步 (默认 True)
+        """
         self.config = config or get_config()
         self._state: Dict[str, Any] = {}
         # 与 security_utils.atomic_write_json 保持一致:state.json.lock
         self._lock_path = self.config.state_file.with_suffix(self.config.state_file.suffix + ".lock")
 
+        # v5.1: SQLite 同步
+        self._enable_sqlite_sync = enable_sqlite_sync
+        self._sql_state_manager = None
+        if enable_sqlite_sync:
+            try:
+                from .sql_state_manager import SQLStateManager
+                self._sql_state_manager = SQLStateManager(self.config)
+            except ImportError:
+                pass  # SQLStateManager 不可用时静默降级
+
         # 待写入的增量(锁内重读 + 合并 + 写入)
         self._pending_entity_patches: Dict[tuple[str, str], _EntityPatch] = {}
         self._pending_alias_entries: Dict[str, List[Dict[str, str]]] = {}
@@ -95,6 +116,15 @@ class StateManager:
         self._pending_progress_chapter: Optional[int] = None
         self._pending_progress_words_delta: int = 0
 
+        # v5.1: 缓存待同步到 SQLite 的数据
+        self._pending_sqlite_data: Dict[str, Any] = {
+            "entities_appeared": [],
+            "entities_new": [],
+            "state_changes": [],
+            "relationships_new": [],
+            "chapter": None
+        }
+
         self._load_state()
 
     def _now_progress_timestamp(self) -> str:
@@ -424,9 +454,49 @@ class StateManager:
                 self._pending_disambiguation_pending.clear()
                 self._pending_progress_chapter = None
                 self._pending_progress_words_delta = 0
+
+                # v5.1: 同步到 SQLite
+                self._sync_to_sqlite()
+
         except filelock.Timeout:
             raise RuntimeError("无法获取 state.json 文件锁,请稍后重试")
 
+    def _sync_to_sqlite(self):
+        """v5.1: 同步待处理数据到 SQLite"""
+        if not self._sql_state_manager:
+            return
+
+        sqlite_data = self._pending_sqlite_data
+        chapter = sqlite_data.get("chapter")
+
+        if chapter is None:
+            # 清空并返回
+            self._clear_pending_sqlite_data()
+            return
+
+        try:
+            self._sql_state_manager.process_chapter_entities(
+                chapter=chapter,
+                entities_appeared=sqlite_data.get("entities_appeared", []),
+                entities_new=sqlite_data.get("entities_new", []),
+                state_changes=sqlite_data.get("state_changes", []),
+                relationships_new=sqlite_data.get("relationships_new", [])
+            )
+        except Exception:
+            pass  # SQLite 同步失败时静默降级,不影响主流程
+        finally:
+            self._clear_pending_sqlite_data()
+
+    def _clear_pending_sqlite_data(self):
+        """清空待同步的 SQLite 数据"""
+        self._pending_sqlite_data = {
+            "entities_appeared": [],
+            "entities_new": [],
+            "state_changes": [],
+            "relationships_new": [],
+            "chapter": None
+        }
+
     # ==================== 进度管理 ====================
 
     def get_current_chapter(self) -> int:
@@ -794,7 +864,7 @@ class StateManager:
 
     def process_chapter_result(self, chapter: int, result: Dict) -> List[str]:
         """
-        处理 Data Agent 的章节处理结果 (v5.0)
+        处理 Data Agent 的章节处理结果 (v5.1)
 
         输入格式:
         - entities_appeared: 出场实体列表
@@ -806,12 +876,17 @@ class StateManager:
         """
         warnings = []
 
+        # v5.1: 记录章节号用于 SQLite 同步
+        self._pending_sqlite_data["chapter"] = chapter
+
         # 处理出场实体
         for entity in result.get("entities_appeared", []):
             entity_id = entity.get("id")
             entity_type = entity.get("type")
             if entity_id:
                 self.update_entity_appearance(entity_id, chapter, entity_type)
+                # v5.1: 缓存用于 SQLite 同步
+                self._pending_sqlite_data["entities_appeared"].append(entity)
 
         # 处理新实体
         for entity in result.get("entities_new", []):
@@ -828,6 +903,8 @@ class StateManager:
                 )
                 if not self.add_entity(new_entity):
                     warnings.append(f"实体已存在: {entity_id}")
+                # v5.1: 缓存用于 SQLite 同步
+                self._pending_sqlite_data["entities_new"].append(entity)
 
         # 处理状态变化
         for change in result.get("state_changes", []):
@@ -839,6 +916,8 @@ class StateManager:
                 reason=change.get("reason", ""),
                 chapter=chapter
             )
+            # v5.1: 缓存用于 SQLite 同步
+            self._pending_sqlite_data["state_changes"].append(change)
 
         # 处理关系
         for rel in result.get("relationships_new", []):
@@ -849,6 +928,8 @@ class StateManager:
                 description=rel.get("description", ""),
                 chapter=chapter
             )
+            # v5.1: 缓存用于 SQLite 同步
+            self._pending_sqlite_data["relationships_new"].append(rel)
 
         # 处理消歧不确定项(不影响实体写入,但必须对 Writer 可见)
         warnings.extend(self._record_disambiguation(chapter, result.get("uncertain", [])))

+ 154 - 37
.claude/skills/webnovel-query/references/system-data-flow.md

@@ -1,7 +1,7 @@
 ---
 name: system-data-flow
 purpose: 项目初始化和状态查询时加载,理解数据结构
-version: "5.0"
+version: "5.1"
 ---
 
 <context>
@@ -18,51 +18,77 @@ version: "5.0"
 ├── 大纲/           # 卷纲/章纲/场景纲
 ├── 设定集/         # 世界观/力量体系/角色卡/物品卡
 └── .webnovel/
-    ├── state.json          # 权威状态(entities_v3 + alias_index + 进度/主角/strand_tracker)
+    ├── state.json          # 精简状态 (< 5KB):进度/主角/strand_tracker/消歧
+    ├── index.db            # SQLite 主存储:实体/别名/关系/状态变化/章节/场景
     ├── workflow_state.json # 工作流断点(用于 /webnovel-resume)
-    ├── index.db            # SQLite 索引(章节/实体/别名/关系/伏笔,可重建)
+    ├── vectors.db          # RAG 向量数据库
     └── archive/            # 归档数据(不活跃角色/已回收伏笔)
 ```
 
-## v5.0 双 Agent 架构
+## v5.1 架构变更
+
+**核心变化**: 解决 state.json 膨胀问题(20章后 token 爆炸)
+
+| 数据类型 | v5.0 存储位置 | v5.1 存储位置 |
+|----------|--------------|--------------|
+| entities_v3 | state.json | **index.db** (entities 表) |
+| alias_index | state.json | **index.db** (aliases 表) |
+| state_changes | state.json | **index.db** (state_changes 表) |
+| structured_relationships | state.json | **index.db** (relationships 表) |
+| progress | state.json | state.json (保留) |
+| protagonist_state | state.json | state.json (保留) |
+| strand_tracker | state.json | state.json (保留) |
+| disambiguation_* | state.json | state.json (保留) |
+
+## v5.1 双 Agent 架构
 
 ```
 写作前: Context Agent 读取数据 → 组装上下文包
+        ├── 从 state.json 读取精简数据(进度/配置)
+        └── 从 index.db SQL 按需查询(实体/关系)
+
 写作中: Writer 使用上下文包生成纯正文(无 XML 标签)
+
 写作后: Data Agent 处理正文 → AI 提取实体 → 写入数据链
+        ├── 写入 index.db(实体/别名/状态变化/关系)
+        └── 更新 state.json(进度/主角快照)
 
-Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
+Context Agent (读) ←→ index.db + state.json ←→ Data Agent (写)
 ```
 
-## 脚本/模块职责速查 (v5.0)
+## 脚本/模块职责速查 (v5.1)
 
 ### 核心脚本
 
 | 脚本 | 输入 | 输出 |
 |------|------|------|
-| `init_project.py` | 项目信息 | 生成 `.webnovel/state.json`  |
+| `init_project.py` | 项目信息 | 生成 `.webnovel/state.json` + 初始化 `index.db` |
 | `update_state.py` | 参数 | 原子更新 `state.json` 字段(进度/主角/strand_tracker) |
 | `backup_manager.py` | 章节号 | 自动 Git 备份 |
 | `status_reporter.py` | 无 | 生成健康报告/伏笔紧急度 |
 | `archive_manager.py` | 无 | 归档不活跃数据 |
+| `migrate_state_to_sqlite.py` | 项目路径 | 迁移旧 state.json 到 SQLite (v5.1 新增) |
 
 ### data_modules 模块
 
 | 模块 | 职责 |
 |------|------|
-| `state_manager.py` | 实体状态管理(读写 entities_v3) |
-| `index_manager.py` | SQLite 索引管理(章节/实体/场景查询) |
-| `entity_linker.py` | 别名注册与消歧(alias_index 管理) |
+| `state_manager.py` | 实体状态管理(精简 state.json + SQLite 同步) |
+| `sql_state_manager.py` | SQLite 状态管理(v5.1 新增,替代 JSON 写入) |
+| `index_manager.py` | SQLite 索引管理(实体/别名/关系/状态变化/章节/场景) |
+| `entity_linker.py` | 别名注册与消歧 |
 | `rag_adapter.py` | 向量嵌入与语义检索 |
 | `style_sampler.py` | 风格样本提取与管理 |
 | `api_client.py` | LLM API 调用封装 |
 | `config.py` | 配置管理 |
 
-## 每章数据链(v5.0 顺序)
+## 每章数据链(v5.1 顺序)
 
 ```
 1. Context Agent 组装上下文包
-   → 读取大纲/state.json/index.db/RAG
+   → 读取 state.json(精简版:进度/配置)
+   → SQL 查询 index.db(核心实体/按需实体)
+   → RAG 检索(相关场景)
    → 输出上下文包 JSON
 
 2. Writer 生成章节内容
@@ -80,8 +106,8 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
 5. Data Agent 处理数据链
    → AI 实体提取(替代 XML 标签解析)
    → 实体消歧(置信度策略)
-   → 更新 state.json (entities_v3 + alias_index + 进度/消歧记录)
-   → 更新 index.db
+   → 写入 index.db(实体/别名/状态变化/关系)
+   → 更新 state.json(进度/主角快照)
    → 向量嵌入 (RAG)
    → 风格样本评估
 
@@ -90,7 +116,7 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
 
 > `update_state.py` 用于手动/脚本化更新 `progress`/`protagonist_state`/`strand_tracker` 等字段;主流程通常由 Data Agent 在处理数据链时同步推进进度。
 
-## state.json 核心字段 (v5.0)
+## state.json 精简结构 (v5.1)
 
 ```json
 {
@@ -102,22 +128,6 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
     "location": {"current": "", "last_chapter": 0},
     "golden_finger": {"name": "", "level": 1, "skills": []}
   },
-  "entities_v3": {
-    "角色": {"entity_id": {"canonical_name": "", "aliases": [], "tier": "", "current": {}, "history": []}},
-    "地点": {},
-    "物品": {},
-    "势力": {},
-    "招式": {}
-  },
-  "alias_index": {
-    "别名": [{"type": "角色", "id": "entity_id"}]
-  },
-  "relationships": {},
-  "structured_relationships": [],
-  "disambiguation_warnings": [],
-  "disambiguation_pending": [],
-  "plot_threads": {"active_threads": [], "foreshadowing": []},
-  "world_settings": {},
   "strand_tracker": {
     "last_quest_chapter": 0,
     "last_fire_chapter": 0,
@@ -126,10 +136,71 @@ Context Agent (读) ←→ 数据存储 ←→ Data Agent (写)
     "chapters_since_switch": 0,
     "history": []
   },
-  "review_checkpoints": []
+  "relationships": {},
+  "plot_threads": {"active_threads": [], "foreshadowing": []},
+  "world_settings": {},
+  "disambiguation_warnings": [],
+  "disambiguation_pending": [],
+  "review_checkpoints": [],
+  "_migrated_to_sqlite": true
 }
 ```
 
+> **v5.1 变更**: entities_v3、alias_index、state_changes、structured_relationships 已迁移到 index.db,不再存储在 state.json 中。
+
+## index.db 表结构 (v5.1)
+
+```sql
+-- 实体表
+CREATE TABLE entities (
+    id TEXT PRIMARY KEY,
+    type TEXT NOT NULL,           -- 角色/地点/物品/势力/招式
+    canonical_name TEXT NOT NULL,
+    tier TEXT DEFAULT '装饰',     -- 核心/重要/次要/装饰
+    desc TEXT,
+    current_json TEXT,            -- JSON: {realm, location, ...}
+    first_appearance INTEGER,
+    last_appearance INTEGER,
+    is_protagonist INTEGER DEFAULT 0,
+    is_archived INTEGER DEFAULT 0
+);
+
+-- 别名表(一对多)
+CREATE TABLE aliases (
+    alias TEXT NOT NULL,
+    entity_id TEXT NOT NULL,
+    entity_type TEXT NOT NULL,
+    PRIMARY KEY (alias, entity_id, entity_type)
+);
+
+-- 状态变化表
+CREATE TABLE state_changes (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    entity_id TEXT NOT NULL,
+    field TEXT NOT NULL,
+    old_value TEXT,
+    new_value TEXT,
+    reason TEXT,
+    chapter INTEGER NOT NULL
+);
+
+-- 关系表
+CREATE TABLE relationships (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    from_entity TEXT NOT NULL,
+    to_entity TEXT NOT NULL,
+    type TEXT NOT NULL,
+    description TEXT,
+    chapter INTEGER NOT NULL,
+    UNIQUE(from_entity, to_entity, type)
+);
+
+-- 原有表(保留)
+CREATE TABLE chapters (...);
+CREATE TABLE scenes (...);
+CREATE TABLE appearances (...);
+```
+
 ## Data Agent AI 提取流程
 
 v5.0 不再要求 XML 标签,由 Data Agent 智能提取:
@@ -180,11 +251,44 @@ cat .webnovel/state.json | jq '.progress'
 </example>
 
 <example>
-<input>查询实体别名</input>
+<input>v5.1: 查询实体(SQL)</input>
+<output>
+```bash
+python -m data_modules.index_manager get-entity --id "xiaoyan" --project-root "."
+# 输出: {"id": "xiaoyan", "type": "角色", "canonical_name": "萧炎", ...}
+
+python -m data_modules.index_manager get-core-entities --project-root "."
+# 输出: 所有核心实体(主角 + tier=核心/重要)
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 按别名查找实体(一对多)</input>
+<output>
+```bash
+python -m data_modules.index_manager get-by-alias --alias "天云宗" --project-root "."
+# 输出: [{"id": "loc_tianyunzong", "type": "地点"}, {"id": "faction_tianyunzong", "type": "势力"}]
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 查询状态变化</input>
+<output>
+```bash
+python -m data_modules.index_manager get-state-changes --entity "xiaoyan" --limit 10 --project-root "."
+# 输出: [{entity_id, field, old_value, new_value, reason, chapter}, ...]
+```
+</output>
+</example>
+
+<example>
+<input>v5.1: 查询关系</input>
 <output>
 ```bash
-cat .webnovel/state.json | jq '.alias_index["林天"]'
-# 输出: [{"type": "角色", "id": "lintian"}]
+python -m data_modules.index_manager get-relationships --entity "xiaoyan" --project-root "."
+# 输出: [{from_entity, to_entity, type, description, chapter}, ...]
 ```
 </output>
 </example>
@@ -207,13 +311,26 @@ python -m data_modules.index_manager entity-appearances --entity "lintian" --pro
 </output>
 </example>
 
+<example>
+<input>v5.1: 迁移旧 state.json 到 SQLite</input>
+<output>
+```bash
+python -m data_modules.migrate_state_to_sqlite --project-root "." --backup
+# 自动备份 state.json,迁移数据到 index.db,精简 state.json
+```
+</output>
+</example>
+
 </examples>
 
 <errors>
 ❌ 伏笔状态写成"待回收" → ✅ 使用规范值"未回收"
 ❌ 手工更新忘记加 planted_chapter → ✅ 脚本已自动补全
 ❌ 归档路径混淆 → ✅ 固定为 `.webnovel/archive/*.json`
-❌ alias_index 期望单对象 → ✅ v5.0 使用数组格式(一对多)
-❌ 期望 XML 标签提取 → ✅ v5.0 由 Data Agent AI 自动提取
+❌ alias_index 期望单对象 → ✅ v5.0+ 使用数组格式(一对多)
+❌ 期望 XML 标签提取 → ✅ v5.0+ 由 Data Agent AI 自动提取
 ❌ 使用旧版 data_modules.state_manager schema → ✅ 统一使用 entities_v3 结构
+❌ v5.1 仍从 state.json 读取 entities_v3 → ✅ 改用 SQL 查询 index.db
+❌ v5.1 仍写入 state.json 大数据 → ✅ 改用 SQLite 增量写入
+❌ v5.1 state.json 膨胀 → ✅ 运行迁移脚本: `python -m data_modules.migrate_state_to_sqlite`
 </errors>