소스 검색

fix(db): chunk deleteResolvedReferences IN-list under the SQLite param limit (#1001) (#1023)

deleteResolvedReferences bound every id into a single unbounded
`IN (...)`, so a list longer than SQLITE_MAX_VARIABLE_NUMBER (32766 on
the bundled node:sqlite) threw "too many SQL variables" — the one IN-list
in queries.ts that #540 missed. It's reachable only through the exported
QueryBuilder (library use): the internal resolution path uses
deleteSpecificResolvedReferences, which binds per-row and is immune, so
the CLI/MCP indexing pipeline was never affected. Wrap it in the same
SQLITE_PARAM_CHUNK_SIZE loop every sibling query uses, and add a
regression test (33k ids, past the real 32766 ceiling) that throws
without the fix.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 20 시간 전
부모
커밋
30dc303f4c
3개의 변경된 파일55개의 추가작업 그리고 2개의 파일을 삭제
  1. 1 0
      CHANGELOG.md
  2. 44 0
      __tests__/db-perf.test.ts
  3. 10 2
      src/db/queries.ts

+ 1 - 0
CHANGELOG.md

@@ -22,6 +22,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - Indexing no longer hangs at "Resolving refs" on a repo that commits a large JavaScript/TypeScript theme or SDK. A vendored admin theme (Metronic is the classic case — ~1,300 committed `.js` files) re-declares the same method names (`init`, `update`, `render`, `destroy`, …) on hundreds of widgets, and resolution used to score *every* same-named definition against *every* call — work that grows with the square of how many times a name repeats. On such a repo it pinned a CPU core for 15–30 minutes and effectively never finished. Resolution now declines to guess when a name is defined more times than any real codebase ever repeats one (the cutoff is generous — normal projects top out far below it and are completely unaffected), since no proximity heuristic can pick the one true target among thousands anyway. Indexing that previously wedged now completes in seconds, and precise resolution (imports, qualified names, class-name matches) is unchanged. This is the same class of slowdown as the 1.1.0 import-name fix, now closed for repeated method/symbol names. Tune the cutoff with `CODEGRAPH_AMBIGUOUS_NAME_CEILING` if you ever need to. Thanks @DANOX2 for the detailed report and repro. (#999)
 - Claude Code's front-load prompt hook now fires for non-English prompts. The optional hook that injects CodeGraph context for structural questions only recognized English keywords, so a structural question written in Chinese — or any non-Latin-script language — silently injected nothing: the hook looked like it wasn't wired up despite a correct setup, with no error to explain why. The gate is now language-aware. It recognizes Chinese structural keywords (如何/流程/调用/依赖/实现/架构…), and — in any language — a prompt that names a real code symbol from your project, such as `getUserId`, `article_publish`, `user.login`, or `parseToken()` (the name is checked against the index, so an ordinary word that merely looks like code doesn't trigger it). Non-structural prompts ("fix this typo", in any language) stay a no-op as before, so nothing fires where there's no structural answer to give. Thanks @whinc for the detailed report and repro. (#994)
 - The background auto-sync server now starts for projects kept on an ExFAT or FAT external drive (and some network mounts). Those filesystems don't support the operations the server relies on to coordinate and to listen locally, so it failed immediately and re-logged the same error on every retry — background indexing was broken, so you had to run `codegraph sync` by hand after changes. (The MCP tools, the prompt hook, and manual `codegraph index`/`sync` were unaffected, since none of them need the server.) The server now works around those limitations automatically — falling back to a different coordination method and relocating its local socket to your system temp directory — so background indexing works there exactly like anywhere else, with no configuration needed. Verified end-to-end on real removable-drive filesystems on macOS, Linux, and Windows. Thanks @zengwenliang416 for the detailed report. (#997)
+- If you use CodeGraph as a library, the `QueryBuilder.deleteResolvedReferences()` helper no longer throws "too many SQL variables" when handed a very large list of ids — it issued one unbounded query, so a list longer than SQLite's parameter limit aborted the call. It now splits the work into batches like every other bulk query in the API. CodeGraph's own indexing and reference resolution never called this method (they use a different, already-batched path), so the CLI and MCP server were unaffected. Thanks @inth3shadows for the static analysis. (#1001)
 
 
 ## [1.1.1] - 2026-06-24

+ 44 - 0
__tests__/db-perf.test.ts

@@ -97,6 +97,50 @@ describe('getNodesByIds (batch lookup)', () => {
   });
 });
 
+describe('deleteResolvedReferences (chunking)', () => {
+  let dir: string;
+  let db: DatabaseConnection;
+  let q: QueryBuilder;
+
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'db-perf-delref-'));
+    db = DatabaseConnection.initialize(path.join(dir, 'test.db'));
+    q = new QueryBuilder(db.getDb());
+  });
+
+  afterEach(() => {
+    db.close();
+    if (fs.existsSync(dir)) fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('deletes unresolved refs for more ids than the SQLite parameter limit (#1001)', () => {
+    // Regression: this method bound every id as one parameter in a single
+    // IN (...), so passing more ids than SQLITE_MAX_VARIABLE_NUMBER (32766 on
+    // the bundled node:sqlite) threw "too many SQL variables". Use 33000 to
+    // clear that ceiling. from_node_id has a FK to nodes, so insert nodes first.
+    const nodes = Array.from({ length: 33000 }, (_, i) => makeNode(`n${i}`));
+    q.insertNodes(nodes);
+    q.insertUnresolvedRefsBatch(
+      nodes.map((n) => ({
+        fromNodeId: n.id,
+        referenceName: 'someName',
+        referenceKind: 'calls',
+        line: 1,
+        column: 0,
+      }))
+    );
+    expect(q.getUnresolvedReferencesCount()).toBe(33000);
+
+    const ids = nodes.map((n) => n.id);
+    expect(() => q.deleteResolvedReferences(ids)).not.toThrow();
+    expect(q.getUnresolvedReferencesCount()).toBe(0);
+  });
+
+  it('handles an empty input array', () => {
+    expect(() => q.deleteResolvedReferences([])).not.toThrow();
+  });
+});
+
 describe('insertNode cache invalidation', () => {
   let dir: string;
   let db: DatabaseConnection;

+ 10 - 2
src/db/queries.ts

@@ -1731,8 +1731,16 @@ export class QueryBuilder {
    */
   deleteResolvedReferences(fromNodeIds: string[]): void {
     if (fromNodeIds.length === 0) return;
-    const placeholders = fromNodeIds.map(() => '?').join(',');
-    this.db.prepare(`DELETE FROM unresolved_refs WHERE from_node_id IN (${placeholders})`).run(...fromNodeIds);
+    // Chunk under SQLite's parameter limit, matching every other IN-list in
+    // this file. The internal resolution path uses deleteSpecificResolvedReferences
+    // instead, but QueryBuilder is part of the public API, so a library consumer
+    // passing more ids than SQLITE_MAX_VARIABLE_NUMBER (32766 on the bundled
+    // node:sqlite) would otherwise hit "too many SQL variables". (#540, #1001)
+    for (let i = 0; i < fromNodeIds.length; i += SQLITE_PARAM_CHUNK_SIZE) {
+      const chunk = fromNodeIds.slice(i, i + SQLITE_PARAM_CHUNK_SIZE);
+      const placeholders = chunk.map(() => '?').join(',');
+      this.db.prepare(`DELETE FROM unresolved_refs WHERE from_node_id IN (${placeholders})`).run(...chunk);
+    }
   }
 
   /**