1
0
Эх сурвалжийг харах

fix(mcp): reopen the database when it's replaced on disk instead of serving a deleted inode (#925) (#940)

A long-lived `serve --mcp` process opens `.codegraph/codegraph.db` and holds
the fd for its whole life. If `.codegraph/` is removed and recreated AT THE
SAME PATH while it runs — `git worktree remove <p>` + re-add, or `rm -rf
.codegraph` + `codegraph init` — the held fd points at the now-unlinked inode
and can never see the new index. The server serves the pre-removal snapshot
(renamed/removed symbols still "live", new ones missing); `codegraph sync`
can't refresh it and the CLI (a fresh process) diverges. Only a restart fixed
it — and because the daemon registry is keyed by path, a same-path recreate
routes new clients straight back to the same stale daemon, so the fix has to
self-heal inside the running process.

- DatabaseConnection records the DB file's (dev, ino) at open and exposes
  isReplacedOnDisk() — a different inode now at the same path. POSIX-gated:
  Windows can't unlink an open file and its st_ino is unreliable, so it never
  fires there.
- CodeGraph.reopenIfReplaced() opens the live file first, then swaps the
  connection + query layers IN PLACE (via the new wireLayers() helper), so
  every holder of the instance (the daemon's default project, cached
  projectPath connections) heals without a restart. Closing the dead handle
  also frees the leaked db/-wal/-shm fds pinning the unlinked inode.
- ToolHandler.getCodeGraph calls it (freshen) before serving — one stat() per
  call, a no-op unless the inode actually changed, never throws into a tool.

Tests cover isReplacedOnDisk (unchanged / replaced / absent / Windows-gated)
and an end-to-end reopen that heals a held instance after a same-path recreate
(asserts the pre-heal staleness too). Validated on macOS with a dist probe of
the raw instance and the MCP serving path; full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 өдөр өмнө
parent
commit
e43ac82cdf

+ 1 - 0
CHANGELOG.md

@@ -39,6 +39,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - The file watcher that auto-syncs the graph now fails cleanly when live watching can no longer be trusted, instead of looking healthy while the index quietly goes stale. If the operating system runs out of file-watch resources, or another process holds the write lock far longer than a normal save, CodeGraph now disables auto-sync once — with a single clear message telling you to run `codegraph sync` (or rely on the git sync hooks) to refresh — rather than retrying forever or repeating the same error on a loop. And while auto-sync is disabled, CodeGraph's tool responses (and `codegraph status`) now say so plainly, so your AI agent knows to read files directly instead of trusting a frozen index. This mostly matters for long-running MCP/daemon sessions, which could otherwise keep serving stale results while appearing to work. Thanks @thismilktea. (#876)
 - On Linux, hitting the kernel's inotify watch limit on a large project no longer silently leaves half the tree unwatched. CodeGraph now tells you once — naming the exact setting to raise (`fs.inotify.max_user_watches`, e.g. `sudo sysctl fs.inotify.max_user_watches=1048576`) — and keeps live-watching the directories it could register while `codegraph sync` (or the git sync hooks) covers the rest. (#876)
 - A long-running MCP server now notices when a git worktree gains its own index. Before, if the server (or shared daemon) first saw a worktree before you ran `codegraph init` in it — so the lookup walked up to the main checkout's index — it pinned that decision for its whole life: even after the worktree had its own `.codegraph/`, every query kept hitting the main checkout's index and every result carried a false "this index belongs to a different git working tree" warning, until you restarted the server. The CLI got it right but the MCP server didn't, and re-indexing didn't help. The server now re-checks which index a path belongs to on each call, so the worktree's own index is picked up — and the stale warning drops — without a restart. (#926)
+- A long-running MCP server now recovers when your index is deleted and rebuilt at the same path. If `.codegraph/` was removed and recreated while the server held it open — most easily by recreating a git worktree at the same path, or `rm`-ing `.codegraph/` and running `codegraph init` again — the server kept reading the old, deleted database file and served a frozen snapshot: renamed or removed symbols still showed as live, new ones were missing, and `codegraph sync` couldn't refresh it — only restarting the server fixed it. The server now detects that the database file was swapped out from under it and reopens the live one in place, so results stay correct without a restart. (On Linux and macOS; Windows doesn't allow deleting an open file, so it isn't affected.) (#925)
 
 
 ## [1.0.1] - 2026-06-13

+ 117 - 0
__tests__/db-reopen-on-replace.test.ts

@@ -0,0 +1,117 @@
+/**
+ * Deleted-but-open DB inode self-heal (issue #925).
+ *
+ * A long-lived process (the MCP daemon) opens `.codegraph/codegraph.db` and
+ * holds the file descriptor for its whole life. If `.codegraph/` is removed and
+ * recreated AT THE SAME PATH while it's running — `git worktree remove <p>` then
+ * `git worktree add <p>` + `codegraph init`, or `rm -rf .codegraph` + re-init —
+ * the held fd points at the now-unlinked inode and can never see the new index.
+ * Queries then return the pre-removal snapshot until the process restarts; the
+ * CLI (a fresh process) reads the new inode and diverges.
+ *
+ * The deleted-but-open-inode hazard is POSIX file semantics (an open file can't
+ * be unlinked on Windows, and st_ino is unreliable there), so the recreate
+ * repros are gated to non-Windows; `isReplacedOnDisk` is verified to stay false
+ * on Windows.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+import { DatabaseConnection } from '../src/db';
+import { getCodeGraphDir } from '../src/directory';
+import CodeGraph from '../src/index';
+
+const posixOnly = it.runIf(process.platform !== 'win32');
+const windowsOnly = it.runIf(process.platform === 'win32');
+
+describe('DatabaseConnection.isReplacedOnDisk (issue #925)', () => {
+  let dir: string;
+  let dbPath: string;
+  let conn: DatabaseConnection;
+
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-925-db-'));
+    dbPath = path.join(dir, 'codegraph.db');
+    conn = DatabaseConnection.initialize(dbPath);
+  });
+
+  afterEach(() => {
+    try { conn.close(); } catch { /* may already be closed */ }
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('is false for the file it opened (any platform)', () => {
+    expect(conn.isReplacedOnDisk()).toBe(false);
+  });
+
+  posixOnly('becomes true once a DIFFERENT inode lives at the same path', () => {
+    // Unlink the file we hold open, then create a fresh file at the same path —
+    // a new inode. The held connection should now report itself replaced.
+    fs.rmSync(dbPath);
+    fs.writeFileSync(dbPath, 'not really a db, but a different inode');
+    expect(conn.isReplacedOnDisk()).toBe(true);
+  });
+
+  posixOnly('is false while the file is momentarily absent (mid-recreate)', () => {
+    // Nothing to reopen onto yet — don't claim "replaced" until a new file lands.
+    fs.rmSync(dbPath);
+    expect(conn.isReplacedOnDisk()).toBe(false);
+  });
+
+  windowsOnly('never fires on Windows (no usable inode / open files cannot be unlinked)', () => {
+    expect(conn.isReplacedOnDisk()).toBe(false);
+  });
+});
+
+describe('CodeGraph.reopenIfReplaced (issue #925)', () => {
+  let root: string;
+
+  beforeEach(() => {
+    root = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-925-cg-'));
+    fs.mkdirSync(path.join(root, 'src'));
+    fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export function fooOld() { return 1; }\n');
+  });
+
+  afterEach(() => {
+    fs.rmSync(root, { recursive: true, force: true });
+  });
+
+  posixOnly('heals a held connection after the index is removed and recreated at the same path', async () => {
+    // The "server" opens and holds the DB for its lifetime.
+    const server = CodeGraph.initSync(root);
+    await server.indexAll();
+    expect(server.searchNodes('fooOld').length).toBeGreaterThan(0);
+    expect(server.searchNodes('fooNew').length).toBe(0);
+
+    // Simulate `git worktree remove` + re-add (or rm -rf .codegraph + init):
+    // a NEW index inode at the same path, carrying a renamed symbol, written by
+    // a separate instance (mirrors a fresh `codegraph init` process).
+    fs.rmSync(getCodeGraphDir(root), { recursive: true, force: true });
+    fs.writeFileSync(path.join(root, 'src', 'a.ts'), 'export function fooNew() { return 2; }\n');
+    const fresh = CodeGraph.initSync(root);
+    await fresh.indexAll();
+    fresh.destroy();
+
+    // Pre-heal: the held fd still serves the pre-removal snapshot.
+    expect(server.searchNodes('fooNew').length).toBe(0);
+    expect(server.searchNodes('fooOld').length).toBeGreaterThan(0);
+
+    // Heal in place — the SAME instance now reads the live inode.
+    expect(server.reopenIfReplaced()).toBe(true);
+    expect(server.searchNodes('fooNew').length).toBeGreaterThan(0);
+    expect(server.searchNodes('fooOld').length).toBe(0);
+
+    // Idempotent: nothing changed since, so a second call is a no-op.
+    expect(server.reopenIfReplaced()).toBe(false);
+
+    server.destroy();
+  });
+
+  posixOnly('is a no-op (returns false) when the index has not been replaced', async () => {
+    const server = CodeGraph.initSync(root);
+    await server.indexAll();
+    expect(server.reopenIfReplaced()).toBe(false);
+    server.destroy();
+  });
+});

+ 44 - 0
src/db/index.ts

@@ -44,11 +44,21 @@ export class DatabaseConnection {
   private db: SqliteDatabase;
   private dbPath: string;
   private backend: SqliteBackend;
+  /**
+   * `dev:ino` of the DB file at the moment we opened it (or null when the
+   * platform/filesystem reports no usable inode). Lets us notice when the file
+   * we hold open has been unlinked and REPLACED by a new file at the same path
+   * — a git worktree removed and re-added, or `.codegraph/` deleted and
+   * re-`init`ed under a long-lived server — at which point our fd reads a now
+   * dead inode forever (#925). See `isReplacedOnDisk`.
+   */
+  private openedInode: string | null;
 
   private constructor(db: SqliteDatabase, dbPath: string, backend: SqliteBackend) {
     this.db = db;
     this.dbPath = dbPath;
     this.backend = backend;
+    this.openedInode = statInode(dbPath);
   }
 
   /**
@@ -230,6 +240,40 @@ export class DatabaseConnection {
   isOpen(): boolean {
     return this.db.open;
   }
+
+  /**
+   * True when the DB file at our path has been REPLACED on disk since we opened
+   * it — a different inode now lives at the same path, so the fd we still hold
+   * points at a now-unlinked inode that can never receive new writes (#925).
+   * The trigger is removing and recreating `.codegraph/` at the same path under
+   * a long-lived process (`git worktree remove` + re-add, or `rm -rf
+   * .codegraph` + `codegraph init`). Returns false when the inode is unchanged,
+   * when the file is momentarily absent (mid-recreate — nothing to reopen onto
+   * yet), or when the platform doesn't report a usable inode (Windows can't
+   * unlink an open file and its st_ino is unreliable, so this never fires there).
+   */
+  isReplacedOnDisk(): boolean {
+    if (this.openedInode === null) return false;
+    const current = statInode(this.dbPath);
+    return current !== null && current !== this.openedInode;
+  }
+}
+
+/**
+ * `dev:ino` for a path, or null if it can't be stat'd or the platform doesn't
+ * report a usable inode. Windows st_ino is unreliable across handle reopens, so
+ * we deliberately return null there — the deleted-but-open-inode hazard this
+ * guards (#925) is a POSIX file-semantics issue that doesn't arise on Windows
+ * (an open file can't be unlinked).
+ */
+function statInode(p: string): string | null {
+  if (process.platform === 'win32') return null;
+  try {
+    const s = fs.statSync(p);
+    return `${s.dev}:${s.ino}`;
+  } catch {
+    return null;
+  }
 }
 
 /**

+ 59 - 15
src/index.ts

@@ -132,11 +132,14 @@ export class CodeGraph {
   private db: DatabaseConnection;
   private queries: QueryBuilder;
   private projectRoot: string;
-  private orchestrator: ExtractionOrchestrator;
-  private resolver: ReferenceResolver;
-  private graphManager: GraphQueryManager;
-  private traverser: GraphTraverser;
-  private contextBuilder: ContextBuilder;
+  // Assigned via wireLayers() from the constructor (and again on reopen) — the
+  // `!` tells TS these are definitely set even though the assignment is one
+  // method call away from the constructor body.
+  private orchestrator!: ExtractionOrchestrator;
+  private resolver!: ReferenceResolver;
+  private graphManager!: GraphQueryManager;
+  private traverser!: GraphTraverser;
+  private contextBuilder!: ContextBuilder;
 
   // Mutex for preventing concurrent indexing operations (in-process)
   private indexMutex = new Mutex();
@@ -155,27 +158,68 @@ export class CodeGraph {
     this.db = db;
     this.queries = queries;
     this.projectRoot = projectRoot;
+    this.fileLock = new FileLock(
+      path.join(getCodeGraphDir(projectRoot), 'codegraph.lock')
+    );
+    this.wireLayers();
+  }
+
+  /**
+   * (Re)build the query/extraction/graph layers over the current `this.queries`
+   * (which wraps `this.db`). Factored out of the constructor so `reopenIfReplaced`
+   * can rebuild them against a fresh connection without duplicating the wiring.
+   * The path-based `fileLock` is independent of the DB handle, so it stays put.
+   */
+  private wireLayers(): void {
     // Down-weight the project name as a query term in search ranking — it names
     // the whole repo, not a symbol, so it has no discriminative value (#720).
     try {
-      this.queries.setProjectNameTokens(deriveProjectNameTokens(projectRoot));
+      this.queries.setProjectNameTokens(deriveProjectNameTokens(this.projectRoot));
     } catch {
       // Best-effort: ranking still works without it.
     }
-    this.fileLock = new FileLock(
-      path.join(getCodeGraphDir(projectRoot), 'codegraph.lock')
-    );
-    this.orchestrator = new ExtractionOrchestrator(projectRoot, queries);
-    this.resolver = createResolver(projectRoot, queries);
-    this.graphManager = new GraphQueryManager(queries);
-    this.traverser = new GraphTraverser(queries);
+    this.orchestrator = new ExtractionOrchestrator(this.projectRoot, this.queries);
+    this.resolver = createResolver(this.projectRoot, this.queries);
+    this.graphManager = new GraphQueryManager(this.queries);
+    this.traverser = new GraphTraverser(this.queries);
     this.contextBuilder = createContextBuilder(
-      projectRoot,
-      queries,
+      this.projectRoot,
+      this.queries,
       this.traverser
     );
   }
 
+  /**
+   * Heal a stale database handle in place. If `.codegraph/` was removed and
+   * recreated at the SAME path while this instance held the DB open — a git
+   * worktree removed and re-added, or `rm -rf .codegraph` + `codegraph init` —
+   * our open fd points at the now-unlinked inode and can never see the new
+   * index, so every query returns the pre-removal snapshot until the process
+   * restarts (#925). When that's detected, open the live file at the same path,
+   * rebuild the query layers, and swap them IN PLACE, so every holder of this
+   * instance (the MCP daemon's default project, cached projectPath connections)
+   * heals without a restart. Returns true iff it reopened.
+   *
+   * POSIX-only in practice: `isReplacedOnDisk` never fires on Windows (an open
+   * file can't be unlinked there, and st_ino is unreliable).
+   */
+  reopenIfReplaced(): boolean {
+    if (!this.db.isReplacedOnDisk()) return false;
+    const dbPath = this.db.getPath();
+    // Open the live file FIRST — if that throws (e.g. mid-recreate), the old
+    // handle stays in place and the caller retries on the next query, rather
+    // than leaving this instance with no connection at all.
+    const fresh = DatabaseConnection.open(dbPath);
+    const stale = this.db;
+    this.db = fresh;
+    this.queries = new QueryBuilder(fresh.getDb());
+    this.wireLayers();
+    // Releasing the dead handle also frees the leaked db/-wal/-shm fds that were
+    // pinning the unlinked inode (#925).
+    try { stale.close(); } catch { /* the old inode is gone; closing just frees fds */ }
+    return true;
+  }
+
   // ===========================================================================
   // Lifecycle Methods
   // ===========================================================================

+ 28 - 3
src/mcp/tools.ts

@@ -820,7 +820,7 @@ export class ToolHandler {
           "and don't call codegraph again this session — the user can run 'codegraph init' to enable it."
         );
       }
-      return this.cg;
+      return this.freshen(this.cg);
     }
 
     // Reject sensitive system directories before opening. Only validate a
@@ -863,20 +863,45 @@ export class ToolHandler {
     // "database is locked" on concurrent tool calls. See issue #238. The
     // default instance is owned/closed by the server, so it's never cached.
     if (this.cg && this.cg.getProjectRoot() === resolvedRoot) {
-      return this.cg;
+      return this.freshen(this.cg);
     }
 
     // Cache the open DB connection by RESOLVED ROOT only — never by the input
     // path. One key per instance means closeAll() closes each exactly once, and
     // a changed resolution maps to a different entry instead of a stale hit.
     const cached = this.projectCache.get(resolvedRoot);
-    if (cached) return cached;
+    if (cached) return this.freshen(cached);
 
     const cg = loadCodeGraph().openSync(resolvedRoot);
     this.projectCache.set(resolvedRoot, cg);
     return cg;
   }
 
+  /**
+   * Heal a long-lived connection whose `.codegraph/` was removed and recreated
+   * at the same path (a worktree recreated, or `rm -rf .codegraph` + re-init)
+   * before handing it to a tool. Otherwise the daemon keeps serving the
+   * pre-removal snapshot from its now-unlinked file handle until restart — and
+   * because the daemon registry is keyed by path, a same-path recreate routes
+   * new clients straight back to this same stale daemon (#925). The check is one
+   * stat() and a no-op unless the inode actually changed; it never throws into a
+   * tool call.
+   */
+  private freshen(cg: CodeGraph): CodeGraph {
+    try {
+      if (cg.reopenIfReplaced()) {
+        process.stderr.write(
+          '[CodeGraph MCP] The index was replaced on disk (e.g. a git worktree ' +
+          'recreated at the same path); reopened the live database in place.\n'
+        );
+      }
+    } catch {
+      // Best-effort self-heal — a failed reopen must never break the tool call;
+      // the (still stale) handle keeps serving and the next call retries.
+    }
+    return cg;
+  }
+
   /**
    * Close all cached project connections
    */