Sfoglia il codice sorgente

fix(db): eliminate concurrent-read "database is locked"; add node:sqlite backend (#238)

WAL + busy_timeout were already enabled, so the issue's suggested fix was a
no-op. The real causes, addressed here:

- busy_timeout is now set first (before journal_mode) and lowered 120s -> 5s,
  so open-time pragmas wait out a lock instead of hanging for two minutes.
- getCodeGraph no longer opens a second connection to the default project when
  a tool passes its own projectPath (the in-process lock amplifier).
- The wasm fallback (no WAL) gets a bounded read-retry on SQLITE_BUSY.
- New: node:sqlite backend, preferred over wasm, so installs whose native
  better-sqlite3 build fails land on a real-WAL backend instead of no-WAL wasm.
- codegraph status / codegraph_status now report the effective journal mode, so
  a lock report is triageable (wal vs delete).
- CLI hard-blocks Node < 20 to actually enforce the engines floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby McHenry 1 mese fa
parent
commit
18f7ec02c0

+ 4 - 3
README.md

@@ -456,10 +456,10 @@ The `.codegraph/config.json` file controls indexing:
 
 **Indexing is slow** — Check that `node_modules` and other large directories are excluded. Use `--quiet` to reduce output overhead.
 
-**Indexing is slow / MCP `database is locked` / WASM fallback active** — `codegraph` ships with a WASM SQLite fallback for environments where `better-sqlite3` (a native module, declared as `optionalDependencies`) can't install. The fallback is 5-10x slower than the native backend and uses a journal mode that lets writers block readers, so MCP queries can also hit `database is locked` while indexing runs. Run `codegraph status` and look at the `Backend:` line:
+**Indexing is slow, or MCP hits `database is locked`** — both trace to the SQLite backend. `codegraph` picks the best available, in order: native `better-sqlite3` (fastest; an `optionalDependencies` native module), then Node's built-in `node:sqlite` (Node ≥ 22.5), then a bundled WASM build. Run `codegraph status` and read the **`Backend:`** and **`Journal:`** lines:
 
-- `Backend: native` — you're on the fast path, nothing to do.
-- `Backend: wasm` — you're on the slow fallback. Common causes: missing C build tools, prebuilt binary unavailable for your Node version, or your Node version changed after install. Fix:
+- `Backend: native` or `node:sqlite` with `Journal: wal` — fast path with lock-free concurrent reads; nothing to do.
+- `Backend: wasm` — the native module didn't load *and* `node:sqlite` is unavailable (Node < 22.5). WASM is 5-10x slower and has no WAL, so heavy concurrent use can briefly hit `database is locked`. The simplest fix is Node ≥ 22.5 (you get `node:sqlite` automatically); otherwise restore the native backend:
 
   ```bash
   # macOS
@@ -479,6 +479,7 @@ The `.codegraph/config.json` file controls indexing:
   ```
 
   After the fix, `codegraph status` should show `Backend: native`.
+- `Journal:` shows anything other than `wal` on a `native` / `node:sqlite` backend — WAL couldn't be enabled on this filesystem (common on network shares and WSL2 `/mnt`), so reads can block on writes. Move the project (with its `.codegraph/` folder) onto a local disk.
 
 **MCP server not connecting** — Ensure the project is initialized/indexed, verify the path in your MCP config, and check that `codegraph serve --mcp` works from the command line.
 

+ 285 - 0
__tests__/concurrent-locking.test.ts

@@ -0,0 +1,285 @@
+/**
+ * Issue #238 — "database is locked" on concurrent MCP tool calls.
+ *
+ * The reporter's suggested fix (enable WAL / busy_timeout) was already in place,
+ * so these tests pin the ACTUAL fixes:
+ *  1. busy_timeout is a bounded few-second wait (not a 2-minute hang) and WAL is
+ *     active on the native backend — the property concurrent reads rely on.
+ *  2. The MCP ToolHandler reuses the default instance when a tool passes a
+ *     projectPath pointing at the default project, instead of opening a SECOND
+ *     connection to the same DB (the lock amplifier).
+ *  3. The wasm backend (which can't do WAL) retries reads on SQLITE_BUSY.
+ */
+
+import { describe, it, expect, beforeAll, afterAll, vi } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import CodeGraph from '../src';
+import { ToolHandler } from '../src/mcp/tools';
+import { DatabaseConnection } from '../src/db';
+import { withBusyRetry, isDatabaseLockedError } from '../src/db/sqlite-adapter';
+
+// The bundled wasm fallback backend — the one the actual reporters run on and the
+// only one without WAL. Loaded the same way the adapter loads it (CJS require).
+// eslint-disable-next-line @typescript-eslint/no-require-imports
+const { Database: WasmDatabase } = require('node-sqlite3-wasm');
+
+/** Normalize a PRAGMA read across backends (array | object | scalar) to a value. */
+function pragmaValue(raw: unknown, key: string): unknown {
+  const row = Array.isArray(raw) ? raw[0] : raw;
+  if (row !== null && typeof row === 'object') return (row as Record<string, unknown>)[key];
+  return row;
+}
+
+describe('issue #238 — connection PRAGMAs (#1)', () => {
+  let dir: string;
+  let conn: DatabaseConnection;
+
+  beforeAll(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg238-pragma-'));
+    conn = DatabaseConnection.initialize(path.join(dir, 'codegraph.db'));
+  });
+
+  afterAll(() => {
+    conn.close();
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('uses a bounded busy_timeout, not the old 2-minute hang', () => {
+    const ms = Number(pragmaValue(conn.getDb().pragma('busy_timeout'), 'timeout'));
+    expect(ms).toBeGreaterThan(0);
+    expect(ms).toBeLessThanOrEqual(30000); // far below the old 120000
+  });
+
+  it('runs WAL on native (the mode that lets readers proceed during a write)', () => {
+    const mode = String(pragmaValue(conn.getDb().pragma('journal_mode'), 'journal_mode')).toLowerCase();
+    // Native supports WAL; the wasm fallback is forced to DELETE (no WAL).
+    expect(mode).toBe(conn.getBackend() === 'wasm' ? 'delete' : 'wal');
+  });
+
+  it('getJournalMode() surfaces the effective mode for status triage', () => {
+    // The conclusive data point for triaging "database is locked": 'wal' means
+    // readers can't be blocked by a writer; anything else means they can.
+    const mode = conn.getJournalMode();
+    expect(mode).toBe(conn.getBackend() === 'wasm' ? 'delete' : 'wal');
+  });
+});
+
+describe('issue #238 — native WAL lets a reader proceed during a writer', () => {
+  let dir: string;
+
+  beforeAll(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg238-wal-'));
+  });
+
+  afterAll(() => {
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('a read on a 2nd connection succeeds while a writer holds the lock', () => {
+    const dbPath = path.join(dir, 'codegraph.db');
+    const writer = DatabaseConnection.initialize(dbPath);
+    // This property only holds under WAL; on the wasm fallback (DELETE) an
+    // EXCLUSIVE writer correctly blocks readers, so the assertion is native-only.
+    if (writer.getBackend() !== 'native') {
+      writer.close();
+      return;
+    }
+    const reader = DatabaseConnection.open(dbPath);
+    try {
+      writer.getDb().prepare('BEGIN EXCLUSIVE').run(); // hard write lock, held open
+      const t0 = Date.now();
+      const row = reader.getDb().prepare('SELECT COUNT(*) AS c FROM nodes').get() as { c: number };
+      const waited = Date.now() - t0;
+      expect(row.c).toBe(0);
+      expect(waited).toBeLessThan(1000); // proceeds immediately, no busy wait
+    } finally {
+      try { writer.getDb().prepare('COMMIT').run(); } catch { /* ignore */ }
+      reader.close();
+      writer.close();
+    }
+  });
+});
+
+describe('issue #238 — ToolHandler reuses the default instance (#2)', () => {
+  let dir: string;
+  let cg: CodeGraph;
+  let root: string;
+  let handler: ToolHandler;
+
+  beforeAll(async () => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg238-tools-'));
+    fs.writeFileSync(path.join(dir, 'a.ts'), 'export function helper(): number { return 1; }\n');
+    fs.writeFileSync(
+      path.join(dir, 'b.ts'),
+      "import { helper } from './a';\nexport function main(): number { return helper(); }\n"
+    );
+    cg = await CodeGraph.init(dir, { index: true });
+    root = cg.getProjectRoot();
+    handler = new ToolHandler(cg);
+  });
+
+  afterAll(() => {
+    cg.close();
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('getCodeGraph(defaultRoot) returns the default instance, not a new connection', () => {
+    const openSpy = vi.spyOn(CodeGraph, 'openSync');
+    try {
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const resolved = (handler as any).getCodeGraph(root);
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const nested = (handler as any).getCodeGraph(path.join(root, 'does', 'not', 'exist'));
+      expect(resolved).toBe(cg);
+      expect(nested).toBe(cg); // a sub-path resolves up to the same default project
+      expect(openSpy).not.toHaveBeenCalled(); // no second connection opened
+    } finally {
+      openSpy.mockRestore();
+    }
+  });
+
+  it('concurrent read tool calls (mixed projectPath) all succeed without "database is locked"', async () => {
+    const openSpy = vi.spyOn(CodeGraph, 'openSync');
+    try {
+      const calls: Promise<{ content: Array<{ text: string }>; isError?: boolean }>[] = [
+        handler.execute('codegraph_search', { query: 'helper' }),
+        handler.execute('codegraph_search', { query: 'helper', projectPath: root }),
+        handler.execute('codegraph_callers', { symbol: 'helper', projectPath: root }),
+        handler.execute('codegraph_callees', { symbol: 'main' }),
+        handler.execute('codegraph_files', { projectPath: root }),
+        handler.execute('codegraph_status', { projectPath: root }),
+      ];
+      const results = await Promise.all(calls);
+      for (const r of results) {
+        expect(r.isError).not.toBe(true);
+        expect(r.content[0]?.text ?? '').not.toMatch(/database is locked/i);
+      }
+      // Passing the default project's own path must not open a second connection.
+      expect(openSpy).not.toHaveBeenCalled();
+    } finally {
+      openSpy.mockRestore();
+    }
+  });
+});
+
+describe('issue #238 — withBusyRetry / isDatabaseLockedError (#3)', () => {
+  const locked = () => Object.assign(new Error('database is locked'), { code: 'SQLITE_BUSY' });
+
+  it('retries a locked read and then succeeds', () => {
+    const sleeps: number[] = [];
+    let calls = 0;
+    const result = withBusyRetry(
+      () => {
+        calls++;
+        if (calls < 3) throw locked();
+        return 'ok';
+      },
+      { attempts: 5, backoffMs: [10, 20], sleep: (ms) => sleeps.push(ms) }
+    );
+    expect(result).toBe('ok');
+    expect(calls).toBe(3);
+    expect(sleeps).toEqual([10, 20]); // backed off between the two retries
+  });
+
+  it('gives up after the attempt budget and rethrows the lock error', () => {
+    let calls = 0;
+    expect(() =>
+      withBusyRetry(
+        () => { calls++; throw locked(); },
+        { attempts: 3, backoffMs: [0], sleep: () => {} }
+      )
+    ).toThrow(/database is locked/i);
+    expect(calls).toBe(3);
+  });
+
+  it('does not retry a non-lock error', () => {
+    let calls = 0;
+    expect(() =>
+      withBusyRetry(
+        () => { calls++; throw new Error('no such table: nodes'); },
+        { attempts: 5, sleep: () => {} }
+      )
+    ).toThrow(/no such table/);
+    expect(calls).toBe(1);
+  });
+
+  it('isDatabaseLockedError recognizes lock errors across backends', () => {
+    expect(isDatabaseLockedError(Object.assign(new Error('x'), { code: 'SQLITE_BUSY' }))).toBe(true);
+    expect(isDatabaseLockedError(Object.assign(new Error('x'), { code: 'SQLITE_LOCKED' }))).toBe(true);
+    expect(isDatabaseLockedError(new Error('database is locked'))).toBe(true);
+    expect(isDatabaseLockedError(new Error('database is busy'))).toBe(true);
+    expect(isDatabaseLockedError(new Error('SQLITE_BUSY: database is locked'))).toBe(true);
+    expect(isDatabaseLockedError(new Error('no such column'))).toBe(false);
+    expect(isDatabaseLockedError(null)).toBe(false);
+  });
+});
+
+describe('issue #238 — wasm backend rides out a REAL lock via retry (#3, end-to-end)', () => {
+  // Exercises an actual node-sqlite3-wasm connection against a real held write
+  // lock — the path the reporters are on. Native (WAL) never reaches this code,
+  // so it cannot be covered by the native CI backend; we drive wasm directly.
+  let dir: string;
+  let dbPath: string;
+  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+  let writer: any;
+  // eslint-disable-next-line @typescript-eslint/no-explicit-any
+  let reader: any;
+
+  beforeAll(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg238-wasm-'));
+    dbPath = path.join(dir, 'codegraph.db');
+    const seed = new WasmDatabase(dbPath);
+    seed.exec('PRAGMA journal_mode = DELETE'); // what the adapter forces for wasm (no WAL)
+    seed.exec('CREATE TABLE nodes(id INTEGER PRIMARY KEY, name TEXT)');
+    seed.exec("INSERT INTO nodes(name) VALUES ('seed')");
+    seed.close();
+  });
+
+  afterAll(() => {
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  beforeEach(() => {
+    writer = new WasmDatabase(dbPath);
+    writer.exec('BEGIN EXCLUSIVE');                       // real, held write lock
+    writer.exec("INSERT INTO nodes(name) VALUES ('writer')");
+    reader = new WasmDatabase(dbPath);                    // separate connection, no busy wait
+  });
+
+  afterEach(() => {
+    try { reader.close(); } catch { /* ignore */ }
+    try { writer.close(); } catch { /* ignore */ }
+  });
+
+  it('precondition: a wasm read hits a real lock while a writer holds EXCLUSIVE', () => {
+    expect(() => reader.get('SELECT COUNT(*) AS c FROM nodes')).toThrow(/lock|busy/i);
+  });
+
+  it('withBusyRetry rides out a writer that clears mid-wait → the read succeeds', () => {
+    let released = false;
+    // The injected sleep stands in for the gap during which a cross-process
+    // writer finishes; we release the held lock on the first retry. This proves
+    // the wasm read path recovers instead of surfacing "database is locked".
+    const row = withBusyRetry(
+      () => reader.get('SELECT COUNT(*) AS c FROM nodes') as { c: number },
+      {
+        attempts: 4,
+        backoffMs: [1],
+        sleep: () => { if (!released) { writer.exec('COMMIT'); released = true; } },
+      }
+    );
+    expect(released).toBe(true);  // the first attempt really did hit the lock and retry
+    expect(row.c).toBe(2);        // seed + writer, visible once the writer committed
+  });
+
+  it('exhausting retries against a writer that never clears still throws a lock error', () => {
+    expect(() =>
+      withBusyRetry(
+        () => reader.get('SELECT COUNT(*) AS c FROM nodes'),
+        { attempts: 3, backoffMs: [1], sleep: () => { /* writer never releases */ } }
+      )
+    ).toThrow(/lock|busy/i);
+  });
+});

+ 76 - 0
__tests__/node-sqlite-backend.test.ts

@@ -0,0 +1,76 @@
+/**
+ * node:sqlite backend (issue #238 follow-up).
+ *
+ * Proves Node's built-in node:sqlite works as a real CodeGraph backend — the
+ * fallback that replaces the no-WAL wasm path when better-sqlite3 can't load.
+ * Forces it via CODEGRAPH_SQLITE_BACKEND and drives a real index + queries, so
+ * WAL, FTS5 search, and @named-param writes are all exercised end-to-end.
+ *
+ * Skipped on Node < 22.5 where node:sqlite doesn't exist.
+ */
+
+import { describe, it, expect, beforeAll, afterAll } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import CodeGraph from '../src';
+
+let nodeSqliteAvailable = false;
+try {
+  // eslint-disable-next-line @typescript-eslint/no-require-imports
+  require('node:sqlite');
+  nodeSqliteAvailable = true;
+} catch {
+  nodeSqliteAvailable = false;
+}
+
+describe.skipIf(!nodeSqliteAvailable)('node:sqlite backend — real index + queries', () => {
+  let dir: string;
+  let cg: CodeGraph;
+  const prevEnv = process.env.CODEGRAPH_SQLITE_BACKEND;
+
+  beforeAll(async () => {
+    process.env.CODEGRAPH_SQLITE_BACKEND = 'node-sqlite'; // force the backend under test
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-nodesqlite-'));
+    fs.writeFileSync(path.join(dir, 'a.ts'), 'export function helper(): number { return 1; }\n');
+    fs.writeFileSync(
+      path.join(dir, 'b.ts'),
+      "import { helper } from './a';\nexport function main(): number { return helper(); }\n"
+    );
+    cg = await CodeGraph.init(dir, { index: true });
+  });
+
+  afterAll(() => {
+    cg?.close();
+    if (prevEnv === undefined) delete process.env.CODEGRAPH_SQLITE_BACKEND;
+    else process.env.CODEGRAPH_SQLITE_BACKEND = prevEnv;
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('actually selected the node:sqlite backend (env override took effect)', () => {
+    expect(cg.getBackend()).toBe('node-sqlite');
+  });
+
+  it('runs in WAL mode — the whole reason it beats the wasm fallback', () => {
+    expect(cg.getJournalMode()).toBe('wal');
+  });
+
+  it('indexed the project (write path: @named-param INSERTs via node:sqlite)', () => {
+    const stats = cg.getStats();
+    expect(stats.fileCount).toBe(2);
+    expect(stats.nodeCount).toBeGreaterThan(0);
+  });
+
+  it('FTS5 search returns the indexed symbol (read path)', () => {
+    const results = cg.searchNodes('helper');
+    const names = results.map(r => r.node.name);
+    expect(names).toContain('helper');
+  });
+
+  it('graph traversal resolves the cross-file caller', () => {
+    const helper = cg.searchNodes('helper').find(r => r.node.name === 'helper');
+    expect(helper).toBeTruthy();
+    const callers = cg.getCallers(helper!.node.id);
+    expect(callers.map(c => c.node.name)).toContain('main');
+  });
+});

+ 27 - 1
__tests__/node-version-check.test.ts

@@ -7,7 +7,7 @@
  */
 
 import { describe, it, expect } from 'vitest';
-import { buildNode25BlockBanner } from '../src/bin/node-version-check';
+import { buildNode25BlockBanner, buildNodeTooOldBanner, MIN_NODE_MAJOR } from '../src/bin/node-version-check';
 
 describe('buildNode25BlockBanner', () => {
   it('embeds the reported Node version in the header', () => {
@@ -41,3 +41,29 @@ describe('buildNode25BlockBanner', () => {
     );
   });
 });
+
+describe('buildNodeTooOldBanner', () => {
+  it('embeds the reported Node version in the header', () => {
+    expect(buildNodeTooOldBanner('18.20.0')).toContain(
+      'Unsupported Node.js version: 18.20.0'
+    );
+  });
+
+  it('states the supported floor matching MIN_NODE_MAJOR', () => {
+    expect(MIN_NODE_MAJOR).toBe(20);
+    expect(buildNodeTooOldBanner('18.0.0')).toContain(
+      `requires Node.js ${MIN_NODE_MAJOR} or newer`
+    );
+  });
+
+  it('points users to Node 22 LTS via nvm and Homebrew', () => {
+    const banner = buildNodeTooOldBanner('16.0.0');
+    expect(banner).toContain('Node.js 22 LTS');
+    expect(banner).toContain('nvm install 22');
+    expect(banner).toContain('brew install node@22');
+  });
+
+  it('documents the CODEGRAPH_ALLOW_UNSAFE_NODE override', () => {
+    expect(buildNodeTooOldBanner('18.0.0')).toContain('CODEGRAPH_ALLOW_UNSAFE_NODE=1');
+  });
+});

+ 1 - 1
__tests__/sqlite-backend.test.ts

@@ -70,7 +70,7 @@ describe('DatabaseConnection — per-instance backend reporting', () => {
     const dbPath = path.join(dir, 'test.db');
     const conn = DatabaseConnection.initialize(dbPath);
     const backend = conn.getBackend();
-    expect(['native', 'wasm']).toContain(backend);
+    expect(['native', 'node-sqlite', 'wasm']).toContain(backend);
     conn.close();
   });
 

+ 24 - 3
src/bin/codegraph.ts

@@ -25,7 +25,7 @@ import { getCodeGraphDir, isInitialized } from '../directory';
 import { createShimmerProgress } from '../ui/shimmer-progress';
 import { getGlyphs } from '../ui/glyphs';
 
-import { buildNode25BlockBanner } from './node-version-check';
+import { buildNode25BlockBanner, buildNodeTooOldBanner, MIN_NODE_MAJOR } from './node-version-check';
 
 // Lazy-load heavy modules (CodeGraph, runInstaller) to keep CLI startup fast.
 async function loadCodeGraph(): Promise<typeof import('../index')> {
@@ -63,6 +63,16 @@ if (nodeMajor >= 25) {
   }
   // Override active — banner shown for visibility, continuing.
 }
+// Enforce the supported Node floor. `engines` in package.json only *warns* on
+// install (unless engine-strict), so hard-block here to actually keep users off
+// unsupported versions. Mirrors the 25+ block above. See package.json `engines`.
+if (nodeMajor < MIN_NODE_MAJOR) {
+  process.stderr.write(buildNodeTooOldBanner(nodeVersion) + '\n');
+  if (!process.env.CODEGRAPH_ALLOW_UNSAFE_NODE) {
+    process.exit(1);
+  }
+  // Override active — banner shown for visibility, continuing.
+}
 
 // Check if running with no arguments - run installer
 if (process.argv.length === 2) {
@@ -689,6 +699,7 @@ program
       const stats = cg.getStats();
       const changes = cg.getChangedFiles();
       const backend = cg.getBackend();
+      const journalMode = cg.getJournalMode();
 
       // JSON output mode
       if (options.json) {
@@ -700,6 +711,7 @@ program
           edgeCount: stats.edgeCount,
           dbSizeBytes: stats.dbSizeBytes,
           backend,
+          journalMode,
           nodesByKind: stats.nodesByKind,
           languages: Object.entries(stats.filesByLanguage).filter(([, count]) => count > 0).map(([lang]) => lang),
           pendingChanges: {
@@ -728,10 +740,19 @@ program
       // WASM fallback (5-10x slower). better-sqlite3 is in
       // `optionalDependencies`, so `npm install` succeeds without it
       // when the native build fails.
-      const backendLabel = backend === 'native'
-        ? chalk.green('native')
+      const backendLabel =
+        backend === 'native' ? chalk.green('native')
+        : backend === 'node-sqlite' ? chalk.green(`node:sqlite ${getGlyphs().dash} built-in (full WAL)`)
         : chalk.yellow(`wasm ${getGlyphs().dash} slower fallback; run \`npm rebuild better-sqlite3\``);
       console.log(`  Backend:   ${backendLabel}`);
+      // Effective journal mode: 'wal' means concurrent reads never block on a
+      // writer; anything else means they can ("database is locked"). Native can
+      // silently fall back to DELETE on filesystems without shared-memory
+      // support (network mounts, WSL2 /mnt). See issue #238.
+      const journalLabel = journalMode === 'wal'
+        ? chalk.green('wal')
+        : chalk.yellow(`${journalMode || 'unknown'} ${getGlyphs().dash} WAL inactive; reads can block on writes`);
+      console.log(`  Journal:   ${journalLabel}`);
       console.log();
 
       // Node breakdown

+ 37 - 0
src/bin/node-version-check.ts

@@ -37,3 +37,40 @@ export function buildNode25BlockBanner(nodeVersion: string): string {
     sep,
   ].join('\n');
 }
+
+/**
+ * Lowest supported Node.js major version. Matches the `engines` floor in
+ * package.json. Below this, CodeGraph relies on language features / native APIs
+ * that aren't present, and the combination is untested. `engines` alone only
+ * *warns* on install (unless the user set `engine-strict`), so the CLI bootstrap
+ * also hard-blocks here to actually enforce the floor.
+ */
+export const MIN_NODE_MAJOR = 20;
+
+/**
+ * Build the bordered banner shown when CodeGraph detects a Node.js major below
+ * {@link MIN_NODE_MAJOR}. Pinned via unit test so the recovery commands and the
+ * override env var can't be silently stripped by future edits.
+ *
+ * Uses ASCII glyphs to stay readable on Windows OEM-codepage consoles
+ * (see ../ui/glyphs.ts for the rationale).
+ */
+export function buildNodeTooOldBanner(nodeVersion: string): string {
+  const sep = '-'.repeat(72);
+  return [
+    sep,
+    `[CodeGraph] Unsupported Node.js version: ${nodeVersion}`,
+    sep,
+    `CodeGraph requires Node.js ${MIN_NODE_MAJOR} or newer. Older versions lack`,
+    'language features and native APIs CodeGraph depends on, and are not',
+    'tested or supported.',
+    '',
+    'Fix: install Node.js 22 LTS:',
+    '  nvm install 22 && nvm use 22                          # nvm',
+    '  brew install node@22 && brew link --overwrite --force node@22  # Homebrew',
+    '',
+    'To override (NOT recommended - unsupported):',
+    '  CODEGRAPH_ALLOW_UNSAFE_NODE=1 codegraph ...',
+    sep,
+  ].join('\n');
+}

+ 46 - 22
src/db/index.ts

@@ -12,6 +12,31 @@ import { runMigrations, getCurrentVersion, CURRENT_SCHEMA_VERSION } from './migr
 
 export { SqliteDatabase, SqliteBackend, WASM_FALLBACK_FIX_RECIPE } from './sqlite-adapter';
 
+/**
+ * Apply connection-level PRAGMAs. Shared by `initialize` and `open` so the two
+ * paths can't drift.
+ *
+ * `busy_timeout` is set FIRST, before any pragma that might touch the database
+ * file (notably `journal_mode`). If another process holds a write lock at open
+ * time, the later pragmas — and the connection's first query — then wait out
+ * the lock instead of throwing "database is locked" immediately. See issue #238.
+ *
+ * The 5s window (was 120s) rides out a normal incremental sync; the old
+ * 2-minute wait presented as a frozen, hung agent. Reads on the native WAL
+ * backend never wait at all, so this timeout only governs cross-process write
+ * contention and the wasm fallback — which can't do WAL (the adapter downgrades
+ * it to DELETE) and so layers a bounded read retry on top (see sqlite-adapter).
+ */
+function configureConnection(db: SqliteDatabase): void {
+  db.pragma('busy_timeout = 5000');      // MUST be first — see above
+  db.pragma('foreign_keys = ON');
+  db.pragma('journal_mode = WAL');       // downgraded to DELETE on the wasm backend
+  db.pragma('synchronous = NORMAL');     // safe with WAL mode
+  db.pragma('cache_size = -64000');      // 64 MB page cache
+  db.pragma('temp_store = MEMORY');      // temp tables in memory
+  db.pragma('mmap_size = 268435456');    // 256 MB memory-mapped I/O
+}
+
 /**
  * Database connection wrapper with lifecycle management
  */
@@ -39,17 +64,7 @@ export class DatabaseConnection {
     // Create and configure database
     const { db, backend } = createDatabase(dbPath);
 
-    // Enable foreign keys and WAL mode for better performance
-    db.pragma('foreign_keys = ON');
-    db.pragma('journal_mode = WAL');
-    // Wait up to 2 minutes if database is locked by another process
-    // (indexing operations can hold locks for extended periods)
-    db.pragma('busy_timeout = 120000');
-    // Performance tuning
-    db.pragma('synchronous = NORMAL');     // Safe with WAL mode
-    db.pragma('cache_size = -64000');      // 64 MB page cache
-    db.pragma('temp_store = MEMORY');      // Temp tables in memory
-    db.pragma('mmap_size = 268435456');    // 256 MB memory-mapped I/O
+    configureConnection(db);
 
     // Run schema initialization
     const schemaPath = path.join(__dirname, 'schema.sql');
@@ -77,17 +92,7 @@ export class DatabaseConnection {
 
     const { db, backend } = createDatabase(dbPath);
 
-    // Enable foreign keys and WAL mode
-    db.pragma('foreign_keys = ON');
-    db.pragma('journal_mode = WAL');
-    // Wait up to 2 minutes if database is locked by another process
-    // (indexing operations can hold locks for extended periods)
-    db.pragma('busy_timeout = 120000');
-    // Performance tuning
-    db.pragma('synchronous = NORMAL');
-    db.pragma('cache_size = -64000');
-    db.pragma('temp_store = MEMORY');
-    db.pragma('mmap_size = 268435456');
+    configureConnection(db);
 
     // Check and run migrations if needed
     const conn = new DatabaseConnection(db, dbPath, backend);
@@ -123,6 +128,25 @@ export class DatabaseConnection {
     return this.dbPath;
   }
 
+  /**
+   * The journal mode actually in effect (e.g. 'wal', 'delete').
+   *
+   * SQLite silently keeps the prior mode if WAL can't be enabled — e.g. on
+   * filesystems without shared-memory support (some network/virtualized mounts,
+   * WSL2 /mnt), and always on the wasm backend. So the effective mode can differ
+   * from what `configureConnection` requested. Surfaced in `codegraph status` so
+   * a "database is locked" report is triageable: 'wal' ⇒ readers never block on a
+   * writer; anything else ⇒ they can. See issue #238.
+   */
+  getJournalMode(): string {
+    const raw = this.db.pragma('journal_mode');
+    const row = Array.isArray(raw) ? raw[0] : raw;
+    const mode = row && typeof row === 'object'
+      ? (row as Record<string, unknown>).journal_mode
+      : row;
+    return String(mode ?? '').toLowerCase();
+  }
+
   /**
    * Get current schema version
    */

+ 239 - 31
src/db/sqlite-adapter.ts

@@ -20,7 +20,7 @@ export interface SqliteDatabase {
   readonly open: boolean;
 }
 
-export type SqliteBackend = 'native' | 'wasm';
+export type SqliteBackend = 'native' | 'node-sqlite' | 'wasm';
 
 /**
  * One-line summary of the recovery steps shown when WASM fallback is
@@ -110,6 +110,72 @@ function resolveParams(params: any[], paramOrder: string[] | null): any {
   return params;
 }
 
+/**
+ * Whether an error is SQLite's SQLITE_BUSY / SQLITE_LOCKED ("database is
+ * locked"). Checks better-sqlite3's `code` first, then falls back to message
+ * text for the wasm backend (which throws a plain Error). Exported for tests.
+ */
+export function isDatabaseLockedError(err: unknown): boolean {
+  const code = (err as { code?: unknown } | null)?.code;
+  if (code === 'SQLITE_BUSY' || code === 'SQLITE_LOCKED') return true;
+  const msg = (err instanceof Error ? err.message : String(err)).toLowerCase();
+  return (
+    msg.includes('database is locked') ||
+    msg.includes('database is busy') ||
+    msg.includes('database table is locked') ||
+    msg.includes('sqlite_busy') ||
+    msg.includes('sqlite_locked')
+  );
+}
+
+/**
+ * Sleep synchronously for `ms` without spinning the CPU. The wasm backend is
+ * single-threaded and synchronous, so an async sleep is useless at the
+ * (synchronous) query call site — we have to actually block this turn while a
+ * writer in another process clears.
+ */
+function sleepSync(ms: number): void {
+  if (ms <= 0) return;
+  Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms);
+}
+
+export interface BusyRetryOptions {
+  /** Total attempts, including the first. */
+  attempts?: number;
+  /** Backoff per retry (ms); the last entry repeats if more retries remain. */
+  backoffMs?: number[];
+  /** Sleep implementation — injectable so tests don't actually wait. */
+  sleep?: (ms: number) => void;
+}
+
+/**
+ * Run a read, retrying on SQLITE_BUSY with bounded backoff.
+ *
+ * Used only by the wasm backend: it can't use WAL (downgraded to DELETE), so a
+ * writer in ANOTHER process (e.g. the git-hook `codegraph sync`) briefly blocks
+ * readers. `busy_timeout` helps but can return immediately when SQLite detects a
+ * would-be deadlock; a short retry rides out the writer. Reads only — never wrap
+ * writes, which run inside transactions guarded by the cross-process FileLock.
+ * The native backend doesn't use this: WAL lets readers proceed during a write.
+ * See issue #238.
+ */
+export function withBusyRetry<T>(fn: () => T, opts: BusyRetryOptions = {}): T {
+  const attempts = opts.attempts ?? 3;
+  const backoff = opts.backoffMs ?? [150, 400];
+  const sleep = opts.sleep ?? sleepSync;
+  let lastErr: unknown;
+  for (let i = 0; i < attempts; i++) {
+    try {
+      return fn();
+    } catch (err) {
+      lastErr = err;
+      if (i === attempts - 1 || !isDatabaseLockedError(err)) throw err;
+      sleep(backoff.length > 0 ? backoff[Math.min(i, backoff.length - 1)]! : 0);
+    }
+  }
+  throw lastErr;
+}
+
 /**
  * Wraps node-sqlite3-wasm to match the better-sqlite3 interface.
  *
@@ -151,12 +217,18 @@ class WasmDatabaseAdapter implements SqliteDatabase {
         };
       },
       get(...params: any[]) {
-        const resolved = resolveParams(params, paramOrder);
-        return resolved !== undefined ? stmt.get(resolved) : stmt.get();
+        // Reads retry on SQLITE_BUSY — the wasm backend has no WAL, so a writer
+        // in another process can briefly block this read. See issue #238.
+        return withBusyRetry(() => {
+          const resolved = resolveParams(params, paramOrder);
+          return resolved !== undefined ? stmt.get(resolved) : stmt.get();
+        });
       },
       all(...params: any[]) {
-        const resolved = resolveParams(params, paramOrder);
-        return resolved !== undefined ? stmt.all(resolved) : stmt.all();
+        return withBusyRetry(() => {
+          const resolved = resolveParams(params, paramOrder);
+          return resolved !== undefined ? stmt.all(resolved) : stmt.all();
+        });
       },
     };
   }
@@ -229,39 +301,175 @@ class WasmDatabaseAdapter implements SqliteDatabase {
 }
 
 /**
- * Create a database connection. Tries native better-sqlite3 first,
- * falls back to node-sqlite3-wasm. Returns the active backend
- * alongside the db so each `DatabaseConnection` can report its own
- * backend per-instance — MCP can open multiple project DBs in one
- * process (`tools.ts` getCodeGraph cache), so a process-global would
- * race / overwrite.
+ * Wraps Node's built-in `node:sqlite` (`DatabaseSync`) to match the
+ * better-sqlite3 interface.
+ *
+ * Unlike the wasm adapter this is REAL SQLite compiled into Node, so it supports
+ * WAL, FTS5, mmap, and `@named` params natively — the only shims needed are the
+ * better-sqlite3 conveniences node:sqlite omits: a `.pragma()` helper, a
+ * `.transaction()` helper, and `open` (node:sqlite exposes `isOpen`). It also
+ * needs no statement finalization on close (node-sqlite3-wasm did).
+ *
+ * Available on Node >= 22.5 (the module is simply absent on older Node, so
+ * `createDatabase` falls through to wasm there). The API is still flagged
+ * experimental; `node:sqlite` emits a one-time ExperimentalWarning to stderr on
+ * load, which is harmless for the MCP stdout protocol.
  */
-export function createDatabase(dbPath: string): { db: SqliteDatabase; backend: SqliteBackend } {
-  let nativeError: string | undefined;
-  let wasmError: string | undefined;
+class NodeSqliteAdapter implements SqliteDatabase {
+  private _db: any;
 
-  // Try native better-sqlite3 first
-  try {
+  constructor(dbPath: string) {
     // eslint-disable-next-line @typescript-eslint/no-require-imports
-    const Database = require('better-sqlite3');
-    const db = new Database(dbPath);
-    return { db: db as SqliteDatabase, backend: 'native' };
-  } catch (error) {
-    nativeError = error instanceof Error ? error.message : String(error);
+    const { DatabaseSync } = require('node:sqlite');
+    this._db = new DatabaseSync(dbPath);
+  }
+
+  get open(): boolean {
+    return this._db.isOpen;
+  }
+
+  prepare(sql: string): SqliteStatement {
+    // node:sqlite matches better-sqlite3's calling convention (variadic
+    // positional args, or a single object for @named params), so params forward
+    // through unchanged — no positional translation like the wasm adapter needs.
+    const stmt = this._db.prepare(sql);
+    return {
+      run(...params: any[]) {
+        const r = stmt.run(...params);
+        return {
+          changes: Number(r?.changes ?? 0),
+          lastInsertRowid: r?.lastInsertRowid ?? 0,
+        };
+      },
+      get(...params: any[]) {
+        return stmt.get(...params);
+      },
+      all(...params: any[]) {
+        return stmt.all(...params);
+      },
+    };
+  }
+
+  exec(sql: string): void {
+    this._db.exec(sql);
+  }
+
+  pragma(str: string): any {
+    const trimmed = str.trim();
+    // Write pragma ("key = value"): node:sqlite is real SQLite, so every pragma
+    // (WAL, mmap, synchronous, …) applies as-is — no special-casing like wasm.
+    if (trimmed.includes('=')) {
+      this._db.exec(`PRAGMA ${trimmed}`);
+      return;
+    }
+    // Read pragma: return the row object (e.g. { journal_mode: 'wal' }).
+    return this._db.prepare(`PRAGMA ${trimmed}`).get();
+  }
+
+  transaction<T>(fn: (...args: any[]) => T): (...args: any[]) => T {
+    return (...args: any[]) => {
+      this._db.exec('BEGIN');
+      try {
+        const result = fn(...args);
+        this._db.exec('COMMIT');
+        return result;
+      } catch (error) {
+        this._db.exec('ROLLBACK');
+        throw error;
+      }
+    };
   }
 
-  // Fall back to WASM
-  try {
-    const db = new WasmDatabaseAdapter(dbPath);
-    console.warn(buildWasmFallbackBanner(nativeError));
-    return { db, backend: 'wasm' };
-  } catch (error) {
-    wasmError = error instanceof Error ? error.message : String(error);
+  close(): void {
+    this._db.close();
+  }
+}
+
+/**
+ * Concise stderr notice shown when better-sqlite3 is unavailable but Node's
+ * built-in node:sqlite is, so we use that instead of the slow wasm fallback.
+ * Unlike wasm, node:sqlite has full WAL + FTS5 and near-native speed, so this is
+ * informational — not a "fix me" warning. Exported for tests.
+ */
+export function buildNodeSqliteNotice(nativeError?: string): string {
+  const lines = [
+    '[CodeGraph] better-sqlite3 unavailable — using the built-in node:sqlite backend.',
+    'Full WAL + FTS5 support, no native build required. To restore the (fastest)',
+    `native backend: ${WASM_FALLBACK_FIX_RECIPE}`,
+  ];
+  if (nativeError) lines.push(`(better-sqlite3 load error: ${nativeError})`);
+  return lines.join('\n') + '\n';
+}
+
+/**
+ * Create a database connection, trying backends in order of preference:
+ *   1. better-sqlite3 (native)  — fastest, but needs a compiled binding
+ *   2. node:sqlite (Node ≥22.5) — real WAL + FTS5, no native build, no wasm
+ *   3. node-sqlite3-wasm        — last resort (no WAL); only ancient Node
+ *
+ * node:sqlite sits ahead of wasm so that when the native binding fails to load
+ * (common on Windows / locked-down CI), users land on a backend WITH WAL instead
+ * of the no-WAL wasm path that causes concurrent-read lock errors (issue #238).
+ *
+ * `CODEGRAPH_SQLITE_BACKEND=native|node-sqlite|wasm` forces a single backend
+ * (used for A/B testing and to opt into node:sqlite); a forced backend that
+ * can't load throws rather than silently falling through.
+ *
+ * Returns the active backend alongside the db so each `DatabaseConnection` can
+ * report its own backend per-instance — MCP can open multiple project DBs in one
+ * process, so a process-global would race / overwrite.
+ */
+export function createDatabase(dbPath: string): { db: SqliteDatabase; backend: SqliteBackend } {
+  const forced = (process.env.CODEGRAPH_SQLITE_BACKEND || '').trim().toLowerCase();
+  const errors: { native?: string; nodeSqlite?: string; wasm?: string } = {};
+  const toMsg = (e: unknown) => (e instanceof Error ? e.message : String(e));
+
+  const tryNative = !forced || forced === 'native';
+  const tryNodeSqlite = !forced || forced === 'node-sqlite' || forced === 'node:sqlite';
+  const tryWasm = !forced || forced === 'wasm';
+
+  // 1. Native better-sqlite3
+  if (tryNative) {
+    try {
+      // eslint-disable-next-line @typescript-eslint/no-require-imports
+      const Database = require('better-sqlite3');
+      return { db: new Database(dbPath) as SqliteDatabase, backend: 'native' };
+    } catch (error) {
+      errors.native = toMsg(error);
+    }
+  }
+
+  // 2. Node's built-in node:sqlite (real WAL, no native build)
+  if (tryNodeSqlite) {
+    try {
+      const db = new NodeSqliteAdapter(dbPath);
+      // Announce only when this is a genuine fallback (native was tried & failed),
+      // not when the caller explicitly forced node-sqlite.
+      if (!forced && errors.native) {
+        process.stderr.write(buildNodeSqliteNotice(errors.native));
+      }
+      return { db, backend: 'node-sqlite' };
+    } catch (error) {
+      errors.nodeSqlite = toMsg(error);
+    }
+  }
+
+  // 3. WASM (no WAL) — last resort
+  if (tryWasm) {
+    try {
+      const db = new WasmDatabaseAdapter(dbPath);
+      console.warn(buildWasmFallbackBanner(errors.native));
+      return { db, backend: 'wasm' };
+    } catch (error) {
+      errors.wasm = toMsg(error);
+    }
   }
 
   throw new Error(
-    `Failed to load any SQLite backend.\n` +
-    `  Native (better-sqlite3): ${nativeError}\n` +
-    `  WASM (node-sqlite3-wasm): ${wasmError}`
+    `Failed to load a SQLite backend.\n` +
+    (errors.native ? `  Native (better-sqlite3): ${errors.native}\n` : '') +
+    (errors.nodeSqlite ? `  node:sqlite: ${errors.nodeSqlite}\n` : '') +
+    (errors.wasm ? `  WASM (node-sqlite3-wasm): ${errors.wasm}\n` : '') +
+    (forced ? `  (CODEGRAPH_SQLITE_BACKEND=${forced} restricted which backends were tried)` : '')
   );
 }

+ 10 - 0
src/index.ts

@@ -622,6 +622,16 @@ export class CodeGraph {
     return this.db.getBackend();
   }
 
+  /**
+   * The journal mode actually in effect ('wal', 'delete', …). 'wal' means
+   * readers never block on a concurrent writer; anything else means they can,
+   * which is the precondition for the "database is locked" failures in issue
+   * #238. Surfaced via `codegraph status` and the `codegraph_status` MCP tool.
+   */
+  getJournalMode(): string {
+    return this.db.getJournalMode();
+  }
+
   // ===========================================================================
   // Node Operations
   // ===========================================================================

+ 34 - 1
src/mcp/tools.ts

@@ -542,6 +542,17 @@ export class ToolHandler {
       throw new Error(`CodeGraph not initialized in ${projectPath}. Run 'codegraph init' in that project first.`);
     }
 
+    // If the path resolves to the default project, reuse the already-open
+    // default instance rather than opening a SECOND connection to the same DB.
+    // A duplicate connection serializes reads against the watcher's auto-sync
+    // writes; on the wasm backend (no WAL) that surfaces as intermittent
+    // "database is locked" on concurrent tool calls. See issue #238. Deliberately
+    // not cached under projectPath — the server owns and closes the default
+    // instance, so routing it through projectCache.closeAll() would double-close it.
+    if (this.cg && this.cg.getProjectRoot() === resolvedRoot) {
+      return this.cg;
+    }
+
     // Check if we already have this resolved root cached (different path, same project)
     if (this.projectCache.has(resolvedRoot)) {
       const cg = this.projectCache.get(resolvedRoot)!;
@@ -1327,10 +1338,32 @@ export class ToolHandler {
     const backend = cg.getBackend();
     if (backend === 'native') {
       lines.push(`**Backend:** native (better-sqlite3)`);
+    } else if (backend === 'node-sqlite') {
+      lines.push(
+        `**Backend:** node:sqlite (Node built-in) — full WAL + FTS5. ` +
+        `For maximum speed, restore native: ${WASM_FALLBACK_FIX_RECIPE}`
+      );
     } else {
       lines.push(
         `**Backend:** ⚠ wasm (better-sqlite3 unavailable) — ` +
-        `5-10x slower than native. Fix: ${WASM_FALLBACK_FIX_RECIPE}`
+        `5-10x slower than native, no WAL. Fix: ${WASM_FALLBACK_FIX_RECIPE}`
+      );
+    }
+
+    // Effective journal mode. 'wal' ⇒ concurrent reads never block on a writer;
+    // anything else ⇒ they can ("database is locked"). The wasm backend can't do
+    // WAL, and even native silently falls back to DELETE on filesystems without
+    // shared-memory (network/virtualized mounts, WSL2 /mnt). See issue #238.
+    const journalMode = cg.getJournalMode();
+    if (journalMode === 'wal') {
+      lines.push(`**Journal mode:** wal (concurrent reads safe)`);
+    } else {
+      lines.push(
+        `**Journal mode:** ⚠ ${journalMode || 'unknown'} — WAL not active, so reads ` +
+        `can block on a concurrent write` +
+        // wasm can't do WAL at all; the real-SQLite backends only lack it when the
+        // filesystem doesn't support shared memory (network mounts, WSL2 /mnt).
+        (backend === 'wasm' ? '' : ' (WAL appears unsupported on this filesystem)')
       );
     }