1
0
Эх сурвалжийг харах

fix(resolution): stop "Resolving refs" wedge on theme-vendoring repos; add exclude config + index watchdogs (#999) (#1009)

Three fixes for a repo that commits a large JS/TS theme/SDK (Metronic under
static/, ~1,600 tracked files):

1. A SECOND "Resolving refs" quadratic that #915 didn't cover. #915 capped
   import-name collisions; this caps method-name collisions (init/update/render
   re-declared on every widget), which flow through matchMethodCall Strategy 3
   and findBestMatch instead. New AMBIGUOUS_NAME_CEILING (default 500, env
   CODEGRAPH_AMBIGUOUS_NAME_CEILING): above it the fuzzy strategies decline
   rather than score K candidates — no proximity score can pick the one true
   target among thousands anyway. Resolving drops from O(K^2) to linear in refs
   (e.g. 900-file synthetic: 28.7s -> 3.4s), edge counts unchanged, and the cap
   never fires on normal repos (max real method-collision ~40).

2. A new `exclude` array in codegraph.json keeps git-TRACKED paths out of the
   index, which .gitignore can't do (enumeration is `git ls-files`). Mirrors the
   existing includeIgnored plumbing across the git, sync, and non-git-walk
   paths.

3. `index`/`init` now install the #850 liveness + #277 ppid watchdogs (which
   were serve-only), so a wedged or orphaned indexer self-terminates instead of
   pinning a core. The --liftoff-only relaunch's spawnSync can't forward
   signals, so killing the parent shim used to orphan the worker.

Tests: ubiquitous-name ceiling, exclude (incl. tracked-file exclusion on git +
non-git), orphan self-termination (POSIX), and ppid-parser units. Shared the
ppid parsers out of mcp/index.ts into mcp/ppid-watchdog.ts.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 10 цаг өмнө
parent
commit
45d3293c6a

+ 3 - 0
CHANGELOG.md

@@ -11,12 +11,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### New Features
 
+- You can now exclude committed directories from the index with an `exclude` list in `codegraph.json` — even when they're git-tracked. `.gitignore` can't drop a directory git already tracks, so a vendored theme or SDK that's checked into your repo (a committed Metronic theme under `static/`, a bundled vendor library) had no supported way to be kept out — it just bloated the graph and slowed indexing. Add a root `codegraph.json` with, e.g., `{ "exclude": ["static/", "**/vendor/**"] }` and those paths are skipped on indexing, sync, and file-watching, on both git and non-git projects. Patterns are gitignore-style and matched against repo-root-relative paths. This complements the existing `includeIgnored` (its opposite — opt *in* to gitignored embedded repos). (#999)
 - CodeGraph now follows C/C++ commands that are dispatched through macro-built function-pointer tables, so the handler functions they reach are no longer dead-ends in the graph. Many C projects register a handler into a struct's function-pointer field through a macro and a generated table — redis is the classic case: every command (`getCommand`, `decrbyCommand`, …) is wired into the command struct's `proc` field by a `MAKE_CMD(…)` table that lives in a generated, `#include`-d file, then invoked as `c->cmd->proc(c)`. CodeGraph now reads those macro-built tables — including ones whose struct type is itself a macro alias, whose table sits in an `#include`-d file that is never indexed on its own, or that are wrapped in conditional compilation (`#ifdef`) and defined inline with the struct. It recognizes function-pointer fields declared through a function typedef, and follows the receiver — a chained access (`c->cmd->proc`) or an array subscript through a file-scope table (`(cmdnames[i].cmd_func)(…)`) — across field types. It also follows dispatch through a bare array of function pointers with no struct wrapper at all — the opcode/handler-table pattern common in interpreters and emulators, where a table like `opcodes[op](…)` invokes one of many registered handler functions by index — linking the dispatcher to every handler in the array. The upshot: asking for the callers or blast radius of a command handler now finds the dispatcher that reaches it. For redis, `call` shows up as a caller of every command; for SQLite, the builtin SQL functions registered through `FUNCTION(...)` link to where they're invoked; for Vim, every `:ex` and normal-mode command links from the dispatcher. (#991, extending #932)
 - CodeGraph no longer times out when many agents query it at once. The shared background server that serves all your editor and agent sessions used to run every query on a single thread, so a burst of concurrent requests — for example a swarm of subagents exploring a large monorepo together — queued up behind one another and, while the heavy ones ran, froze the connection so finished answers couldn't even be sent back until the whole batch drained. Past a handful of simultaneous callers that routinely surfaced as MCP request timeouts. The shared server now answers queries across a pool of worker threads, so concurrent requests run in parallel and the connection stays responsive the whole time; when it's genuinely saturated a call returns a brief "busy, retry shortly" note (not an error) instead of hanging past your client's timeout. The pool sizes itself to your machine — roughly one worker per core, leaving one for coordination — and a single editor session is unaffected (no pool, no overhead). Set `CODEGRAPH_QUERY_POOL_SIZE` to choose a specific number of workers, or `0` to revert to single-threaded in-process queries.
 - When CodeGraph's MCP server runs with no default project of its own — started outside any repository (for example behind an MCP gateway), or at a monorepo root whose indexes live in sub-projects — it now marks `projectPath` as a required argument on every tool call. Before, `projectPath` was always optional, so an agent talking to such a server would often omit it, get back guidance to pass it, and not reliably retry — you had to nudge it by hand every time. Now the requirement is part of the tool definition the agent sees, so it supplies the path to the project it's working on the first time. When the server does have a default project — the normal case, launched inside your repo — `projectPath` stays optional and a call without it falls back to that project exactly as before. Thanks @wauxhall for the report. (#993)
 
 ### Fixes
 
+- A `codegraph index` or `codegraph init` that gets orphaned or wedged now stops itself instead of pinning a CPU core forever. If you killed the command (or the terminal/agent that launched it), the underlying indexer process used to keep running in the background — the parent couldn't pass the signal along — and a genuinely stuck index had nothing watching it either, since the self-recovery watchdogs were wired only into the background MCP server. Both gaps are closed: indexing now self-terminates when its parent goes away, and a main thread that stops making progress is killed so it can't hang indefinitely. Opt out with `CODEGRAPH_NO_WATCHDOG=1` (liveness) or `CODEGRAPH_PPID_POLL_MS=0` (orphan detection), matching the server. (#999)
+- Indexing no longer hangs at "Resolving refs" on a repo that commits a large JavaScript/TypeScript theme or SDK. A vendored admin theme (Metronic is the classic case — ~1,300 committed `.js` files) re-declares the same method names (`init`, `update`, `render`, `destroy`, …) on hundreds of widgets, and resolution used to score *every* same-named definition against *every* call — work that grows with the square of how many times a name repeats. On such a repo it pinned a CPU core for 15–30 minutes and effectively never finished. Resolution now declines to guess when a name is defined more times than any real codebase ever repeats one (the cutoff is generous — normal projects top out far below it and are completely unaffected), since no proximity heuristic can pick the one true target among thousands anyway. Indexing that previously wedged now completes in seconds, and precise resolution (imports, qualified names, class-name matches) is unchanged. This is the same class of slowdown as the 1.1.0 import-name fix, now closed for repeated method/symbol names. Tune the cutoff with `CODEGRAPH_AMBIGUOUS_NAME_CEILING` if you ever need to. Thanks @DANOX2 for the detailed report and repro. (#999)
 - Claude Code's front-load prompt hook now fires for non-English prompts. The optional hook that injects CodeGraph context for structural questions only recognized English keywords, so a structural question written in Chinese — or any non-Latin-script language — silently injected nothing: the hook looked like it wasn't wired up despite a correct setup, with no error to explain why. The gate is now language-aware. It recognizes Chinese structural keywords (如何/流程/调用/依赖/实现/架构…), and — in any language — a prompt that names a real code symbol from your project, such as `getUserId`, `article_publish`, `user.login`, or `parseToken()` (the name is checked against the index, so an ordinary word that merely looks like code doesn't trigger it). Non-structural prompts ("fix this typo", in any language) stay a no-op as before, so nothing fires where there's no structural answer to give. Thanks @whinc for the detailed report and repro. (#994)
 
 

+ 165 - 0
__tests__/exclude-config.test.ts

@@ -0,0 +1,165 @@
+/**
+ * `codegraph.json` `exclude` — keep paths out of the index even when git-TRACKED
+ * (#999).
+ *
+ * The escape hatch for a committed vendor/theme/SDK directory (a checked-in
+ * Metronic theme under `static/`) that `.gitignore` cannot drop because git
+ * tracks it. Two layers under test:
+ *   1. Loader: parse/validate/cache, mirroring the `includeIgnored` loader.
+ *   2. Behavior: `scanDirectory` drops excluded paths on BOTH the git
+ *      (`git ls-files`) and non-git (filesystem walk) enumeration paths — and
+ *      crucially for TRACKED files, which is the whole point.
+ *
+ * Invariant: every loader failure mode degrades to the zero-config default
+ * (exclude nothing), never a throw.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { execFileSync } from 'node:child_process';
+import { loadExcludePatterns, loadExtensionOverrides, loadIncludeIgnoredPatterns, clearProjectConfigCache } from '../src/project-config';
+import { scanDirectory } from '../src/extraction';
+
+describe('exclude loader (codegraph.json)', () => {
+  let dir: string;
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-exclude-'));
+    clearProjectConfigCache();
+  });
+  afterEach(() => {
+    clearProjectConfigCache();
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+  const writeConfig = (obj: unknown) =>
+    fs.writeFileSync(
+      path.join(dir, 'codegraph.json'),
+      typeof obj === 'string' ? obj : JSON.stringify(obj)
+    );
+
+  it('returns an empty list when there is no codegraph.json (the default)', () => {
+    expect(loadExcludePatterns(dir)).toEqual([]);
+  });
+
+  it('loads a well-formed pattern array', () => {
+    writeConfig({ exclude: ['static/', '**/vendor/**'] });
+    expect(loadExcludePatterns(dir)).toEqual(['static/', '**/vendor/**']);
+  });
+
+  it('trims whitespace and drops blank / non-string entries', () => {
+    writeConfig({ exclude: ['  static/  ', '', '   ', 42, null, 'vendor/'] });
+    expect(loadExcludePatterns(dir)).toEqual(['static/', 'vendor/']);
+  });
+
+  it('ignores a non-array exclude value without throwing', () => {
+    writeConfig({ exclude: 'static/' });
+    expect(loadExcludePatterns(dir)).toEqual([]);
+  });
+
+  it('ignores malformed JSON without throwing', () => {
+    writeConfig('{ not: valid json ');
+    expect(loadExcludePatterns(dir)).toEqual([]);
+  });
+
+  it('coexists with extensions and includeIgnored in one file (shared single parse)', () => {
+    writeConfig({ extensions: { '.foo': 'typescript' }, includeIgnored: ['pkgs/'], exclude: ['static/'] });
+    expect(loadExtensionOverrides(dir)).toEqual({ '.foo': 'typescript' });
+    expect(loadIncludeIgnoredPatterns(dir)).toEqual(['pkgs/']);
+    expect(loadExcludePatterns(dir)).toEqual(['static/']);
+  });
+
+  it('picks up a changed config (mtime-invalidated cache)', () => {
+    writeConfig({ exclude: ['static/'] });
+    expect(loadExcludePatterns(dir)).toEqual(['static/']);
+
+    writeConfig({ exclude: ['assets/'] });
+    const future = new Date(Date.now() + 2000);
+    fs.utimesSync(path.join(dir, 'codegraph.json'), future, future);
+
+    expect(loadExcludePatterns(dir)).toEqual(['assets/']);
+  });
+
+  it('drops the patterns again when the config file is removed', () => {
+    writeConfig({ exclude: ['static/'] });
+    expect(loadExcludePatterns(dir)).toEqual(['static/']);
+    fs.rmSync(path.join(dir, 'codegraph.json'));
+    expect(loadExcludePatterns(dir)).toEqual([]);
+  });
+});
+
+describe('exclude behavior — scanDirectory drops excluded paths (#999)', () => {
+  let dir: string;
+  const mk = (rel: string, content = 'export const x = 1;\n') => {
+    const p = path.join(dir, rel);
+    fs.mkdirSync(path.dirname(p), { recursive: true });
+    fs.writeFileSync(p, content);
+  };
+  const writeConfig = (obj: unknown) =>
+    fs.writeFileSync(path.join(dir, 'codegraph.json'), JSON.stringify(obj));
+
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-exclude-scan-'));
+    clearProjectConfigCache();
+  });
+  afterEach(() => {
+    clearProjectConfigCache();
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  const gitInit = () => {
+    execFileSync('git', ['init', '-q'], { cwd: dir });
+    execFileSync('git', ['add', '-A'], { cwd: dir });
+    execFileSync('git', ['-c', 'user.email=a@b.c', '-c', 'user.name=t', 'commit', '-qm', 'x'], { cwd: dir });
+  };
+
+  it('keeps a TRACKED excluded dir out of the index (git path) — the core fix', () => {
+    mk('app/main.ts');
+    mk('static/theme/widget1.js');
+    mk('static/theme/widget2.js');
+    gitInit(); // static/ is now git-TRACKED — .gitignore could not drop it
+
+    // Sanity: without exclude the tracked theme IS indexed.
+    let files = scanDirectory(dir).map((f) => f.replace(/\\/g, '/'));
+    expect(files).toContain('app/main.ts');
+    expect(files.some((f) => f.startsWith('static/'))).toBe(true);
+
+    // With exclude the tracked theme is gone, app code stays.
+    writeConfig({ exclude: ['static/'] });
+    clearProjectConfigCache();
+    files = scanDirectory(dir).map((f) => f.replace(/\\/g, '/'));
+    expect(files).toContain('app/main.ts');
+    expect(files.some((f) => f.startsWith('static/'))).toBe(false);
+  });
+
+  it('excludes a tracked dir on the non-git filesystem-walk path too', () => {
+    mk('app/main.ts');
+    mk('static/theme/widget1.js');
+    // No git init → scanDirectory falls back to the filesystem walk.
+    writeConfig({ exclude: ['static/'] });
+    clearProjectConfigCache();
+    const files = scanDirectory(dir).map((f) => f.replace(/\\/g, '/'));
+    expect(files).toContain('app/main.ts');
+    expect(files.some((f) => f.startsWith('static/'))).toBe(false);
+  });
+
+  it('supports a double-star glob', () => {
+    mk('src/a.ts');
+    mk('packages/x/vendor/lib1.js');
+    mk('packages/y/vendor/lib2.js');
+    gitInit();
+    writeConfig({ exclude: ['**/vendor/**'] });
+    clearProjectConfigCache();
+    const files = scanDirectory(dir).map((f) => f.replace(/\\/g, '/'));
+    expect(files).toContain('src/a.ts');
+    expect(files.some((f) => f.includes('/vendor/'))).toBe(false);
+  });
+
+  it('is a no-op with no exclude config (everything indexed)', () => {
+    mk('app/main.ts');
+    mk('static/theme/widget1.js');
+    gitInit();
+    const files = scanDirectory(dir).map((f) => f.replace(/\\/g, '/'));
+    expect(files).toContain('app/main.ts');
+    expect(files.some((f) => f.startsWith('static/'))).toBe(true);
+  });
+});

+ 120 - 0
__tests__/index-orphan-watchdog.test.ts

@@ -0,0 +1,120 @@
+/**
+ * `index` / `init` command supervision regression test (#999, secondary issues).
+ *
+ * `codegraph index` runs in a child re-exec'd with `--liftoff-only` whose parent
+ * blocks in `spawnSync` and so cannot forward a signal — when the parent shim is
+ * killed the indexer used to keep running, orphaned, pinning a CPU core. The
+ * `#850` liveness watchdog and `#277` ppid watchdog were also wired only into
+ * `serve`, never `index`/`init`. `installCommandSupervision` (src/bin/
+ * command-supervision.ts) closes both gaps; this proves the orphan half end to
+ * end: a process running it self-terminates once its parent dies.
+ *
+ * Windows is excluded — `process.kill(pid, 'SIGKILL')` doesn't deliver SIGKILL
+ * there and the reparenting semantics the ppid watchdog relies on are POSIX-only
+ * (same exclusion as mcp-ppid-watchdog.test.ts).
+ */
+import { describe, it, expect, afterEach } from 'vitest';
+import { spawn, ChildProcessWithoutNullStreams } from 'child_process';
+import * as fs from 'fs';
+import * as os from 'os';
+import * as path from 'path';
+
+const SUPERVISION = path.resolve(__dirname, '../dist/bin/command-supervision.js');
+
+function isAlive(pid: number): boolean {
+  try { process.kill(pid, 0); return true; } catch { return false; }
+}
+
+function waitForExit(pid: number, timeoutMs: number): Promise<boolean> {
+  return new Promise((resolve) => {
+    const start = Date.now();
+    const tick = () => {
+      if (!isAlive(pid)) return resolve(true);
+      if (Date.now() - start > timeoutMs) return resolve(false);
+      setTimeout(tick, 100);
+    };
+    tick();
+  });
+}
+
+describe.skipIf(process.platform === 'win32')('index/init orphan supervision (#999)', () => {
+  let wrapper: ChildProcessWithoutNullStreams | null = null;
+  let childPid: number | null = null;
+
+  afterEach(() => {
+    if (wrapper && !wrapper.killed) {
+      try { wrapper.kill('SIGKILL'); } catch { /* already gone */ }
+    }
+    if (childPid !== null && isAlive(childPid)) {
+      try { process.kill(childPid, 'SIGKILL'); } catch { /* already gone */ }
+    }
+    wrapper = null;
+    childPid = null;
+  });
+
+  it("self-terminates when its parent is SIGKILL'd mid-index", async () => {
+    const stderrLog = path.join(
+      fs.mkdtempSync(path.join(os.tmpdir(), 'cg-index-orphan-')),
+      'child.stderr.log',
+    );
+    // The child stands in for a running indexer: it installs the SAME command
+    // supervision `index`/`init` install, then idles on a ref'd timer so it
+    // stays alive until the watchdog (not the timer) takes it down.
+    // CODEGRAPH_NO_WATCHDOG=1 isolates the ppid (orphan) path from the liveness
+    // child; CODEGRAPH_PPID_POLL_MS=200 keeps it responsive in test.
+    const childSrc = `
+      const { installCommandSupervision } = require(${JSON.stringify(SUPERVISION)});
+      installCommandSupervision('index');
+      process.stdout.write('UP ' + process.pid + '\\n');
+      setInterval(() => {}, 60000);
+    `;
+    // The wrapper spawns the child detached (so it's reparented to init when the
+    // wrapper dies, not killed with it), waits for it to report its pid + install
+    // the watchdog, relays the pid, then idles until SIGKILL'd.
+    const wrapperSrc = `
+      const { spawn } = require('child_process');
+      const fs = require('fs');
+      const errFd = fs.openSync(${JSON.stringify(stderrLog)}, 'a');
+      const child = spawn(process.execPath, ['-e', ${JSON.stringify(childSrc)}], {
+        stdio: ['ignore', 'pipe', errFd],
+        env: { ...process.env, CODEGRAPH_NO_WATCHDOG: '1', CODEGRAPH_PPID_POLL_MS: '200', CODEGRAPH_WASM_RELAUNCHED: '1' },
+        detached: true,
+      });
+      child.unref();
+      child.stdout.on('data', (d) => {
+        const m = /UP (\\d+)/.exec(d.toString());
+        if (m) process.stdout.write(JSON.stringify({ pid: Number(m[1]) }) + '\\n');
+      });
+      setInterval(() => {}, 60000);
+    `;
+    wrapper = spawn(process.execPath, ['-e', wrapperSrc], {
+      stdio: ['pipe', 'pipe', 'inherit'],
+    }) as ChildProcessWithoutNullStreams;
+
+    const { pid } = await new Promise<{ pid: number }>((resolve, reject) => {
+      let buf = '';
+      const timer = setTimeout(() => reject(new Error('child did not report its pid in time')), 10000);
+      wrapper!.stdout.on('data', (chunk: Buffer) => {
+        buf += chunk.toString('utf8');
+        const m = buf.match(/\{"pid":(\d+)\}/);
+        if (m) { clearTimeout(timer); resolve({ pid: parseInt(m[1], 10) }); }
+      });
+      wrapper!.on('exit', () => { clearTimeout(timer); reject(new Error('wrapper exited before reporting pid')); });
+    });
+    childPid = pid;
+    expect(isAlive(childPid)).toBe(true);
+
+    // SIGKILL the wrapper — no cleanup runs, just like killing the parent shim.
+    // The child is reparented to init; only its ppid watchdog can take it down.
+    wrapper.kill('SIGKILL');
+
+    const exited = await waitForExit(childPid, 5000);
+    const stderr = fs.existsSync(stderrLog) ? fs.readFileSync(stderrLog, 'utf-8') : '<none>';
+    expect(
+      exited,
+      `child (pid=${childPid}) did not self-terminate within 5s after parent SIGKILL.\nstderr:\n${stderr}`,
+    ).toBe(true);
+    // Confirm it died from the parent-death path, not some other cause.
+    expect(stderr).toMatch(/Parent process exited.*aborting/);
+  }, 20000);
+});

+ 30 - 1
__tests__/ppid-watchdog.test.ts

@@ -10,7 +10,7 @@
  * stubbing `isAlive` and `platform`.
  */
 import { describe, it, expect } from 'vitest';
-import { supervisionLostReason } from '../src/mcp/ppid-watchdog';
+import { supervisionLostReason, parsePpidPollMs, parseHostPpid, DEFAULT_PPID_POLL_MS } from '../src/mcp/ppid-watchdog';
 
 const alive = () => true;
 const dead = () => false;
@@ -136,3 +136,32 @@ describe('supervisionLostReason', () => {
     });
   });
 });
+
+describe('parsePpidPollMs', () => {
+  it('defaults when unset / empty / non-numeric / negative', () => {
+    expect(parsePpidPollMs(undefined)).toBe(DEFAULT_PPID_POLL_MS);
+    expect(parsePpidPollMs('')).toBe(DEFAULT_PPID_POLL_MS);
+    expect(parsePpidPollMs('abc')).toBe(DEFAULT_PPID_POLL_MS);
+    expect(parsePpidPollMs('-5')).toBe(DEFAULT_PPID_POLL_MS);
+  });
+  it('honours a positive override and floors it', () => {
+    expect(parsePpidPollMs('200')).toBe(200);
+    expect(parsePpidPollMs('150.9')).toBe(150);
+  });
+  it('treats 0 as the explicit "disable" sentinel (caller skips the timer)', () => {
+    expect(parsePpidPollMs('0')).toBe(0);
+  });
+});
+
+describe('parseHostPpid', () => {
+  it('returns null for unset / empty / non-integer / orphan-sentinel pids', () => {
+    expect(parseHostPpid(undefined)).toBeNull();
+    expect(parseHostPpid('')).toBeNull();
+    expect(parseHostPpid('x')).toBeNull();
+    expect(parseHostPpid('0')).toBeNull();  // unknown
+    expect(parseHostPpid('1')).toBeNull();  // init = already orphaned
+  });
+  it('returns a real positive pid', () => {
+    expect(parseHostPpid('4242')).toBe(4242);
+  });
+});

+ 162 - 0
__tests__/resolution.test.ts

@@ -270,6 +270,168 @@ describe('Resolution Module', () => {
     });
   });
 
+  describe('Ubiquitous-name ceiling (#999)', () => {
+    // A vendored theme/SDK re-declares the same method name across thousands of
+    // files (Metronic's `init`/`update`/… on every widget). The fuzzy strategies
+    // used to score every same-named candidate per ref — O(K) per ref, O(K²)
+    // total — which pinned a core for 15-28 min at "Resolving refs … 94%". Above
+    // the ceiling they must DECLINE instead, since no proximity/word-overlap
+    // score can pick the one true target among thousands anyway.
+    const CEILING = 500;
+
+    // A spy context: counts how many nodes the strategy actually inspects, so we
+    // can assert the cap short-circuits BEFORE the O(K) scoring (not just that it
+    // returns null).
+    const makeManyMethods = (n: number, name: string): Node[] =>
+      Array.from({ length: n }, (_, i) => ({
+        id: `method:widget${i}.js:Widget${i}.${name}:1`,
+        kind: 'method' as const,
+        name,
+        qualifiedName: `widget${i}.js::Widget${i}::${name}`,
+        filePath: `static/theme/widget${i}.js`,
+        language: 'javascript' as const,
+        startLine: 1,
+        endLine: 5,
+        startColumn: 0,
+        endColumn: 0,
+        updatedAt: Date.now(),
+      }));
+
+    const spyContext = (nodes: Node[]): { ctx: ResolutionContext; lookups: () => number } => {
+      let scanned = 0;
+      const ctx: ResolutionContext = {
+        getNodesInFile: () => [],
+        getNodesByName: (name) => {
+          const hit = nodes.filter((n) => n.name === name);
+          scanned += hit.length;
+          return hit;
+        },
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => true,
+        readFile: () => null,
+        getProjectRoot: () => '/test',
+        getAllFiles: () => [],
+        getNodesByLowerName: () => [],
+        getImportMappings: () => [],
+      };
+      return { ctx, lookups: () => scanned };
+    };
+
+    it('declines a method call (`obj.init`) above the ceiling instead of scoring K candidates', () => {
+      const { ctx } = spyContext(makeManyMethods(CEILING + 1, 'init'));
+      const ref = {
+        fromNodeId: 'method:caller.js:caller:1',
+        referenceName: 'widget.init',
+        referenceKind: 'calls' as const,
+        line: 2,
+        column: 4,
+        filePath: 'static/theme/caller.js',
+        language: 'javascript' as const,
+      };
+      expect(matchReference(ref, ctx)).toBeNull();
+    });
+
+    it('declines a bare exact-name ref above the ceiling', () => {
+      const { ctx } = spyContext(makeManyMethods(CEILING + 1, 'render'));
+      const ref = {
+        fromNodeId: 'method:caller.js:caller:1',
+        referenceName: 'render',
+        referenceKind: 'calls' as const,
+        line: 2,
+        column: 4,
+        filePath: 'static/theme/caller.js',
+        language: 'javascript' as const,
+      };
+      expect(matchReference(ref, ctx)).toBeNull();
+    });
+
+    it('still resolves a SAME-FILE definition when one exists (precise path unaffected)', () => {
+      // Strategy 1 (class-name) and same-file matching are precise — a ubiquitous
+      // name with an unambiguous local target still resolves.
+      const nodes = makeManyMethods(CEILING + 1, 'init');
+      const local: Node = {
+        id: 'class:static/theme/caller.js:Widgetly:1',
+        kind: 'class',
+        name: 'Widgetly',
+        qualifiedName: 'static/theme/caller.js::Widgetly',
+        filePath: 'static/theme/caller.js',
+        language: 'javascript',
+        startLine: 1, endLine: 9, startColumn: 0, endColumn: 0, updatedAt: Date.now(),
+      };
+      const localMethod: Node = {
+        id: 'method:static/theme/caller.js:Widgetly.init:2',
+        kind: 'method',
+        name: 'init',
+        qualifiedName: 'static/theme/caller.js::Widgetly::init',
+        filePath: 'static/theme/caller.js',
+        language: 'javascript',
+        startLine: 2, endLine: 4, startColumn: 0, endColumn: 0, updatedAt: Date.now(),
+      };
+      const all = [...nodes, local, localMethod];
+      const ctx: ResolutionContext = {
+        getNodesInFile: (fp) => all.filter((n) => n.filePath === fp),
+        getNodesByName: (name) => all.filter((n) => n.name === name),
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => true,
+        readFile: () => null,
+        getProjectRoot: () => '/test',
+        getAllFiles: () => [],
+        getNodesByLowerName: () => [],
+        getImportMappings: () => [],
+      };
+      // `Widgetly.init` names the class explicitly → Strategy 1 resolves it.
+      const ref = {
+        fromNodeId: 'method:static/theme/caller.js:caller:6',
+        referenceName: 'Widgetly.init',
+        referenceKind: 'calls' as const,
+        line: 6,
+        column: 4,
+        filePath: 'static/theme/caller.js',
+        language: 'javascript' as const,
+      };
+      const result = matchReference(ref, ctx);
+      expect(result?.targetNodeId).toBe('method:static/theme/caller.js:Widgetly.init:2');
+    });
+
+    it('still scores normally JUST below the ceiling (no behavior change for normal repos)', () => {
+      // Real repos top out near ~40 same-named methods; this proves a sub-ceiling
+      // collision still resolves via proximity, so the cap is invisible to them.
+      const nodes = makeManyMethods(CEILING - 1, 'update');
+      // Make ONE candidate share the caller's directory so proximity picks it.
+      nodes[0] = {
+        ...nodes[0]!,
+        id: 'method:static/theme/app/Widget0.update:1',
+        qualifiedName: 'static/theme/app/widget.js::Widget0::update',
+        filePath: 'static/theme/app/widget.js',
+      };
+      const ctx: ResolutionContext = {
+        getNodesInFile: () => [],
+        getNodesByName: (name) => nodes.filter((n) => n.name === name),
+        getNodesByQualifiedName: () => [],
+        getNodesByKind: () => [],
+        fileExists: () => true,
+        readFile: () => null,
+        getProjectRoot: () => '/test',
+        getAllFiles: () => [],
+        getNodesByLowerName: () => [],
+        getImportMappings: () => [],
+      };
+      const ref = {
+        fromNodeId: 'method:static/theme/app/caller.js:caller:1',
+        referenceName: 'update',
+        referenceKind: 'calls' as const,
+        line: 2,
+        column: 4,
+        filePath: 'static/theme/app/caller.js',
+        language: 'javascript' as const,
+      };
+      // Below the ceiling the fuzzy path runs and resolves SOMETHING (not capped).
+      expect(matchReference(ref, ctx)).not.toBeNull();
+    });
+  });
+
   describe('Import Resolver', () => {
     it('should resolve relative import paths', () => {
       const context: ResolutionContext = {

+ 64 - 50
src/bin/codegraph.ts

@@ -34,6 +34,7 @@ import { getGlyphs } from '../ui/glyphs';
 import { buildNode25BlockBanner, buildNodeTooOldBanner, MIN_NODE_MAJOR } from './node-version-check';
 import { installFatalHandlers } from './fatal-handler';
 import { relaunchWithWasmRuntimeFlagsIfNeeded } from '../extraction/wasm-runtime-flags';
+import { installCommandSupervision } from './command-supervision';
 import { EXTRACTION_VERSION } from '../extraction/extraction-version';
 import { getTelemetry, TELEMETRY_DOCS, recordIndexEvent } from '../telemetry';
 
@@ -506,19 +507,25 @@ program
       // Indexing runs by default now. The legacy -i/--index flag is still
       // accepted (so existing muscle memory and scripts don't break) but is a
       // no-op — initializing always builds the initial index.
+      // Supervise the index: self-terminate if orphaned or wedged (#999).
+      const supervision = installCommandSupervision('init');
       let result: IndexResult;
-      if (options.verbose) {
-        result = await cg.indexAll({
-          onProgress: createVerboseProgress(),
-          verbose: true,
-        });
-      } else {
-        process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
-        const progress = createShimmerProgress();
-        result = await cg.indexAll({
-          onProgress: progress.onProgress,
-        });
-        await progress.stop();
+      try {
+        if (options.verbose) {
+          result = await cg.indexAll({
+            onProgress: createVerboseProgress(),
+            verbose: true,
+          });
+        } else {
+          process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
+          const progress = createShimmerProgress();
+          result = await cg.indexAll({
+            onProgress: progress.onProgress,
+          });
+          await progress.stop();
+        }
+      } finally {
+        supervision.stop();
       }
       printIndexResult(clack, result, projectPath);
       await recordIndexTelemetry(cg, result);
@@ -627,51 +634,58 @@ program
       const { default: CodeGraph } = await loadCodeGraph();
       const cg = await CodeGraph.open(projectPath);
 
-      if (options.quiet) {
-        // Quiet mode: no UI, just run. `index` is a full re-index, so clear the
-        // existing graph and rebuild from scratch (see the note below — #874).
-        cg.clear();
-        const result = await cg.indexAll();
-        if (!result.success) process.exit(1);
-        cg.destroy();
-        return;
-      }
+      // Supervise the indexer: self-terminate if orphaned (parent shim killed)
+      // or if the main thread wedges — neither was guarded on this path (#999).
+      const supervision = installCommandSupervision('index');
+      try {
+        if (options.quiet) {
+          // Quiet mode: no UI, just run. `index` is a full re-index, so clear the
+          // existing graph and rebuild from scratch (see the note below — #874).
+          cg.clear();
+          const result = await cg.indexAll();
+          if (!result.success) process.exit(1);
+          cg.destroy();
+          return;
+        }
 
-      const clack = await importESM('@clack/prompts');
-      clack.intro('Indexing project');
+        const clack = await importESM('@clack/prompts');
+        clack.intro('Indexing project');
 
-      // `index` is a FULL re-index: clear the existing graph and rebuild it from
-      // scratch so the result is identical to a fresh `init`. Without the clear,
-      // indexAll() skips every unchanged file by its content hash and reports
-      // "0 nodes, 0 edges" against the already-populated graph — which reads as
-      // "index wiped my index" (#874). For fast incremental updates use `sync`.
-      cg.clear();
+        // `index` is a FULL re-index: clear the existing graph and rebuild it from
+        // scratch so the result is identical to a fresh `init`. Without the clear,
+        // indexAll() skips every unchanged file by its content hash and reports
+        // "0 nodes, 0 edges" against the already-populated graph — which reads as
+        // "index wiped my index" (#874). For fast incremental updates use `sync`.
+        cg.clear();
 
-      let result: IndexResult;
+        let result: IndexResult;
 
-      if (options.verbose) {
-        result = await cg.indexAll({
-          onProgress: createVerboseProgress(),
-          verbose: true,
-        });
-      } else {
-        process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
-        const progress = createShimmerProgress();
-        result = await cg.indexAll({
-          onProgress: progress.onProgress,
-        });
-        await progress.stop();
-      }
+        if (options.verbose) {
+          result = await cg.indexAll({
+            onProgress: createVerboseProgress(),
+            verbose: true,
+          });
+        } else {
+          process.stdout.write(`${colors.dim}${getGlyphs().rail}${colors.reset}\n`);
+          const progress = createShimmerProgress();
+          result = await cg.indexAll({
+            onProgress: progress.onProgress,
+          });
+          await progress.stop();
+        }
 
-      printIndexResult(clack, result, projectPath);
-      await recordIndexTelemetry(cg, result);
+        printIndexResult(clack, result, projectPath);
+        await recordIndexTelemetry(cg, result);
 
-      if (!result.success) {
-        process.exit(1);
-      }
+        if (!result.success) {
+          process.exit(1);
+        }
 
-      clack.outro('Done');
-      cg.destroy();
+        clack.outro('Done');
+        cg.destroy();
+      } finally {
+        supervision.stop();
+      }
     } catch (err) {
       error(`Failed to index: ${err instanceof Error ? err.message : String(err)}`);
       process.exit(1);

+ 77 - 0
src/bin/command-supervision.ts

@@ -0,0 +1,77 @@
+/**
+ * Process supervision for long-running CLI commands (`index` / `init --index`).
+ *
+ * Indexing a large repo can run for a while on the main thread, and #999
+ * surfaced two ways that goes wrong when nothing is watching it:
+ *
+ *   1. **Orphaned worker.** `index` runs in a child re-exec'd with
+ *      `--liftoff-only` (the WASM-flag relaunch). Its parent blocks in
+ *      `spawnSync`, so when the parent shim is killed it cannot forward the
+ *      signal — the child keeps running, now orphaned, pinning a core. The PPID
+ *      watchdog (#277) notices the parent/host went away and exits the child.
+ *   2. **Wedged indexer.** The `#850` main-thread liveness watchdog — which
+ *      SIGKILLs a process whose event loop stops turning — was wired only into
+ *      the MCP `serve` path, so a wedged `index`/`init` was never auto-killed.
+ *
+ * Both reuse the exact mechanisms `serve` already uses; this just makes them
+ * available to a one-shot command. Best-effort and self-disabling: a missing
+ * watchdog never blocks the command from running. Both honour the same env
+ * switches as `serve` (`CODEGRAPH_NO_WATCHDOG`, `CODEGRAPH_PPID_POLL_MS=0`).
+ */
+import { installMainThreadWatchdog } from '../mcp/liveness-watchdog';
+import { supervisionLostReason, parsePpidPollMs, parseHostPpid } from '../mcp/ppid-watchdog';
+import { isProcessAlive } from '../mcp/daemon-registry';
+import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags';
+
+export interface CommandSupervision {
+  /** Tear down both watchdogs. Idempotent; call when the command finishes. */
+  stop(): void;
+}
+
+/**
+ * Install the liveness + PPID watchdogs for the duration of a CLI command.
+ * `label` is used in the shutdown notice (e.g. `"index"`). Returns a handle
+ * whose `stop()` must be called when the command completes so neither watchdog
+ * outlives it.
+ */
+export function installCommandSupervision(label: string): CommandSupervision {
+  // Liveness watchdog: a separate process that SIGKILLs us if our event loop
+  // stops turning for too long (a wedged synchronous loop). Self-disables on
+  // CODEGRAPH_NO_WATCHDOG.
+  const liveness = installMainThreadWatchdog();
+
+  // PPID watchdog: detect that the parent (or the host threaded past the
+  // relaunch shim) died and we've been orphaned, then exit instead of leaking.
+  const originalPpid = process.ppid;
+  const hostPpid = parseHostPpid(process.env[HOST_PPID_ENV]);
+  const pollMs = parsePpidPollMs(process.env.CODEGRAPH_PPID_POLL_MS);
+  let ppidTimer: ReturnType<typeof setInterval> | null = null;
+  if (pollMs > 0) {
+    ppidTimer = setInterval(() => {
+      const reason = supervisionLostReason({
+        originalPpid,
+        currentPpid: process.ppid,
+        hostPpid,
+        isAlive: isProcessAlive,
+      });
+      if (reason) {
+        try {
+          process.stderr.write(`[CodeGraph ${label}] Parent process exited (${reason}); aborting.\n`);
+        } catch { /* stderr gone with the parent — exit anyway */ }
+        process.exit(1);
+      }
+    }, pollMs);
+    // Never let the watchdog itself keep the process alive past its real work.
+    ppidTimer.unref();
+  }
+
+  let stopped = false;
+  return {
+    stop(): void {
+      if (stopped) return;
+      stopped = true;
+      if (ppidTimer) clearInterval(ppidTimer);
+      liveness?.stop();
+    },
+  };
+}

+ 46 - 7
src/extraction/index.ts

@@ -19,7 +19,7 @@ import {
 import { QueryBuilder } from '../db/queries';
 import { extractFromSource } from './tree-sitter';
 import { detectLanguage, isSourceFile, isLanguageSupported, isFileLevelOnlyLanguage, initGrammars, loadGrammarsForLanguages } from './grammars';
-import { loadExtensionOverrides, loadIncludeIgnoredPatterns } from '../project-config';
+import { loadExtensionOverrides, loadIncludeIgnoredPatterns, loadExcludePatterns } from '../project-config';
 import { isCodeGraphDataDir } from '../directory';
 import { logDebug, logWarn } from '../errors';
 import { validatePathWithinRoot, normalizePath } from '../utils';
@@ -283,6 +283,20 @@ function loadIncludeIgnoredMatcher(rootDir: string): Ignore | null {
   return patterns.length > 0 ? ignore().add(patterns) : null;
 }
 
+/**
+ * Matcher for the project's `codegraph.json` `exclude` patterns — paths to keep
+ * OUT of the index even when git-tracked, which `.gitignore` cannot do (#999).
+ * The escape hatch for a committed vendor/theme/SDK directory. Returns `null`
+ * when nothing is excluded (the zero-config default → no overhead). Matched
+ * against project-root-relative paths, so it applies uniformly across the whole
+ * workspace, including inside embedded repos (excluding `static/` means gone
+ * everywhere). Built once per scan/sync/scope operation from the scan root.
+ */
+function loadExcludeMatcher(rootDir: string): Ignore | null {
+  const patterns = loadExcludePatterns(rootDir);
+  return patterns.length > 0 ? ignore().add(patterns) : null;
+}
+
 /**
  * `git ls-files --directory` collapses a wholly-untracked/ignored directory into
  * one entry — and when the command's own cwd is such a directory (the indexed
@@ -421,12 +435,25 @@ function findNestedGitRepos(absDir: string, relPrefix: string): string[] {
 export class ScopeIgnore {
   private embedded: Array<{ root: string; matcher: Ignore }>;
   private defaults: Ignore = defaultsOnlyIgnore();
-  constructor(private rootMatcher: Ignore, embedded: Array<{ root: string; matcher: Ignore }>) {
+  constructor(
+    private rootMatcher: Ignore,
+    embedded: Array<{ root: string; matcher: Ignore }>,
+    /**
+     * Project `codegraph.json` `exclude` patterns (#999), matched against the
+     * full root-relative path. Wins over everything else — an explicit user
+     * exclude applies even to tracked files and even inside embedded repos.
+     */
+    private exclude: Ignore | null = null,
+  ) {
     // Longest root first so paths in nested embedded repos hit the innermost matcher.
     this.embedded = [...embedded].sort((a, b) => b.root.length - a.root.length);
   }
 
   ignores(rel: string): boolean {
+    // User `exclude` (#999) is checked first and against the full root-relative
+    // path: it must drop git-TRACKED paths (which `.gitignore` can't) and apply
+    // everywhere, including ancestors of embedded repos.
+    if (this.exclude && this.exclude.ignores(rel)) return true;
     for (const { root, matcher } of this.embedded) {
       if (rel.startsWith(root)) {
         const inner = rel.slice(root.length);
@@ -455,6 +482,7 @@ export function buildScopeIgnore(rootDir: string, embeddedRoots?: Iterable<strin
   return new ScopeIgnore(
     buildDefaultIgnore(rootDir),
     roots.map((root) => ({ root, matcher: buildDefaultIgnore(path.join(rootDir, root)) })),
+    loadExcludeMatcher(rootDir),
   );
 }
 
@@ -678,14 +706,14 @@ function getGitChangedFiles(rootDir: string): GitChanges | null {
     // Custom extension → language overrides from the project's codegraph.json,
     // so change detection sees the same custom-extension files the full index does.
     const overrides = loadExtensionOverrides(rootDir);
-    collectGitStatus(rootDir, '', changes, overrides, loadIncludeIgnoredMatcher(rootDir));
+    collectGitStatus(rootDir, '', changes, overrides, loadIncludeIgnoredMatcher(rootDir), loadExcludeMatcher(rootDir));
     return changes;
   } catch {
     return null;
   }
 }
 
-function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, overrides?: Record<string, Language>, includeIgnored: Ignore | null = null): void {
+function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, overrides?: Record<string, Language>, includeIgnored: Ignore | null = null, exclude: Ignore | null = null): void {
   const output = execFileSync(
     'git',
     ['status', '--porcelain', '--no-renames'],
@@ -732,6 +760,11 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, over
     // Added (`??`) / modified files inside an excluded dir must not enter the
     // index — match against the repo-relative path, same as the full scan. (#766)
     if (ig.ignores(rel)) continue;
+    // User `codegraph.json` `exclude` (#999) is project-root-relative, so it's
+    // matched against the full path — sync must not re-add a tracked file the
+    // full index now keeps out. Deletions above stay unfiltered so a file that
+    // WAS indexed before an exclude was added still cleans itself out.
+    if (exclude && exclude.ignores(filePath)) continue;
 
     if (statusCode === '??') {
       out.added.push(filePath);
@@ -747,11 +780,11 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges, over
   // and they are left alone (#970, #976), mirroring the full-index scan.
   for (const rel of untrackedDirs) {
     for (const repoRel of findNestedGitRepos(path.join(repoDir, rel), rel)) {
-      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out, overrides, includeIgnored);
+      collectGitStatus(path.join(repoDir, repoRel), prefix + repoRel, out, overrides, includeIgnored, exclude);
     }
   }
   for (const rel of findIgnoredEmbeddedRepos(repoDir, includeIgnored, prefix)) {
-    collectGitStatus(path.join(repoDir, rel), prefix + rel, out, overrides, includeIgnored);
+    collectGitStatus(path.join(repoDir, rel), prefix + rel, out, overrides, includeIgnored, exclude);
   }
 }
 
@@ -936,7 +969,13 @@ function scanDirectoryWalk(
 
   // Seed a base matcher with the built-in default ignores (merged with the root
   // .gitignore so a negation can override). Nested .gitignores still layer per-dir.
-  walk(rootDir, [{ dir: rootDir, ig: buildDefaultIgnore(rootDir) }]);
+  const baseMatchers: ScopedIgnore[] = [{ dir: rootDir, ig: buildDefaultIgnore(rootDir) }];
+  // Project `codegraph.json` `exclude` patterns (#999), rooted at the project so
+  // `isIgnored` matches them against root-relative paths — same coverage the
+  // git path gets via ScopeIgnore, for non-git projects.
+  const exclude = loadExcludeMatcher(rootDir);
+  if (exclude) baseMatchers.push({ dir: rootDir, ig: exclude });
+  walk(rootDir, baseMatchers);
   return files;
 }
 

+ 1 - 36
src/mcp/index.ts

@@ -50,18 +50,11 @@ import {
 import { connectWithHello, runLocalHandshakeProxy } from './proxy';
 import { getDaemonSocketPath } from './daemon-paths';
 import { getTelemetry } from '../telemetry';
-import { supervisionLostReason } from './ppid-watchdog';
+import { supervisionLostReason, parsePpidPollMs, parseHostPpid } from './ppid-watchdog';
 import { installMainThreadWatchdog, WatchdogHandle } from './liveness-watchdog';
 import { treatStdinFailureAsShutdown } from './stdin-teardown';
 import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags';
 
-/**
- * How often to poll `process.ppid` to detect parent process death (see #277).
- * 5s is a deliberate trade-off: the failure mode being guarded against is rare
- * (parent SIGKILL'd), and longer poll = less wakeup overhead while idle.
- */
-const DEFAULT_PPID_POLL_MS = 5000;
-
 /**
  * Env var that marks a process as the *detached daemon* itself (set by
  * {@link spawnDetachedDaemon} when it re-invokes the CLI). Without it a
@@ -94,34 +87,6 @@ const TAKEOVER_RETRY_DELAY_MS = 100;
 const DAEMON_CONNECT_MAX_RETRIES = 240;
 const DAEMON_CONNECT_RETRY_DELAY_MS = 25;
 
-/**
- * Resolve the PPID watchdog poll interval from an env override. A value of
- * `0` disables the watchdog entirely (escape hatch for embedded scenarios
- * where the parent legitimately re-parents the server on purpose). Anything
- * non-numeric or negative falls back to the default.
- */
-function parsePpidPollMs(raw: string | undefined): number {
-  if (raw === undefined || raw === '') return DEFAULT_PPID_POLL_MS;
-  const parsed = Number(raw);
-  if (!Number.isFinite(parsed)) return DEFAULT_PPID_POLL_MS;
-  if (parsed < 0) return DEFAULT_PPID_POLL_MS;
-  return Math.floor(parsed);
-}
-
-/**
- * Parse the host PID propagated across the `--liftoff-only` re-exec
- * ({@link HOST_PPID_ENV}). Returns a positive integer PID, or null when
- * unset/invalid — the direct-launch path, where the watchdog falls back to
- * `process.ppid` divergence. PIDs of 0/1 are rejected (0 = unknown, 1 = init,
- * i.e. already orphaned), so the watchdog doesn't latch onto init.
- */
-function parseHostPpid(raw: string | undefined): number | null {
-  if (raw === undefined || raw === '') return null;
-  const parsed = Number(raw);
-  if (!Number.isInteger(parsed) || parsed <= 1) return null;
-  return parsed;
-}
-
 /** Whether `CODEGRAPH_NO_DAEMON` was set to a truthy value. */
 function daemonOptOutSet(): boolean {
   const raw = process.env.CODEGRAPH_NO_DAEMON;

+ 32 - 0
src/mcp/ppid-watchdog.ts

@@ -61,3 +61,35 @@ export function supervisionLostReason(state: SupervisionState): string | null {
   }
   return null;
 }
+
+/** Default PPID poll cadence (ms). Shared by the MCP server and CLI commands. */
+export const DEFAULT_PPID_POLL_MS = 5000;
+
+/**
+ * Resolve the PPID watchdog poll interval from an env override
+ * (`CODEGRAPH_PPID_POLL_MS`). A value of `0` disables the watchdog entirely
+ * (escape hatch for embedded scenarios where the parent legitimately re-parents
+ * the process on purpose). Anything non-numeric or negative falls back to the
+ * default.
+ */
+export function parsePpidPollMs(raw: string | undefined): number {
+  if (raw === undefined || raw === '') return DEFAULT_PPID_POLL_MS;
+  const parsed = Number(raw);
+  if (!Number.isFinite(parsed)) return DEFAULT_PPID_POLL_MS;
+  if (parsed < 0) return DEFAULT_PPID_POLL_MS;
+  return Math.floor(parsed);
+}
+
+/**
+ * Parse the host PID propagated across the `--liftoff-only` re-exec
+ * (`CODEGRAPH_HOST_PPID`). Returns a positive integer PID, or null when
+ * unset/invalid — the direct-launch path, where the watchdog falls back to
+ * `process.ppid` divergence. PIDs of 0/1 are rejected (0 = unknown, 1 = init,
+ * i.e. already orphaned), so the watchdog doesn't latch onto init.
+ */
+export function parseHostPpid(raw: string | undefined): number | null {
+  if (raw === undefined || raw === '') return null;
+  const parsed = Number(raw);
+  if (!Number.isInteger(parsed) || parsed <= 1) return null;
+  return parsed;
+}

+ 56 - 2
src/project-config.ts

@@ -42,12 +42,24 @@ export interface ProjectConfig {
    * are never discovered or indexed (#970, #976).
    */
   includeIgnored?: string[];
+  /**
+   * Gitignore-style patterns for paths to keep OUT of the index — even when
+   * they are git-TRACKED, which `.gitignore` cannot do (#999). The escape hatch
+   * for a committed vendor/theme/SDK directory (e.g. a checked-in Metronic theme
+   * under `static/`) that bloats the graph and slows indexing but isn't really
+   * your code. Matched against project-root-relative paths, so a directory like
+   * `"static/"`, a double-star vendor glob, or `"assets/theme"` all work.
+   * Absent/empty (the default) excludes nothing beyond the built-in defaults
+   * and your `.gitignore`.
+   */
+  exclude?: string[];
 }
 
 /** Parsed, validated view of a project's `codegraph.json`. */
 interface ParsedConfig {
   extensions: Record<string, Language>;
   includeIgnored: string[];
+  exclude: string[];
 }
 
 interface CacheEntry {
@@ -68,6 +80,7 @@ const EMPTY_EXTENSIONS: Record<string, Language> = Object.freeze({});
 const EMPTY_CONFIG: ParsedConfig = Object.freeze({
   extensions: EMPTY_EXTENSIONS,
   includeIgnored: Object.freeze([]) as unknown as string[],
+  exclude: Object.freeze([]) as unknown as string[],
 });
 
 /**
@@ -118,8 +131,11 @@ function parseConfig(file: string): ParsedConfig {
 
   const extensions = extractExtensions(parsed, file);
   const includeIgnored = extractIncludeIgnored(parsed, file);
-  if (extensions === EMPTY_EXTENSIONS && includeIgnored.length === 0) return EMPTY_CONFIG;
-  return { extensions, includeIgnored };
+  const exclude = extractExclude(parsed, file);
+  if (extensions === EMPTY_EXTENSIONS && includeIgnored.length === 0 && exclude.length === 0) {
+    return EMPTY_CONFIG;
+  }
+  return { extensions, includeIgnored, exclude };
 }
 
 /**
@@ -172,6 +188,32 @@ function extractIncludeIgnored(parsed: object, file: string): string[] {
   return out;
 }
 
+/**
+ * Validate the `exclude` patterns: an array of non-empty gitignore-style
+ * strings naming paths to keep out of the index even when git-tracked (#999). A
+ * non-array value or a non-string/blank entry warns-and-skips; never throws.
+ * Patterns are kept verbatim (trimmed) so they match exactly as a `.gitignore`
+ * line would, against project-root-relative paths.
+ */
+function extractExclude(parsed: object, file: string): string[] {
+  const raw = (parsed as ProjectConfig).exclude;
+  if (raw === undefined) return [];
+  if (!Array.isArray(raw)) {
+    logWarn(`Ignoring "exclude" in ${PROJECT_CONFIG_FILENAME}: must be an array of gitignore-style patterns`, { file });
+    return [];
+  }
+
+  const out: string[] = [];
+  for (const entry of raw) {
+    if (typeof entry !== 'string' || !entry.trim()) {
+      logWarn(`Ignoring an "exclude" entry in ${PROJECT_CONFIG_FILENAME}: every pattern must be a non-empty string`, { file });
+      continue;
+    }
+    out.push(entry.trim());
+  }
+  return out;
+}
+
 /**
  * Load the parsed `codegraph.json` for a project, mtime-cached. A missing or
  * malformed file yields the zero-config default. One `stat` (and at most one
@@ -221,6 +263,18 @@ export function loadIncludeIgnoredPatterns(rootDir: string): string[] {
   return loadParsedConfig(rootDir).includeIgnored;
 }
 
+/**
+ * Load the validated `exclude` patterns for a project, mtime-cached.
+ *
+ * These name paths to keep OUT of the index even when git-tracked — the escape
+ * hatch for a committed vendor/theme/SDK directory `.gitignore` can't drop
+ * (#999). An empty result — the zero-config default — excludes nothing beyond
+ * the built-in defaults and the project's `.gitignore`.
+ */
+export function loadExcludePatterns(rootDir: string): string[] {
+  return loadParsedConfig(rootDir).exclude;
+}
+
 /** Test/maintenance hook: forget cached config (e.g. after rewriting it in a test). */
 export function clearProjectConfigCache(): void {
   cache.clear();

+ 45 - 0
src/resolution/name-matcher.ts

@@ -7,6 +7,33 @@
 import { Node } from '../types';
 import { UnresolvedRef, ResolvedRef, ResolutionContext } from './types';
 
+/**
+ * Ceiling on how many same-named definitions a FUZZY name-match strategy will
+ * score. A name defined more times than this is "ubiquitous" — a method/symbol
+ * re-declared across a vendored theme or SDK (e.g. `init`/`update`/`render` on
+ * every widget of a committed Metronic theme — #999). No directory-proximity or
+ * receiver-word-overlap score can reliably pick THE one true target among
+ * thousands, so the fuzzy strategies (matchByExactName's findBestMatch, and
+ * matchMethodCall Strategy 3) decline above the ceiling instead of emitting a
+ * low-confidence, almost-certainly-wrong edge. This also caps their per-ref cost
+ * at O(ceiling): without it, K same-named refs each scored K candidates — the
+ * O(K²) blow-up that pinned a core for 15-28 min at "Resolving refs … 94%" on a
+ * repo vendoring a large JS/TS theme (#999). The PRECISE strategies are
+ * unaffected: qualified-name, import-based, and class-name (Strategy 1/2)
+ * resolution all still run and resolve a ubiquitous name when the context names
+ * its exact target. Real repos top out near ~40 same-named methods, so a normal
+ * codebase never reaches this; only bulk-vendored code does. Tune via
+ * `CODEGRAPH_AMBIGUOUS_NAME_CEILING`.
+ */
+const DEFAULT_AMBIGUOUS_NAME_CEILING = 500;
+function resolveAmbiguousNameCeiling(): number {
+  const raw = process.env.CODEGRAPH_AMBIGUOUS_NAME_CEILING;
+  if (!raw) return DEFAULT_AMBIGUOUS_NAME_CEILING;
+  const parsed = Number.parseInt(raw, 10);
+  return Number.isFinite(parsed) && parsed > 0 ? parsed : DEFAULT_AMBIGUOUS_NAME_CEILING;
+}
+const AMBIGUOUS_NAME_CEILING = resolveAmbiguousNameCeiling();
+
 /**
  * Try to resolve a path-like reference (e.g., "snippets/drawer-menu.liquid")
  * by matching the filename against file nodes.
@@ -344,6 +371,15 @@ export function matchByExactName(
     };
   }
 
+  // Ubiquitous-name ceiling (#999): above it, picking one target among K
+  // same-named defs by directory proximity is unreliable AND O(K) per ref — the
+  // quadratic behind the "Resolving refs" wedge on theme/SDK-vendoring repos.
+  // Decline; the precise strategies (qualified-name, import, class-name) already
+  // ran. Falls through to fuzzy, which itself only resolves a UNIQUE candidate.
+  if (candidates.length > AMBIGUOUS_NAME_CEILING) {
+    return null;
+  }
+
   // Multiple matches - try to narrow down
   const bestMatch = findBestMatch(ref, candidates, context);
   if (bestMatch) {
@@ -1067,6 +1103,15 @@ export function matchMethodCall(
   // names like permissionEngine → PermissionRuleEngine.
   if (methodName) {
     const methodCandidates = context.getNodesByName(methodName!);
+    // Ubiquitous-method ceiling (#999): a method name re-declared across a
+    // vendored theme/SDK (Metronic's `init`/`update`/… on every widget) yields
+    // K candidates that receiver-word overlap can't reliably disambiguate —
+    // and filtering + scoring all K per call is the O(K²) cost that wedged
+    // "Resolving refs" for 15-28 min. Bail before the O(K) work; Strategy 1/2
+    // (class-name match) already had their precise shot above.
+    if (methodCandidates.length > AMBIGUOUS_NAME_CEILING) {
+      return null;
+    }
     const methods = methodCandidates.filter(
       (n) => n.kind === 'method' && n.name === methodName
     );