Ver Fonte

feat(mcp): pare default tool surface to codegraph_explore alone + redux-thunk synthesizer

Colby McHenry há 4 dias atrás
pai
commit
f82a662ddb

+ 7 - 22
.cursor/rules/codegraph.mdc

@@ -1,37 +1,22 @@
 ---
-description: CodeGraph MCP usage guide — when to use which tool
+description: CodeGraph MCP usage guide — one tool, codegraph_explore
 alwaysApply: true
 ---
 <!-- CODEGRAPH_START -->
 ## CodeGraph
 
-This project has a CodeGraph MCP server (`codegraph_*` tools) configured. CodeGraph is a tree-sitter-parsed knowledge graph of every symbol, edge, and file. Reads are sub-millisecond and return structural information grep cannot.
+This project has a CodeGraph MCP server configured, exposing a single tool: `codegraph_explore`. CodeGraph is a tree-sitter-parsed knowledge graph of every symbol, edge, and file. Reads are sub-millisecond and return structural information grep cannot.
 
-### When to prefer codegraph over native search
+### Use codegraph_explore instead of reading files
 
-Use codegraph for **structural** questions — what calls what, what would break, where is X defined, what is X's signature. Use native grep/read only for **literal text** queries (string contents, comments, log messages) or after you already have a specific file open.
-
-| Question | Tool |
-|---|---|
-| "Where is X defined?" / "Find symbol named X" | `codegraph_search` |
-| "What calls function Y?" | `codegraph_callers` |
-| "What does Y call?" | `codegraph_callees` |
-| "How does X reach/become Y? / trace the flow from X to Y" | `codegraph_trace` (one call = the whole path, incl. callback/React/JSX dynamic hops) |
-| "What would break if I changed Z?" | `codegraph_impact` |
-| "Show me Y's signature / source / docstring" | `codegraph_node` |
-| "Give me focused context for a task/area" | `codegraph_context` |
-| "See several related symbols' source at once" | `codegraph_explore` |
-| "What files exist under path/" | `codegraph_files` |
-| "Is the index healthy?" | `codegraph_status` |
+Reach for `codegraph_explore` before grep/find or Read for any **structural** question — how does X work, how does X reach Y, what calls what, where is X defined, or surveying an area. It takes a natural-language question or a bag of symbol/file names and returns the relevant symbols' **verbatim, line-numbered source** grouped by file (the same `<n>\t<line>` shape Read gives you, safe to Edit from), plus the call paths between them — including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow — and a blast-radius summary of what depends on them. Name a file or symbol in the query to read its current source.
 
 ### Rules of thumb
 
-- **Answer directly — don't delegate exploration.** For "how does X work" / architecture questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. For a specific **flow** ("how does X reach Y") start with `codegraph_trace` from→to — one call returns the whole path with dynamic hops bridged — then ONE `codegraph_explore` for the bodies; don't rebuild the path with `codegraph_search` + `codegraph_callers`. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
+- **Answer directly — don't delegate exploration.** ONE `codegraph_explore` usually answers the whole question; follow up with another `codegraph_explore` naming more specific symbols if you need more. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
-- **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
-- **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.
-- **Don't loop `codegraph_node` over many symbols** — one `codegraph_explore` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more.
-- **Index lag — check the staleness banner, don't guess a wait.** When a codegraph response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Files NOT in that banner are fresh and codegraph is authoritative for them. `codegraph_status` also lists pending files under "Pending sync".
+- **Don't grep or Read first** to find or understand indexed code — one `codegraph_explore` returns the relevant source in a single round-trip. Reach for raw Read/Grep only to confirm a specific detail codegraph didn't cover, or for what it doesn't index (configs, docs).
+- **Index lag — check the staleness banner, don't guess a wait.** When a codegraph response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Files NOT in that banner are fresh and codegraph is authoritative for them.
 
 ### If `.codegraph/` doesn't exist
 

+ 12 - 20
README.md

@@ -262,7 +262,7 @@ agent writes src/Widget.ts
   → next agent query sees it
 ```
 
-**Verify any time** with `codegraph_status` (via MCP) or `codegraph status` (CLI). If anything is pending, you'll see a `### Pending sync:` section naming the files and their edit age.
+**Verify any time** with `codegraph status` (CLI). If anything is pending, you'll see a `### Pending sync:` section naming the files and their edit age.
 
 The handful of cases where manual `codegraph sync` makes sense: the watcher is disabled (sandboxed environments, or `CODEGRAPH_NO_DAEMON=1`), or you're scripting against the index outside an agent session and want a pre-flight sync at the start of your script.
 
@@ -300,7 +300,7 @@ CodeGraph detects web-framework routing files and emits `route` nodes linked by
 
 ## Mixed iOS / React Native / Expo bridging
 
-Real iOS and React Native codebases live across multiple languages — a Swift caller invokes an Objective-C selector that's been auto-bridged, a JS file calls into a native module via the React Native bridge, a JSX component delegates to a native view manager. Static tree-sitter extraction stops at each language boundary. CodeGraph bridges them so `trace`, `callers`, `callees`, and `impact` connect end-to-end across the gap.
+Real iOS and React Native codebases live across multiple languages — a Swift caller invokes an Objective-C selector that's been auto-bridged, a JS file calls into a native module via the React Native bridge, a JSX component delegates to a native view manager. Static tree-sitter extraction stops at each language boundary. CodeGraph bridges them so `codegraph_explore` connects the flow end-to-end across the gap — call paths and blast radius cross the boundary instead of stopping at it.
 
 | Boundary | JS / Swift side | Native side | How |
 |---|---|---|---|
@@ -339,7 +339,7 @@ The installer will:
 - Ask which agent(s) to configure — auto-detects installed ones from: **Claude Code**, **Cursor**, **Codex CLI**, **opencode**, **Hermes Agent**, **Gemini CLI**, **Antigravity IDE**, **Kiro**
 - Prompt to install `codegraph` on your PATH (so agents can launch the MCP server)
 - Ask whether configs apply to all your projects or just this one
-- Write each chosen agent's MCP server config, plus a small marker-fenced CodeGraph section in the agent's instructions file (`CLAUDE.md` / `AGENTS.md` / `GEMINI.md`) — that's how subagents and non-MCP agents learn the `codegraph explore` / `codegraph node` commands, since the MCP server's own guidance only reaches the main agent. Removed cleanly by `codegraph uninstall`.
+- Write each chosen agent's MCP server config, plus a small marker-fenced CodeGraph section in the agent's instructions file (`CLAUDE.md` / `AGENTS.md` / `GEMINI.md`) — that's how subagents and non-MCP agents learn the `codegraph explore` command, since the MCP server's own guidance only reaches the main agent. Removed cleanly by `codegraph uninstall`.
 - Set up auto-allow permissions when Claude Code is one of the targets
 - Initialize your current project (local installs only)
 
@@ -401,19 +401,14 @@ npm install -g @colbymchenry/codegraph
 {
   "permissions": {
     "allow": [
-      "mcp__codegraph__codegraph_search",
-      "mcp__codegraph__codegraph_explore",
-      "mcp__codegraph__codegraph_callers",
-      "mcp__codegraph__codegraph_callees",
-      "mcp__codegraph__codegraph_impact",
-      "mcp__codegraph__codegraph_node",
-      "mcp__codegraph__codegraph_status",
-      "mcp__codegraph__codegraph_files"
+      "mcp__codegraph__*"
     ]
   }
 }
 ```
 
+<sub>One wildcard auto-approves every CodeGraph tool — `codegraph_explore` is the only one listed by default, but if you re-enable others via `CODEGRAPH_MCP_TOOLS` they're already permitted, no prompt.</sub>
+
 </details>
 
 <details>
@@ -422,11 +417,11 @@ npm install -g @colbymchenry/codegraph
 CodeGraph's MCP server delivers its usage guidance to your agent **automatically**, in the MCP `initialize` response. In short, it tells the agent to:
 
 - **Answer structural questions directly with CodeGraph** — it *is* the pre-built index, so a grep/read loop just repeats work it already did. Treat the returned source as already read.
-- **Pick the tool by intent:** `codegraph_explore` for almost anything — "how does X work", a flow/"how does X reach Y", or surveying an area (one call returns the relevant symbols' source grouped by file); `codegraph_search` to just locate a symbol; `codegraph_callers` for every call site (including callback registrations); `codegraph_node` for one symbol's full source + callers, or to read a file like the Read tool.
+- **Reach for `codegraph_explore` for almost anything** — "how does X work", a flow/"how does X reach Y", or surveying an area. One call returns the relevant symbols' verbatim source grouped by file, the call paths between them (dynamic-dispatch hops included), and a blast-radius summary. Name a file or symbol in the query to read its current line-numbered source.
 - **Trust the results — don't re-verify with grep**, and check the staleness banner after edits.
 - In a workspace with no index, CodeGraph announces itself inactive and serves no tools — indexing stays your decision.
 
-The exact text is `src/mcp/server-instructions.ts` — the single source of truth for the main agent. Because subagents and non-MCP harnesses never see the MCP guidance, the installer also writes a four-line marker-fenced section into the agent's instructions file pointing at the `codegraph explore` / `codegraph node` CLI equivalents.
+The exact text is `src/mcp/server-instructions.ts` — the single source of truth for the main agent. Because subagents and non-MCP harnesses never see the MCP guidance, the installer also writes a short marker-fenced section into the agent's instructions file pointing at the `codegraph explore` CLI equivalent.
 
 </details>
 
@@ -447,7 +442,7 @@ The exact text is `src/mcp/server-instructions.ts` — the single source of trut
 ┌───────────────────────────────────────────────────────────────────┐
 │                        CodeGraph MCP Server                       │
 │                                                                   │
-│       explore · search · callers · callees · impact · node       
+│ explore  ·  one call → verbatim source + call flow + blast radius
 │                                 │                                 │
 │                                 ▼                                 │
 │                       SQLite knowledge graph                      │
@@ -524,16 +519,13 @@ fi
 
 ## MCP Tools
 
-When running as an MCP server, CodeGraph exposes a focused set of four tools — measured agent behavior showed a leaner list steers agents to the right tool and saves context every session:
+When running as an MCP server, CodeGraph exposes a **single tool** — `codegraph_explore`. Measured agent behavior showed that one strong tool steers agents better than a menu of narrower ones — fewer mis-picks, and it saves context every session:
 
 | Tool | Purpose |
 |------|---------|
-| `codegraph_explore` | **Primary.** Answer almost any question in one call — "how does X work", a flow ("how does X reach Y"), or surveying an area — returning the relevant symbols' verbatim source grouped by file, plus a relationship map and blast radius. Surfaces dynamic-dispatch hops (callbacks, React re-render, interface→impl) grep can't follow. |
-| `codegraph_node` | One symbol's full source + caller/callee trail (every overload for an ambiguous name) — or pass a file path to **read a whole file like the Read tool** (same line-numbered output, `offset`/`limit`), with its dependents attached. |
-| `codegraph_search` | Find symbols by name across the codebase |
-| `codegraph_callers` | Every call site of a function — including where it's registered as a callback — with one section per definition when several share a name |
+| `codegraph_explore` | Answer almost any question in one call — "how does X work", a flow ("how does X reach Y"), or surveying an area — returning the relevant symbols' verbatim source grouped by file, plus the call paths between them and a blast-radius summary. Surfaces dynamic-dispatch hops (callbacks, React re-render, interface→impl) grep can't follow. Name a file or symbol in the query to read its current line-numbered source, the same shape the Read tool gives you. |
 
-Four more tools (`codegraph_callees`, `codegraph_impact`, `codegraph_files`, `codegraph_status`) stay fully functional but unlisted by default — measured across eval runs, agents never or rarely picked them, and their information already arrives inline on the four above (explore's blast-radius section, node's dependents note, a symbol's body as its callee list). Re-enable any of them with the `CODEGRAPH_MCP_TOOLS` environment variable (e.g. `CODEGRAPH_MCP_TOOLS=explore,node,search,callers,impact`), or use their CLI equivalents (`codegraph callees` / `impact` / `files` / `status`).
+The other tools (`codegraph_node`, `codegraph_search`, `codegraph_callers`, `codegraph_callees`, `codegraph_impact`, `codegraph_files`, `codegraph_status`) stay fully functional but **unlisted by default** — everything they return already arrives inline on `codegraph_explore` (its blast-radius section, the relationship map, a symbol's body as its callee list). Re-enable any of them for the MCP surface with the `CODEGRAPH_MCP_TOOLS` environment variable (e.g. `CODEGRAPH_MCP_TOOLS=explore,node,search,callers`), or use their CLI equivalents (`codegraph node` / `query` / `callers` / `callees` / `impact` / `files` / `status`).
 
 In a workspace with no `.codegraph/` index, the server announces itself inactive and lists **no** tools — agents work normally with their built-in tools, and indexing stays your decision.
 

+ 1 - 1
__tests__/installer-targets.test.ts

@@ -1031,7 +1031,7 @@ describe('Installer targets — partial-state idempotency', () => {
     // The unrelated GitKraken hook survives untouched.
     expect(stopCommands.some((c: string) => c.includes('gk') && c.includes('ai hook run'))).toBe(true);
     // Permissions still written as normal alongside the cleanup.
-    expect(after.permissions?.allow).toContain('mcp__codegraph__codegraph_search');
+    expect(after.permissions?.allow).toContain('mcp__codegraph__*');
   });
 
   it('claude: cleanupLegacyHooks preserves a sibling hook sharing our matcher group', () => {

+ 7 - 13
__tests__/mcp-tool-allowlist.test.ts

@@ -17,18 +17,13 @@ describe('CODEGRAPH_MCP_TOOLS allowlist', () => {
 
   const listed = () => new ToolHandler(null).getTools().map(t => t.name).sort();
 
-  it('exposes the default 4-tool surface when unset', () => {
+  it('exposes ONLY codegraph_explore by default when unset', () => {
     delete process.env[ENV];
-    // The default set (see DEFAULT_MCP_TOOLS): explore + node are the
-    // validated workhorses, search the cheap lookup, callers the one
-    // irreplaceable enumerator. callees/impact/files/status stay defined
-    // and executable but unlisted — impact appeared in ZERO recorded runs.
-    expect(listed()).toEqual([
-      'codegraph_callers',
-      'codegraph_explore',
-      'codegraph_node',
-      'codegraph_search',
-    ]);
+    // The default set (see DEFAULT_MCP_TOOLS) is pared to explore alone — the one
+    // tool that earns its place (verbatim source grouped by file, plus the reasoned
+    // flow map under the offload). node/search/callers/callees/impact/files/status
+    // stay defined and executable but unlisted; CODEGRAPH_MCP_TOOLS re-enables them.
+    expect(listed()).toEqual(['codegraph_explore']);
   });
 
   it('re-enables an unlisted tool via the allowlist (impact)', () => {
@@ -48,8 +43,7 @@ describe('CODEGRAPH_MCP_TOOLS allowlist', () => {
 
   it('treats an empty/whitespace value as unset (default surface)', () => {
     process.env[ENV] = '   ';
-    expect(listed()).toHaveLength(4);
-    expect(listed()).toContain('codegraph_explore');
+    expect(listed()).toEqual(['codegraph_explore']);
   });
 
   it('rejects a disabled tool on execute (defense in depth)', async () => {

+ 7 - 7
__tests__/mcp-unindexed.test.ts

@@ -116,7 +116,7 @@ describe('Unindexed-workspace session policy', () => {
     expect(instructions).toMatch(/inactive/i);
     expect(instructions).toMatch(/codegraph init/);
     // The full playbook must NOT be sent into a session where every call fails
-    expect(instructions).not.toMatch(/Tool selection by intent/);
+    expect(instructions).not.toMatch(/How to query/);
     expect(instructions).not.toMatch(/codegraph_explore/);
   });
 
@@ -128,7 +128,7 @@ describe('Unindexed-workspace session policy', () => {
     expect((res.result as { tools: unknown[] }).tools).toEqual([]);
   });
 
-  it('an INDEXED workspace still gets the full playbook and all tools', async () => {
+  it('an INDEXED workspace still gets the full playbook and the explore tool', async () => {
     fs.writeFileSync(path.join(tempDir, 'index.ts'), 'export function hello(): string { return "hi"; }\n');
     const cg = await CodeGraph.init(tempDir, { index: true });
     cg.close();
@@ -136,15 +136,15 @@ describe('Unindexed-workspace session policy', () => {
     child = spawnServer(tempDir);
     const init = await request(child, { id: 0, method: 'initialize', params: initializeParams(tempDir) });
     const instructions = (init.result as { instructions: string }).instructions;
-    expect(instructions).toMatch(/Tool selection by intent/);
+    expect(instructions).toMatch(/How to query/);
     expect(instructions).not.toMatch(/inactive/i);
 
     const list = await request(child, { id: 1, method: 'tools/list' });
     const tools = (list.result as { tools: Array<{ name: string }> }).tools;
-    // A 1-file project triggers the pre-existing tiny-repo tool gating (a
-    // reduced core set) — the contract under test is "indexed → tools are
-    // PRESENT", in contrast to the unindexed empty list above.
-    expect(tools.length).toBeGreaterThanOrEqual(3);
+    // The default surface is pared to explore alone (see DEFAULT_MCP_TOOLS) — the
+    // contract under test is "indexed → tools are PRESENT", in contrast to the
+    // unindexed empty list above.
+    expect(tools.length).toBeGreaterThanOrEqual(1);
     expect(tools.map((t) => t.name)).toContain('codegraph_explore');
   });
 });

+ 82 - 0
__tests__/redux-thunk-synthesizer.test.ts

@@ -0,0 +1,82 @@
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'node:fs';
+import * as path from 'node:path';
+import * as os from 'node:os';
+import { CodeGraph } from '../src';
+
+/**
+ * End-to-end test for the redux-thunk dispatch-chain synthesizer.
+ *
+ * `createAsyncThunk(prefix, async (a, api) => {...})` passes the async body as an argument, so
+ * tree-sitter never makes it its own function node — the thunk `constant`'s body calls (incl.
+ * `dispatch(nextThunk(...))`) are orphaned and `callees(thunk)` is empty. Verify the synthesizer
+ * body-scans each thunk constant and links it → each dispatched thunk, so the chain
+ * `outer → inner → deep` connects end-to-end; and that a non-thunk constant is skipped.
+ */
+describe('redux-thunk synthesizer', () => {
+  let dir: string;
+  beforeEach(() => {
+    dir = fs.mkdtempSync(path.join(os.tmpdir(), 'redux-thunk-fixture-'));
+  });
+  afterEach(() => {
+    fs.rmSync(dir, { recursive: true, force: true });
+  });
+
+  it('links each thunk constant to the thunks it dispatches, and skips non-thunks', async () => {
+    fs.writeFileSync(
+      path.join(dir, 'package.json'),
+      JSON.stringify({ name: 'app', dependencies: { '@reduxjs/toolkit': '^2' } })
+    );
+    fs.writeFileSync(
+      path.join(dir, 'thunks.ts'),
+      `import { createAsyncThunk } from '@reduxjs/toolkit';
+
+export const deepThunk = createAsyncThunk('app/deep', async (n: number) => {
+  return n * 2;
+});
+
+export const innerThunk = createAsyncThunk('app/inner', async (n: number, { dispatch }) => {
+  return dispatch(deepThunk(n));
+});
+
+export const outerThunk = createAsyncThunk('app/outer', async (n: number, { dispatch }) => {
+  await dispatch(innerThunk(n));
+});
+
+// Non-thunk constant that only MENTIONS dispatch in a string — must be skipped.
+export const notAThunk = 'dispatch(innerThunk())';
+`
+    );
+
+    const cg = await CodeGraph.init(dir, { silent: true });
+    await cg.indexAll();
+
+    const db = (cg as any).db.db;
+    const rows = db
+      .prepare(
+        `SELECT s.name source_name, s.kind source_kind, t.name target_name,
+                json_extract(e.metadata,'$.via') via,
+                json_extract(e.metadata,'$.registeredAt') registeredAt
+         FROM edges e
+         JOIN nodes s ON s.id = e.source
+         JOIN nodes t ON t.id = e.target
+         WHERE json_extract(e.metadata,'$.synthesizedBy') = 'redux-thunk'`
+      )
+      .all();
+    cg.close?.();
+
+    // The dispatch chain connects: outer → inner → deep.
+    const pairs = new Set(rows.map((r: any) => `${r.source_name}>${r.target_name}`));
+    expect(pairs.has('outerThunk>innerThunk')).toBe(true);
+    expect(pairs.has('innerThunk>deepThunk')).toBe(true);
+
+    // Sources are thunk constants; the non-thunk string constant is never a source.
+    expect(rows.every((r: any) => r.source_kind === 'constant')).toBe(true);
+    expect(rows.some((r: any) => r.source_name === 'notAThunk')).toBe(false);
+
+    // Edges are 'calls' with the wiring site surfaced for the agent.
+    const outer = rows.find((r: any) => r.source_name === 'outerThunk');
+    expect(outer.via).toBe('innerThunk');
+    expect(outer.registeredAt).toMatch(/thunks\.ts:\d+/);
+  });
+});

+ 108 - 0
scripts/agent-eval/offload-eval-effort.mjs

@@ -0,0 +1,108 @@
+#!/usr/bin/env node
+// Effort A/B — does CODEGRAPH_OFFLOAD_EFFORT=high improve offload SYNTHESIS FIDELITY vs low?
+// Probe-based (no agent): for each repo × effort × rep, run codegraph_explore with the offload
+// ON on the canonical question, capture the synthesized answer + AI tokens/cost/latency, then
+// Sonnet-judge that answer's fidelity vs source-verified ground truth. Isolates the synthesis
+// from agent/adoption noise. Requires `codegraph login` (managed offload) + indexed repos.
+//
+// Env: REPS (default 3) · CG_ENGINE (engine repo) · AGENT_EVAL_OUT (repos under /repos) · CONC (judge concurrency)
+import { pathToFileURL, fileURLToPath } from 'node:url';
+import { resolve, dirname, join } from 'node:path';
+import { readFileSync, writeFileSync, existsSync, rmSync } from 'node:fs';
+import { execFile } from 'node:child_process';
+import { tmpdir } from 'node:os';
+
+const HERE = dirname(fileURLToPath(import.meta.url));
+const ENGINE = process.env.CG_ENGINE || resolve(HERE, '..', '..');
+const OUT = process.env.AGENT_EVAL_OUT || '/tmp/cg-offload-eval';
+const REPOS = join(OUT, 'repos');
+const GT = JSON.parse(readFileSync(resolve(HERE, 'offload-eval-ground-truth.json'), 'utf8'));
+const REPS = Number(process.env.REPS || 3);
+const CONC = Number(process.env.CONC || 4);
+const EFFORTS = (process.env.EFFORTS_FILTER || 'low,high').split(',');
+const ONLY = process.env.REPOS_FILTER ? new Set(process.env.REPOS_FILTER.split(',')) : null;
+const TIER = { mtkruto: 'small', postybirb: 'medium', shapeshift: 'complex', trezor: 'large' };
+
+const load = async (rel) => import(pathToFileURL(resolve(ENGINE, rel)).href);
+const idx = await load('dist/index.js');
+const toolsMod = await load('dist/mcp/tools.js');
+const CodeGraph = idx.default?.default ?? idx.default ?? idx.CodeGraph;
+const ToolHandler = toolsMod.ToolHandler ?? toolsMod.default?.ToolHandler;
+if (typeof CodeGraph?.openSync !== 'function' || typeof ToolHandler !== 'function') {
+  console.error('could not load engine from', ENGINE); process.exit(2);
+}
+
+const fidPrompt = (gt, ans) => `You are scoring the FIDELITY of a machine-synthesized code-exploration answer against verified ground truth. Do NOT use any tools.
+
+QUESTION: ${gt.question}
+
+VERIFIED GROUND TRUTH (the actual call path + files):
+${gt.truth}
+
+SYNTHESIZED ANSWER (to score):
+${ans || '(empty)'}
+
+Judge: (1) is the traced call path correct vs ground truth? (2) are the cited files/symbols correct (not fabricated)? (3) if it gave a "Coverage:" verdict, was it honest? A confident WRONG trace is the worst outcome — penalize it harder than an honest partial.
+Output ONLY minified JSON: {"verdict":"pass|partial|fail","score":<0-100>,"fabrication":<true|false>,"coverageHonest":<true|false>,"note":"<=20 words"}`;
+
+const askJudge = (prompt) => new Promise((res) => {
+  execFile('claude', ['-p', prompt, '--model', 'sonnet', '--effort', 'high', '--max-budget-usd', '0.5',
+    '--strict-mcp-config', '--mcp-config', '{"mcpServers":{}}'],
+    { cwd: OUT, maxBuffer: 1 << 24, timeout: 120000 }, (err, stdout) => {
+      const m = (stdout || '').match(/\{[\s\S]*\}/);
+      if (!m) return res({ verdict: 'error', score: null, note: (err ? err.message : 'no json').slice(0, 60) });
+      try { res(JSON.parse(m[0])); } catch { res({ verdict: 'error', score: null }); }
+    });
+});
+
+// ---- 1. Probe: collect synthesized answers at each effort -------------------
+const records = [];
+for (const repo of Object.keys(GT)) {
+  if (ONLY && !ONLY.has(repo)) continue;
+  const dir = join(REPOS, repo);
+  if (!existsSync(join(dir, '.codegraph'))) { console.error('skip (not indexed):', repo); continue; }
+  const cg = CodeGraph.openSync(dir);
+  const h = new ToolHandler(cg);
+  for (const effort of EFFORTS) {
+    for (let rep = 1; rep <= REPS; rep++) {
+      process.env.CODEGRAPH_OFFLOAD_EFFORT = effort;
+      const usageLog = join(tmpdir(), `effort-${repo}-${effort}-${rep}.jsonl`);
+      try { rmSync(usageLog); } catch { /* none */ }
+      process.env.CODEGRAPH_OFFLOAD_USAGE_LOG = usageLog;
+      let answer = '';
+      try { answer = (await h.execute('codegraph_explore', { query: GT[repo].question }))?.content?.[0]?.text ?? ''; }
+      catch (e) { console.error(`  ${repo}/${effort}#${rep} explore failed: ${e?.message}`); }
+      const fired = /Synthesized by CodeGraph/.test(answer);
+      const ai = { tokens: 0, cost: 0, ms: 0 };
+      if (existsSync(usageLog)) for (const e of readFileSync(usageLog, 'utf8').split('\n').filter(Boolean).map(JSON.parse)) {
+        ai.tokens += e.totalTokens || 0; ai.cost += e.costUsd || 0; ai.ms += e.ms || 0;
+      }
+      records.push({ repo, tier: TIER[repo], effort, rep, fired, ai, answer });
+      console.error(`  ${repo}/${effort}#${rep}: fired=${fired} ${ai.tokens}tok $${ai.cost.toFixed(4)} ${ai.ms}ms`);
+    }
+  }
+  try { cg.close?.(); } catch { /* none */ }
+}
+
+// ---- 2. Judge fidelity (concurrency) ---------------------------------------
+console.error(`\njudging ${records.length} answers (concurrency ${CONC})...`);
+let done = 0;
+const q = [...records];
+async function worker() { while (q.length) { const r = q.shift(); r.fid = await askJudge(fidPrompt(GT[r.repo], r.answer)); console.error(`  [${++done}/${records.length}] ${r.repo}/${r.effort}#${r.rep}: ${r.fid.verdict} ${r.fid.score ?? ''}`); } }
+await Promise.all(Array.from({ length: CONC }, worker));
+writeFileSync(join(OUT, 'effort-results.jsonl'), records.map((r) => JSON.stringify(r)).join('\n') + '\n');
+
+// ---- 3. Aggregate: low vs high per repo ------------------------------------
+const med = (a) => { a = a.filter((x) => x != null).sort((x, y) => x - y); return a.length ? (a.length % 2 ? a[(a.length - 1) / 2] : (a[a.length / 2 - 1] + a[a.length / 2]) / 2) : null; };
+console.log(`\n${'='.repeat(80)}\nEFFORT A/B — offload synthesis fidelity (probe, n=${REPS}/cell)\n${'='.repeat(80)}`);
+console.log(`${'repo'.padEnd(11)} ${'tier'.padEnd(8)} ${'effort'.padEnd(6)} fired  ${'fid(med)'.padStart(8)} ${'fab%'.padStart(5)} ${'AItok'.padStart(7)} ${'AIcost'.padStart(8)} ${'ms(med)'.padStart(8)}`);
+for (const repo of Object.keys(GT)) {
+  for (const effort of EFFORTS) {
+    const rs = records.filter((r) => r.repo === repo && r.effort === effort);
+    if (!rs.length) continue;
+    const fids = rs.map((r) => r.fid?.score).filter((x) => x != null);
+    const fab = rs.filter((r) => r.fid?.fabrication === true).length;
+    console.log(`${repo.padEnd(11)} ${TIER[repo].padEnd(8)} ${effort.padEnd(6)} ${rs.filter((r) => r.fired).length}/${rs.length}   ${String(med(fids) ?? '—').padStart(8)} ${String(Math.round(100 * fab / rs.length) + '%').padStart(5)} ${String(Math.round(med(rs.map((r) => r.ai.tokens)) / 1000) + 'k').padStart(7)} ${('$' + (med(rs.map((r) => r.ai.cost)) ?? 0).toFixed(4)).padStart(8)} ${String(med(rs.map((r) => r.ai.ms)) ?? '—').padStart(8)}`);
+  }
+}
+console.log('');

+ 5 - 1
scripts/agent-eval/offload-eval-metrics.mjs

@@ -43,7 +43,11 @@ for (const line of lines) {
       const text = Array.isArray(b.content)
         ? b.content.map(c => (typeof c === 'string' ? c : c.text || '')).join('')
         : (typeof b.content === 'string' ? b.content : '');
-      if (/Synthesized by CodeGraph/.test(text)) { offloadAnswers.push(text); exploreResults++; }
+      // An offload answer is either the 'plain'/'report' synthesis (carries the
+      // "Synthesized by CodeGraph" footer) or a 'refs' answer (carries the re-expanded
+      // "### Referenced source — verbatim" appendix). A refs call that cited nothing
+      // valid falls back to RAW source, which is correctly counted as a raw explore below.
+      if (/Synthesized by CodeGraph|### Referenced source — verbatim/.test(text)) { offloadAnswers.push(text); exploreResults++; }
       else if (/Found \d+ symbols? across|## Exploration:/.test(text)) exploreResults++;
     }
   }

+ 50 - 0
scripts/agent-eval/offload-eval-refs1.sh

@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# ONE offload run on ONE indexed repo at a given offload STYLE (plain|refs), so we can
+# watch a single agent transcript at a time (the user's one-run-at-a-time methodology).
+# The OFFLOAD reasoning runs in the prewarmed DAEMON process, so the style env must be
+# set on BOTH the daemon and the client MCP config. Writes one metrics line to RESULTS
+# and leaves the raw stream-json at $RUNS/<repo>-<style>-<n>.jsonl for inspection.
+#
+# Usage: offload-eval-refs1.sh <indexed-repo> <style> <n> "<question>"
+set -uo pipefail
+HERE="$(cd "$(dirname "$0")" && pwd)"; ENGINE="$(cd "$HERE/../.." && pwd)"; BIN="$ENGINE/dist/bin/codegraph.js"
+OUT="${AGENT_EVAL_OUT:-/tmp/cg-offload-eval}"; RUNS="$OUT/runs"; EXTRACT="$HERE/offload-eval-metrics.mjs"
+TARGET="${1:?repo}"; STYLE="${2:?style}"; N="${3:?run-tag}"; Q="${4:?question}"
+RESULTS="${RESULTS:-$OUT/results-refs.jsonl}"; REPO=$(basename "$TARGET"); TARGET=$(cd "$TARGET" && pwd -P)
+mkdir -p "$RUNS"; command -v claude >/dev/null || { echo "no claude"; exit 1; }
+USAGE="$RUNS/$REPO-$STYLE-usage.jsonl"; : > "$USAGE"
+CFG="$RUNS/mcp-$REPO-$STYLE.json"
+# `raw` is a pseudo-style: codegraph attached but the offload DISABLED (the ceiling —
+# verbatim source, no reasoning model). Any other value is an offload style (plain|refs).
+if [ "$STYLE" = "raw" ]; then
+  DAEMON_ENV="CODEGRAPH_OFFLOAD_DISABLE=1"
+  printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","CODEGRAPH_OFFLOAD_DISABLE=1","node","%s","serve","--mcp","--path","%s"]}}}' \
+    "$BIN" "$TARGET" > "$CFG"
+  USAGE="-"
+else
+  DAEMON_ENV="CODEGRAPH_OFFLOAD_STYLE=$STYLE CODEGRAPH_OFFLOAD_USAGE_LOG=$USAGE"
+  printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","CODEGRAPH_OFFLOAD_STYLE=%s","CODEGRAPH_OFFLOAD_USAGE_LOG=%s","node","%s","serve","--mcp","--path","%s"]}}}' \
+    "$STYLE" "$USAGE" "$BIN" "$TARGET" > "$CFG"
+fi
+
+# Prewarm a persistent daemon carrying the SAME offload config (it does the reasoning).
+pkill -9 -f "serve --mcp --path $TARGET" 2>/dev/null; rm -f "$TARGET/.codegraph/daemon.sock" 2>/dev/null; sleep 0.6
+env $DAEMON_ENV CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 \
+  node "$BIN" serve --mcp --path "$TARGET" </dev/null >/dev/null 2>&1 &
+node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$TARGET" \
+  && echo "daemon warm ($STYLE)" || echo "WARN daemon never bound"
+
+tag="$REPO-$STYLE-$N"
+echo "== run $tag =="
+# DISALLOW (optional): block tools that confound the offload-sufficiency signal —
+# chiefly "Agent" (sub-agent delegation: the spawned Explore subagent has low MCP
+# salience, ignores codegraph, and thrashes via Bash+Read, making the A/B noise).
+( cd "$TARGET" && claude -p "$Q" --output-format stream-json --verbose --permission-mode bypassPermissions \
+    --model "${MODEL:-sonnet}" --effort "${EFFORT:-high}" --max-budget-usd 4 \
+    ${DISALLOW:+--disallowedTools "$DISALLOW"} \
+    --strict-mcp-config --mcp-config "$CFG" </dev/null > "$RUNS/$tag.jsonl" 2>"$RUNS/$tag.err" )
+node "$EXTRACT" --run "$RUNS/$tag.jsonl" --usage "$USAGE" --arm "offload-$STYLE" --rep "$N" \
+    --repo "$REPO" --tier "complex" --q "$Q" >> "$RESULTS"
+node -e 'const o=JSON.parse(require("fs").readFileSync(process.argv[1],"utf8").trim().split("\n").pop());console.log(`  [${o.arm} #${o.rep}] ${o.durationSec}s | main $${o.costUsdMain} ${o.tokBillable} tok | read=${o.read} grep=${o.grep} explore=${o.explore} offload=${o.offloadFired} | AI ${o.ai.calls}call/${o.ai.totalTokens}tok/$${o.ai.costUsd.toFixed(4)} | ok=${o.ok}`)' "$RESULTS"
+pkill -9 -f "serve --mcp --path $TARGET" 2>/dev/null; rm -f "$TARGET/.codegraph/daemon.sock" 2>/dev/null
+echo "raw transcript: $RUNS/$tag.jsonl"

+ 4 - 4
src/installer/instructions-template.ts

@@ -17,8 +17,8 @@
  *    runs without this block, and consistently with it — including runs
  *    with zero Read/grep fallback.
  *  - **Non-MCP harnesses** — agents with no MCP client at all can still
- *    run the `codegraph explore` / `codegraph node` CLI, which prints the
- *    same output as the MCP tools.
+ *    run the `codegraph explore` CLI, which prints the same output as the
+ *    MCP tool.
  *
  * Keep this block SHORT. The main agent reads it every turn on top of the
  * server instructions — the #529 duplication-cost argument still bounds
@@ -44,8 +44,8 @@ export const CODEGRAPH_INSTRUCTIONS_BLOCK = `${CODEGRAPH_SECTION_START}
 
 In repositories indexed by CodeGraph (a \`.codegraph/\` directory exists at the repo root), reach for it BEFORE grep/find or reading files when you need to understand or locate code:
 
-- **MCP tools** (when available): \`codegraph_explore\` answers most code questions in one call — the relevant symbols' verbatim source plus the call paths between them. \`codegraph_node\` returns one symbol's source + callers, or reads a whole file with line numbers. If the tools are listed but deferred, load them by name via tool search.
-- **Shell** (always works): \`codegraph explore "<symbol names or question>"\` and \`codegraph node <symbol-or-file>\` print the same output.
+- **MCP tool** (when available): \`codegraph_explore\` answers most code questions in one call — the relevant symbols' verbatim source plus the call paths between them, including dynamic-dispatch hops grep can't follow. Name a file or symbol in the query to read its current line-numbered source. If it's listed but deferred, load it by name via tool search.
+- **Shell** (always works): \`codegraph explore "<symbol names or question>"\` prints the same output.
 
 If there is no \`.codegraph/\` directory, skip CodeGraph entirely — indexing is the user's decision.
 ${CODEGRAPH_SECTION_END}`;

+ 13 - 12
src/installer/targets/shared.ts

@@ -31,20 +31,21 @@ export function getMcpServerConfig(): { type: string; command: string; args: str
 
 /**
  * Permissions list for Claude `settings.json`. Other targets that
- * have a permissions concept can compose this list directly. The
- * permission strings follow Claude's `mcp__<server>__<tool>` format.
+ * have a permissions concept can compose this list directly.
+ *
+ * One server-scoped wildcard rather than a per-tool list. By default only
+ * `codegraph_explore` is even LISTED to the agent (see DEFAULT_MCP_TOOLS in
+ * mcp/tools.ts), so in practice explore is the only tool this auto-approves —
+ * but the wildcard means that if a user re-enables another tool via
+ * CODEGRAPH_MCP_TOOLS, it's already pre-approved (no permission prompt, no
+ * hand-editing settings.json), and future tools are covered too. Claude only
+ * honors globs after a literal `mcp__<server>__` prefix, so this exact string
+ * is the way to allow-all for one server; a bare `mcp__codegraph` or `*` is
+ * ignored. The allowlist gates PROMPTING, not visibility, so a superset here
+ * never makes a hidden tool appear.
  */
 export function getCodeGraphPermissions(): string[] {
-  return [
-    'mcp__codegraph__codegraph_explore',
-    'mcp__codegraph__codegraph_search',
-    'mcp__codegraph__codegraph_node',
-    'mcp__codegraph__codegraph_callers',
-    'mcp__codegraph__codegraph_callees',
-    'mcp__codegraph__codegraph_impact',
-    'mcp__codegraph__codegraph_files',
-    'mcp__codegraph__codegraph_status',
-  ];
+  return ['mcp__codegraph__*'];
 }
 
 /**

+ 29 - 36
src/mcp/server-instructions.ts

@@ -7,13 +7,15 @@
  * before it sees individual tool descriptions.
  *
  * Goals when editing this:
- *   - Tool selection by intent (which tool for which question)
- *   - Common chains (refactor planning = X then Y)
- *   - Anti-patterns (don't grep when codegraph_search is faster)
+ *   - Lead the agent to codegraph_explore for any structural/flow question
+ *   - Reinforce "explore instead of Read/Grep" for indexed code
+ *   - Anti-patterns (don't re-verify with grep; don't hand-reconstruct flows)
  *
  * Keep it tight. The agent reads this every session — long instructions
- * burn tokens. Reference only tools that exist on `main`; gate any
- * conditional tools behind feature checks if/when they ship.
+ * burn tokens. The DEFAULT MCP surface is `codegraph_explore` ALONE (see
+ * DEFAULT_MCP_TOOLS in tools.ts) — reference only that tool here. The other
+ * tools (node/search/callers/…) stay defined and are re-enablable via
+ * CODEGRAPH_MCP_TOOLS, but they are NOT listed to agents, so don't name them.
  */
 export const SERVER_INSTRUCTIONS = `# Codegraph — code intelligence over an indexed knowledge graph
 
@@ -27,45 +29,36 @@ verbatim source PLUS who calls it and what it affects, so you edit with the
 blast radius in view. More accurate context, in far fewer tokens and
 round-trips than reading files yourself.
 
-## Use codegraph instead of reading files — for questions AND edits
+## One tool: codegraph_explore — use it instead of reading files
 
-Whether you're answering "how does X work" or implementing a change (fixing
-a bug, adding a feature), reach for codegraph before you Read. For
-understanding, answer DIRECTLY — usually with ONE \`codegraph_explore\` call.
-\`codegraph_explore\` takes either a natural-language question or a bag of
-symbol/file names and returns the verbatim source of the relevant symbols
-grouped by file, so it is Read-equivalent and most often the ONLY
-codegraph call you need. Codegraph IS the pre-built search index — so
-delegating the lookup to a separate file-reading sub-task/agent, or
-running your own grep + read loop, repeats work codegraph already did and
-costs more for the same answer. Reach for raw Read/Grep only to confirm a
-specific detail codegraph didn't cover. A direct codegraph answer is
-typically one to a few calls; a grep/read exploration is dozens.
+There is a single tool, \`codegraph_explore\`, and it is Read-equivalent. It
+takes either a natural-language question or a bag of symbol/file names and
+returns the **verbatim, line-numbered source** of the relevant symbols
+grouped by file — the same \`<n>\\t<line>\` shape \`Read\` gives you, safe to
+\`Edit\` from — PLUS the call path among them (including dynamic-dispatch hops
+like callbacks, React re-render, and JSX children that grep can't follow) and
+a blast-radius summary of what depends on them.
 
-## Tool selection by intent
+Whether you're answering "how does X work" or implementing a change (fixing a
+bug, adding a feature), call \`codegraph_explore\` before you Read. ONE call
+usually answers the whole question. Codegraph IS the pre-built search index —
+so running your own grep + read loop, or delegating the lookup to a separate
+file-reading sub-task/agent, repeats work codegraph already did and costs more
+for the same answer. A direct codegraph answer is typically one to a few
+calls; a grep/read exploration is dozens.
 
-- **Almost any question — "how does X work", architecture, a bug, "what/where is X", or surveying an area** → \`codegraph_explore\` (PRIMARY — call FIRST; ONE capped call returns the verbatim source of the relevant symbols grouped by file; most often the ONLY call you need)
-- **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow
-- **"What is the symbol named X?" (just its location)** → \`codegraph_search\`
-- **"What calls this?" / "What would changing this break?"** → \`codegraph_callers\` — EVERY call site with file:line, including where a function is **registered as a callback** (passed as an argument, assigned to a function pointer/field, listed in a handler table) — labeled "via callback registration" — so a function with no direct calls is NOT dead if it's wired up somewhere. When several UNRELATED symbols share a name (one \`UserService\` per monorepo app), it reports **one section per definition** (never a merged list) — pass \`file\` to focus the definition you mean. The wider blast radius arrives automatically on \`codegraph_explore\` (its "Blast radius" section) and \`codegraph_node\` (the dependents note)
-- **"What does this call?"** → \`codegraph_node\` with that symbol and \`includeCode: true\` — the body IS the callee list, and the caller/callee trail comes with it
-- **Reading a source FILE (any time you'd use the \`Read\` tool)** → \`codegraph_node\` with a \`file\` path and no \`symbol\`. It returns the file's **current source with line numbers — the same \`<n>\\t<line>\` shape \`Read\` gives you, safe to \`Edit\` from** — narrowable with \`offset\`/\`limit\` exactly like \`Read\`, PLUS a one-line note of which files depend on it. Same bytes as \`Read\`, faster (served from the index), with the blast radius attached. Use it **instead of \`Read\`** for indexed source files; fall back to \`Read\` only for what codegraph doesn't index (configs, docs). Pass \`symbolsOnly: true\` for just the file's structure.
-- **About to read or edit a symbol you can name** → \`codegraph_node\` with that \`symbol\` (SECONDARY — the after-explore depth tool): the verbatim source (\`includeCode: true\`) PLUS its caller/callee trail, so before changing it you see what calls it and what your edit would break. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload
+## How to query
 
-## Common chains
-
-- **Flow / "how does X reach Y"**: ONE \`codegraph_explore\` with the symbol names spanning the flow — it surfaces the call path among them (riding dynamic-dispatch hops) AND returns their source. No need to reconstruct the path with \`codegraph_search\` + \`codegraph_callers\`.
-- **Onboarding / understanding any area**: ONE \`codegraph_explore\` is usually the whole answer. Only follow up — \`codegraph_node\` for a specific symbol — if something is still unclear.
-- **Refactor planning**: \`codegraph_callers\` for the complete call-site list to update; the wider blast radius is already attached to \`codegraph_explore\` / \`codegraph_node\` output.
-- **Debugging a regression**: \`codegraph_callers\` of the suspected symbol; \`codegraph_node\` on anything unexpected that appears.
+- **Almost any question — "how does X work", architecture, a bug, "what/where is X", or surveying an area** → \`codegraph_explore\` with a natural-language question or the relevant names. ONE capped call returns the verbatim source grouped by file; most often the ONLY call you need.
+- **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, riding dynamic-dispatch hops, and returns their source.
+- **Reading or editing a file/symbol you can name** → put its name or file path in the \`codegraph_explore\` query — it returns that current line-numbered source (safe to \`Edit\` from) with the call path and blast radius attached, so you don't Read it separately. For an overloaded name it returns every matching definition's body in one call.
+- **Need more?** Call \`codegraph_explore\` again with more specific names — treat the source it returns as already Read.
 
 ## Anti-patterns
 
 - **Trust codegraph's results — don't re-verify them with grep.** They come from a full AST parse; re-checking with grep is slower, less accurate, and wastes context.
-- **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature.
-- **Don't chain \`codegraph_search\` + \`codegraph_node\`** to understand an area — ONE \`codegraph_explore\` returns the relevant symbols' source together in a single round-trip.
-- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol.
-- **Don't reach for the \`Read\` tool on an indexed source file** — \`codegraph_node\` with a \`file\` reads it for you (same \`<n>\\t<line>\` source, \`offset\`/\`limit\` like Read, faster, with its blast radius), and with a \`symbol\` it returns the source plus the caller/callee trail. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index.
+- **Don't grep or Read first** to find or understand indexed code — ONE \`codegraph_explore\` returns the relevant symbols' source together in a single round-trip. Reach for raw \`Read\`/\`Grep\` only to confirm a specific detail codegraph didn't cover, or for what codegraph doesn't index (configs, docs).
+- **Don't reconstruct a flow by hand** — name the endpoints in one \`codegraph_explore\` and it surfaces the path between them, dynamic-dispatch hops included.
 - **After editing, check the staleness banner.** When a tool response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Every file NOT in that banner is fresh, so still trust codegraph. A different, rarer banner — "⚠️ CodeGraph auto-sync is DISABLED…" — means live watching stopped entirely (the whole index is frozen, not just a few files); until it's resolved, Read files directly to confirm anything that may have changed.
 
 ## Limitations

+ 10 - 20
src/mcp/tools.ts

@@ -633,28 +633,18 @@ export function getStaticTools(): ToolDefinition[] {
 }
 
 /**
- * The MCP tools served by DEFAULT (short names). The other defined tools
- * (callees, impact, files, status) remain fully functional — handlers stay,
- * the library API and CLI are untouched, and `CODEGRAPH_MCP_TOOLS` re-enables
- * any of them — they just aren't LISTED to agents anymore.
+ * The MCP tools served by DEFAULT (short names). Pared to ONLY `codegraph_explore`
+ * — the single tool that reliably earns its place: one capped call returns the
+ * verbatim source of the relevant symbols grouped by file (and, with the offload,
+ * a reasoned flow map over that source). Every other tool is a narrower slice of
+ * what explore already does, and presence itself steers mis-picks, so they are no
+ * longer LISTED to agents.
  *
- * Evidence for the cut (the "adapt the tool to the agent" principle —
- * fewer tools = fewer mis-picks, and presence itself steers):
- * - `codegraph_impact` appears in ZERO recorded eval runs ever — its
- *   blast-radius info already arrives inline on explore (the "Blast radius"
- *   section) and node (the dependents note), so agents never need the
- *   standalone tool.
- * - `codegraph_callees` is redundant by construction: a symbol's body (which
- *   node returns) IS its callee list, plus the caller/callee trail.
- * - `codegraph_files` / `codegraph_status`: the tiny-repo audit (see
- *   getTools) found they "reduce to one grep"; staleness banners already
- *   inline the pending-sync info on every read tool, and the CLI covers
- *   diagnostics.
- * - `codegraph_callers` stays: exhaustive call-site enumeration (every
- *   caller with file:line, callback registrations labeled, one section per
- *   same-named definition) is the one job explore/node don't replicate.
+ * The other defined tools (`node`, `search`, `callers`, plus callees/impact/files/
+ * status) remain fully functional — handlers stay, the library API and CLI are
+ * untouched, and `CODEGRAPH_MCP_TOOLS=explore,node,...` re-enables any of them.
  */
-const DEFAULT_MCP_TOOLS = new Set(['explore', 'node', 'search', 'callers']);
+const DEFAULT_MCP_TOOLS = new Set(['explore']);
 
 /**
  * Tool handler that executes tools against a CodeGraph instance

+ 61 - 1
src/resolution/callback-synthesizer.ts

@@ -1646,10 +1646,68 @@ function svelteKitLoadEdges(ctx: ResolutionContext): Edge[] {
   return edges;
 }
 
+/**
+ * Redux-thunk dispatch chain. `export const X = createAsyncThunk(prefix, async (a, api) => {...})`
+ * (or a wrapper like trezor's `createThunk(...)`) passes the async body as an ARGUMENT, so
+ * tree-sitter never extracts it as a function node: `X` is a `constant` whose body's calls are
+ * ORPHANED. The `dispatch(nextThunk(...))` calls that drive a thunk chain forward therefore produce
+ * no edges, so `callees(X)` is empty and a flow `dispatch(X(...)) → X → nextThunk` dead-ends at the
+ * constant (validated on trezor-suite: the signXxxThunk constants had ZERO outgoing edges). Bridge
+ * it: body-scan each thunk constant for `dispatch(Y(...))` and link `X → Y`, so the dispatch chain
+ * connects. High-precision — the `dispatch(` keyword plus `Y` must resolve to a function/constant/
+ * method node; capped; gated on thunk constants existing so it never runs on non-RTK repos.
+ * Cross-file by design (a suite thunk dispatches a wallet-core thunk). Provenance `heuristic`,
+ * `synthesizedBy:'redux-thunk'`; `registeredAt` is the dispatch site.
+ */
+const THUNK_DECL_RE = /create(?:Async)?Thunk/;
+const THUNK_DISPATCH_RE = /\bdispatch\s*\(\s*([A-Za-z_]\w*)\s*[(),]/g;
+const THUNK_FANOUT_CAP = 24;
+
+function reduxThunkEdges(queries: QueryBuilder, ctx: ResolutionContext): Edge[] {
+  const edges: Edge[] = [];
+  const seen = new Set<string>();
+  for (const node of queries.iterateNodesByKind('constant')) {
+    // Cheap gate: the initializer (captured in `signature`) must be a create(Async)Thunk call —
+    // avoids reading every constant's body on a large repo.
+    if (!node.signature || !THUNK_DECL_RE.test(node.signature)) continue;
+    const content = ctx.readFile(node.filePath);
+    const src = content && sliceLines(content, node.startLine, node.endLine);
+    if (!src) continue;
+    // Thunks are TS/JS-family (same // and /* */ comment syntax); map to a CommentLang.
+    const safe = stripCommentsForRegex(src, node.language === 'javascript' || node.language === 'jsx' ? 'javascript' : 'typescript');
+    THUNK_DISPATCH_RE.lastIndex = 0;
+    let m: RegExpExecArray | null;
+    let added = 0;
+    while ((m = THUNK_DISPATCH_RE.exec(safe)) && added < THUNK_FANOUT_CAP) {
+      const name = m[1]!;
+      if (name === node.name) continue; // self-dispatch (recursive thunk) — skip
+      const target = ctx
+        .getNodesByName(name)
+        .find((n) => n.kind === 'constant' || n.kind === 'function' || n.kind === 'method');
+      if (!target || target.id === node.id) continue;
+      const key = `${node.id}>${target.id}`;
+      if (seen.has(key)) continue;
+      seen.add(key);
+      const line = node.startLine + safe.slice(0, m.index).split('\n').length - 1;
+      edges.push({
+        source: node.id,
+        target: target.id,
+        kind: 'calls',
+        line,
+        provenance: 'heuristic',
+        metadata: { synthesizedBy: 'redux-thunk', via: name, registeredAt: `${node.filePath}:${line}` },
+      });
+      added++;
+    }
+  }
+  return edges;
+}
+
 /**
  * Synthesize dispatcher→callback edges (field observers + EventEmitters +
  * React re-render + JSX children + Vue templates + SvelteKit load + RN event
- * channel + Fabric native-impl + MyBatis Java↔XML + Gin middleware chain).
+ * channel + Fabric native-impl + MyBatis Java↔XML + Gin middleware chain +
+ * Redux-thunk dispatch chain).
  * Returns the count added. Never throws into indexing — callers wrap in try/catch.
  */
 export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionContext): number {
@@ -1687,6 +1745,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
   const rnXPlatEdges = rnCrossPlatformEdges(queries);
   const mybatisEdges = mybatisJavaXmlEdges(queries);
   const ginEdges = ginMiddlewareChainEdges(queries, ctx);
+  const thunkEdges = reduxThunkEdges(queries, ctx);
 
   const merged: Edge[] = [];
   const seen = new Set<string>();
@@ -1710,6 +1769,7 @@ export function synthesizeCallbackEdges(queries: QueryBuilder, ctx: ResolutionCo
     ...rnXPlatEdges,
     ...mybatisEdges,
     ...ginEdges,
+    ...thunkEdges,
   ]) {
     const key = `${e.source}>${e.target}`;
     if (seen.has(key)) continue;