Просмотр исходного кода

chore(agent-eval): coverage probes, block-read hook, and design docs

Dev-only validation harness for the dynamic-dispatch coverage work:
- probe-{trace,node,context,explore}.mjs: drive MCP tools against a built index
  without a full agent run.
- block-read-hook.sh + hook-settings.json: PreToolUse experiment that denies
  source Reads to measure codegraph sufficiency (forced Read-0).
- docs/design/: callback-edge-synthesis + dynamic-dispatch-coverage playbook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby McHenry 1 месяц назад
Родитель
Сommit
25cba9ad7b

+ 179 - 0
docs/design/callback-edge-synthesis.md

@@ -0,0 +1,179 @@
+# Design + status: general callback / observer edge synthesis
+
+**Status:** Phases 1–3 implemented & validated as a **prototype, uncommitted on `main`**
+(as of 2026-05-22). This doc is the handoff for continuing the work.
+**Motivation:** close the dynamic-dispatch hole that static extraction leaves for
+observer / event-emitter / signal patterns, where a *dispatcher* invokes callbacks
+registered elsewhere through a shared store — so flows like "how does an update
+reach the screen" actually exist in the graph.
+
+---
+
+## TL;DR for a new session
+
+We synthesize `dispatcher → callback` edges that static parsing misses. It works:
+
+- **Field observer** (excalidraw `Scene.onUpdate`/`triggerUpdate`): synthesizes
+  `triggerUpdate → triggerRender`. `trace(mutateElement, triggerRender)` now = 3 hops.
+- **EventEmitter** (express `on('mount', …)`/`emit('mount')`): synthesizes `use → onmount`.
+- Precision is high: excalidraw got **1** synthesized edge out of 27k (the correct one);
+  node count moved +3 after Phase 3 (no explosion).
+
+**Files touched (all uncommitted on `main`):**
+- `src/resolution/callback-synthesizer.ts` — the whole-graph synthesis pass (Phase 1 + 2).
+- `src/resolution/index.ts` — calls `synthesizeCallbackEdges()` at the end of
+  `resolveAndPersistBatched()` (after base edges are persisted) + the import.
+- `src/extraction/tree-sitter.ts` — `visitFunctionBody` now extracts **named** nested
+  functions (Phase 3), so inline named handlers become linkable nodes.
+
+**How to reproduce / test:**
+```bash
+npm run build
+rm -rf /tmp/codegraph-corpus/excalidraw/.codegraph
+( cd /tmp/codegraph-corpus/excalidraw && codegraph init -i )
+# synthesized edges (provenance='heuristic', metadata.synthesizedBy in {callback,event-emitter}):
+sqlite3 /tmp/codegraph-corpus/excalidraw/.codegraph/codegraph.db \
+  "select s.name||' → '||t.name||'  '||coalesce(e.metadata,'') from edges e \
+   join nodes s on e.source=s.id join nodes t on e.target=t.id where e.provenance='heuristic';"
+# end-to-end trace (uses the dev probes):
+node scripts/agent-eval/probe-trace.mjs /tmp/codegraph-corpus/excalidraw triggerUpdate triggerRender
+```
+Probe scripts (dev-only, in `scripts/agent-eval/`): `probe-node.mjs` (symbol + trail),
+`probe-trace.mjs` (call path), `probe-context.mjs`, `probe-explore.mjs`. EventEmitter
+fixture lives at `/tmp/cb-fixture/bus.js` (ephemeral — recreate or move into `__tests__/`).
+
+---
+
+## The hole
+
+```ts
+class Scene {
+  private callbacks = new Set<Callback>();
+  onUpdate(cb: Callback) { this.callbacks.add(cb); }          // REGISTRAR
+  triggerUpdate() { for (const cb of this.callbacks) cb(); }  // DISPATCHER
+}
+this.scene.onUpdate(this.triggerRender);                      // REGISTRATION SITE
+```
+
+The runtime edge `triggerUpdate → triggerRender` does not exist statically:
+`triggerUpdate`'s only literal call is `cb()` (anonymous). Measured: `triggerUpdate`'s
+only callee was `randomInteger`; `trace(triggerUpdate, triggerRender)` returned no path.
+
+## Why it's a whole-graph pass, not a `FrameworkResolver.resolve()`
+
+`resolve(ref)` answers "what does this **named** ref point to," one ref at a time. The
+callback edge has **no ref to resolve** (`cb()` is anonymous) and needs **cross-file,
+multi-site correlation** (registrar, registration, dispatcher). So it's a whole-graph
+pass after base resolution, language-level (any OO observer), living in
+`src/resolution/callback-synthesizer.ts` — **not** under `frameworks/`.
+
+> Sibling mechanism for the *other* dynamic-dispatch class — **named** attribute/
+> descriptor dispatch (e.g. django `self._iterable_class(...)`) — is the
+> `claimsReference` hook (`resolution/types.ts` + `resolution/index.ts` pre-filter)
+> + a `FrameworkResolver.resolve()` (django ORM resolver in `frameworks/python.ts`).
+> That one *does* fit `resolve()` because the ref is named. Both are part of the same
+> coverage effort; see the "Related work" section.
+
+---
+
+## As-built algorithm (and where it diverged from the original design)
+
+### Field-observer channels (`fieldChannelEdges`, Phase 1)
+1. **Candidates** by method/function **name** — registrar `^(on[A-Z]\w*|subscribe|
+   addListener|addEventListener|register|watch|listen|addCallback)$`; dispatcher
+   contains `(emit|trigger|notify|dispatch|fire|publish|flush)`.
+2. **Confirm by body** (read via `ctx.readFile` + slice node lines): registrar has
+   `this.<F>.add|push|set(`; dispatcher has `for (… of [Array.from(]this.<F>)` + a call,
+   or `this.<F>.forEach(`.
+3. **Pairing — DIVERGENCE:** the design said pair by *class*; the build pairs by
+   **same file + same field `F`** (file as a class proxy — getting the containing class
+   reliably was harder). Works for the common 1-class-per-file case; revisit for
+   multi-class files.
+4. **Registrations:** `queries.getIncomingEdges(registrar.id, ['calls'])` → for each,
+   read the caller's source at the edge line and **regex-recover the arg**
+   (`<registrarName>\s*\(\s*(?:this\.)?(\w+)`). DIVERGENCE: design preferred tree-sitter
+   re-parse; build uses regex (named refs only — arrows/inline args are missed here).
+5. **Synthesize** `dispatcher → fn` (`getNodesByName(arg)` → method|function). Capped at
+   `MAX_CALLBACKS_PER_CHANNEL = 40`.
+
+### EventEmitter channels (`eventEmitterEdges`, Phase 2)
+- **File-oriented scan** (`ctx.getAllFiles()` + `readFile`, substring pre-filter on
+  `.emit(`/`.on(`/etc). `ON_RE` = `\.(?:on|once|addListener)\(\s*['"]([^'"]+)['"]\s*,\s*
+  (?:function\s+(\w+)|(?:this\.)?(\w+))`; `EMIT_RE` = `\.(?:emit|fire|dispatchEvent)\(\s*['"]([^'"]+)['"]`.
+- Dispatcher = **enclosing function** of the `emit('e')` call (`enclosingFn` finds the
+  tightest function/method/component node containing the line). Handler = `getNodesByName`
+  of the on-handler name.
+- Correlate by **event-name literal**; synthesize dispatcher → handler.
+- **Precision — DIVERGENCE:** design proposed receiver-type matching; build uses an
+  **event fan-out cap** (`EVENT_FANOUT_CAP = 6`) — skip events with >6 handlers or
+  dispatchers (generic names like `error`/`change` would over-link without type info).
+
+### Provenance — DIVERGENCE
+`Edge.provenance` is a fixed enum (`'tree-sitter'|'scip'|'heuristic'`), so synthesized
+edges use **`provenance: 'heuristic'`** + `metadata: { synthesizedBy: 'callback'|
+'event-emitter', via/event/field }`. The design's `'callback-synthesis'` provenance and
+high/medium/low **confidence tiers were NOT implemented** — the fan-out cap +
+registrar-name uniqueness + named-only handlers are the precision guards instead.
+
+### Phase 3 — inline callback extraction (`tree-sitter.ts`)
+The real blocker for EventEmitter on real repos: inline handlers
+(`on('mount', function onmount(){})`) weren't **nodes**, so nothing could link to them.
+Root cause: `visitFunctionBody` walked *through* nested functions without extracting them.
+Fix: in `visitForCallsAndStructure`, when a body node is a `functionType` and
+`extractName` returns a real name, call `extractFunction` (which extracts it and walks
+its own body) and return. **Named only** — anonymous arrows fall through to the existing
+recursion (so their inner calls stay attributed to the enclosing fn). This bounded it:
+excalidraw +3 nodes, no explosion, no regression.
+
+---
+
+## Validation results (actual)
+
+| Repo | Result |
+|---|---|
+| excalidraw | 1 synthesized edge `triggerUpdate → triggerRender` (of 27,214); `trace(mutateElement, triggerRender)` = 3 hops; nodes 9,286 → 9,289 |
+| express | after Phase 3: `use → onmount` `{event-emitter, event:"mount"}` (`onmount` now extracted at `application.js:109`) |
+| `/tmp/cb-fixture/bus.js` | `tick → handleRefresh`, `persist → handleSave` (named-method EventEmitter handlers) |
+| excalidraw / express | no Phase-1 regression; node counts stable |
+
+---
+
+## Remaining work (prioritized for the next session)
+
+1. **Anonymous-arrow handlers** — `on('e', () => foo())` still produce no edge (no node,
+   intentionally not extracted in Phase 3). The fix is **synthesizer link-through-body**:
+   parse the arrow's body and link `dispatcher → (calls inside the arrow)`. Highest
+   remaining recall win; handles the most common modern callback shape.
+2. **Wire into `resolveAndPersist`** (incremental sync) — synthesis currently runs only
+   in `resolveAndPersistBatched` (full index). Incremental re-index won't refresh
+   synthesized edges.
+3. **Receiver-type matching** for EventEmitter precision (replace/augment the fan-out
+   cap) — use `type_of` edges so `x.emit('change')` only links to `y.on('change', fn)`
+   when `x`,`y` are the same type. Lets the fan-out cap relax.
+4. **Tree-sitter arg recovery** (replace the regex in field-channel Stage 4) — robust for
+   arrows, multi-arg, line-wrapped calls.
+5. **Single-callback fields** (`this.onChange = cb; … this.onChange()`) — scalar-store
+   variant of the field observer; not built.
+6. **Broad precision/recall audit** — run across the full corpus; tally synthesized edges
+   per repo, spot-check, confirm no explosion on EventEmitter-heavy repos.
+7. **Tests + CHANGELOG** — the fixture is a ready vitest case for the synthesizer; add
+   extractor tests for Phase 3 (named-nested-fn extraction; confirm other languages
+   unaffected — the change is in the shared walker), resolver tests for the django side.
+
+## Edge cases / model
+- **Over-approximation across instances** is accepted (reachability, not instance
+  precision). `unregister`/`off` ignored.
+- Synthesized edges are **additive** — never replace static edges; tooling can filter on
+  `provenance='heuristic'` + `metadata.synthesizedBy`.
+
+## Related work (same coverage effort)
+This is one half of closing dynamic-dispatch coverage. The other artifacts on `main`:
+- **Named attribute/descriptor resolver**: `claimsReference` (`resolution/types.ts`,
+  pre-filter in `resolution/index.ts`) + django ORM resolver (`frameworks/python.ts`,
+  `_iterable_class` → `ModelIterable.__iter__`).
+- **Retrieval/UX changes** (separate from coverage): `explore` whole-small-file + glue
+  fixes, `node`-with-trail, `codegraph_trace`, `context` call-paths — all in
+  `src/mcp/tools.ts` / `src/context/index.ts`.
+- **Full investigation context + findings:** auto-memory
+  `project_codegraph_read_displacement` (why coverage — not prompting/hooks/new-tools —
+  is the lever for getting agents to use codegraph over Read).

+ 234 - 0
docs/design/dynamic-dispatch-coverage-playbook.md

@@ -0,0 +1,234 @@
+# Dynamic-Dispatch Coverage Playbook
+
+**Audience:** a Claude agent continuing this work.
+**Mission:** systematically close static-extraction coverage holes for **dynamic
+dispatch** across **every language and framework codegraph supports**, and validate
+each one the same way, so cross-symbol *flows* exist in the graph everywhere.
+
+> This is the top-level playbook. The deep design for one mechanism (the callback
+> synthesizer) is in [`callback-edge-synthesis.md`](./callback-edge-synthesis.md).
+> Full investigation context + findings: auto-memory `project_codegraph_read_displacement`.
+
+---
+
+## 1. The goal (why this matters)
+
+codegraph's value is being **the map** — answering structural/flow questions
+(`trace`, `impact`, callers, "how does X reach Y") that grep/Read cannot. Agents
+will use codegraph instead of Read **only when it is sufficient**. We proved
+empirically (see memory) that the lever for sufficiency is **coverage**, not
+prompting/hooks/new-tools: when a flow is missing from the graph, the agent reads
+the files to reconstruct it; when the flow *is* in the graph, the agent can answer
+completely without reading.
+
+**Validated end-to-end on excalidraw:** after closing the update-flow hole, 2/3
+headless agent runs answered the "how does an update reach the screen" question with
+**Read 0 and a complete answer** — impossible before, because the key edge wasn't in
+the graph. (Caveat: coverage *enables* the no-read path; agent confirm-by-reading
+variance means it doesn't *force* it. Completeness improves unconditionally.)
+
+The mission is to make that true for **all** languages/frameworks.
+
+---
+
+## 2. The problem class: dynamic dispatch
+
+Static tree-sitter extraction captures explicit calls (`foo()`, `this.bar()`). It
+**misses** any call whose target is computed/indirect. Four recurring shapes, with a
+**difficulty gradient** (do the cheap ones first):
+
+| # | Shape | Example | Fix mechanism | Cost |
+|---|---|---|---|---|
+| 1 | **Named attribute / descriptor** | django `self._iterable_class(self)` | framework resolver (`claimsReference` + `resolve()`) | **cheap** |
+| 2 | **Field-backed observer** | `onUpdate(cb)` + `for(cb of cbs)cb()` | callback synthesizer (whole-graph pass) | medium |
+| 3 | **String-keyed EventEmitter** | `on('e',fn)` / `emit('e')` | callback synthesizer (event-keyed) | medium |
+| 4 | **Inline callback handler** | `on('e', function h(){})` / `() => {}` | extraction (named) + synthesizer link-through-body (anon) | named: cheap · anon: hard |
+
+Key distinction driving the mechanism choice:
+- **A named ref exists** to resolve (`_iterable_class` is an attribute name) → **resolver**.
+- **No ref exists** (`cb()` is anonymous; needs registrar↔dispatcher correlation) → **synthesizer**.
+
+---
+
+## 3. Worked examples (the two mechanisms, end to end)
+
+### 3a. Django ORM descriptor — the **resolver** pattern (Python)
+- **Hole:** `QuerySet._fetch_all` calls `self._iterable_class(self)` (a runtime-chosen
+  iterable, default `ModelIterable`), whose `__iter__` runs the SQL compiler. Static
+  parsing can't resolve the attribute-as-callable → `_fetch_all`'s only callee was
+  `_prefetch_related_objects`; `trace(_fetch_all, execute_sql)` returned no path.
+- **Fix:** `djangoResolver` claims the unresolved `_iterable_class` ref through the
+  name-exists pre-filter, then resolves it to `ModelIterable.__iter__`.
+- **Files:** `src/resolution/types.ts` (`claimsReference?` on `FrameworkResolver`),
+  `src/resolution/index.ts` (pre-filter in `resolveOne` consults `claimsReference`),
+  `src/resolution/frameworks/python.ts` (`djangoResolver.resolve` + `claimsReference` +
+  `resolveModelIterableIter`).
+- **Result:** `trace(_fetch_all, execute_sql)` → `_fetch_all → __iter__ → execute_sql` (3 hops).
+
+### 3b. Excalidraw observer + EventEmitter — the **synthesizer** (TS)
+- **Hole:** `Scene.triggerUpdate` does `for (cb of this.callbacks) cb()`; `triggerRender`
+  is registered via `scene.onUpdate(this.triggerRender)`. The `triggerUpdate →
+  triggerRender` edge is dynamic → `trace` returned no path; the whole update flow broke.
+- **Fix:** a whole-graph pass that detects registrar/dispatcher channels, correlates
+  registration sites, and synthesizes `dispatcher → callback` edges. Plus extraction of
+  **named** inline callbacks so handlers like express's `function onmount(){}` are nodes.
+- **Files:** `src/resolution/callback-synthesizer.ts` (the pass — field observers +
+  EventEmitter), `src/resolution/index.ts` (calls `synthesizeCallbackEdges()` at the end
+  of `resolveAndPersistBatched`), `src/extraction/tree-sitter.ts` (`visitFunctionBody`
+  extracts named nested functions).
+- **Result:** `trace(mutateElement, triggerRender)` → 3 hops; express `use → onmount`.
+
+---
+
+## 4. The repeatable methodology (run this per language/framework)
+
+### Step 1 — Pick the framework's canonical *flow* question
+Every framework has a signature data/control flow. Pick the "how does X reach/become Y"
+question and a real repo (add to `.claude/skills/agent-eval/corpus.json`). Examples:
+- React state→DOM, Vue reactive→render, Svelte store→update
+- Rails request→controller→view, Spring request→`@Controller`→service
+- Express/Koa request→middleware→handler, FastAPI request→route→dependency
+- Redux action→reducer→store, RxJS subscribe→operator→observer
+- Any ORM: query builder → SQL execution (django pattern)
+
+### Step 2 — Measure the hole (deterministic, no agent)
+```bash
+rm -rf <repo>/.codegraph && ( cd <repo> && codegraph init -i )
+node scripts/agent-eval/probe-trace.mjs <repo> <from-symbol> <to-symbol>   # does the flow break? where?
+node scripts/agent-eval/probe-node.mjs  <repo> <break-symbol>              # trail: is the next hop missing?
+```
+A "No direct call path … breaks at dynamic dispatch" + a sparse trail at the break
+point **locates the hole** (this is exactly how `_iterable_class` and `triggerUpdate`
+were found). Confirm it's dynamic by reading the break symbol's body.
+
+### Step 3 — Classify → choose the mechanism (use the §2 table)
+- `self.<attr>(...)` / descriptor / metaclass → **resolver** (§3a).
+- `for(cb of store)cb()` / `store.forEach(cb=>cb())` → **field-observer synthesizer** (§3b).
+- `on('e',fn)` + `emit('e')` → **EventEmitter synthesizer** (§3b).
+- Inline handler not a node → **named:** extraction (already done generically in
+  `tree-sitter.ts`); **anonymous:** synthesizer link-through-body (not yet built).
+
+### Step 4 — Implement
+- **Resolver:** add to `src/resolution/frameworks/<lang>.ts` — a `resolve()` branch +
+  `claimsReference(name)` if the ref name isn't a declared symbol. Copy `djangoResolver`.
+- **Synthesizer channel:** extend `src/resolution/callback-synthesizer.ts` — add the
+  framework's registrar/dispatcher **name patterns** and **body patterns** (e.g. signals
+  use `.connect()`/`.emit()`; Rx uses `.subscribe()`/`.next()`).
+- Reindex (Step 2 command) and re-run `probe-trace` — the flow should now connect.
+
+### Step 5 — Validate (the same way every time)
+1. **Deterministic:** `probe-trace(from,to)` finds the path; `probe-node` shows the
+   bridged hop. The previously-broken hop is closed.
+2. **Precision:** count + spot-check synthesized/resolved edges — no explosion, correct targets:
+   ```bash
+   sqlite3 <repo>/.codegraph/codegraph.db \
+     "select s.name||' → '||t.name||'  '||coalesce(e.metadata,'') from edges e \
+      join nodes s on e.source=s.id join nodes t on e.target=t.id where e.provenance='heuristic';"
+   ```
+   (Resolver edges aren't `heuristic`; verify via the trace + callees instead.)
+3. **Regression:** node count stable (`select count(*) from nodes;` before/after — a big
+   jump means an extraction change over-fired); existing traces on a control repo intact.
+4. **End-to-end agent eval:** run the flow question with codegraph and measure
+   **reads / answer-completeness / cost** vs a pre-fix baseline:
+   ```bash
+   # headless (exact cost + clean tool sequence)
+   bash scripts/agent-eval/run-agent.sh <repo> with "<flow question>"
+   # or the full A/B + interactive Explore-subagent path:
+   scripts/agent-eval/audit.sh local <name> <url> "<flow question>" all
+   ```
+   Then parse: `Read` count, codegraph-tool count, cost, and whether the answer now
+   contains the glue symbols (the ones that previously required a read).
+
+### Success criteria (per language/framework)
+- `trace` finds the canonical flow end-to-end (no dynamic-dispatch break).
+- Agent can answer the flow question with **Read 0** (achievable in ≥ some runs) and the
+  glue symbols appear in the answer.
+- **No node explosion** and no regression on a control repo.
+- Synthesized edges are precise on a spot-check (no generic-name over-linking).
+
+---
+
+## 5. Validation toolkit (reference)
+
+| Tool | Purpose |
+|---|---|
+| `scripts/agent-eval/probe-trace.mjs <repo> <from> <to>` | call-path between two symbols (the hole detector) |
+| `scripts/agent-eval/probe-node.mjs <repo> <sym> [code]` | symbol + trail (callers/callees); `code` adds the body |
+| `scripts/agent-eval/probe-context.mjs <repo> "<task>"` | context output incl. call-paths |
+| `scripts/agent-eval/probe-explore.mjs <repo> "<query>"` | explore output |
+| `scripts/agent-eval/{audit,run-agent,itrun}.sh` | agent A/B (headless + interactive); also the `/agent-eval` skill |
+| `sqlite3 <repo>/.codegraph/codegraph.db` | direct edge/node inspection (provenance, metadata, counts) |
+
+Probe scripts use the built `dist/` — run `npm run build` first. Reindex after any
+extraction or resolution change (`rm -rf <repo>/.codegraph && codegraph init -i`) — the
+synthesizer/resolvers run at index time. Test fixtures: keep a tiny per-pattern fixture
+(see `/tmp/cb-fixture/bus.js`; **move into `__tests__/`** when shipping).
+
+---
+
+## 6. Coverage matrix (fill in as you go)
+
+Status legend: ✅ done+validated · 🔬 hole identified · ⬜ not started.
+`Mechanism`: R = resolver, S = synthesizer channel, X = extraction.
+
+| Language | Framework(s) | Canonical flow to test | Mechanism | Status |
+|---|---|---|---|---|
+| TypeScript/JS | React / observer / EventEmitter | state→render; dispatch→callback | S + X | ✅ (excalidraw) |
+| TypeScript/JS | Vue / Nuxt | reactive dep → render | ? | ⬜ |
+| TypeScript/JS | Svelte / SvelteKit | store → DOM update | ? | ⬜ |
+| TypeScript/JS | Express / Koa | request → middleware → handler | ? | ⬜ |
+| TypeScript/JS | NestJS | request → controller → provider | ? | ⬜ |
+| TypeScript/JS | RxJS / signals | subscribe → operator → observer | S | ⬜ |
+| Python | Django ORM | QuerySet → SQL compiler | R | ✅ |
+| Python | Django (views/signals) | url → view; signal → receiver | R/S | 🔬 (routes done; signals ⬜) |
+| Python | Flask / FastAPI | request → route → dependency | R | 🔬 (routes done) |
+| Go | Gin / net/http | request → handler chain | ? | ⬜ |
+| Rust | Axum / Cargo workspace | request → handler; trait dispatch | R | 🔬 (workspaces done) |
+| Java | Spring | request → @Controller → service; DI | ? | ⬜ |
+| Kotlin | (coroutines / DI) | flow/callback dispatch | ? | ⬜ |
+| Swift | Vapor | request → route → controller | ? | ⬜ |
+| C# | ASP.NET | request → controller; DI | ? | ⬜ |
+| Ruby | Rails / Sinatra | request → controller → view; callbacks | ? | ⬜ |
+| PHP | Laravel / Drupal | request → controller; events | ? | ⬜ |
+| C/C++ | (callback structs / vtables) | function-pointer dispatch | ? | ⬜ |
+| Dart | Flutter | setState → build | S | ⬜ |
+| Lua / Luau | (Neovim / Roblox) | event/callback dispatch | S | ⬜ |
+| Scala | (Akka / Play) | actor message → handler | ? | ⬜ |
+
+(Verify the exact supported set against `src/extraction/languages/` and
+`src/resolution/frameworks/` before starting — this table is a starting point.)
+
+---
+
+## 7. Known limits & gotchas (from the excalidraw/django work)
+
+- **Coverage enables, doesn't force, the no-read path.** Agents still read to *confirm
+  source* sometimes; cost stays ~flat (codegraph calls trade for reads). The reliable
+  win is **completeness** + making Read-0 *possible*. Don't expect a guaranteed cost drop.
+- **Difficulty gradient is real:** named-ref dispatch (resolver) is cheap; anonymous
+  callback dispatch (synthesizer) is medium; **anonymous-arrow handlers are the hard
+  remaining gap** (no identity → need synthesizer link-through-body, not yet built).
+- **Extraction changes are high blast radius.** The Phase-3 named-inline-callback
+  extraction is in the *shared* `tree-sitter.ts` walker — re-check **node counts across
+  several languages** after any extraction change (it held at +3 on excalidraw because
+  anonymous arrows are skipped).
+- **Synthesizer precision guards:** registrar-name uniqueness, named-only handlers, and
+  an event **fan-out cap** (skip generic events like `error`/`change`). Receiver-type
+  matching (via `type_of` edges) is the planned precision upgrade — deferred.
+- **As-built shortcuts** (callback synthesizer): pairs registrar/dispatcher by *file*+field
+  (class proxy), regex arg-recovery (named refs only), `provenance:'heuristic'` +
+  `metadata.synthesizedBy` (the enum has no `'callback-synthesis'`). See the design doc.
+- **Synthesizer runs only in `resolveAndPersistBatched`** (full index) — wire into
+  `resolveAndPersist` for incremental sync before shipping.
+- **Symbol ambiguity in `trace`:** common names (`render`, `execute_sql`) match many
+  nodes; trace picks among them and may start from the wrong one. Trace from the specific
+  method, not a class name.
+
+---
+
+## 8. Definition of done (the whole mission)
+
+For each language × framework: the canonical flow `trace`s end-to-end, an agent can
+answer the flow question with Read 0 in at least some runs with the glue present, no node
+explosion, no regression — recorded in the matrix (§6) with the validating repo + numbers.
+Then ship-prep: tests per mechanism, CHANGELOG, wire incremental, commit.

+ 19 - 0
scripts/agent-eval/block-read-hook.sh

@@ -0,0 +1,19 @@
+#!/usr/bin/env bash
+# PreToolUse hook (experiment): deny Read of codegraph-indexed source files and
+# steer the agent to codegraph_explore/codegraph_node instead. Tests whether
+# codegraph can FULLY replace Read for code-understanding once the escape hatch
+# is removed. Non-source reads (config, .env, markdown, new files) pass through.
+#
+# Wire via:  claude ... --settings scripts/agent-eval/hook-settings.json
+set -uo pipefail
+input="$(cat)"
+fp="$(printf '%s' "$input" | jq -r '.tool_input.file_path // empty' 2>/dev/null)"
+
+case "$fp" in
+  *.ts|*.tsx|*.js|*.jsx|*.mjs|*.cjs|*.py|*.go|*.rs|*.java|*.rb|*.php|*.swift|*.kt|*.kts|*.c|*.cc|*.cpp|*.h|*.hpp|*.cs|*.lua|*.vue|*.svelte)
+    msg="Read is disabled for source files in this session — codegraph already has this file indexed (with line numbers, kept in sync on every change). Use codegraph_explore (several related symbols at once) or codegraph_node (one symbol's full source). If a symbol you need wasn't in a prior explore, run ANOTHER codegraph_explore with its exact name instead of reading the file."
+    jq -n --arg m "$msg" '{reason:$m, hookSpecificOutput:{hookEventName:"PreToolUse",permissionDecision:"deny",permissionDecisionReason:$m}}'
+    exit 0
+    ;;
+esac
+exit 0

+ 15 - 0
scripts/agent-eval/hook-settings.json

@@ -0,0 +1,15 @@
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Read",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash /Users/colby/Development/Personal/codegraph/scripts/agent-eval/block-read-hook.sh"
+          }
+        ]
+      }
+    ]
+  }
+}

+ 21 - 0
scripts/agent-eval/probe-context.mjs

@@ -0,0 +1,21 @@
+#!/usr/bin/env node
+// Probe codegraph_context (with call-paths) against an index using the built dist.
+// Usage: node probe-context.mjs <repo-with-.codegraph> <task words...>
+import { pathToFileURL } from 'node:url';
+import { resolve } from 'node:path';
+
+const [, , repo, ...taskParts] = process.argv;
+const task = taskParts.join(' ');
+if (!repo || !task) { console.error('usage: probe-context.mjs <repo> <task...>'); process.exit(1); }
+
+const load = async (rel) => import(pathToFileURL(resolve(rel)).href);
+const idx = await load('dist/index.js');
+const tools = await load('dist/mcp/tools.js');
+const CodeGraph = idx.default?.default ?? idx.default ?? idx.CodeGraph;
+const ToolHandler = tools.ToolHandler ?? tools.default?.ToolHandler;
+
+const cg = CodeGraph.openSync(repo);
+const h = new ToolHandler(cg);
+const res = await h.execute('codegraph_context', { task });
+console.log(res.content?.[0]?.text ?? '(no text)');
+try { cg.close?.(); } catch {}

+ 40 - 0
scripts/agent-eval/probe-explore.mjs

@@ -0,0 +1,40 @@
+#!/usr/bin/env node
+// One-shot probe: run handleExplore against an existing index using the built
+// dist, print the output + a few stats. Lets us verify explore's coverage fix
+// without a full agent run. Usage: node probe-explore.mjs <repo-with-.codegraph> "<query>"
+import { pathToFileURL } from 'node:url';
+import { resolve } from 'node:path';
+
+const [, , repo, query] = process.argv;
+if (!repo || !query) {
+  console.error('usage: probe-explore.mjs <repo> "<query>"');
+  process.exit(1);
+}
+
+const load = async (rel) => import(pathToFileURL(resolve(rel)).href);
+const idx = await load('dist/index.js');
+const tools = await load('dist/mcp/tools.js');
+
+// esModuleInterop: dynamic import of CJS yields { default: module.exports, ...named }
+const CodeGraph = idx.default?.default ?? idx.default ?? idx.CodeGraph;
+const ToolHandler = tools.ToolHandler ?? tools.default?.ToolHandler;
+
+if (typeof CodeGraph?.openSync !== 'function') {
+  console.error('could not resolve CodeGraph.openSync; index keys:', Object.keys(idx), 'default keys:', idx.default && Object.keys(idx.default));
+  process.exit(2);
+}
+if (typeof ToolHandler !== 'function') {
+  console.error('could not resolve ToolHandler; tools keys:', Object.keys(tools));
+  process.exit(2);
+}
+
+const cg = CodeGraph.openSync(repo);
+const h = new ToolHandler(cg);
+const res = await h.execute('codegraph_explore', { query });
+const text = res.content?.[0]?.text ?? '(no text)';
+console.log(text);
+console.error('\n--- PROBE STATS ---');
+console.error('output chars:', text.length);
+console.error('triggerRender body present (-> setState({})):', /triggerRender[\s\S]{0,400}setState\(\{\}\)/.test(text));
+console.error('App.tsx in source section:', /#### .*App\.tsx —/.test(text));
+try { cg.close?.(); } catch {}

+ 20 - 0
scripts/agent-eval/probe-node.mjs

@@ -0,0 +1,20 @@
+#!/usr/bin/env node
+// Probe codegraph_node (with trail) against an index using the built dist.
+// Usage: node probe-node.mjs <repo-with-.codegraph> <symbol> [code]
+import { pathToFileURL } from 'node:url';
+import { resolve } from 'node:path';
+
+const [, , repo, symbol, code] = process.argv;
+if (!repo || !symbol) { console.error('usage: probe-node.mjs <repo> <symbol> [code]'); process.exit(1); }
+
+const load = async (rel) => import(pathToFileURL(resolve(rel)).href);
+const idx = await load('dist/index.js');
+const tools = await load('dist/mcp/tools.js');
+const CodeGraph = idx.default?.default ?? idx.default ?? idx.CodeGraph;
+const ToolHandler = tools.ToolHandler ?? tools.default?.ToolHandler;
+
+const cg = CodeGraph.openSync(repo);
+const h = new ToolHandler(cg);
+const res = await h.execute('codegraph_node', { symbol, includeCode: code === 'code' });
+console.log(res.content?.[0]?.text ?? '(no text)');
+try { cg.close?.(); } catch {}

+ 20 - 0
scripts/agent-eval/probe-trace.mjs

@@ -0,0 +1,20 @@
+#!/usr/bin/env node
+// Probe codegraph_trace against an index using the built dist.
+// Usage: node probe-trace.mjs <repo-with-.codegraph> <from> <to>
+import { pathToFileURL } from 'node:url';
+import { resolve } from 'node:path';
+
+const [, , repo, from, to] = process.argv;
+if (!repo || !from || !to) { console.error('usage: probe-trace.mjs <repo> <from> <to>'); process.exit(1); }
+
+const load = async (rel) => import(pathToFileURL(resolve(rel)).href);
+const idx = await load('dist/index.js');
+const tools = await load('dist/mcp/tools.js');
+const CodeGraph = idx.default?.default ?? idx.default ?? idx.CodeGraph;
+const ToolHandler = tools.ToolHandler ?? tools.default?.ToolHandler;
+
+const cg = CodeGraph.openSync(repo);
+const h = new ToolHandler(cg);
+const res = await h.execute('codegraph_trace', { from, to });
+console.log(res.content?.[0]?.text ?? '(no text)');
+try { cg.close?.(); } catch {}