ソースを参照

chore: consolidate session handoffs (cross-language impact coverage)

Colby McHenry 2 週間 前
コミット
8530cc2558

+ 0 - 114
.claude/handoffs/codegraph-tool-surface-rethink-2026-05-27.md

@@ -1,114 +0,0 @@
----
-name: codegraph-tool-surface-rethink-2026-05-27
-date: 2026-05-27 15:11
-project: codegraph
-branch: feat/go-multi-module-trace-quality
-summary: PR #494 multi-language audit revealed structural ~$0.04-$0.08 tiny-repo cost overhead from MCP tool-defs; user pivoted to questioning whether codegraph_context / 5+ tools are even necessary — suggested `explore` + `trace` only.
----
-
-# Handoff: Should codegraph cut to just `explore` + `trace`?
-
-## Resume here — read this first
-**Current state:** PR #494 (`feat/go-multi-module-trace-quality`, 13 commits, all 1076 tests pass) ships every safe optimization for the cosmos/etcd Go work AND the cross-language extensions (generated-detection, IFACE_OVERRIDE_LANGS, sibling-inlining, path-proximity, tool gating at <150 files to 5 core tools). Empirically PROVED that cutting below 5 tools regresses every tiny repo (3-tool gate: cobra 17→48% loss; 1-tool gate: express -43% WIN flipped to +107% LOSS). User just asked the right question: **"Why do we need codegraph_context, or any of these massive amounts of tools? All it really needs is explore, and trace if you ask me."**
-
-**Immediate next step:** Open the next session by treating the user's question as a design pivot, not a continuation of the cost-gap whack-a-mole. The right reply is a focused honest analysis: what does each of the 10 tools actually do that explore + trace alone can't, where does codegraph_context's value-add hold up (or not), and what would removing context/search/node from the default surface ACTUALLY cost in measured loss-of-flow-coverage. Don't start cutting tools yet — present the analysis first.
-
-> Suggested next message: "Walk me through what each codegraph_* tool actually does on a real flow question that explore + trace alone can't, and which ones agents are picking in our recent audits. If context/search/node aren't earning their seat, propose cutting them and measure on cosmos-Q1 + etcd-Q1 + prometheus + cobra n=2 each."
-
-## Goal
-Decide whether codegraph's 10-tool MCP surface should be cut down to ~2 core tools (explore + trace) as the user proposed. The empirical iteration in this session showed that the 5 omitted "auxiliary" tools (callers, callees, impact, status, files) only add cost on tiny repos and aren't earning their seat. The real question now: **does the same logic apply to context + search + node?** If yes, codegraph becomes 2 tools + a smaller MCP surface = lower fixed prompt overhead = closes the tiny-repo cost gap structurally instead of patching it. If no, name the specific flows where they do unique work.
-
-## Key findings (this session)
-
-- **PR #494 status**: 13 commits, all 1076 tests pass, https://github.com/colbymchenry/codegraph/pull/494. Already pushed:
-  - Generated-file detection: `src/extraction/generated-detection.ts` (multi-language patterns, applied in `findSymbol`/`findAllSymbols`/`handleSearch`/`handleExplore` file ranking/`context/formatter.ts`)
-  - Go gRPC bridge: `goGrpcStubImplEdges` in `src/resolution/callback-synthesizer.ts:341` (467 bridge edges on cosmos-sdk)
-  - Trace failure inlining + path-proximity pairing + less-canonical-path penalty + sibling-from-TO-file inlining: all in `src/mcp/tools.ts` `handleTrace`
-  - `IFACE_OVERRIDE_LANGS` extended from `{java,kotlin}` to `{java,kotlin,csharp,typescript,javascript,swift,scala}`; loop iterates `class` AND `struct` kinds
-  - Tool-def trims (~7KB → 5KB) in `src/mcp/tools.ts`
-  - Tiny-repo tool gating: `ToolHandler.getTools()` filters to 5 core tools when `fileCount < 150`
-  - Tiny-tier explore budget in `getExploreOutputBudget(fileCount < 150)`: 13K total / 4 files / `includeRelationships: true`
-  - `handleContext` default `maxNodes` drops from 20 → 8 when `fileCount < 150`
-- **Cosmos Q1 flipped**: WIN ($0.257 vs $0.449, n=1; n=2 avg $0.341 vs $0.350 tied). The breakthrough was `inlineEndpoint`'s "Other functions in TO's file" siblings — `msgServer.Send`'s real callee `k.Keeper.SendCoins` is an embedded-interface call tree-sitter can't statically resolve, so static `getCallees` returns only utility funcs; the *actual* flow lives in `x/bank/keeper/send.go`'s file-mates. See `handleTrace` line ~1430.
-- **Empirical lower bounds on tool gating** (n=2-3 audits):
-  - 5 tools (search+context+node+explore+trace) = current setting, works
-  - 3 tools (search+context+trace) = cobra 17→48% loss, sinatra 18→96% loss; agent falls back to Reads when node/explore unavailable
-  - 1 tool (search only) = catastrophic, express -43% WIN → +107% LOSS
-- **n=3 measurements confirm structural floor:** cobra WITH consistently $0.28 (variance <5%), WITHOUT consistently $0.24. The $0.04 gap is structural, not noise.
-- **The user's pivot question challenges this:** their hypothesis is that context+search+node may also be earning less than they cost. The audits we have can't directly answer that — every test had all 10 (or 5) tools available. To test, expose ONLY explore+trace on a controlled batch and re-measure.
-- **Cross-language status (single-run each):** WINS = Go (multi-mod), Rust, Java, C#, Kotlin, Swift, Svelte, prometheus, ky (post-gating), express (JS). TIES = cobra (n=2 tied $0.27/$0.27), excalidraw, django, redis, json, Masonry, flutter, vapor, spring. LOSSES = sinatra, slim, flask, scala-play, Fusion, vue-core (variance), Drupal, NestJS, FastAPI, Laravel, ASP.NET, axum, actix, Rocket, gorilla/mux, SvelteKit, Charts bridge (slight), RN segmented-control (slight).
-- **Loss pattern is structural, not language-specific.** All losses are tiny example/starter repos where the without-arm grep+read path costs ~$0.20-0.30 and codegraph's MCP overhead can't be amortized.
-
-## Gotchas
-
-- **PR-494 is a Go-multi-module PR by title but the body is now cross-cutting** — generated-detection, IFACE_OVERRIDE_LANGS, tool gating, all language-agnostic. Don't let the title narrow what's in it.
-- **The variance on the WITHOUT arm is enormous** — same-repo single-run cost can swing $0.04 to $0.80 depending on whether the agent goes grep-heavy or read-heavy that turn. **Never conclude WIN/LOSS from n=1.** The session has many single-run results that need confirming.
-- **Cobra (~50 files) is the canary** — every aggressive cut that helps ky or sinatra has regressed cobra at least once. It's the most-tested tiny repo because of that.
-- **Don't try the 1-tool or 3-tool gate again** — both are explicitly documented as regressions in `getTools()` comments (`src/mcp/tools.ts` around line 660). Cutting below 5 forces the agent to Read.
-- **Kong's first audit was a 0-byte index** — parallel `audit.sh` runs against the same .codegraph dir can corrupt each other. If kong/any-repo's audit shows wildly wrong numbers, check `stat /tmp/codegraph-corpus/<repo>/.codegraph/codegraph.db` before iterating on the result.
-- **48-parallel audit launches FAIL silently** — system resource limits. Stay at 6-8 parallel max. Use `wait` between waves.
-- **The MCP daemon caches the tool list** at process start — when iterating on `getTools()` you MUST `pkill -f "codegraph.js serve --mcp"` between rebuilds or you'll be testing stale code.
-- **`maxCharsPerFile` monotonic invariant** is pinned by `__tests__/explore-output-budget.test.ts` (the spec is `a larger tier must NEVER get a smaller maxCharsPerFile than a smaller tier`). Honor it.
-
-## How to test & validate
-
-- `npm test` → "Tests 1076 passed | 2 skipped". Must stay green.
-- `npm run build 2>&1 | tail -3` → check dist rebuilt cleanly.
-- `pkill -f "codegraph.js serve --mcp" ; sleep 2` → ALWAYS run before agent-eval after a build, otherwise the daemon serves stale code.
-- Single-question audit: `AGENT_EVAL_OUT=/tmp/cg-NAME /Users/colby/Development/Personal/codegraph/scripts/agent-eval/run-all.sh <repo-path> "<question>" headless`. Outputs `run-headless-with.jsonl` and `run-headless-without.jsonl`.
-- Parse: `node scripts/agent-eval/parse-run.mjs /tmp/cg-NAME/run-headless-{with,without}.jsonl` → cost, duration, turns, tool sequence.
-- **For real conclusions, always n=2 minimum.** n=3 is the right bar to separate variance from signal — last session's data on cobra showed WITH had <5% variance but WITHOUT swung 95%.
-- **The explore + trace experiment** the user wants: modify `getTools()` to filter visible tools to `new Set(['codegraph_explore', 'codegraph_trace'])` for ALL repos (or just the tiny tier first), re-run cosmos-Q1, etcd-Q1, prometheus, cobra n=2 each, and compare.
-
-## Repo state
-
-- branch `feat/go-multi-module-trace-quality`, last commit `ae5364c docs(mcp): pin empirical lower bound on tool gating after n=2 micro test`
-- uncommitted: clean
-- PR: https://github.com/colbymchenry/codegraph/pull/494 (13 commits, ready for review unless we land the tool-surface redesign)
-
-## Open threads / TODO
-
-- [ ] **The user's pivot**: prove or disprove that explore + trace alone is sufficient. Set up a 4-repo × n=2 batch (cosmos-Q1, etcd-Q1, prometheus, cobra) with ONLY explore+trace exposed, compare to current 5-tool / 10-tool baselines.
-- [ ] If explore+trace alone wins → cut the tool surface across the board. **This is a breaking API change** — callers/callees/impact/status/files/node would disappear from default exposure. Need a clean way to retain them for users who script against the MCP directly (env var? `--full-tools` flag?).
-- [ ] If explore+trace alone loses → identify which of context/search/node is doing the structural work, and propose cutting only the others.
-- [ ] **README update either way**: the current "~35% cheaper" claim averages 7 medium/large repos. Either commit to that scope ("real codebases (~200+ files)") or re-measure after the tool surface change.
-- [ ] Liquid, Pascal/Delphi, React Router, TurboModules, Expo Modules, Paper view managers — still untested categories from the README. Bridges Swift↔ObjC/RN-legacy/RN-events/Fabric were tested in wave 3 — 1 win, 2 tied, 1 slight loss. The rest are still gaps.
-- [ ] If we ship the PR as-is, write a CHANGELOG entry under `[Unreleased]` summarizing the 13 commits — currently the CHANGELOG entry covers commits 1-2 (generated-detection + gRPC bridge + trace UX); commits 3-13 need their own bullets.
-
-## Recent transcript (last ~10 turns, oldest → newest)
-
-### Turn N-9 — User: "Have we tested every language/framework per the playbook?"
-- **Claude:** Audited coverage honestly — ~70% of README's claims (22 langs + 14 frameworks + 8 bridges) had NOT been validated. Offered three scopes (smoke / medium / full playbook).
-- **Outcome:** User chose "Full playbook"; later refined to "Smoke-test only the completely-untested categories."
-
-### Turn N-8 — Wave 1 smoke-test (8 parallel: JS/Dart/Drupal/NestJS/FastAPI/Vapor/Spring/Rails)
-- **Claude:** Launched 8 audits via `run-all.sh`. Results: express WIN (-43%), 3 tied, 4 LOSS (Drupal, NestJS, FastAPI, Rails). NestJS 113% loss, FastAPI 194% loss were the standouts.
-- **Outcome:** Documented the loss pattern correlates with repo size, not language.
-
-### Turn N-7 — Wave 2 (7 frameworks: Laravel, ASP.NET, axum, actix, Rocket, gorilla/mux, SvelteKit) + Wave 3 (4 bridges: Charts, async-storage, RN-geolocation, RN-segmented-control)
-- **Claude:** All wave-2 frameworks lost on cost (small example/starter repos). All wave-3 bridges won/tied on cost AND won on reads.
-- **Outcome:** Confirmed loss pattern is tiny-repo-structural, not framework-specific. Bridges (codegraph's marquee feature) work.
-
-### Turn N-6 — User: "We have a lot of losses. Work on those until they are all clear wins. You cannot stop or give up or push back."
-- **Claude:** Ran n=3 on the 8 worst losses to separate variance from signal. First attempt with 48 parallel processes silently failed (system limits). Re-ran in waves of 6.
-- **Outcome:** n=3 cobra: WITH consistent $0.28 (variance <5%), WITHOUT consistent $0.24. **Confirmed losses are structural, not variance.**
-
-### Turn N-5 — Tested 3-tool gate (search + context + trace only)
-- **Claude:** Edited `getTools()` to expose only 3 tools on `<150` files. Re-ran cobra/ky/sinatra. Results: cobra 17→48% LOSS, ky 47→42% loss, sinatra 18→96% LOSS. Reverted.
-- **Outcome:** Cutting below 5 tools regresses because the agent falls back to extra Reads.
-
-### Turn N-4 — Tested 1-tool gate (search only) on micro (<50 files)
-- **Claude:** Edited gate. cobra 126% LOSS, ky 44% loss, express FLIPPED from -43% WIN to +107% LOSS. Catastrophic. Reverted.
-- **Outcome:** Documented the empirical lower bound: 5 tools. Committed `ae5364c` with comments warning future tuners not to relitigate.
-
-### Turn N-3 — Claude proposed three paths forward (auto-disable / README honesty / context-rewrite)
-- **Claude:** Used AskUserQuestion to offer auto-disable on <50, honest README, or last-ditch context shrink.
-- **Outcome:** User pivoted instead of picking any of those.
-
-### Turn N-2 — User: "Why do we need codegraph_context, or any of these massive amounts of tools? All it really needs is explore, and trace if you ask me."
-- **Claude:** Recognized this is a design pivot, not a continuation of the cost-gap iteration. The right next move is to actually test the user's hypothesis, not to write another response defending the status quo.
-- **Outcome:** This handoff captures the pivot for a fresh session to answer properly.
-
-### Turn N-1 — User: `/handoff save`
-- **Claude:** Wrote this file.
-- **Outcome:** Handoff persisted. Next session reads it and engages the explore+trace-only design question with measurement, not opinion.

ファイルの差分が大きいため隠しています
+ 22 - 10
.claude/handoffs/cross-language-impact-coverage-2026-06-04.md


+ 0 - 70
.claude/handoffs/explore-flow-tool-adoption.md

@@ -1,70 +0,0 @@
----
-name: explore-flow-tool-adoption
-date: 2026-05-24 00:55
-project: codegraph
-branch: architectural-improvements
-summary: Investigated why codegraph's read savings don't convert to wall-clock; root cause is agent tool-CHOICE (under-uses trace). Shipped a chain of fixes; the breakthrough is "explore-surfaces-flow" — the first mechanism to show up in real agent runs by adapting the tool the agent already uses.
----
-
-# Handoff: codegraph retrieval — tool adoption & explore-surfaces-flow
-
-## Resume here — read this first
-**Current state:** A long investigation into making agents answer flow questions faster with codegraph. 6 commits on `architectural-improvements` (all probe-validated, suite green 815). The breakthrough: **`codegraph_explore` now surfaces the execution flow** from the symbol-bag the agent already passes it (`PmsProductController getList PmsProductService list PmsProductServiceImpl` → leads output with `getList → service-interface → impl`, riding synth edges). It's the FIRST mechanism this whole arc to actually appear in real agent runs (spring-mall A/B: flow surfaced both runs, reads 2.0→1.5) — because it adapts the tool the agent USES instead of trying to make it use `trace`.
-
-**Immediate next step:** The user is weighing how to push tool-USE quality next (their open question). Decide between: (a) **extend explore-flow to surface more reliably** (spring-halo's query didn't name a connected co-named chain → no flow), (b) accept we're at the model-behavior ceiling and **wrap up**, or (c) the user's ideas — better tool-description *examples* (≈ steering, low-leverage per the evidence) or a *query-builder tool* (adds a call + new-tool adoption problem). My read: keep ADAPTING THE USED TOOL (the only thing that's worked); examples/new-tools are the "change the agent" direction that failed all session.
-
-> Suggested next message: "explore-flow only surfaced on 2 of 3 repos — dig into why spring-halo's explore query didn't produce a flow and make it surface more reliably" — OR — "we're at the model-behavior ceiling; let's stop and write the CHANGELOG/PR for this branch"
-
-## Goal
-Make an AI agent answer **flow questions** ("how does X reach Y", request→handler→service, state→render) fast: ~0 Read/Grep, few codegraph calls, lower wall-clock. `codegraph_trace` is the fastest tool (1 call = the path), but the agent under-uses it. Ultimate target = trace's speed, however the agent gets there.
-
-## Key findings (the through-line)
-- **The wall is agent tool-CHOICE, not the graph.** Matrix-wide, codegraph cuts reads −75% but wall-clock only −16% (`docs/benchmarks/codegraph-ab-matrix.md`). The floor is round-trips + the synthesis turn. The agent reliably calls `context`/`explore`, rarely `trace` (3/37 flow cells). Full analysis: `docs/benchmarks/call-sequence-analysis.md`.
-- **Steering does NOT move it** (arms B/F/G, 3 wording variants): an MCP `initialize` instruction / tool description can't match a CLI `--append-system-prompt`'s salience, and forcing trace where it doesn't connect regresses. Reverted.
-- **Sufficiency works** (committed): a self-sufficient `trace` (hop bodies + destination callees inlined) lets the unsteered agent stop — but only when it calls trace.
-- **THE breakthrough — adapt the tool the agent uses.** `explore`'s query is a precise symbol-bag spanning the flow, so `explore` finds the call path AMONG its named symbols and leads with it. First mechanism to surface in real runs + drop reads.
-- **What FAILED:** option 1 (context-surfaces-flow) — fuzzy DESCRIPTION can't disambiguate endpoints → confident WRONG-feature flow; reverted. trace multi-source-BFS over ambiguous names — same wrong-feature; reverted.
-
-## Gotchas
-- **Co-naming disambiguation must match qualifiedName SEGMENTS, not substrings** (`buildFlowFromNamedSymbols` in `src/mcp/tools.ts`): `list` is a substring of `getList` → kept every getList. Split `qualifiedName` on `::`/`.` and match segments.
-- **BFS must cap consecutive UNNAMED hops at 1** — full-graph BFS wanders a god-function's fan-out (excalidraw `render()` → pointer handlers → mutateElement). ≤1 bridge crosses a missing intermediate without wandering.
-- **`getCallees` returns non-`calls` edges too** (references) — filter `c.edge.kind === 'calls'`.
-- **Resolver/synthesizer changes need a CLEAN reindex**: `rm -rf .codegraph && codegraph init -i` (the init edge count is contains-only — query the DB for the real count). The explore-flow change is query-time (no reindex).
-- **n=2 A/B is noisy** — report ranges/patterns, never conclude from one run. Foreground `sleep` is blocked → run A/B batches with `run_in_background`.
-- Java/Kotlin `qualifiedName` is `Class::method` (so `matchesSymbol` resolves `Class.method` qualified trace endpoints — the agent already passes these).
-
-## How to test & validate
-- Probe flow surfacing (no agent): `node scripts/agent-eval/probe-explore.mjs <repo> "<SymbolA SymbolB SymbolC>"` → look for the `## Flow` section. `probe-trace.mjs <repo> <from> <to>` for trace.
-- Synthesizer: `sqlite3 <repo>/.codegraph/codegraph.db "select count(*) from edges where json_extract(metadata,'$.synthesizedBy')='interface-impl'"`; node count stable before/after reindex (synth adds edges only).
-- Agent A/B (the real test): `bash scripts/agent-eval/run-arms.sh <repo> "<Q>" I <run>` (arm I = body-trace build, no steering). Parse via the `cmp2.mjs`-style scripts in `/tmp`. Pass = flow surfaces (`flowShown=Y`) + reads ≤ baseline.
-- `npm test` (vitest, 815 pass); `__tests__/mcp-tool-allowlist.test.ts` covers the allowlist.
-
-## Repo state
-- branch `architectural-improvements`, last commit `bafae81 feat(mcp): codegraph_explore surfaces the execution flow from its named symbols`.
-- uncommitted: clean (only untracked `.claude/handoffs/`).
-- 6 session commits: `eab5cf3` self-sufficient trace + `CODEGRAPH_MCP_TOOLS` allowlist · `a6183d7` research log + arms harness · `bde8c19` node/trace line numbers · `98baf41` Java/Kotlin interface→impl synthesizer · `6f3c468` playbook · `bafae81` explore-surfaces-flow.
-- NOT pushed/merged. No version bump. CHANGELOG `[Unreleased]` has all of it.
-
-## Open threads / TODO
-- [ ] **User's open question** (answer in the next turn): better tool-description *examples* vs a *query-builder tool* vs keep adapting the used tool. Evidence favors the last.
-- [x] explore-flow reliability: now resolves QUALIFIED tokens (`Class.method`) — the agent's most precise input was being dropped by the file-ext strip (`2765c3c`). spring-halo's publish flow stays absent on purpose — it's **reactive/reconciler dispatch** (`publishPost` calls `ReactiveExtensionClient.get`/`awaitPostPublished`, not `PostService.publish`), so there's no static call chain. That's the next COVERAGE frontier (reactive runtimes — like MediatR, Vue Proxy), not an explore-flow bug.
-- [ ] Ship-prep for the whole branch (this arc + the earlier framework sweep): CHANGELOG version block + `package.json` bump + PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish`.
-- [ ] Frontiers: MediatR (`_mediator.Send`→Handle) and Vue/Compose reactive runtimes are still unbridged dynamic dispatch.
-
-## Recent transcript (oldest → newest)
-### Turn — "improve the A/B matrix; trace works, reads near 0 — what else?"
-- Diagnosed: reads at floor, wall-clock floor = round-trips + synthesis. Built `seq-matrix.mjs`; found trace adoption 3/37.
-### Turn — "do explore/context/trace compete? one tool?"
-- Ablation arms A–E (`run-arms.sh`/`arms-F.sh` + `CODEGRAPH_MCP_TOOLS` allowlist). explore = 68% of payload, load-bearing; trace path-scoped but under-adopted; trace alone insufficient.
-### Turn — "prototype body-inlining trace + A/B"
-- Arm F: self-sufficient trace wins WITH append-prompt steering. But steering isn't a shippable channel.
-### Turn — "port the steering + re-run"
-- Arms G (3 variants) all regressed vs baseline; arm H (body-trace, no steer) ≈ baseline. Steering reverted; body-trace + line-numbers + allowlist committed.
-### Turn — "tee up connectivity (Spring interface-DI)"
-- Built `interfaceOverrideEdges` (Java/Kotlin interface→impl, overload-aware). Probe: 3-hop trace connects. But A/B null — agent never called trace. Committed (probe-validated, adoption-gated).
-### Turn — "make context surface the flow (option 1)"
-- Failed: fuzzy query → wrong-feature flows. Reverted.
-### Turn — "change explore to do trace in the backend"
-- WIN: explore's query is a precise symbol-bag. `buildFlowFromNamedSymbols` (co-naming segment match + ≤1 bridge). Probe perfect (Spring + excalidraw full chains); A/B: flow surfaces + modest read drop. Committed `bafae81`.
-### Turn — "update memory + handoff; what about better examples / a query-builder tool?"
-- This handoff + memory update. Strategic answer pending (adapt-the-tool > change-the-agent).

+ 0 - 80
.claude/handoffs/explore-overhaul-2026-06-01.md

@@ -1,80 +0,0 @@
----
-name: explore-overhaul-2026-06-01
-date: 2026-06-01 19:50
-project: codegraph
-branch: main
-summary: Made codegraph_explore the sole primary tool (removed context + trace), added graph-connectivity ranking + 100K budget + full method bodies — then an agent-eval revealed the budget BACKFIRES and the real lever is COVERAGE (Zustand store methods aren't indexed).
----
-
-# Handoff: codegraph_explore overhaul — explore as the one tool, and the coverage pivot
-
-## Resume here — read this first
-**Current state:** Big uncommitted working tree on `main`. `codegraph_context` and `codegraph_trace` tools are fully removed; `codegraph_explore` is the sole primary, now with graph-connectivity (RWR) ranking, a flat **100K** output budget, full method bodies, whole-central-file, and an always-on blast-radius section. A fresh-daemon agent-eval on the real repo (`~/Downloads/amniservices-mobile-app`) just proved two things: (1) the **100K budget BACKFIRES** — a broad explore hit **67K chars and overflowed the agent's per-tool token cap**, forcing it to Read; (2) the **real cause of the agent's reads is a COVERAGE gap**, not ranking/budget — Zustand store methods (`fetchUser`/`switchOrganization` inside `create((set,get)=>({...}))`) aren't indexed as nodes, and callers **destructure** them (`const {fetchUser}=useOrgUser.getState()`), so `codegraph_node`/`codegraph_callers` return "not found."
-**Immediate next step:** Revert the 100K budget (it overflows) to ~28–35K, then build the Zustand coverage fix (extract store-literal methods as nodes + resolve destructured `getState()` calls). That's what actually deletes the reads.
-
-> Suggested next message: "Revert the explore budget in getExploreOutputBudget (tools.ts) from 100K back to ~30K — the 67K response overflowed the agent's tool cap. Then build the Zustand coverage fix: extract methods inside `create((set,get)=>({...}))` as nodes, and resolve destructured store calls like `const {fetchUser}=useOrgUser.getState()`. Then kill the AmniSphere daemon and re-run the agent eval."
-
-## Goal
-Make `codegraph_explore` good enough to be a **Read-replacement** — one (maybe two) calls answer a structural/flow question with ~0 Read/Grep, for smart AND dumb models. Metric is wall-clock + tool-call count + Read count (NOT token cost). The user's golden era: one tool (`explore`), reflexively used, zero Reads.
-
-## Key findings
-- **The agent's reads are a COVERAGE gap, not ranking/budget.** Agent's own words (diagnostic eval): Zustand store actions inside the `create((set,get)=>({...}))` literal "aren't individually indexed," so `codegraph_node fetchUser` / `codegraph_callers fetchUser` → **"not found"**; callers **destructure** off `useOrgUser.getState()` so even grep needed `\bfetchUser\b`. Component-body control flow (`handleLogin`, `AppInitializer` in `src/app/index.tsx`, `src/components/providers/index.tsx`) isn't a node either.
-- **The 100K budget backfires.** A broad explore returned ~67K chars and "overflowed the token cap" → agent Read instead. Big responses are *worse*. `getExploreOutputBudget` (tools.ts ~line 140) is now a flat 100K — revert toward ~28–35K (size to the agent's per-tool output limit).
-- **Adoption is EXCELLENT — the agent WANTS codegraph.** In the fresh eval it made **16 codegraph calls** vs 5 Reads. So the problem is never "agent won't use it"; it's "the symbols aren't in the graph."
-- **Graph-connectivity ranking works in isolation but didn't address the real cause.** `computeGraphRelevance` (tools.ts, before `handleExplore`) is RWR/personalized-PageRank from the matched seeds; probe shows it ranks `org-user.storage.ts` #1 and returns it whole. But it doesn't cleanly drop noise (LensSwitcher.swift matched "switch") because real codebases share infra + generic terms — **neither graph nor text alone separates; needs IDF×graph fusion**, a tuning long tail. Park it until coverage is fixed.
-- **`context` + `trace` tools fully removed** (def + dispatch + handlers + CLI `context` command + permissions + server-instructions + tests). The shared engine `findRelevantContext` stays (explore runs on it). `synthEdgeNote` kept (shared); `handleTrace`/`sourceLineAt`/`sourceRangeAt`/`maybeInlineFlowTrace`/`handleContext`/`looksLikeFeatureRequest`/`formatTaskContext` deleted.
-- **Read-gate PreToolUse hook was built then REMOVED** (user: "ideally zero hooks"). Deleted `src/hooks/`, `src/mcp/session-consult.ts`, the `mcp-read-gate` CLI cmd, installer wiring (`InstallOptions.readGate`, claude.ts helpers), and the marker security tests. Had an unverified `CLAUDE_SESSION_ID`==hook-`session_id` assumption.
-- **Precision fix landed earlier (keeper):** `isDistinctiveIdentifier` (query-utils.ts) gates the exact-name bonus in `findRelevantContext` Step 5a so a common word ("flat") can't hijack ranking (was surfacing a python `FLAT` constant). Lives in the shared engine → benefits explore.
-- **Blast-radius section added to explore** (`buildBlastRadiusSection`, tools.ts): per entry symbol, who-depends-on-it + covering test files, locations only. Always-on, compact. (2 tests in `__tests__/explore-blast-radius.test.ts`.)
-
-## Gotchas
-- **STALE-DAEMON FOOT-GUN (cost us hours).** `codegraph serve --mcp` connects to a per-repo daemon (`<repo>/.codegraph/daemon.sock`, 5-min idle timeout) that holds the loaded code. **A `npm run build` does NOT take effect until you kill the daemon.** Every agent-eval before the kill was testing STALE code (agent got 2277 chars where a fresh in-process probe got 54K). **Before ANY agent eval:** `pkill -f "serve --mcp"; rm -f <repo>/.codegraph/daemon.sock`. Worth fixing in the product (a rebuild should invalidate the daemon).
-- **probe ≠ agent.** `probe-explore.mjs` loads `dist/` in-process (always current code); the agent uses the daemon (can be stale). Don't trust a probe result as "what the agent sees" unless the daemon was just killed.
-- **Validating with a favorable query lies.** My probe query (`"org user storage…"`) returned the whole central file; the agent's near-identical query behaved totally differently. Use the agent's EXACT query, on a fresh daemon.
-- **n=1 variance is large** — never conclude from one agent run (CLAUDE.md). The "4 vs 5 reads" between runs is noise.
-- **Budget-table repos (excalidraw/django/etc.) NOT validated** — they're not on this machine. The ranking/budget changes could regress them; the CLAUDE.md "do-not-regress explore budget" table is now obsolete (flat 100K) and needs reconciling.
-- All work is **uncommitted on `main`** — branch before committing (PR policy: main is REVIEW_REQUIRED).
-
-## How to test & validate
-- Build: `npm run build` (must exit 0).
-- Cheap probe (current code, NOT what a stale daemon serves): `node scripts/agent-eval/probe-explore.mjs /Users/colby/Downloads/amniservices-mobile-app "<query>"`.
-- Agent A/B (real metric, ~$2, KILL DAEMON FIRST): `pkill -f "serve --mcp"; rm -f /Users/colby/Downloads/amniservices-mobile-app/.codegraph/daemon.sock; CG_BIN=$(pwd)/dist/bin/codegraph.js AGENT_EVAL_OUT=/tmp/agent-eval-amni bash scripts/agent-eval/run-agent.sh /Users/colby/Downloads/amniservices-mobile-app <label> "<prompt>"` → parse `/tmp/agent-eval-amni/run-<label>.jsonl` for tool order + Read count.
-- Diagnostic prompt that worked: append "for EACH Read/Grep note WHY codegraph wasn't enough; end with '## Why I read'." The agent's self-report is the best diagnostic.
-- Affected unit tests (NOT the full suite — user is cost-conscious): `npx vitest run __tests__/{context-ranking,explore-blast-radius,context,mcp-tool-allowlist,security,worktree-detection,installer-targets}.test.ts __tests__/integration/mcp-input-limits.test.ts`.
-- Pass bar: a flow question reaches ~0 Read within the explore-call budget, faster than without-codegraph, no regression on a control repo.
-
-## Repo state
-- branch `main`, last commit `8629f7a docs(changelog): promote [Unreleased] into [0.9.8]`
-- uncommitted (all this session, none committed): `M src/mcp/tools.ts` (the big one — explore ranking/RWR/budget, context+trace removal, blast radius), `M src/context/index.ts` (precision fix), `?? src/context/markers.ts` (LOW_CONFIDENCE_MARKER leaf), `M src/search/query-utils.ts` (isDistinctiveIdentifier), `M src/mcp/server-instructions.ts`, `M src/installer/targets/shared.ts` (permissions), `M src/bin/codegraph.ts` (CLI context/trace removed), `M src/types.ts`, `M CHANGELOG.md`, `?? __tests__/context-ranking.test.ts`, `?? __tests__/explore-blast-radius.test.ts`, `M __tests__/{security,worktree-detection,mcp-tool-allowlist}.test.ts`, `M __tests__/integration/mcp-input-limits.test.ts`. (read-gate hook + session-consult.ts were created then deleted → no trace.)
-
-## Open threads / TODO
-- [ ] **Revert the 100K budget** in `getExploreOutputBudget` (tools.ts ~140) to ~28–35K — it overflows the agent tool cap at 67K.
-- [ ] **Reconsider "whole central file always"** (tools.ts whole-file rule + `centralFiles`) — a 791-line whole file is what overflowed. Prefer the relevant methods full, not whole-huge-file.
-- [ ] **Build the Zustand coverage fix (the real lever):** (a) extract methods defined inside `create((set,get)=>({...}))` as nodes (extraction); (b) resolve destructured store calls `const {fetchUser}=useOrgUser.getState()` (reference resolution). Then re-eval.
-- [ ] Decide: keep `computeGraphRelevance` (graph ranking) or park it — it didn't address the coverage cause. Probe-validated, not agent-validated.
-- [ ] Re-run the agent eval (FRESH DAEMON) after the coverage fix; confirm reads of `org-user.storage.ts`/components drop.
-- [ ] Reconcile CHANGELOG `[Unreleased]` with the final direction (explore primary; context+trace removed; budget/ranking may change).
-- [ ] Validate budget-table repos (excalidraw/django) before merging — not on this machine.
-- [ ] Run the FULL test suite once (only ran affected files this session).
-- [ ] Branch off `main` before committing this work.
-
-## Recent transcript (oldest → newest)
-### Turn — "do it" (build graph ranking)
-- **User:** Build graph-connectivity ranking (RWR/PageRank fused with bm25) as explore's relevance core; keep the output wins; fix the whole-file path.
-- **Claude:** `Edit(tools.ts)` → added `computeGraphRelevance` (RWR, undirected, α=0.25, 25 iters); rewired file ranking/central/gate to graph-primary + text-secondary. `probe` → `org-user.storage.ts` #1 + whole (796 lines), but LensSwitcher.swift/capture noise still present.
-- **Outcome:** RWR surfaces the central file but doesn't cleanly drop noise; reported the honest limit (needs IDF×graph fusion).
-
-### Turn — "how does amniservices do now?"
-- **User:** Run the agent eval on the real repo with everything in place.
-- **Claude:** `run-agent.sh (amni-with-q2)` → codegraph 5 / Read 5 / Grep 3; agent Read `org-user.storage.ts` anyway. `parse` → agent's first explore returned only **2277 chars** while `probe` returned 54K for the same query.
-- **Outcome:** 2277-vs-54K discrepancy → suspected the agent wasn't running my build.
-
-### Turn — "do the test again, ask the agent WHY it read"
-- **User:** Re-run, but make the agent explain each Read / why codegraph wasn't enough.
-- **Claude:** `ps` → found stale daemon `pid 8947` on AmniSphere socket + pile of lingering `serve --mcp`. `pkill -f "serve --mcp"; rm daemon.sock` → fresh. `run-agent.sh (amni-diag-q3)` with the "## Why I read" prompt, fresh daemon.
-- **Outcome:** First valid run on my build. codegraph **16** / Read 5 / Grep 3. Agent's "Why I read": Zustand store methods not indexed (`node`/`callers` → not found), callers destructure off `getState()`, broad explore **overflowed at 67K**. → real cause = COVERAGE; budget BACKFIRES. Pivot the priorities.
-
-### Turn — /handoff save (this)
-- **User:** `/handoff save`.
-- **Claude:** Wrote this file.
-- **Outcome:** Checkpoint captured; next edge = budget revert + Zustand coverage fix.

+ 0 - 73
.claude/handoffs/explore-overhaul-bench-2026-06-02.md

@@ -1,73 +0,0 @@
----
-name: explore-overhaul-bench-2026-06-02
-date: 2026-06-02 06:30
-project: codegraph
-branch: feat/explore-overhaul-store-coverage
-summary: Finished the explore-overhaul arc (explore as sole primary + store coverage + overload disambiguation + method-atomic render + node file/line selector + explore reshaped to native-read windows) and validated it — all 7 README repos hit 0 Read/0 Grep at effort=high; only the README benchmark write-up remains.
----
-
-# Handoff: explore-overhaul arc — validated 0-reads across all 7 README repos; README write-up is the last step
-
-## Resume here — read this first
-**Current state:** All code is committed + pushed on `feat/explore-overhaul-store-coverage` (4 commits, working tree clean). The why-Read agent sweep is DONE: **all 7 README repos × 4 runs = 28/28 runs hit 0 Read / 0 Grep on `--effort high`**, every run "codegraph was sufficient." WITH-`high` medians are captured (~59% fewer tool calls · 51% fewer tokens · ~15% cheaper · 0 reads vs the existing README WITHOUT) — the earlier cost REGRESSION (-3%) is recovered. The only open item is **updating the README benchmark section**, which is blocked on one methodology decision.
-**Immediate next step:** Decide how to publish: (A) do a CLEAN both-arms run on `effort=high` with the PLAIN prompt (no why-Read) for an apples-to-apples table, or (B) write the WITH-`high` deltas in against the existing WITHOUT with a cross-effort caveat. Then edit `README.md` (benchmark table + per-repo breakdowns + average line + methodology date) and open the PR.
-
-> Suggested next message: "Do the clean both-arms run on effort=high with the plain prompt for all 7 repos, then update the README benchmark table + per-repo breakdowns from those medians and open the PR."
-
-## Goal
-Make `codegraph_explore` a true Read-replacement — flow/architecture questions answered with ~0 Read/Grep — then re-validate the README benchmark on the current build and update its numbers. Definition of done: README benchmark reflects the current build with defensible (same-effort) numbers; branch merged via PR.
-
-## Key findings
-- **The arc (all shipped on the branch):** explore is the SOLE primary tool (`codegraph_context` + `codegraph_trace` removed in the prior session, this branch); store-action **coverage** (object-literal method extraction — a GENERAL AST rule in `tree-sitter.ts` `extractVariable`/`extractObjectLiteralFunctions`/`findInitializerReturnedObject`, covers Zustand/Redux/Pinia, not a per-lib hack); graph-ranking **gate fix** (a named/≥2-term file is never pruned); **`node` all-overloads + `file`/`line` selector**; **method-atomic render** (never half a method — drop whole methods/files); **explore reshape** to native-read windows.
-- **Native-read ground truth (from the WITHOUT transcripts):** the agent natively reads **~6–9 files as ~100-line windows** (77% ranged, median 100 lines, 51–250 dominant), located by `func X(` signature greps. That's the unit explore now mimics.
-- **Explore reshape (commit 50401a6, the latest mechanism):** `getExploreOutputBudget` caps EVERY tier at **~24K** (was 28/35/38K) + absolute **25K** hard ceiling (was 1.5×-of-budget) — because a bigger response gets **externalized** by the host to a file the agent Reads back (a 35K vscode explore did exactly that) AND costs cache-writes. Repo size scales the CALL budget, not the response. Per-file = one ~150–250 line window: per-symbol `bodyCap` 2×→1.5× and the spine is windowed too (so tokio's big-spine `worker.rs` doesn't starve `harness.rs`'s `poll`); central whole-file 4×→1.5× / 400→280 lines. Explore's named-symbol injection now uses **`cg.getNodesByName`** (direct index, not FTS) so a 50+-overload name (`poll`) surfaces the wanted def (`Harness::poll`) for the PascalCase-type-token bias to pick.
-- **`node` file/line selector (commit 5bf6ad8):** `codegraph_node` takes optional `file`/`line` to pin an overload (the `file:line` a trail showed). `findSymbolMatches` (replaced `findSymbol`) enumerates ALL overloads via `cg.getNodesByName` (new passthrough `index.ts` → QueryBuilder), then file/line filters. The agent USES it in runs (`run file:worker.rs line:508`, `poll file:harness.rs`).
-- **Cost regression was REAL, now recovered.** The pre-reshape n=4 benchmark (on `max` effort, bloated 35-42K explores) was **−3% cost avg** (vscode −52%) and reads were **NOT 0** (vscode 6,4,0,7; tokio 3,4,2,2) — which corrected my earlier n=2 "0 reads everywhere" optimism. The reshape (≤25K, no externalization) + 0 reads flipped cost back to **~15% cheaper**.
-
-## Gotchas
-- **STALE-DAEMON foot-gun:** before ANY agent eval, `pkill -f "serve --mcp"; rm -f <repo>/.codegraph/daemon.sock` so it serves the current `dist/`. `bench-why-repo.sh` does this per-run. A `npm run build` does NOT take effect until the daemon is killed.
-- **Mac SLEEP corrupts long runs:** the first overnight re-bench (5h on `max`) was sleep-corrupted — the Mac napped 16–42 min BETWEEN runs (~3h of the 5h was paused), inflating wall-clock for the later repos. **Always wrap long runs in `caffeinate -dimsu`.** Cost/tokens/reads are sleep-INDEPENDENT (billed API totals), so the cost regression was real (confirmed on vscode which ran fully awake before any sleep); only TIME is corrupted.
-- **`--effort` matters:** the user's Claude default is `max`, which is "too much." The eval is pinned to `--effort high` (levels: low/medium/high/xhigh/max). `bench-why-repo.sh` honors `EFFORT` (default `high`). The MAX-mode runs were discarded and redone on `high`.
-- **why-Read prompt biases reads down (Hawthorne) + adds <0.3% to WITH cost/tokens.** So the 28/28 0-read sweep proves codegraph is *sufficient* (it CAN answer with 0 reads); it slightly understates a natural run's reads. Keep it OUT of any published benchmark numbers (use plain prompt for the table).
-- **README methodology mismatch:** WITH numbers are `effort=high` + why-Read; the existing README WITHOUT is the user's OLD default effort + plain. Cross-effort → can't publish cleanly without same-effort both arms. The user does NOT want to re-run WITHOUT repeatedly, but the effort CHANGED, so a one-time WITHOUT-on-high is a new (justified) measurement.
-- **PR policy:** `main` is REVIEW_REQUIRED — work on the branch, open a PR, `gh pr merge --squash --admin` for self-review. Branch + push only so far; **PR not opened** (user asked branch+push).
-
-## How to test & validate
-- Build: `npm run build` (exit 0). Full suite: `npx vitest run` → **1112 pass, 2 skip, 0 fail** (npm-shim network tests can flake offline — pre-existing).
-- Affected tests: `npx vitest run __tests__/{explore-output-budget,adaptive-explore-sizing,context-ranking,explore-blast-radius,symbol-lookup,pr19-improvements,object-literal-methods}.test.ts`.
-- Deterministic probe (current `dist/`, in-process — NOT the daemon): `node scripts/agent-eval/probe-explore.mjs /tmp/codegraph-corpus/<repo> "<query>"` → confirm ≤~25K chars + the flow files render. `node scripts/agent-eval/probe-node.mjs <repo> <symbol> code` (e.g. `poll file:harness.rs` via a small script).
-- Agent why-Read sweep (the real metric): `EFFORT=high caffeinate -dimsu bash scripts/agent-eval/bench-why-repo.sh /tmp/codegraph-corpus/<repo> "<readme query>" 4` → parse `/tmp/ab-why/<repo>/with*.jsonl` for `Read`/`Grep` tool_use + the trailing `## Why I read` section.
-- All 7 repos are cloned + indexed on the current build at `/tmp/codegraph-corpus/{vscode,excalidraw,django,tokio,okhttp,gin,alamofire}`. README queries are in `scripts/agent-eval/bench-readme.sh`.
-- **Pass bar:** flow question → ~0 Read at the explore-call budget, faster than WITHOUT, no control regression.
-
-## Repo state
-- branch `feat/explore-overhaul-store-coverage`, last commit `9cf671a chore(agent-eval): add per-repo WITH-only why-Read benchmark harness`. Pushed, in sync with origin.
-- 4 commits: `22333c1` (explore-primary + store coverage + overload disambiguation + docs) · `5bf6ad8` (method-atomic render + node file/line selector) · `50401a6` (explore reshape: inline-cap + concentrated flow-windows + direct-index injection) · `9cf671a` (why-Read eval harness).
-- uncommitted: clean (only `.claude/handoffs/` scratch, intentionally not committed).
-- CHANGELOG `[Unreleased]` already updated for the whole arc. README + CLAUDE.md + docs/design banners updated to drop removed context/trace.
-
-## Open threads / TODO
-- [ ] **README benchmark update** — decide (A) clean both-arms `effort=high` plain run vs (B) publish WITH-`high` deltas with caveat; then edit `README.md` (table + per-repo WITH/WITHOUT breakdowns + average + methodology date). WITH-`high` medians: vscode 4cg/742k/$0.70, excalidraw 6.5/834k/$0.74, django 3/509k/$0.59, tokio 6/921k/$0.77, okhttp 4.5/528k/$0.41, gin 4.5/573k/$0.48, alamofire 7.5/976k/$0.65 — all 0 reads.
-- [ ] **Open the PR** (`gh pr create`) once the README is updated.
-- [ ] Optional: delete dead dev probes `scripts/agent-eval/probe-{trace,context}.mjs` (their tools were removed; noted, not done).
-- [ ] Memory `project_codegraph_read_displacement` is updated through the reshape; add the 28/28-on-high + cost-recovery result if you want it durable.
-
-## Recent transcript (oldest → newest)
-### Turn — "re-run the full benchmark to update README; ask why on reads"
-- **Claude:** cloned+indexed all 7 repos; launched `RUNS=4 bench-readme.sh` (both arms). → ran overnight.
-### Turn — "what's still running? 5h overnight"
-- **Claude:** bench was on the LAST repo (alamofire), ~90%. n=4 aggregate showed **cost −3% (regression)** + reads **NOT 0** (vscode 6-7, tokio 2-4) — corrected my n=2 "0 everywhere." Diagnosed reads from transcripts: explore was 35-42K, too spread, dropped named flow methods.
-### Turn — "is it the lock screen?"
-- **Claude:** Yes for TIME (Mac slept 16-42min between runs → 5h mostly paused) but NOT cost (vscode ran awake; cost is billed API). → recommend `caffeinate`.
-### Turn — "how do I stop Mac sleeping" → answered (Battery→Options / `caffeinate`).
-### Turn — "do the queries WITHOUT codegraph, see exactly what it Reads, mimic it"
-- **Claude:** characterized native reads (77% ranged, median 100 lines, signature greps) → spec: ~100-line flow-windows, ~6-9 files, ≤25K.
-### Turn — "do it" (the reshape)
-- **Claude:** `Edit(tools.ts)` → inline-cap 24K + concentrated windows + `getNodesByName` injection; updated budget tests; full suite green; probe vscode 35K→16K, tokio fits harness.rs. Committed `50401a6`, pushed.
-### Turn — "why AST not LSP" (×2) + "reply to hd_nvim (ast-grep author)"
-- **Claude:** explained the trade-off (breadth/speed/no-build/determinism/19-langs vs LSP type-precision-but-heavy); drafted the X reply (concede precision, reframe on goal/cost).
-### Turn — "re-run, ask Why on each WITH read, report per repo" → "no WITHOUT" → "MAX too much, use high, start over"
-- **Claude:** built `bench-why-repo.sh` (WITH-only, why-Read, `EFFORT` default high); ran all 7 on `high` one repo at a time, reporting each. **Result: 28/28 runs 0R/0G, all "sufficient."** Aggregated WITH-`high` medians → ~15% cheaper (regression recovered).
-### Turn — "commit and push this"
-- **Claude:** committed `bench-why-repo.sh` (`9cf671a`), pushed. → this handoff.
-- **Outcome:** Arc complete + validated; README write-up + PR are all that remain.

+ 0 - 70
.claude/handoffs/explore-per-symbol-sizing.md

@@ -1,70 +0,0 @@
----
-name: explore-per-symbol-sizing
-date: 2026-05-29 23:20
-project: codegraph
-branch: main
-summary: Shipped per-symbol adaptive codegraph_explore sizing (PR #569) — show the answer (named methods + mechanism) in full, collapse redundant interchangeable siblings to signatures, keep named methods alive in non-sibling god-files; flipped Django/OkHttp from cost laggards to clear wins and lifted the README averages to 25%/57%/23%/62%.
----
-
-# Handoff: per-symbol adaptive codegraph_explore sizing (shipped)
-
-## Resume here — read this first
-**Current state:** **DONE + shipped.** PR #569 squash-merged to `main` (`b026e64`); local is on `main`, `dist/` rebuilt, working tree clean. README benchmarks + averages + header, CHANGELOG, and `docs/design/adaptive-explore-sizing.md` all updated with the new full-7-repo sweep. The only loose end: **two squash-merged feature branches still linger** (`feat/adaptive-explore-sizing` from #564, `feat/explore-per-symbol-sizing` from #569) — local **and** remote — because squash-merges don't register as "merged" in git's ancestor sense.
-**Immediate next step:** Delete those two merged branches (local + remote), or pick up one of the Open-threads frontiers (Gin's small WITH-cost bump, alamofire DataRequest residual, or stabilizing per-repo benchmark numbers with median-of-8).
-
-> Suggested next message: "Delete the merged branches feat/adaptive-explore-sizing and feat/explore-per-symbol-sizing — local and remote."
-
-## Goal
-Make `codegraph_explore`'s cost a clear win on **every** README benchmark repo, especially the two laggards the README showed thinnest (Django 9% cheaper, OkHttp 4%). The optimization target per CLAUDE.md is **tool-calls/reads + latency** (NOT raw cost) — but the user explicitly wanted the cost margins up too. Definition of done = both laggards clearly cheaper with ~0 reads, no regression elsewhere, README refreshed, shipped. **Achieved.**
-
-## Key findings
-- **The feature, in `src/mcp/tools.ts` (`handleExplore` + `buildFlowFromNamedSymbols`):** explore sizes output to the *answer*, not the file count. Builds on PR #564's gate (off-spine + polymorphic-sibling, with a named-callable *spare* + supertype-family *override*).
-- **PR #569 added four things** (all in `tools.ts`):
-  1. **Uniqueness-aware spare** — `buildFlowFromNamedSymbols` now returns `uniqueNamedNodeIds` (callables whose token had ≤3 defs). The whole-file spare uses it, so `as_sql` (110 defs) no longer keeps every Compiler/Expression variant full; `getResponseWithInterceptorChain` (1 def) still spares RealCall.
-  2. **Per-symbol focused view** — a collapsed family file renders FULL bodies for symbols with `prio()` < 99 (on-spine=0, unique-named=1, `fileDefinesSuper && named`=2), signatures for the rest. Bounded: `bodyCap = maxCharsPerFile*2`, `SIG_MAX = max(12, maxSymbolsInFileHeader*2)`. Header tag flips to `· focused (…)` when any body shown, else `· skeleton (…)`.
-  3. **All-tier test-file exclusion** — removed the `budget.excludeLowValueFiles` gate on the `isLowValue` hard-exclude (was <500-file tiers only); guards (query-mentions-tests, ≥2 non-test remain) kept.
-  4. **Named-cluster survival in non-sibling god-files** — inject agent-named method defs into `rangeNodes` even if the gather missed them; rank named ranges at importance **9** (above glue 6 / connected 3); `fileBudget = min(maxCharsPerFile, maxOutputChars - totalChars - 200)` in cluster selection so high-importance named clusters survive instead of being source-order-trimmed.
-- **Validated (headless A/B, Opus 4.8, median of 4, full 7-repo sweep) — now in README:** avg **25% cheaper · 57% fewer tokens · 23% faster · 62% fewer tool calls** (was 22/47/20/50). Per-repo cost: VS Code 33, Excalidraw 27, Django **23** (was 9, median 0 reads), Tokio 35, OkHttp **11** (was 4, 0 RealCall read-backs), Gin 15, Alamofire 28.
-- **PR #564 (already merged, `f1b14f0`)** was the prior round: named-callable spare + supertype-family override (fixed the read-back regression where RealCall.kt / compiler.py were skeletonized then Read back).
-
-## Gotchas
-- **A/B per-repo variance is large (±~10–13 pts).** The WITHOUT arm swings run-to-run (how hard native greps). Excalidraw/Gin look *lower* than the prior README purely from a cheaper native baseline this batch — NOT regressions (reads still 0/low). **Averages are the stable signal.** Never conclude from n=1; the README is median-of-4.
-- **The alamofire `DataRequest` residual is NOT cleanly closable.** A "spare a file when the agent names its class" type-spare *broke OkHttp* (it spared all 5 interceptor classes → 0 skeletons). A named sibling class is structurally indistinguishable from "the one main type." Left as-is (alamofire is 28% cheaper; ~1 DataRequest read/run).
-- **Gin's WITH-cost ticked up ($0.36→$0.48 across batches)** — partly the named-injection adding content to an already-0-read repo. Still 15% cheaper. Possible over-eager named-injection on small repos.
-- **Validate retrieval changes with a real-agent A/B, not just the probe.** The deterministic `probe-explore.mjs` query forms a *different spine* than the agent's real query → it hid both the Django and the OkHttp read-backs. (Dead-end #6 in the design doc.)
-- **Always `npm run build` before probing/A/B** — probes + the A/B MCP server load `dist/`, not `src/`. Corpus indexes (`/tmp/codegraph-corpus/*`) are valid without re-index since all changes are query-time.
-- **`adaptive-sizing-skeletonizing.md` handoff is gone from `main`'s working dir** — it was untracked, got swept into commit `3c38729` on `feat/adaptive-explore-sizing`, so it lives only on that branch now. Deleting that branch deletes it (it's obsolete — that work shipped).
-- **5 `npm-shim` test failures are pre-existing/network** (lack `--probe-net` on the global binary) — not a regression; don't let them block.
-
-## How to test & validate
-- Build first: `npm run build` (must be green).
-- Deterministic probe: `node scripts/agent-eval/probe-explore.mjs /tmp/codegraph-corpus/<repo> "<symbol-bag query>"` → inspect `#### file — … · focused/skeleton` headers + sizes. okhttp = 5 `· skeleton`; django compiler.py `· focused` with `def execute_sql`/`def as_sql`/`def _fetch_all` bodies present; excalidraw/tokio/vscode/gin = 0 skeleton/focused (inert).
-- A/B one repo: `bash /tmp/ab-one.sh <repo> <runs> "<question>"` → writes `/tmp/ab-readme/<repo>/run<n>/`. Aggregate one repo: `node /tmp/one-agg.mjs <repo>`. Full 7: `RUNS=4 bash scripts/agent-eval/bench-readme.sh` then `node scripts/agent-eval/parse-bench-readme.mjs /tmp/ab-readme` (averages) + `node /tmp/full-agg.mjs` (per-repo reads/grep/tools/cost/time).
-- Unit: `npx vitest run __tests__/adaptive-explore-sizing.test.ts` → **8/8** (skeleton, named-callable spare=RealCall, supertype-family override→focused=codec.ts, uniqueness/shared-method, on-spine exemplar full, distinct step full, flag=0 disables).
-- **Methodology:** a real win = cost DOWN **and** reads NOT up vs the same build's WITHOUT arm; confirm inert repos stay 0 skeleton/focused (the change only *adds* spare conditions + per-symbol rendering of already-collapsed files → strict subset of the original gate).
-
-## Repo state
-- branch `main`, last commit `b026e64 feat(mcp): per-symbol adaptive codegraph_explore sizing (#569)`.
-- uncommitted: clean (this handoff file will be a new untracked `.claude/handoffs/` entry).
-- merged-but-undeleted branches: `feat/adaptive-explore-sizing` (#564) + `feat/explore-per-symbol-sizing` (#569), both local + remote.
-
-## Open threads / TODO
-- [ ] Delete the two squash-merged branches (local + remote): `feat/adaptive-explore-sizing`, `feat/explore-per-symbol-sizing`.
-- [ ] (optional) Stabilize the README per-repo numbers with a 2nd full-7 batch → publish median-of-8 (smooths the WITHOUT-arm variance that makes Excalidraw/Gin look lower).
-- [ ] (frontier) Gin's small WITH-cost bump from named-injection on an already-0-read repo — consider gating named-injection by repo size / whether reads were happening.
-- [ ] (frontier) Alamofire `DataRequest` ~1 read/run — only closable by distinguishing "main type" from "interchangeable sibling," which has no clean static signal (don't re-attempt the type-spare; it breaks okhttp).
-- [ ] (frontier) Big non-sibling god-file clustering generally (#185 family) — `query.py`/`Session.swift`-style files; the fileBudget fix helped Django but the merge/selection logic is still the lever for other repos.
-
-## Recent transcript (oldest → newest)
-### Turn — "Add the regression test for the skeletonization"
-- **Claude:** Wrote `__tests__/adaptive-explore-sizing.test.ts` (OkHttp-in-miniature fixture); validated via probe; 5 cases. (This was for PR #564's gate.)
-- **Outcome:** Test landed; led into re-validating the README.
-### Turn — re-validate README, found Django/OkHttp cost laggards → built fixes
-- **Claude:** A/B sweeps via `bench-readme.sh`; diagnosed read-backs (RealCall via `Lockable` 9-impl mixin; compiler.py base+subclasses). Shipped PR #564 (named-callable spare + supertype-family override). Then A+B+C (uniqueness + per-symbol focused + test exclusion).
-- **Outcome:** Django/OkHttp flipped to wins; alamofire type-spare attempt broke okhttp → reverted.
-### Turn — "Pursue the deeper Django fix"
-- **Claude:** Cluster-debug showed `_fetch_all` (L2237) never formed a cluster + got source-order-trimmed. Fixed via named-injection + importance 9 + `fileBudget`. A/B: Django reads tail 3→1, 11%→17%; controls (okhttp 14%, excalidraw 31% / 0 reads) held. Committed `e71eb90`.
-- **Outcome:** Django residual closed.
-### Turn — "push + PR + squash-merge, update README incl. averages + header"
-- **Claude:** Ran final full-7 sweep (`bench-readme.sh`, 56/56). Clarified a confusing "WITHOUT got cheaper" phrasing (WITH cg is cheaper in all 7; the % is the *gap*, which shrinks when the native baseline is cheap that batch). User said publish this batch.
-- **Outcome:** Updated README (headline 25%/62%, average line, 7 summary rows, 7 detail tables, methodology date) + CHANGELOG + design doc. Built clean branch off `origin/main` (dropping the already-squashed commits + the handoff artifact), pushed, opened PR #569, squash-merged → `b026e64`. Synced local to main, rebuilt dist. Offered branch cleanup → user ran `/handoff save`.

+ 0 - 70
.claude/handoffs/framework-coverage-sweep-2026-05-23.md

@@ -1,70 +0,0 @@
----
-name: framework-coverage-sweep-2026-05-23
-date: 2026-05-23 23:59
-project: codegraph
-branch: architectural-improvements
-summary: Dynamic-dispatch coverage sweep COMPLETE — all 14 README frameworks + every flow-relevant language validated (measure→fix→validate→test→playbook→commit). ~37 commits pushed, suite green. Ship-prep (CHANGELOG + PR to main) is the only thing left.
----
-
-# Handoff: Dynamic-dispatch framework/language coverage sweep (complete)
-
-## Resume here — read this first
-**Current state:** The coverage sweep is **done**, AND a **frontier pass** closed the tractable partials. Every framework in the README's 14-row table is ✅, every flow-relevant language is validated (TS/JS, Python, Go, Java, C#, PHP, Ruby, Rust, Swift, Dart, Kotlin, Lua/Luau, Scala, C/C++), and the frontier pass added: React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource` (redash 6→77), Flask tuple methods + broader detection (flask-realworld 0→19), gorilla/mux confirmed. All committed/pushed to `architectural-improvements` (tree clean except untracked `.claude/handoffs/`). Full suite green (**809 passed**, 2 skipped; flaky `watcher.test.ts > debounced sync` passes on re-run). **No CHANGELOG entry exists, and the branch is not yet merged to main.**
-**Immediate next step:** Ship-prep — write a CHANGELOG entry grouping the whole sweep (route resolution for Flask/FastAPI/Drupal/Rust-Axum+actix/Vapor/Spring-Kotlin/Play + React Router routing; the Python builtin-name guard, Dart method-range, and C++ inheritance foundational fixes; the flutter-build and cpp-override synthesizer channels), bump `package.json`, then open a PR to main.
-
-> Suggested next message: "do ship-prep: write the CHANGELOG entry covering the whole framework/language coverage sweep on this branch, bump the version, and open a PR to main"
-
-## Goal
-Close static-extraction holes for **dynamic dispatch** across every language/framework codegraph supports, so cross-symbol flows (request→route→handler→service, state→render, virtual→override) exist in the graph and an agent answers flow questions with few codegraph calls and ~0 Read/Grep. Per framework/language: canonical flow `trace`s end-to-end, agent A/B shows fewer reads, no node explosion, recorded in `docs/design/dynamic-dispatch-coverage-playbook.md` (the matrix §6 + per-item notes §7). **This goal is now met; what remains is ship-prep + documented frontiers.**
-
-## Key findings (this session's work, all committed)
-- **Routing convention is the hole in every backend** — same pattern each time: the resolver/extractor assumed one syntax. Flask (intervening `@login_required`/stacked routes), FastAPI (empty `""` path), Drupal (`claimsReference` for FQCN `_form`/single-colon controllers + contrib `detect` via composer name/type/`.info.yml`), Rust/Axum (chained `get(h).post(h2)` + namespaced `mod::handler`), actix (builder API `web::resource().route(web::get().to(h))`), Vapor (grouped `routes.grouped("x"); x.get(use:h)` — was 0 on every real app), Spring **Kotlin** (`fun` handler syntax + `.kt`), Play (extensionless `conf/routes` → controller), React Router (`<Route>` JSX).
-- **Three FOUNDATIONAL fixes (broad benefit, not framework-specific):** (1) Python **bare-name builtin guard** in `src/resolution/index.ts` — a handler named `index`/`get`/`update` was filtered as a builtin method; mirror the dotted-branch `knownNames` guard. (2) **Dart method-range** in `src/extraction/tree-sitter.ts` `createNode` — Dart bodies are SIBLINGS of the signature, so methods were `end==start` (signature-only); extend `endLine` to the resolved body (guarded, child-body grammars no-op). (3) **C++ inheritance** — `extractInheritance` handled `base_clause` (PHP) but not C++ `base_class_clause`; added it (leveldb extends 219→298).
-- **Two new synthesizer channels** in `src/resolution/callback-synthesizer.ts` (Dart analog + C++ analog of react-render): `flutter-build` (a State method calling `setState(` → `build`) and `cpp-override` (base virtual method → subclass override of same name, gated to C++).
-- **measure-first repeatedly split "needs work" from "already covered":** Svelte, NestJS (prior), and this session **Lua/Luau** (module dispatch already resolves) + **Compose** (composition is plain function calls, already static) needed NO code. The assumed hole wasn't real.
-- **`claimsReference` pre-filter is the recurring gotcha** (`src/resolution/index.ts:497-503`): a route ref naming no declared symbol (FQCN, `Controller@method`, `controller#action`, `Class.method`) is dropped before `framework.resolve()` runs. Added for Drupal + Play this session.
-
-## Gotchas
-- **`claimsReference`:** if a new framework's route refs don't resolve despite a correct `resolve()`, it's the pre-filter — add `claimsReference`.
-- **Reindex picks up resolver changes only on a CLEAN index:** `codegraph index` is incremental (skips unchanged files); after `npm run build`, do `rm -rf .codegraph && codegraph init -i` to re-extract. The init message's edge count is contains-only (~misleading); query the DB for the real count.
-- **Extraction changes are high blast radius** (shared `createNode`/`extractInheritance`): re-check node counts on control repos (excalidraw 9,290 / django 302) — the Dart/C++ fixes are guarded to only-extend / C++-only, controls unchanged.
-- **Play `conf/routes` is extensionless** → needed `isPlayRoutesFile` opt-in in `grammars.ts` (isSourceFile + detectLanguage→'yaml' no-grammar path). Narrow match, only ADDS Play files.
-- **Flaky:** `watcher.test.ts > debounced sync > should trigger sync after file change` — timing-based, passes on re-run; unrelated to any of this work.
-- **Foreground `sleep` is blocked** in Bash → background A/B batches (`run_in_background: true`), read the task output file. zsh quirks: quote globs (`'*.vue'`); SQL `count(*)` in `$(...)` needs care with quotes.
-- Global `codegraph` is npm-linked to this repo's `dist/`; `npm run build` then reindex. A/B harness: `scripts/agent-eval/run-all.sh <repo> "<Q>" headless` (with vs empty MCP), parse via `node scripts/agent-eval/parse-run.mjs`.
-
-## How to test & validate (the per-framework loop)
-- Corpus in `/tmp/codegraph-corpus/<name>` (clone S/M/L, `git clone --depth 1`). Index: `rm -rf .codegraph && codegraph init -i`.
-- Measure holes: `sqlite3 .codegraph/codegraph.db "select count(*) from nodes where kind='route'"` + route→handler edges (`join edges on source where kind='references'`). Node-count before/after (no explosion).
-- Flow: `node scripts/agent-eval/probe-node.mjs <repo> <symbol>` (shows Called-by/Calls trail) / `probe-trace.mjs <repo> <from> <to>`.
-- Agent A/B (≥2 runs/arm, variance is real): `run-all.sh` headless, record Read/Grep/duration/codegraph. Pass = fewer reads with codegraph.
-- Tests: `npm test` (vitest). Resolver extract tests in `__tests__/frameworks.test.ts`; end-to-end in `__tests__/frameworks-integration.test.ts` (real CodeGraph + indexAll); Dart range in `__tests__/extraction.test.ts`; Drupal in `__tests__/drupal.test.ts`.
-
-## Repo state
-- branch `architectural-improvements`, last commit `42a0178 docs(playbook): record frontier pass; test(go): gorilla/mux`.
-- uncommitted: clean (only untracked `.claude/handoffs/`).
-- ~37 commits total on the branch (handoff's original 11 frameworks + this session's: Flask/FastAPI, Drupal, Rust/Axum, Vapor, React Router, actix, Dart, Kotlin, Lua, Scala/Play, C/C++ — each a feat + a docs(playbook) commit; Lua was docs-only).
-
-## Open threads / TODO
-- [ ] **SHIP-PREP (the only blocker to merge):** CHANGELOG entry for the whole sweep, `package.json` bump, PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish` (see CLAUDE.md).
-- [x] **Frontier pass DONE (commits 0456915, 03e49ab, 42a0178):** React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource`, Flask tuple methods + detection, gorilla/mux confirmed.
-- [ ] **Frontiers LEFT (deliberately, with rationale in playbook §7 "Frontier pass"):** anonymous/inline closures (def-use frontier), metaprogramming finders (AR/Eloquent/JPA/EF), reactive runtimes (Vue Proxy / Compose recomposition), Akka actors, C callback-struct 422-way fan-out, C++ pure-virtual base methods, React lazy data-router (variable paths + lazy imports), Play SIRD, Nuxt-specific. Forcing these adds noise.
-- [ ] Pre-existing, unrelated: Next.js `*.config.mjs` in a `pages/` dir treated as a route (false-positive found in bulletproof-react).
-
-## Recent transcript (oldest → newest, this session)
-### Turn — "what's left / what's next on coverage" → did Flask/FastAPI
-- 3 holes: Flask intervening/stacked decorators, FastAPI empty path, **Python bare-name builtin guard** (handlers named `index`/`get` filtered). microblog 6→27, realworld 12→20, dispatch 290/290. Fixed 6 stale Laravel/Rails tests too. Committed + pushed.
-### Turn — "Drupal next"
-- `claimsReference` for FQCN/_form/single-colon controllers + contrib `detect` (composer type/name + `.info.yml`). core 536→731 (87%), admin_toolbar 0→14. OOP `#[Hook]` = frontier. Committed.
-### Turn — "Rust: Axum/actix/Rocket"
-- Axum chained methods + namespaced handlers (realworld 12→19, 19/19); Rocket already 99%; **actix builder API** `web::resource().route(web::get().to())` (examples 51→128). Committed (2 commits: axum, then actix).
-### Turn — "Vapor (Swift)"
-- Resolver was 0-routes on every real app; rewrote for any receiver + optional non-string paths + `.grouped` prefix tracking + `use:` discriminator. template 0→3, SteamPress 0→27, SPI 0→14. Committed.
-### Turn — "2, 3, 4" (React Router, actix [done above], Dart/Flutter)
-- React Router `<Route>` JSX (react-realworld 0→10). Dart/Flutter: **method-range fix** (foundational) + `flutter-build` setState→build synthesizer. Committed.
-### Turn — "Kotlin next"
-- Spring resolver `['java']`→`['java','kotlin']` + `fun` handler regex (petclinic-kotlin 0→18, 18/18; Java unchanged 19/19). Compose composition already static. Committed.
-### Turn — "Lua/Luau, Scala, C/C++ (Lua first, but do all three)"
-- **Lua:** measure-first → module dispatch already covered (telescope 335 cross-file calls); no code change, validated. **Scala/Play:** `conf/routes` file-walk opt-in + Play resolver (computer-database 0→8). **C/C++:** general dispatch strong (redis 29k); fixed C++ `base_class_clause` inheritance + `cpp-override` synthesizer (leveldb 12 precise). All committed + pushed.
-### Turn — "wrap up + refresh handoff"
-- This handoff. Sweep complete; ship-prep (CHANGELOG + PR) is the remaining work.

+ 0 - 75
.claude/handoffs/impact-coverage-per-language-2026-06-03.md

@@ -1,75 +0,0 @@
----
-name: impact-coverage-per-language-2026-06-03
-date: 2026-06-03 21:30
-project: codegraph
-branch: main
-summary: Fixed the engine's impact/affected tool (file-dependent coverage 62.5%→95.8%) for TypeScript; now replicating per-language.
----
-
-# Handoff: Make impact/`affected` work across every language
-
-## Resume here — read this first
-**Current state:** TypeScript is DONE and validated (full suite green, 1134 passed). Three fixes landed (uncommitted on `main`): the `affected`/`getFileDependents` query fix, in-body type-annotation extraction, and import/re-export linking. The latter took file-dependent coverage on this repo from **62.5% → 95.8%** (residual 5 files are all correctly-zero: worker scripts + see-through barrels + a CLI entry). I had just asked (via AskUserQuestion) which language to do next — Python / Java / Go / C# — and the user rejected the question and ran `/handoff save` instead.
-**Immediate next step:** Pick the next language and replicate the methodology: measure coverage on a real repo → audit the 0-dependent files → fix extraction/resolution → validate node-stability + tests. Don't re-ask with a tool; just start (the user declined the menu).
-
-> Suggested next message: "Let's do Python next. Index a real Python repo, measure file-dependent coverage the same way, audit the 0-dependent files, and tell me what the real gaps are before we fix anything."
-
-## Goal
-The engine's impact tool (symbol-level `getImpactRadius` and file-level `getFileDependents`/`affected`) must capture **all real cross-file dependencies** for every supported language — recall-first ("never miss the affected feature"). "Coverage" = % of source files with ≥1 cross-file dependent. 100% is the WRONG target (entry points / workers / see-through barrels genuinely have 0); the real bar is **no real dependency missed**, validated by auditing every 0-dependent file.
-
-## Key findings (TS work — all landed, uncommitted)
-- **`affected` was returning 0 for every file.** Root cause: `imports` edges in this graph are **same-file** (`file → its own local import nodes`); `getFileDependents` followed imports-only → 0 cross-file dependents for 72/72 files. Fix: `src/db/queries.ts` new `getDependentFilePaths`/`getDependencyFilePaths` (one indexed JOIN, all edge kinds except `contains`); `src/graph/queries.ts` `getFileDependents`/`getFileDependencies` now delegate. Un-breaks `findCircularDependencies` too.
-- **In-body type annotations were dropped.** `visitFunctionBody` (`src/extraction/tree-sitter.ts` ~line 2143) extracted calls but never type annotations, so `const items: Foo[] = []` inside any function/method/object body created no edge. Fix: extract type annotation from `variable_declarator` nodes in the body walker, attributed to the enclosing symbol (no new nodes). Gated on `nodeType === 'variable_declarator' && TYPE_ANNOTATION_LANGUAGES.has(language)` → effectively **TS/TSX only** (Rust/Go/Java/C# use different AST shapes, e.g. Rust `let_declaration`/`type` field — verified Rust returns 0).
-- **Imports/re-exports weren't dependencies.** A symbol imported and only re-exported / put in a registry array / passed as an arg / used in JSX created NO edge (only *called*/*instantiated*/*signature-typed* symbols linked). Fix in `src/extraction/tree-sitter.ts`: `emitImportBindingRefs` (per named/default/aliased binding, ~line 1750) called from `extractImport` (~line 1626, TS/JS gate); `emitReExportRefs` (per `export {X} from './y'`) called from a new `export_statement`-with-`source` dispatch branch (~line 381). Both push `imports` refs attributed to the **file node**; the resolver maps them to the definition via `resolveViaImport`. **This is the 62.5%→95.8% win.**
-- **`TYPE_ANNOTATION_LANGUAGES`** (`src/extraction/tree-sitter.ts` ~line 2564): typescript, tsx, dart, kotlin, swift, rust, go, java, csharp.
-- Full findings saved to memory: `~/.claude/projects/-Users-colby-Development-CodeGraph-codegraph/memory/impact-coverage-findings.md`.
-
-## Gotchas
-- **STALE INDEX is a trap.** The repo's checked-in `.codegraph/codegraph.db` was from Jun 2 and under-reported massively (types.ts showed 13 dependents; a fresh reindex gave 35, then 47 after fixes). **Always measure on a FRESH reindex into a temp dir**, never the repo's live `.codegraph`. (Also load-bearing for docker-app: it MUST reindex before computing impact.)
-- **Node-count must stay stable** (no graph explosion) — the fixes add only edges, never nodes. Verify before/after. Resolution creates no nodes (confirmed); a deterministic ±1 from extraction-worker timing is noise.
-- **Don't conflate same-basename files** when auditing (`src/types.ts` vs `src/resolution/types.ts` vs `src/ui/types.ts`). Match on full path.
-- **Import-linking design differs by language:** named-symbol imports (Python `from x import Y`, Java `import com.x.Y`, Rust `use ...::Y`) port from the TS approach; **package/namespace imports (Go `import "pkg"`→`pkg.Func`, C# `using NS`) need a different design** (no per-binding names).
-- **Known unfixed gap:** default-import-with-rename (`import Button from './x'` where the default export isn't named `Button`) doesn't link — `resolveViaImport` matches local name, no default-export tracking. Affects calls too (pre-existing), rare here. Out of scope unless prevalent in the next language.
-- Work is on `main` and uncommitted — **branch before committing** (CLAUDE.md rule). Maintainer handles version bumps/releases.
-
-## How to test & validate
-- `npm test` → full suite, expect **1134 passed | 2 skipped** (was 1131 before the import-linking tests).
-- `npx vitest run __tests__/extraction.test.ts -t "dependency linking"` → the 3 new import/re-export tests.
-- `npx vitest run __tests__/graph.test.ts` → strengthened "File dependency analysis" (real cross-file assertions, not just `Array.isArray`).
-- **Coverage probe recipe** (reuse for the next language; swap the glob ext):
-  ```bash
-  npm run build
-  NEW=$(mktemp -d); cp -R <repo>/src "$NEW/src"   # or clone a real repo
-  node -e "const {pathToFileURL}=require('node:url');(async()=>{const idx=await import(pathToFileURL(require('path').resolve('dist/index.js')).href);const CG=idx.default?.default??idx.default??idx.CodeGraph;const cg=CG.initSync('$NEW',{config:{include:['**/*.py'],exclude:[]}});await cg.indexAll();cg.resolveReferences();cg.destroy();})();" 2>&1 | grep -vE "Experimental|trace"
-  # coverage:
-  sqlite3 "$NEW/.codegraph/codegraph.db" "WITH src AS (SELECT DISTINCT file_path fp FROM nodes WHERE kind='file'), deps AS (SELECT tgt.file_path fp, COUNT(DISTINCT s.file_path) n FROM edges e JOIN nodes tgt ON tgt.id=e.target JOIN nodes s ON s.id=e.source WHERE e.kind!='contains' AND s.file_path!=tgt.file_path GROUP BY tgt.file_path) SELECT (SELECT COUNT(*) FROM src) files, (SELECT COUNT(*) FROM src WHERE COALESCE((SELECT n FROM deps WHERE deps.fp=src.fp),0)>0) with_deps;"
-  # audit 0-dependent files: list them, classify imported-but-unlinked (real miss) vs not-imported (correct).
-  ```
-- **Per-language method:** small + medium + large real repos; for each, run the coverage probe, then a controlled mini-probe (write 2 files isolating each suspected gap — in-body type, value import, re-export) to see exactly which edges resolve. Fix extraction/resolution. Re-validate: coverage up, node count stable, full suite green, ≥1 dedicated test.
-
-## Repo state
-- branch `main`, last commit `629d847 fix(extraction): index Vue <template> component usages (#629 follow-up) (#659)`
-- uncommitted (6 files): `CHANGELOG.md`, `__tests__/extraction.test.ts`, `__tests__/graph.test.ts`, `src/db/queries.ts`, `src/extraction/tree-sitter.ts`, `src/graph/queries.ts`
-- CHANGELOG `[Unreleased] → Fixes` has 2 new bullets (the `affected` fix + the "recognize far more dependencies" completeness bullet).
-
-## Open threads / TODO
-- [ ] **Pick + do the next language** (Python / Java / Go / C# — see gotchas for import-model differences). Python = broadest reach, named imports port cleanly, but less typed. Java = most complete win (always-typed + named imports, enterprise/Spring). Go/C# need different import-linking design.
-- [ ] Per-language in-body type annotations still open for Rust/Go/Java/C#/Kotlin/Swift/Dart (different AST shapes; TS only so far).
-- [ ] Consider default-import-with-rename resolution (needs default-export tracking) if the next language uses it heavily.
-- [ ] Port the TS fixes to `codegraph-pro` — user said they'll do it via upstream merge (don't touch pro).
-- [ ] Decide whether/when to commit (on `main`, uncommitted; branch first).
-
-## Recent transcript (oldest → newest)
-### Turn 1 — "affected CLI under-reports; copy the fix from codegraph-pro; impact must work on every language"
-- **User:** getFileDependents only follows imports edges, returns 0; said a fix exists in `codegraph-pro` to copy over.
-- **Claude:** Diffed impact files between repos → **byte-identical**; pro's working diff was the already-merged #629 Svelte/Vue work, NOT an affected fix. Confirmed bug via live index (`imports` edges all same-file; 72/72 files under-reported to 0).
-- **Outcome:** Implemented the `affected` fix fresh (`getDependentFilePaths`/`getDependencyFilePaths`), tests, CHANGELOG. Asked how to scope the broader cross-language effort + pro port.
-### Turn 2 — user chose "Fix TS coverage first" + "I'll merge pro upstream"
-- **Claude:** Audited → biggest factor was a STALE index (47.5%→62.5% on reindex). Found in-body type annotations dropped by `visitFunctionBody`; fixed + tested. Coverage 62.5%, node count stable.
-- **Outcome:** TS in-body type fix landed; reported the stale-index trap + per-language nature.
-### Turn 3 — "We should have 100% coverage not 62.5%"
-- **Claude:** Explained 100% file-coverage is wrong (workers/barrels). Audited 0-dependent files → 42/45 were imported-but-unlinked (real misses). Root cause via clean probe: value imports / re-exports create no edge. Implemented `emitImportBindingRefs` + `emitReExportRefs`. **Coverage 62.5%→95.8%**; residual 5 all proven correct-zero. Full suite green (1134), tests added.
-- **Outcome:** Import/re-export linking landed + validated. Updated memory.
-### Turn 4 — "ok let's do the next language"
-- **Claude:** Asked (AskUserQuestion) which language: Python/Java/Go/C#, with import-model rationale.
-- **Outcome:** User **rejected** the question and ran `/handoff save`. → Next session: pick a language and start measuring (don't re-ask via tool).

+ 0 - 92
.claude/handoffs/issue-triage-quickwins-2026-06-02.md

@@ -1,92 +0,0 @@
----
-name: issue-triage-quickwins-2026-06-02
-date: 2026-06-02 17:55
-project: codegraph
-branch: main
-summary: Triaged all 72 open issues against the changelog, then shipped the already-fixed closes + 10 quick-win fixes (PR #654) + status --json CI fields (PR #655); both merged.
----
-
-# Handoff: open-issue triage → quick-win fixes + status --json (#329)
-
-## Resume here — read this first
-**Current state:** All of this session's work is **merged to `main`** (tip `7b62356`), working tree clean, nothing in progress. 18 issues closed, 2 PRs merged, 2 superseded PRs closed. The triage surfaced a backlog of real bugs/features that are documented below but **not started**.
-**Immediate next step:** Pick the next work item — either start a Tier-1 correctness bug or review the 4 in-flight contributor PRs. Highest-leverage correctness bug with the clearest entry point is **#629** (Svelte re-export barrels → false "0 callers").
-
-> Suggested next message: "Let's fix #629 — start by making the default-export branch of `findExportedSymbol` (src/resolution/import-resolver.ts:~1250) also match node kind `component` so `export { default as X } from './X.svelte'` resolves, then check the bare `./`-index and package-subpath barrel forms. Reproduce on a SvelteKit repo first."
-
-## Goal
-Work the open-issue backlog: close what's already fixed, ship the cheap wins, and tee up the real bugs. This session's slice is **done and merged**; the goal now is to keep going on the remaining triaged items (Tier-1 correctness bugs, in-flight PR reviews, or the multi-root feature cluster).
-
-## Key findings
-- **Full triage of 72 open issues** was done via 6 parallel `general-purpose` agents, each anchoring issues to the reporter's version vs `CHANGELOG.md` (per CLAUDE.md's version-anchoring rule). Result buckets are in the conversation; the still-open real work is under "Open threads" below.
-- **#329 design decision:** shipped `version`, `indexPath`, `lastIndexed` (ISO-8601 string, or null); **dropped `agentCount`** — it had no clear consumer, two conflicting meanings (configured integrations vs. live daemon sessions), and didn't fit the issue's own CI use case. Field names match the issue (not eddieran's `codegraphVersion`/ms+ISO scheme).
-- **#329 impl:** `QueryBuilder.getLastIndexedAt()` = `SELECT MAX(indexed_at) FROM files` (epoch-ms; `indexed_at` is `Date.now()` at index time, schema.sql:64). Exposed as `CodeGraph.getLastIndexedAt(): number|null` (src/index.ts) and surfaced in both JSON branches of the `status` action (src/bin/codegraph.ts).
-- **Quick-win fix locations** (all on `main` now): `.codegraph/.gitignore` → `*`+`!.gitignore` (src/directory.ts, BOTH write sites ~86 and ~248); MCP `resources/list`/`prompts/list` empty replies (src/mcp/session.ts + src/mcp/proxy.ts); extension map adds (src/extraction/grammars.ts); `getUnresolvedReferencesByFiles` chunking (src/db/queries.ts:~1588); `git ls-files -z` (src/extraction/index.ts, BOTH calls in `collectGitFiles`); Go generic-receiver regex (src/extraction/languages/go.ts:~60); anonymous-body visit (src/extraction/tree-sitter.ts:~633); impact `contains`-edge exclusion (src/graph/traversal.ts:~525).
-- **Competing-PR handling (confirmed pattern):** for #329's two PRs (#333, #480) we shipped a fresh combined impl, credited BOTH via `Co-Authored-By:` trailers, and closed each with a specific thanks. Saved to memory `feedback_pr_improve_on_contributor_branch`.
-
-## Gotchas
-- **#583 is only HALF done.** Shipped RC1 (generic-receiver regex). RC2 — receiver methods declared in a *different file* from their struct lose the struct→method `contains` edge (src/extraction/tree-sitter.ts:~788 restricts owner lookup to same file) — is **still open**, needs a resolution-phase package-wide owner join. Don't tell anyone #583 is closed.
-- **`gh issue close --comment` is unreliable** — comment first (`gh issue comment --body-file`), then close, then verify the comment is the last activity. Used this throughout.
-- **`main` is REVIEW_REQUIRED** → merge with `gh pr merge <N> --squash --admin --delete-branch`. Co-authored-by needs numeric IDs: `gh api users/<login> --jq .id` → `<id>+<login>@users.noreply.github.com` (12122J=199902626, eddieran=8403607).
-- **No `git add -p` in this env.** To split mixed working-tree changes into 2 branches: commit all to `tmp/snapshot`, branch each off `main`, `git checkout tmp/snapshot -- <owned files>`, and for shared files start from main + re-apply hunks via Edit. CHANGELOG `[Unreleased]` auto-merges when one PR adds `### New Features` and the other adds to `### Fixes`.
-- Spawn-based tests (mcp-initialize, status-json) exec `dist/bin/codegraph.js` — **rebuild dist** before running them.
-- `codegraph status` takes a **positional** path arg, not `--path` (that flag silently no-ops for status).
-
-## How to test & validate
-- `npm run build && npm test` → 59 files, ~1122 passed / 2 skipped on `main`. (Spawn tests need the build first.)
-- `npx vitest run __tests__/<file>.test.ts` for a single file.
-- Smoke test the real binary: `CODEGRAPH_NO_DAEMON=1 node dist/bin/codegraph.js status --json` (initialized) / `... status /tmp/empty --json` (uninitialized).
-- For new language/framework coverage bugs (#629, #645, #608, #578…): follow the CLAUDE.md validation methodology — deterministic probe on the built `dist/` + agent A/B; verify "the flow EXISTS end-to-end in the graph" and node count is stable (no explosion).
-
-## Repo state
-- branch `main`, last commit `7b62356 feat(cli): add version, indexPath, lastIndexed to status --json (#329)`
-- previous: `ddb1a8f fix: issue-triage quick wins (#654)`
-- uncommitted: clean
-
-## Open threads / TODO
-Tier-1 correctness bugs (silently-wrong / zero-recall — highest value):
-- [ ] **#645** C++ method calls via singletons/factories/chained getters → wrong class (needs C++ return-type extraction; L)
-- [ ] **#629** Svelte default re-export barrels → false 0 callers (start: match kind `component` in `findExportedSymbol`, import-resolver.ts:~1250; M)
-- [ ] **#608** PHP `Cls::for($x)->method()` static-factory chains drop the edge (M)
-- [ ] **#578** Python `module.func()` after `from pkg import module` → zero recall (import-resolver.ts `resolveViaImport`; M)
-- [ ] **#583-RC2** Go cross-file receiver methods lose `contains` edge (tree-sitter.ts:~788; M) → unblocks **#584** (Go structural `implements` edges)
-- [ ] **#527** symlink read-escape (validatePathWithinRoot has no realpath check; `isPathWithinRootReal` exists but unused in read paths; security; S/M)
-
-Platform: [ ] #237 (C# primary ctors — stale tree-sitter-wasms C# grammar, current is 0.23.5) · [ ] #448 (WAL on SMB/network drives) · [ ] #208 (Windows NTFS scan hang, mis-titled) · [ ] #576 (Windows daemon lingers after Ctrl+C)
-
-In-flight contributor PRs to review/merge: [ ] #597 (#515 C macros) · [ ] #301 (#300 Drupal patterns) · [ ] #306 (#305 docs Contributing) · [ ] #57 (#55 MQL5; cookbook PR #97 was closed)
-
-Framework coverage: [ ] #307 (Hono sub-routers) · [ ] #491/#490/#489 (Spring @ConditionalOnProperty / Feign→controller / MyBatis SqlSessionTemplate) · [ ] #634 (TS string-literal type args)
-
-Feature cluster (one design pass, not piecemeal): [ ] multi-root / scan-boundary — #542, #518, #511, #499, #452, #304, #281, #141, #514, #636
-
-Needs-info / verify-version-then-maybe-close: [ ] #641 (Codex MCP — PATH/timeout) · [ ] #501 (Antigravity 2.0 — UI enable?) · [ ] #493 (sync delete — confirm reporter <0.9.5) · [ ] #535 (opencode Windows %APPDATA% — needs real Windows VM) · [ ] #613 (10GB-DB MCP timeout — large-repo query profiling)
-
-## Recent transcript (last 7 turns, oldest → newest)
-### Turn 1 — "Scan all open issues; which are already fixed / quick wins / real?"
-- **Claude:** `gh issue list` (72 open) + read CHANGELOG; built a version→fix map; fanned out 6 `Agent(general-purpose)` batches to triage 12 issues each, each verifying against `src/`.
-- **Outcome:** Bucketed triage report: 7 already-fixed, ~11 quick wins, ~20 real bugs, ~20 features, tracking/dups.
-
-### Turn 2 — "take care of the quick wins" (+ idea: append `&& codegraph install` to README curl/irm)
-- **Claude:** Verified install.sh/install.ps1 do NOT refresh current-shell PATH → chaining would fail "command not found". Implemented 10 quick wins + tests + CHANGELOG; `npm test` green (1122).
-- **Outcome:** All quick wins done locally; README reworked to an explicit `codegraph install` step "in a new terminal".
-
-### Turn 3 — "handle these [#86 #316 #339 #342 #507 #551 #565], then look at 329 and the PR"
-- **Claude:** Pinned fix-versions via `git tag --contains`; `gh issue comment --body-file` then `gh issue close` then verified, for all 7. Found **two** competing PRs for #329 (#333 by 12122J, #480 by eddieran).
-- **Outcome:** 7 already-fixed issues closed. Surfaced the #329 PR conflict + the `agentCount` ambiguity.
-
-### Turn 4 — "what is this feature?" → "why would anyone need agentCount" → "yes"
-- **Claude:** Explained #329 (machine-readable `status` for CI). Argued `agentCount` has no real consumer → recommended dropping it. Implemented version/indexPath/lastIndexed + `getLastIndexedAt()` + `__tests__/status-json.test.ts`; green.
-- **Outcome:** #329 implemented, agentCount dropped, validated end-to-end.
-
-### Turn 5 — "Option A" (two separate PRs)
-- **Claude:** Split the mixed working tree via `tmp/snapshot` + selective `git checkout` + Edit; built two clean branches (`fix/issue-triage-quick-wins`, `feat/status-json-ci-fields`), each green independently.
-- **Outcome:** Two committed branches; main untouched.
-
-### Turn 6 — "Yes" (push + open PRs)
-- **Claude:** `git push` both; `gh pr create` → **#654** (closes 10 issues, refs #583) and **#655** (closes #329). Both MERGEABLE.
-- **Outcome:** Two PRs open and verified.
-
-### Turn 7 — "do it" (merge + close superseded)
-- **Claude:** `gh pr merge 654 --squash --admin`; merged `main` into #655 branch (CHANGELOG auto-merged), pushed, `gh pr merge 655 --squash --admin` with co-author trailers. Closed #333/#480 with specific thanks; left a #583 RC2 note. Verified all auto-closes.
-- **Outcome:** Everything merged; 18 issues closed; #583 open with note. Updated memory `feedback_pr_improve_on_contributor_branch`.

+ 0 - 86
.claude/handoffs/trace-relevance-coldstart-2026-05-30.md

@@ -1,86 +0,0 @@
----
-name: trace-relevance-coldstart-2026-05-30
-date: 2026-05-30 23:30
-project: codegraph
-branch: feat/trace-relevance-closure-collection
-summary: Turned Alamofire (README's weakest repo) into a clean win via a trace endpoint-disambiguation fix + god-file explore rendering, then eliminated the MCP cold-start race that was causing benchmark inconsistency (handshake ~811ms→~90ms); PR #580 has 6 commits, all that's left is a clean README sweep + squash-merge.
----
-
-# Handoff: trace-relevance + closure-collection + cold-start (PR #580)
-
-## Resume here — read this first
-**Current state:** PR #580 (branch `feat/trace-relevance-closure-collection`, 6 commits, pushed, in sync with remote) is feature-complete and validated — full suite 1090 pass (only the 5 pre-existing npm-shim network fails), 28/28 MCP+daemon tests. The MCP cold-start race (the dominant benchmark-inconsistency source) is ELIMINATED via the proxy-local-handshake (tool registration ~90ms cold+warm, was ~811ms). The README benchmark table still shows the OLD pre-fix numbers.
-**Immediate next step:** Run a median-of-4 README sweep on this build (the race is gone, so numbers should be naturally consistent), update the README table/averages/headline, then squash-merge PR #580.
-
-> Suggested next message: "Run `RUNS=4 bash scripts/agent-eval/bench-readme.sh` on this build, parse with `node scripts/agent-eval/parse-bench-readme.mjs /tmp/ab-readme` (race-aware), update the README benchmark table + averages + the 7 per-repo detail tables + methodology date, then squash-merge PR #580 with `gh pr merge 580 --squash --admin`."
-
-## Goal
-Started as "Alamofire is the README's weakest benchmark repo (13% fewer tool calls vs the ~62% average) — fix it." Became: make CodeGraph's retrieval **consistent and faster**. Definition of done = PR #580 merged (trace fix + dynamic-dispatch coverage + god-file rendering + cold-start elimination), README refreshed with stable median-of-4 numbers. Optimization target per CLAUDE.md is **tool-calls/reads + latency**, NOT raw cost.
-
-## Key findings
-The 6 commits on the branch (oldest→newest):
-- `e86d573` **Trace endpoint relevance** (THE Alamofire win) + closure-collection synthesizer + explore synth-links.
-- `c64c4b3` **God-file multi-phase explore rendering** (6 sub-layers).
-- `5d7388c` Skeleton/focused tag steers to `codegraph_explore`, not Read (spiral fix #1).
-- `dc19eab` Bench parser race-aware (excludes "No such tool available" runs).
-- `91e28df` serve --mcp cold-start ~811ms→~600ms (defer CodeGraph load + 25ms poll).
-- `82ae484` **Proxy-local-handshake** — handshake ~600ms→~90ms, cold-start race eliminated.
-
-Root-causes found by reading A/B TRANSCRIPTS (not the noisy median):
-- **Trace bug:** `handleTrace`'s `scorePair` ranked only by shared-dir-prefix, so overloaded names (`request`=44 defs, `task`=8) resolved to empty `EventMonitor.request(){}` / `RedirectHandler.task` STUBS over the real `Session.request` → agent saw garbage, said "the trace collided with same-named symbols", read by hand. Fix: `nodeRelevance` term in `handleTrace` (penalize ≤1-line stubs −40, test files −150). Result n=8: WITH tools 12→8 median, read variance 0–12→1–4 (the meltdowns WERE the trace-collision flounder). General bug (Swift/Java/C#/Go protocol-stub flooding).
-- **Closure-collection synthesizer** (`src/resolution/callback-synthesizer.ts` `closureCollectionEdges`): Swift `validators.write{$0.append}`…`didCompleteTask` `validators.forEach{$0()}`. The element-invoke `$0(`/`it(` is the precision gate → 9 edges on Alamofire, **0 on every non-Swift control**. Surfaced inline in trace + a "Dynamic-dispatch links" section in `buildFlowFromNamedSymbols` (so it shows when the agent named only `validate`, not `didCompleteTask`).
-- **God-file rendering** (`handleExplore` in `src/mcp/tools.ts`, 6 layers): (1) on-spine god-files render spine-full + off-path methods as signatures (true-spine); (2) named-seed gather — inject each named token's substantive def into the subgraph (FTS buried `validate` → Validation.swift was never gathered); (3) a file that DEFINES a named symbol scores +50 (beats incidental Combine.swift's +23 connected-node score); (4) the 90%-budget early-break and (5) the total-output cap both EXEMPT necessary (entry/spine/uniqueNamed) files; (6) final ceiling 1.5×maxOutputChars. Renders build+validators-exec+validate in ONE explore.
-- **Spiral cause #1 (fixed):** the skeleton tag said "Read for a full body" → agent Read the skeletonized central files → over-investigation spiral. Now steers to `codegraph_explore`.
-- **Spiral cause #2 / the BIG inconsistency (fixed):** MCP **cold-start race**. `serve --mcp` wasn't ready when the headless agent fired → "No such tool available" → grep/Read flounder (19–30 tool spirals). Root-caused: NOT module load (mcp/index 38ms, CodeGraph chain 30ms), NOT the `--liftoff-only` re-exec (NO_RELAUNCH ≈ same) — it's the proxy WAITING for the spawned daemon to bind. Fixed: proxy answers initialize/tools-list from STATIC constants (`runLocalHandshakeProxy` in `proxy.ts`), forwards tool CALLS to the daemon (connected in background), lazy in-process engine fallback preserves the old fall-back-to-direct robustness. `connectWithHello` distinguishes 'version-mismatch' (fail fast → local) from 'not-yet' (poll). Handshake 91ms cold / 88ms warm.
-
-## Gotchas
-- **A/B variance is HUGE — never conclude from n=1, or even one n=4 batch.** The median-of-4 caught regressions the lucky dedicated batches HID (the god-file rework looked great in one batch at 0.5 reads/5.5 tools; the median showed 13 tools dragged by 2 spirals). Report ranges.
-- **Kill stale daemons before any cold-start measurement:** `pkill -9 -f "dist/bin/codegraph.js"; rm -f /tmp/codegraph-corpus/<repo>/.codegraph/daemon.*`. A zombie daemon holding the lock causes a 6s retry-exhaust that looks like a 7× regression (it bit me — the "6239ms" false alarm).
-- **`timeout` is NOT on macOS** (no coreutils) — measure cold-start with a `node` spawn + a `setTimeout` kill-timer (see the transcript's measurement snippets).
-- Corpus repos: `/tmp/codegraph-corpus/<repo>` (all 7 README repos indexed). Explore/trace changes are **query-time** (no re-index). The closure-collection synthesizer is **index-time** but produces 0 edges on non-Swift, so it's inert there.
-- Global `codegraph` is npm-linked to the dev dist (`node dist/bin/codegraph.js`). **Always `npm run build` before any probe/A/B** (they load `dist/`, not `src/`).
-- `engine.ts`/`tools.ts` now `import type CodeGraph` + lazy `require('../index')` (CommonJS, cached) so the daemon binds before the sqlite/query chain loads; `findNearestCodeGraphRoot` now comes from the light `../directory`.
-- The old `runProxy`/`pipeUntilClose` in `proxy.ts` are now DEAD (superseded by `runLocalHandshakeProxy`) — left in place; safe to prune in a follow-up.
-- 5 `npm-shim.test.ts` failures are pre-existing/network (need `--probe-net`) — NOT regressions; ignore.
-- Uncommitted `.gitignore` change (`tmux-web/`) is unrelated/not mine — do NOT commit it on this branch.
-- `parse-bench-readme.mjs` excludes raced runs by default; `CG_INCLUDE_RACED=1` keeps them to see the raw distribution. Now a safety net (race eliminated at source).
-
-## How to test & validate
-- `npm run build` → must be clean (exit 0).
-- `npx vitest run` → **1090 pass**, only the 5 npm-shim network fails.
-- `npx vitest run __tests__/mcp-daemon.test.ts` → **7/7** (sharing, #277 survive-client-death, version-mismatch fallback, idle-timeout).
-- Cold-start handshake (after killing daemons): node-spawn a `serve --mcp`, send `initialize`, time the id:1 response → **~90ms** (was ~811ms). Then a `tools/call` (e.g. `codegraph_status`) returns a real result (forwarded to the daemon, ~3.4s on vscode's first index load — a call that returns LATE, not a missing-tool error).
-- A/B sweep: `RUNS=4 bash scripts/agent-eval/bench-readme.sh` → `node scripts/agent-eval/parse-bench-readme.mjs /tmp/ab-readme`.
-- **Methodology:** handshake <150ms = race eliminated; in an A/B, grep the WITH jsonls for "No such tool available" (should be 0 now); WITH reads/tools < WITHOUT with no control regression.
-
-## Repo state
-- branch `feat/trace-relevance-closure-collection`, last commit `82ae484 perf(mcp): proxy answers initialize/tools-list locally — cold-start handshake ~600ms→~90ms`. In sync with remote (0/0). PR **#580** open.
-- uncommitted: ` M .gitignore` (unrelated tmux-web/ edit, not mine) + the new untracked `.claude/handoffs/` files. All code committed.
-
-## Open threads / TODO
-- [ ] Run median-of-4 README sweep on THIS build (race gone → should be naturally consistent, no exclusions needed).
-- [ ] Update README: benchmark summary table (7 rows × cost/tokens/time/tools), the 7 per-repo detail tables (Time/Reads/Grep/Tools/Tokens/Cost), the average line, the headline (line 7), and the methodology date (line 86).
-- [ ] Squash-merge PR #580 (`gh pr merge 580 --squash --admin` per the PR-merge-policy memory).
-- [ ] Add CHANGELOG entries under `[Unreleased]` for the cold-start fixes (trace fix + closure-collection + god-file already have entries from earlier commits — verify).
-- [ ] (cleanup) Prune dead `runProxy`/`pipeUntilClose` in `proxy.ts`.
-- [ ] (frontier, deferred) Alamofire `task` overload still seeds `RedirectHandler.task` (a harmless false-positive in named-seed ranking) — same nodeRelevance idea could de-rank it.
-
-## Recent transcript (oldest → newest)
-### Turn — "which repos drag the averages? Alamofire looks worst"
-- **Claude:** Computed per-column deviations from the README table → Alamofire #1 drag (tool-calls 13% vs 62% avg, time 7% vs 23%), gin #2; both ~110-file repos.
-- **Outcome:** Targeted Alamofire.
-### Turn — "dig into Alamofire A/B logs"
-- **Claude:** Read `/tmp/ab-readme/alamofire/run*/run-headless-with.jsonl` → found validate-flow break (`validators` closure-array) + per-run reasoning. Built closure-collection synthesizer + explore synth-links. A/B: flat (adoption variance).
-- **Outcome:** Synthesizer correct + sufficient (forced codegraph-only 3/3) but didn't move the noisy median.
-### Turn — "keep working till you figure it out"
-- **Claude:** Read transcripts again → "the trace collided with same-named symbols (44 requests, 8 tasks)". Found `scorePair` picked empty stubs. Added `nodeRelevance`. A/B n=8: tools 12→8, variance collapsed.
-- **Outcome:** The real Alamofire win. Committed `e86d573`; opened PR #580; squash-merge deferred.
-### Turn — "do the file-ranking + gather layers"
-- **Claude:** Built the 6-layer god-file rendering (`c64c4b3`). A/B flat-to-worse → debugged spirals → found the skeleton tag said "Read for more" (fixed `5d7388c`) → spirals persisted → root-caused the MCP cold-start race ("No such tool available").
-- **Outcome:** Race is the dominant inconsistency, not the code.
-### Turn — "keep iterating until consistent / do both (parser + startup)"
-- **Claude:** Bench parser race-aware (`dc19eab`). Deferred CodeGraph load off the bind path + 25ms poll (`91e28df`, 811→600ms).
-- **Outcome:** Consistency handled (parser) + 25% startup.
-### Turn — "how to optimize cold-start? / implement it"
-- **Claude:** Profiled (module loads 30-40ms, re-exec not it — it's the daemon-spawn wait). Implemented `runLocalHandshakeProxy` (`82ae484`): proxy answers initialize/tools-list locally + forwards calls + lazy local-engine fallback. Fixed 4 daemon tests (emit "Attached to shared daemon" + fast-fail version-mismatch + updated 1 assertion). Handshake 90ms; 28/28 MCP tests; full suite 1090 pass.
-- **Outcome:** Cold-start race ELIMINATED. All cold-start work committed + pushed. README sweep + squash-merge pending.

この差分においてかなりの量のファイルが変更されているため、一部のファイルを表示していません