Colby McHenry hace 1 mes
padre
commit
932f4cbf1b

+ 70 - 0
.claude/handoffs/explore-flow-tool-adoption.md

@@ -0,0 +1,70 @@
+---
+name: explore-flow-tool-adoption
+date: 2026-05-24 00:55
+project: codegraph
+branch: architectural-improvements
+summary: Investigated why codegraph's read savings don't convert to wall-clock; root cause is agent tool-CHOICE (under-uses trace). Shipped a chain of fixes; the breakthrough is "explore-surfaces-flow" — the first mechanism to show up in real agent runs by adapting the tool the agent already uses.
+---
+
+# Handoff: codegraph retrieval — tool adoption & explore-surfaces-flow
+
+## Resume here — read this first
+**Current state:** A long investigation into making agents answer flow questions faster with codegraph. 6 commits on `architectural-improvements` (all probe-validated, suite green 815). The breakthrough: **`codegraph_explore` now surfaces the execution flow** from the symbol-bag the agent already passes it (`PmsProductController getList PmsProductService list PmsProductServiceImpl` → leads output with `getList → service-interface → impl`, riding synth edges). It's the FIRST mechanism this whole arc to actually appear in real agent runs (spring-mall A/B: flow surfaced both runs, reads 2.0→1.5) — because it adapts the tool the agent USES instead of trying to make it use `trace`.
+
+**Immediate next step:** The user is weighing how to push tool-USE quality next (their open question). Decide between: (a) **extend explore-flow to surface more reliably** (spring-halo's query didn't name a connected co-named chain → no flow), (b) accept we're at the model-behavior ceiling and **wrap up**, or (c) the user's ideas — better tool-description *examples* (≈ steering, low-leverage per the evidence) or a *query-builder tool* (adds a call + new-tool adoption problem). My read: keep ADAPTING THE USED TOOL (the only thing that's worked); examples/new-tools are the "change the agent" direction that failed all session.
+
+> Suggested next message: "explore-flow only surfaced on 2 of 3 repos — dig into why spring-halo's explore query didn't produce a flow and make it surface more reliably" — OR — "we're at the model-behavior ceiling; let's stop and write the CHANGELOG/PR for this branch"
+
+## Goal
+Make an AI agent answer **flow questions** ("how does X reach Y", request→handler→service, state→render) fast: ~0 Read/Grep, few codegraph calls, lower wall-clock. `codegraph_trace` is the fastest tool (1 call = the path), but the agent under-uses it. Ultimate target = trace's speed, however the agent gets there.
+
+## Key findings (the through-line)
+- **The wall is agent tool-CHOICE, not the graph.** Matrix-wide, codegraph cuts reads −75% but wall-clock only −16% (`docs/benchmarks/codegraph-ab-matrix.md`). The floor is round-trips + the synthesis turn. The agent reliably calls `context`/`explore`, rarely `trace` (3/37 flow cells). Full analysis: `docs/benchmarks/call-sequence-analysis.md`.
+- **Steering does NOT move it** (arms B/F/G, 3 wording variants): an MCP `initialize` instruction / tool description can't match a CLI `--append-system-prompt`'s salience, and forcing trace where it doesn't connect regresses. Reverted.
+- **Sufficiency works** (committed): a self-sufficient `trace` (hop bodies + destination callees inlined) lets the unsteered agent stop — but only when it calls trace.
+- **THE breakthrough — adapt the tool the agent uses.** `explore`'s query is a precise symbol-bag spanning the flow, so `explore` finds the call path AMONG its named symbols and leads with it. First mechanism to surface in real runs + drop reads.
+- **What FAILED:** option 1 (context-surfaces-flow) — fuzzy DESCRIPTION can't disambiguate endpoints → confident WRONG-feature flow; reverted. trace multi-source-BFS over ambiguous names — same wrong-feature; reverted.
+
+## Gotchas
+- **Co-naming disambiguation must match qualifiedName SEGMENTS, not substrings** (`buildFlowFromNamedSymbols` in `src/mcp/tools.ts`): `list` is a substring of `getList` → kept every getList. Split `qualifiedName` on `::`/`.` and match segments.
+- **BFS must cap consecutive UNNAMED hops at 1** — full-graph BFS wanders a god-function's fan-out (excalidraw `render()` → pointer handlers → mutateElement). ≤1 bridge crosses a missing intermediate without wandering.
+- **`getCallees` returns non-`calls` edges too** (references) — filter `c.edge.kind === 'calls'`.
+- **Resolver/synthesizer changes need a CLEAN reindex**: `rm -rf .codegraph && codegraph init -i` (the init edge count is contains-only — query the DB for the real count). The explore-flow change is query-time (no reindex).
+- **n=2 A/B is noisy** — report ranges/patterns, never conclude from one run. Foreground `sleep` is blocked → run A/B batches with `run_in_background`.
+- Java/Kotlin `qualifiedName` is `Class::method` (so `matchesSymbol` resolves `Class.method` qualified trace endpoints — the agent already passes these).
+
+## How to test & validate
+- Probe flow surfacing (no agent): `node scripts/agent-eval/probe-explore.mjs <repo> "<SymbolA SymbolB SymbolC>"` → look for the `## Flow` section. `probe-trace.mjs <repo> <from> <to>` for trace.
+- Synthesizer: `sqlite3 <repo>/.codegraph/codegraph.db "select count(*) from edges where json_extract(metadata,'$.synthesizedBy')='interface-impl'"`; node count stable before/after reindex (synth adds edges only).
+- Agent A/B (the real test): `bash scripts/agent-eval/run-arms.sh <repo> "<Q>" I <run>` (arm I = body-trace build, no steering). Parse via the `cmp2.mjs`-style scripts in `/tmp`. Pass = flow surfaces (`flowShown=Y`) + reads ≤ baseline.
+- `npm test` (vitest, 815 pass); `__tests__/mcp-tool-allowlist.test.ts` covers the allowlist.
+
+## Repo state
+- branch `architectural-improvements`, last commit `bafae81 feat(mcp): codegraph_explore surfaces the execution flow from its named symbols`.
+- uncommitted: clean (only untracked `.claude/handoffs/`).
+- 6 session commits: `eab5cf3` self-sufficient trace + `CODEGRAPH_MCP_TOOLS` allowlist · `a6183d7` research log + arms harness · `bde8c19` node/trace line numbers · `98baf41` Java/Kotlin interface→impl synthesizer · `6f3c468` playbook · `bafae81` explore-surfaces-flow.
+- NOT pushed/merged. No version bump. CHANGELOG `[Unreleased]` has all of it.
+
+## Open threads / TODO
+- [ ] **User's open question** (answer in the next turn): better tool-description *examples* vs a *query-builder tool* vs keep adapting the used tool. Evidence favors the last.
+- [x] explore-flow reliability: now resolves QUALIFIED tokens (`Class.method`) — the agent's most precise input was being dropped by the file-ext strip (`2765c3c`). spring-halo's publish flow stays absent on purpose — it's **reactive/reconciler dispatch** (`publishPost` calls `ReactiveExtensionClient.get`/`awaitPostPublished`, not `PostService.publish`), so there's no static call chain. That's the next COVERAGE frontier (reactive runtimes — like MediatR, Vue Proxy), not an explore-flow bug.
+- [ ] Ship-prep for the whole branch (this arc + the earlier framework sweep): CHANGELOG version block + `package.json` bump + PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish`.
+- [ ] Frontiers: MediatR (`_mediator.Send`→Handle) and Vue/Compose reactive runtimes are still unbridged dynamic dispatch.
+
+## Recent transcript (oldest → newest)
+### Turn — "improve the A/B matrix; trace works, reads near 0 — what else?"
+- Diagnosed: reads at floor, wall-clock floor = round-trips + synthesis. Built `seq-matrix.mjs`; found trace adoption 3/37.
+### Turn — "do explore/context/trace compete? one tool?"
+- Ablation arms A–E (`run-arms.sh`/`arms-F.sh` + `CODEGRAPH_MCP_TOOLS` allowlist). explore = 68% of payload, load-bearing; trace path-scoped but under-adopted; trace alone insufficient.
+### Turn — "prototype body-inlining trace + A/B"
+- Arm F: self-sufficient trace wins WITH append-prompt steering. But steering isn't a shippable channel.
+### Turn — "port the steering + re-run"
+- Arms G (3 variants) all regressed vs baseline; arm H (body-trace, no steer) ≈ baseline. Steering reverted; body-trace + line-numbers + allowlist committed.
+### Turn — "tee up connectivity (Spring interface-DI)"
+- Built `interfaceOverrideEdges` (Java/Kotlin interface→impl, overload-aware). Probe: 3-hop trace connects. But A/B null — agent never called trace. Committed (probe-validated, adoption-gated).
+### Turn — "make context surface the flow (option 1)"
+- Failed: fuzzy query → wrong-feature flows. Reverted.
+### Turn — "change explore to do trace in the backend"
+- WIN: explore's query is a precise symbol-bag. `buildFlowFromNamedSymbols` (co-naming segment match + ≤1 bridge). Probe perfect (Spring + excalidraw full chains); A/B: flow surfaces + modest read drop. Committed `bafae81`.
+### Turn — "update memory + handoff; what about better examples / a query-builder tool?"
+- This handoff + memory update. Strategic answer pending (adapt-the-tool > change-the-agent).

+ 70 - 0
.claude/handoffs/framework-coverage-sweep-2026-05-23.md

@@ -0,0 +1,70 @@
+---
+name: framework-coverage-sweep-2026-05-23
+date: 2026-05-23 23:59
+project: codegraph
+branch: architectural-improvements
+summary: Dynamic-dispatch coverage sweep COMPLETE — all 14 README frameworks + every flow-relevant language validated (measure→fix→validate→test→playbook→commit). ~37 commits pushed, suite green. Ship-prep (CHANGELOG + PR to main) is the only thing left.
+---
+
+# Handoff: Dynamic-dispatch framework/language coverage sweep (complete)
+
+## Resume here — read this first
+**Current state:** The coverage sweep is **done**, AND a **frontier pass** closed the tractable partials. Every framework in the README's 14-row table is ✅, every flow-relevant language is validated (TS/JS, Python, Go, Java, C#, PHP, Ruby, Rust, Swift, Dart, Kotlin, Lua/Luau, Scala, C/C++), and the frontier pass added: React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource` (redash 6→77), Flask tuple methods + broader detection (flask-realworld 0→19), gorilla/mux confirmed. All committed/pushed to `architectural-improvements` (tree clean except untracked `.claude/handoffs/`). Full suite green (**809 passed**, 2 skipped; flaky `watcher.test.ts > debounced sync` passes on re-run). **No CHANGELOG entry exists, and the branch is not yet merged to main.**
+**Immediate next step:** Ship-prep — write a CHANGELOG entry grouping the whole sweep (route resolution for Flask/FastAPI/Drupal/Rust-Axum+actix/Vapor/Spring-Kotlin/Play + React Router routing; the Python builtin-name guard, Dart method-range, and C++ inheritance foundational fixes; the flutter-build and cpp-override synthesizer channels), bump `package.json`, then open a PR to main.
+
+> Suggested next message: "do ship-prep: write the CHANGELOG entry covering the whole framework/language coverage sweep on this branch, bump the version, and open a PR to main"
+
+## Goal
+Close static-extraction holes for **dynamic dispatch** across every language/framework codegraph supports, so cross-symbol flows (request→route→handler→service, state→render, virtual→override) exist in the graph and an agent answers flow questions with few codegraph calls and ~0 Read/Grep. Per framework/language: canonical flow `trace`s end-to-end, agent A/B shows fewer reads, no node explosion, recorded in `docs/design/dynamic-dispatch-coverage-playbook.md` (the matrix §6 + per-item notes §7). **This goal is now met; what remains is ship-prep + documented frontiers.**
+
+## Key findings (this session's work, all committed)
+- **Routing convention is the hole in every backend** — same pattern each time: the resolver/extractor assumed one syntax. Flask (intervening `@login_required`/stacked routes), FastAPI (empty `""` path), Drupal (`claimsReference` for FQCN `_form`/single-colon controllers + contrib `detect` via composer name/type/`.info.yml`), Rust/Axum (chained `get(h).post(h2)` + namespaced `mod::handler`), actix (builder API `web::resource().route(web::get().to(h))`), Vapor (grouped `routes.grouped("x"); x.get(use:h)` — was 0 on every real app), Spring **Kotlin** (`fun` handler syntax + `.kt`), Play (extensionless `conf/routes` → controller), React Router (`<Route>` JSX).
+- **Three FOUNDATIONAL fixes (broad benefit, not framework-specific):** (1) Python **bare-name builtin guard** in `src/resolution/index.ts` — a handler named `index`/`get`/`update` was filtered as a builtin method; mirror the dotted-branch `knownNames` guard. (2) **Dart method-range** in `src/extraction/tree-sitter.ts` `createNode` — Dart bodies are SIBLINGS of the signature, so methods were `end==start` (signature-only); extend `endLine` to the resolved body (guarded, child-body grammars no-op). (3) **C++ inheritance** — `extractInheritance` handled `base_clause` (PHP) but not C++ `base_class_clause`; added it (leveldb extends 219→298).
+- **Two new synthesizer channels** in `src/resolution/callback-synthesizer.ts` (Dart analog + C++ analog of react-render): `flutter-build` (a State method calling `setState(` → `build`) and `cpp-override` (base virtual method → subclass override of same name, gated to C++).
+- **measure-first repeatedly split "needs work" from "already covered":** Svelte, NestJS (prior), and this session **Lua/Luau** (module dispatch already resolves) + **Compose** (composition is plain function calls, already static) needed NO code. The assumed hole wasn't real.
+- **`claimsReference` pre-filter is the recurring gotcha** (`src/resolution/index.ts:497-503`): a route ref naming no declared symbol (FQCN, `Controller@method`, `controller#action`, `Class.method`) is dropped before `framework.resolve()` runs. Added for Drupal + Play this session.
+
+## Gotchas
+- **`claimsReference`:** if a new framework's route refs don't resolve despite a correct `resolve()`, it's the pre-filter — add `claimsReference`.
+- **Reindex picks up resolver changes only on a CLEAN index:** `codegraph index` is incremental (skips unchanged files); after `npm run build`, do `rm -rf .codegraph && codegraph init -i` to re-extract. The init message's edge count is contains-only (~misleading); query the DB for the real count.
+- **Extraction changes are high blast radius** (shared `createNode`/`extractInheritance`): re-check node counts on control repos (excalidraw 9,290 / django 302) — the Dart/C++ fixes are guarded to only-extend / C++-only, controls unchanged.
+- **Play `conf/routes` is extensionless** → needed `isPlayRoutesFile` opt-in in `grammars.ts` (isSourceFile + detectLanguage→'yaml' no-grammar path). Narrow match, only ADDS Play files.
+- **Flaky:** `watcher.test.ts > debounced sync > should trigger sync after file change` — timing-based, passes on re-run; unrelated to any of this work.
+- **Foreground `sleep` is blocked** in Bash → background A/B batches (`run_in_background: true`), read the task output file. zsh quirks: quote globs (`'*.vue'`); SQL `count(*)` in `$(...)` needs care with quotes.
+- Global `codegraph` is npm-linked to this repo's `dist/`; `npm run build` then reindex. A/B harness: `scripts/agent-eval/run-all.sh <repo> "<Q>" headless` (with vs empty MCP), parse via `node scripts/agent-eval/parse-run.mjs`.
+
+## How to test & validate (the per-framework loop)
+- Corpus in `/tmp/codegraph-corpus/<name>` (clone S/M/L, `git clone --depth 1`). Index: `rm -rf .codegraph && codegraph init -i`.
+- Measure holes: `sqlite3 .codegraph/codegraph.db "select count(*) from nodes where kind='route'"` + route→handler edges (`join edges on source where kind='references'`). Node-count before/after (no explosion).
+- Flow: `node scripts/agent-eval/probe-node.mjs <repo> <symbol>` (shows Called-by/Calls trail) / `probe-trace.mjs <repo> <from> <to>`.
+- Agent A/B (≥2 runs/arm, variance is real): `run-all.sh` headless, record Read/Grep/duration/codegraph. Pass = fewer reads with codegraph.
+- Tests: `npm test` (vitest). Resolver extract tests in `__tests__/frameworks.test.ts`; end-to-end in `__tests__/frameworks-integration.test.ts` (real CodeGraph + indexAll); Dart range in `__tests__/extraction.test.ts`; Drupal in `__tests__/drupal.test.ts`.
+
+## Repo state
+- branch `architectural-improvements`, last commit `42a0178 docs(playbook): record frontier pass; test(go): gorilla/mux`.
+- uncommitted: clean (only untracked `.claude/handoffs/`).
+- ~37 commits total on the branch (handoff's original 11 frameworks + this session's: Flask/FastAPI, Drupal, Rust/Axum, Vapor, React Router, actix, Dart, Kotlin, Lua, Scala/Play, C/C++ — each a feat + a docs(playbook) commit; Lua was docs-only).
+
+## Open threads / TODO
+- [ ] **SHIP-PREP (the only blocker to merge):** CHANGELOG entry for the whole sweep, `package.json` bump, PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish` (see CLAUDE.md).
+- [x] **Frontier pass DONE (commits 0456915, 03e49ab, 42a0178):** React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource`, Flask tuple methods + detection, gorilla/mux confirmed.
+- [ ] **Frontiers LEFT (deliberately, with rationale in playbook §7 "Frontier pass"):** anonymous/inline closures (def-use frontier), metaprogramming finders (AR/Eloquent/JPA/EF), reactive runtimes (Vue Proxy / Compose recomposition), Akka actors, C callback-struct 422-way fan-out, C++ pure-virtual base methods, React lazy data-router (variable paths + lazy imports), Play SIRD, Nuxt-specific. Forcing these adds noise.
+- [ ] Pre-existing, unrelated: Next.js `*.config.mjs` in a `pages/` dir treated as a route (false-positive found in bulletproof-react).
+
+## Recent transcript (oldest → newest, this session)
+### Turn — "what's left / what's next on coverage" → did Flask/FastAPI
+- 3 holes: Flask intervening/stacked decorators, FastAPI empty path, **Python bare-name builtin guard** (handlers named `index`/`get` filtered). microblog 6→27, realworld 12→20, dispatch 290/290. Fixed 6 stale Laravel/Rails tests too. Committed + pushed.
+### Turn — "Drupal next"
+- `claimsReference` for FQCN/_form/single-colon controllers + contrib `detect` (composer type/name + `.info.yml`). core 536→731 (87%), admin_toolbar 0→14. OOP `#[Hook]` = frontier. Committed.
+### Turn — "Rust: Axum/actix/Rocket"
+- Axum chained methods + namespaced handlers (realworld 12→19, 19/19); Rocket already 99%; **actix builder API** `web::resource().route(web::get().to())` (examples 51→128). Committed (2 commits: axum, then actix).
+### Turn — "Vapor (Swift)"
+- Resolver was 0-routes on every real app; rewrote for any receiver + optional non-string paths + `.grouped` prefix tracking + `use:` discriminator. template 0→3, SteamPress 0→27, SPI 0→14. Committed.
+### Turn — "2, 3, 4" (React Router, actix [done above], Dart/Flutter)
+- React Router `<Route>` JSX (react-realworld 0→10). Dart/Flutter: **method-range fix** (foundational) + `flutter-build` setState→build synthesizer. Committed.
+### Turn — "Kotlin next"
+- Spring resolver `['java']`→`['java','kotlin']` + `fun` handler regex (petclinic-kotlin 0→18, 18/18; Java unchanged 19/19). Compose composition already static. Committed.
+### Turn — "Lua/Luau, Scala, C/C++ (Lua first, but do all three)"
+- **Lua:** measure-first → module dispatch already covered (telescope 335 cross-file calls); no code change, validated. **Scala/Play:** `conf/routes` file-walk opt-in + Play resolver (computer-database 0→8). **C/C++:** general dispatch strong (redis 29k); fixed C++ `base_class_clause` inheritance + `cpp-override` synthesizer (leveldb 12 precise). All committed + pushed.
+### Turn — "wrap up + refresh handoff"
+- This handoff. Sweep complete; ship-prep (CHANGELOG + PR) is the remaining work.