name: explore-flow-tool-adoption date: 2026-05-24 00:55 project: codegraph branch: architectural-improvements
Current state: A long investigation into making agents answer flow questions faster with codegraph. 6 commits on architectural-improvements (all probe-validated, suite green 815). The breakthrough: codegraph_explore now surfaces the execution flow from the symbol-bag the agent already passes it (PmsProductController getList PmsProductService list PmsProductServiceImpl → leads output with getList → service-interface → impl, riding synth edges). It's the FIRST mechanism this whole arc to actually appear in real agent runs (spring-mall A/B: flow surfaced both runs, reads 2.0→1.5) — because it adapts the tool the agent USES instead of trying to make it use trace.
Immediate next step: The user is weighing how to push tool-USE quality next (their open question). Decide between: (a) extend explore-flow to surface more reliably (spring-halo's query didn't name a connected co-named chain → no flow), (b) accept we're at the model-behavior ceiling and wrap up, or (c) the user's ideas — better tool-description examples (≈ steering, low-leverage per the evidence) or a query-builder tool (adds a call + new-tool adoption problem). My read: keep ADAPTING THE USED TOOL (the only thing that's worked); examples/new-tools are the "change the agent" direction that failed all session.
Suggested next message: "explore-flow only surfaced on 2 of 3 repos — dig into why spring-halo's explore query didn't produce a flow and make it surface more reliably" — OR — "we're at the model-behavior ceiling; let's stop and write the CHANGELOG/PR for this branch"
Make an AI agent answer flow questions ("how does X reach Y", request→handler→service, state→render) fast: ~0 Read/Grep, few codegraph calls, lower wall-clock. codegraph_trace is the fastest tool (1 call = the path), but the agent under-uses it. Ultimate target = trace's speed, however the agent gets there.
docs/benchmarks/codegraph-ab-matrix.md). The floor is round-trips + the synthesis turn. The agent reliably calls context/explore, rarely trace (3/37 flow cells). Full analysis: docs/benchmarks/call-sequence-analysis.md.initialize instruction / tool description can't match a CLI --append-system-prompt's salience, and forcing trace where it doesn't connect regresses. Reverted.trace (hop bodies + destination callees inlined) lets the unsteered agent stop — but only when it calls trace.explore's query is a precise symbol-bag spanning the flow, so explore finds the call path AMONG its named symbols and leads with it. First mechanism to surface in real runs + drop reads.buildFlowFromNamedSymbols in src/mcp/tools.ts): list is a substring of getList → kept every getList. Split qualifiedName on ::/. and match segments.render() → pointer handlers → mutateElement). ≤1 bridge crosses a missing intermediate without wandering.getCallees returns non-calls edges too (references) — filter c.edge.kind === 'calls'.rm -rf .codegraph && codegraph init -i (the init edge count is contains-only — query the DB for the real count). The explore-flow change is query-time (no reindex).sleep is blocked → run A/B batches with run_in_background.qualifiedName is Class::method (so matchesSymbol resolves Class.method qualified trace endpoints — the agent already passes these).node scripts/agent-eval/probe-explore.mjs <repo> "<SymbolA SymbolB SymbolC>" → look for the ## Flow section. probe-trace.mjs <repo> <from> <to> for trace.sqlite3 <repo>/.codegraph/codegraph.db "select count(*) from edges where json_extract(metadata,'$.synthesizedBy')='interface-impl'"; node count stable before/after reindex (synth adds edges only).bash scripts/agent-eval/run-arms.sh <repo> "<Q>" I <run> (arm I = body-trace build, no steering). Parse via the cmp2.mjs-style scripts in /tmp. Pass = flow surfaces (flowShown=Y) + reads ≤ baseline.npm test (vitest, 815 pass); __tests__/mcp-tool-allowlist.test.ts covers the allowlist.architectural-improvements, last commit bafae81 feat(mcp): codegraph_explore surfaces the execution flow from its named symbols..claude/handoffs/).eab5cf3 self-sufficient trace + CODEGRAPH_MCP_TOOLS allowlist · a6183d7 research log + arms harness · bde8c19 node/trace line numbers · 98baf41 Java/Kotlin interface→impl synthesizer · 6f3c468 playbook · bafae81 explore-surfaces-flow.[Unreleased] has all of it.Class.method) — the agent's most precise input was being dropped by the file-ext strip (2765c3c). spring-halo's publish flow stays absent on purpose — it's reactive/reconciler dispatch (publishPost calls ReactiveExtensionClient.get/awaitPostPublished, not PostService.publish), so there's no static call chain. That's the next COVERAGE frontier (reactive runtimes — like MediatR, Vue Proxy), not an explore-flow bug.package.json bump + PR to main. Releases go through .github/workflows/release.yml only — do NOT npm publish._mediator.Send→Handle) and Vue/Compose reactive runtimes are still unbridged dynamic dispatch.Diagnosed: reads at floor, wall-clock floor = round-trips + synthesis. Built seq-matrix.mjs; found trace adoption 3/37.
Ablation arms A–E (run-arms.sh/arms-F.sh + CODEGRAPH_MCP_TOOLS allowlist). explore = 68% of payload, load-bearing; trace path-scoped but under-adopted; trace alone insufficient.
Arm F: self-sufficient trace wins WITH append-prompt steering. But steering isn't a shippable channel.
Arms G (3 variants) all regressed vs baseline; arm H (body-trace, no steer) ≈ baseline. Steering reverted; body-trace + line-numbers + allowlist committed.
Built interfaceOverrideEdges (Java/Kotlin interface→impl, overload-aware). Probe: 3-hop trace connects. But A/B null — agent never called trace. Committed (probe-validated, adoption-gated).
Failed: fuzzy query → wrong-feature flows. Reverted.
WIN: explore's query is a precise symbol-bag. buildFlowFromNamedSymbols (co-naming segment match + ≤1 bridge). Probe perfect (Spring + excalidraw full chains); A/B: flow surfaces + modest read drop. Committed bafae81.
This handoff + memory update. Strategic answer pending (adapt-the-tool > change-the-agent).