Przeglądaj źródła

feat(mcp): steer flow questions to codegraph_trace first (tightens variance)

codegraph_trace was absent from every steering intent map — all three guidance
files routed "how does X reach Y" to context+explore, never to the trace tool.
So agents used trace only by chance; when one didn't, it floundered
reconstructing the path with search+callers (an 18-call run vs ~6 for trace-users).

Add codegraph_trace to the intent map + a "flow" common chain (trace from->to
FIRST = the whole path in one call, then ONE explore for bodies) across all three
synced files (server-instructions, instructions-template, .cursor rule).

Validated on excalidraw (hard "to the screen" Q, n=4 before/after):
- call count 3-10 -> 3-4 (over-drill outlier gone)
- duration 64-112s -> 51-74s
- trace adoption 3/4 -> 4/4; search+callers path-reconstruction -> 0
- fully-clean runs (0 Read, 0 Grep) 0/4 -> 2/4; best 3 cg / 0 / 0 / 51s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby McHenry 1 miesiąc temu
rodzic
commit
bfcd873993

+ 2 - 1
.cursor/rules/codegraph.mdc

@@ -16,6 +16,7 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "Where is X defined?" / "Find symbol named X" | `codegraph_search` |
 | "What calls function Y?" | `codegraph_callers` |
 | "What does Y call?" | `codegraph_callees` |
+| "How does X reach/become Y? / trace the flow from X to Y" | `codegraph_trace` (one call = the whole path, incl. callback/React/JSX dynamic hops) |
 | "What would break if I changed Z?" | `codegraph_impact` |
 | "Show me Y's signature / source / docstring" | `codegraph_node` |
 | "Give me focused context for a task/area" | `codegraph_context` |
@@ -25,7 +26,7 @@ Use codegraph for **structural** questions — what calls what, what would break
 
 ### Rules of thumb
 
-- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. For a specific **flow** ("how does X reach Y") start with `codegraph_trace` from→to — one call returns the whole path with dynamic hops bridged — then ONE `codegraph_explore` for the bodies; don't rebuild the path with `codegraph_search` + `codegraph_callers`. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
 - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.

+ 3 - 2
CLAUDE.md

@@ -136,9 +136,10 @@ The template to replicate per language/framework. Question: *"how does updating
 |---|---|---|---|---|
 | Without codegraph | 115–139s | 9–10 | 10–11 | 0 |
 | Broken (explore-budget regression) | 131–139s | 5–10 | 3–5 | 6–14 |
-| Fixed (budget + msgs + full synthesis) | **64–112s** | **0–2** | 2–4 | **3–10** |
+| Fixed (budget + msgs + synthesis) | 64–112s | 0–2 | 2–4 | 3–**10** |
+| + trace-first steering | **51–74s** | **0–2** | 0–4 | **3–4** |
 
-Numbers are n=4 unhooked runs of the same prompt; **best run 0 Read / 3 codegraph / 76s**, typical ~1 Read / ~4 codegraph / ~78s, with an occasional over-drill outlier (10 codegraph / 2 Read / 112s). Run-to-run variance is large — that's expected; report the range, never a single run. Validated: `trace(mutateElement, renderStaticScene)` connects in **6 hops** across all three boundaries (`mutateElement → triggerUpdate → [callback] triggerRender → [react-render] render → [jsx] StaticCanvas → renderStaticScene`), each hop showing inline source + the wiring site; node count stable at 9,289; 1 callback + 46 react-render + 280 jsx-render synthesized edges (no explosion, precision-checked).
+n=4 unhooked runs/stage, same prompt. After steering flow questions to `codegraph_trace` first: **best run 0 Read / 0 Grep / 3 codegraph / 51s**; **2 of 4 fully clean** (0 Read, 0 Grep). Steering eliminated the over-drill variance — call count tightened from 3–10 to 3–4, trace adoption went 3/4 → 4/4, and the `search`+`callers` path-reconstruction floundering dropped to 0. Run-to-run variance is still real; report the range, never a single run. **Residual reads/greps are all the nonce data-flow** (`canvasNonce` — a local prop with no graph edges); that's the def-use/data-flow frontier, left deliberately uncovered (tracking every local would explode the graph). Validated: `trace(mutateElement, renderStaticScene)` connects in **6 hops** across all three boundaries (`mutateElement → triggerUpdate → [callback] triggerRender → [react-render] render → [jsx] StaticCanvas → renderStaticScene`), each hop showing inline source + the wiring site; node count stable at 9,289; 1 callback + 46 react-render + 280 jsx-render synthesized edges (no explosion, precision-checked).
 
 ## Tests
 

+ 2 - 1
src/installer/instructions-template.ts

@@ -34,6 +34,7 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "Where is X defined?" / "Find symbol named X" | \`codegraph_search\` |
 | "What calls function Y?" | \`codegraph_callers\` |
 | "What does Y call?" | \`codegraph_callees\` |
+| "How does X reach/become Y? / trace the flow from X to Y" | \`codegraph_trace\` (one call = the whole path, incl. callback/React/JSX dynamic hops) |
 | "What would break if I changed Z?" | \`codegraph_impact\` |
 | "Show me Y's signature / source / docstring" | \`codegraph_node\` |
 | "Give me focused context for a task/area" | \`codegraph_context\` |
@@ -43,7 +44,7 @@ Use codegraph for **structural** questions — what calls what, what would break
 
 ### Rules of thumb
 
-- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: \`codegraph_context\` first, then ONE \`codegraph_explore\` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture questions, answer with 2-3 codegraph calls: \`codegraph_context\` first, then ONE \`codegraph_explore\` for the source of the symbols it surfaces. For a specific **flow** ("how does X reach Y") start with \`codegraph_trace\` from→to — one call returns the whole path with dynamic hops bridged — then ONE \`codegraph_explore\` for the bodies; don't rebuild the path with \`codegraph_search\` + \`codegraph_callers\`. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. \`codegraph_search\` is faster and returns kind + location + signature in one call.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one call.

+ 2 - 0
src/mcp/server-instructions.ts

@@ -38,6 +38,7 @@ of calls; a grep/read exploration is dozens.
 
 - **"What is the symbol named X?"** → \`codegraph_search\`
 - **"What's the deal with this task / feature / area?"** → \`codegraph_context\` (PRIMARY — composes search + node + callers + callees in one call)
+- **"How does X reach/become Y? / trace the flow / the path from X to Y"** → \`codegraph_trace\` (ONE call returns the whole call path, including dynamic-dispatch hops — callbacks, React re-render, JSX children — that grep can't follow)
 - **"What calls this?"** → \`codegraph_callers\`
 - **"What does this call?"** → \`codegraph_callees\`
 - **"What would changing this break?"** → \`codegraph_impact\`
@@ -48,6 +49,7 @@ of calls; a grep/read exploration is dozens.
 
 ## Common chains
 
+- **Flow / "how does X reach Y"**: \`codegraph_trace\` from→to FIRST — one call returns the entire path with dynamic-dispatch hops bridged. Then ONE \`codegraph_explore\` for the hop bodies if you need them. Do NOT reconstruct the path with \`codegraph_search\` + \`codegraph_callers\` — that's exactly what trace does in a single call.
 - **Onboarding**: \`codegraph_context\` first. If still unclear, \`codegraph_explore\` for breadth, then \`codegraph_node\` on specific symbols.
 - **Refactor planning**: \`codegraph_search\` → \`codegraph_callers\` → \`codegraph_impact\`. The blast-radius answer comes from impact, not from walking callers manually.
 - **Debugging a regression**: \`codegraph_callers\` of the suspected symbol; widen with \`codegraph_impact\` if an unexpected call appears.