Question: Does answering a "how does X work?" question directly with CodeGraph in the main session bloat main-session context — and would Claude Code be better off delegating that exploration to a disposable Explore agent (which keeps main context lean by absorbing the file reads in a sub-transcript)? And critically: does the answer change at scale, on a codebase far larger than Excalidraw?
Short answer: No. With CodeGraph, main-session context is roughly scale-invariant (~50k)
because the retrieval is targeted and the explore payload is budget-capped — it does not
balloon on a 16× larger repo. Answering directly wins at every scale: same-or-leaner main
context than the delegation path, zero file reads, and ~28% fewer tokens. The
delegation-for-hygiene advantage stays marginal even on a large codebase.
scripts/agent-eval/itrun.sh (tmux),
not headless claude -p. This matters: headless spawns 0 Explore agents, so it cannot
measure delegation behavior at all; only the interactive TUI does.WITH = CodeGraph in the MCP config; WITHOUT = empty MCP config (--strict-mcp-config).opus. n = 3 runs per arm. Main and sub-agent transcripts parsed
(scripts/agent-eval/parse-session.mjs); reads/bash are summed across main + sub-agents.Context X/Y for the main thread (sub-agent
context does not count against it). "billable tokens" = summed per-turn assistant usage
(input + output + cache read + cache creation).Question: "How does Excalidraw render and update canvas elements?"
| metric | WITH codegraph | WITHOUT |
|---|---|---|
| Explore agents spawned | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2 of 3) |
| main-session context | 51k / 49k / 50k (~50k) | 48k / 34k / 26k (~36k) |
| total tool calls | 4 / 4 / 4 | 16 / 55 / 37 |
| Reads (main+sub) | 0 / 0 / 0 | 6 / 25 / 16 |
| billable tokens | ~127k | ~175k |
Question: "How does the extension host communicate with the main process?"
| metric | WITH codegraph | WITHOUT |
|---|---|---|
| main-session context | 47k / 43k / 50k (~47k) | 54k / 29k / 31k (~38k) |
| Explore agents | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2/3) |
| codegraph calls | ~8 (search + explore×2–3 + context) | 0 |
| Reads (main+sub) | 0 / 1 / 0 | 6 / 26 / 19 |
| billable tokens | ~126k | ~176k |
Main-session context is scale-invariant with CodeGraph. With codegraph, main-session
context was ~47k on VS Code — essentially identical to Excalidraw's ~50k, despite a 16×
bigger repo. It didn't balloon. Reason: codegraph's explore payload is budget-capped and
retrieval is targeted — answering one question pulls in the relevant flow/area, not more
just because the repo is huge. So codegraph makes main-session context roughly scale-invariant
(~50k). The delegation-for-hygiene advantage stays marginal even on a large codebase — exactly
the opposite of "it gets significant at scale."
The thing that would balloon at scale is reading many big files directly into main — and Claude Code avoids that without codegraph by delegating to an Explore agent (29–31k main), but at the cost of 17–26 reads and ~28% more tokens. CodeGraph keeps main lean a better way: a capped, targeted payload — no delegation, 0 reads.
On "the Explore agents use codegraph." I couldn't reproduce it: across 6/6
with-codegraph runs (both repos), Claude Code never delegated — it answered directly every
time. The Explore-agent path only appeared in the without arm (using grep/read, since codegraph
wasn't in that config). So with the current instructions + codegraph present, Claude Code stays
in the main session — the lean-main-via-Explore-agent best case simply isn't what happens;
lean-main-via-capped-codegraph is, and it's cheaper.
"Answer directly with codegraph" wins for Claude Code too — at every scale. No per-agent
split is needed; the unified "answer directly" instruction is right for Claude Code and for
Codex / Cursor / opencode (which have no Explore-agent mechanism and would otherwise read files
directly). This conclusion drove updating the README's ## CodeGraph example block, which
previously told agents to "NEVER call codegraph_explore directly / ALWAYS spawn an Explore
agent" — i.e., it steered Claude Code toward the worse (17–26 read, ~28%-more-token) path.
Caveat / future work (not a blocker): an Explore agent that itself uses codegraph could in principle get lean-main and low-work. But the "answer directly" instruction prevents delegation in practice (0 delegations observed across 6 runs), the main-context gain would be marginal (~50k → ~30k, both a few percent of a 1M window), and it adds a sub-agent round-trip. Worth a future experiment, not a default.