Преглед на файлове

perf(mcp): answer-directly steering — ~35% cheaper, ~70% fewer tool calls (#224)

* perf(mcp): steer agents to answer directly instead of delegating to subagents

CodeGraph beats native grep/read on cost only when the agent queries it
directly. When the agent delegates to file-reading sub-agents, those
sub-agents read files regardless of the index, so CodeGraph becomes net
overhead on top of the reads. The install templates even told agents to
"spawn a subagent for explore-class questions" — the expensive path.

Changes:
- server-instructions + both install templates: add an "Answer directly —
  don't delegate exploration" directive; reposition codegraph_explore as the
  efficient one-call multi-symbol tool (was: "spawn a subagent for it").
- codegraph_explore: hard-cap output to its adaptive budget (it overran,
  ~30k vs a 28k cap) and tighten the medium tier (28k->13k).
- codegraph_node: return a member outline for container kinds instead of the
  full class body.

Rigorous N>=4-per-arm warm-block benchmark (median total_cost_usd):
  excalidraw (~600 files):  WITH $0.54 vs native $1.02  (-47%)
  vscode     (~10k files):  WITH $0.41 vs native $0.72  (-42%)
  ky         (~25 files):   WITH $0.46 vs native $0.44  (wash)
Answers were equal-or-better (correct, file:line-cited) with ~6x fewer tool
calls; the directive drove the direct path on 14/14 codegraph runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(readme): rebuild benchmark with real-world repos + cost/token/time/tool savings

Replace the "Claude Code (Python+Rust/Java)" rows — which benchmarked the
Claude Code CLI repo, not real codebases in those languages — with real
open-source projects per language: Django (Python), Tokio (Rust), OkHttp
(Java), Gin (Go), plus Alamofire (Swift) and the existing TypeScript repos
(VS Code, Excalidraw).

The table now reports all four savings the change targets — cost, tokens,
time, tool calls — as the median of 4 runs per arm (Claude Opus 4.7,
headless claude -p, with vs empty MCP config). Averages across the 7 repos:
35% cheaper, 59% fewer tokens, 49% faster, 70% fewer tool calls. Adds a
methodology note and raw WITH->WITHOUT medians.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby Mchenry преди 1 месец
родител
ревизия
f5bbc26c60
променени са 6 файла, в които са добавени 154 реда и са изтрити 67 реда
  1. 3 2
      .cursor/rules/codegraph.mdc
  2. 25 1
      CHANGELOG.md
  3. 35 46
      README.md
  4. 3 2
      src/installer/instructions-template.ts
  5. 14 2
      src/mcp/server-instructions.ts
  6. 74 14
      src/mcp/tools.ts

+ 3 - 2
.cursor/rules/codegraph.mdc

@@ -19,16 +19,17 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "What would break if I changed Z?" | `codegraph_impact` |
 | "What would break if I changed Z?" | `codegraph_impact` |
 | "Show me Y's signature / source / docstring" | `codegraph_node` |
 | "Show me Y's signature / source / docstring" | `codegraph_node` |
 | "Give me focused context for a task/area" | `codegraph_context` |
 | "Give me focused context for a task/area" | `codegraph_context` |
-| "Survey an unfamiliar module/topic" | `codegraph_explore` |
+| "See several related symbols' source at once" | `codegraph_explore` |
 | "What files exist under path/" | `codegraph_files` |
 | "What files exist under path/" | `codegraph_files` |
 | "Is the index healthy?" | `codegraph_status` |
 | "Is the index healthy?" | `codegraph_status` |
 
 
 ### Rules of thumb
 ### Rules of thumb
 
 
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
 - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
 - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.
 - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.
-- **`codegraph_explore` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean.
+- **Don't loop `codegraph_node` over many symbols** — one `codegraph_explore` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 
 
 ### If `.codegraph/` doesn't exist
 ### If `.codegraph/` doesn't exist

+ 25 - 1
CHANGELOG.md

@@ -33,6 +33,25 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   setup is actually fast. `codegraph uninit` removes any hooks it installed.
   setup is actually fast. `codegraph uninit` removes any hooks it installed.
 
 
 ### Changed
 ### Changed
+- **MCP / agent guidance**: CodeGraph now tells agents to answer "how does X
+  work" / architecture questions *directly* — `codegraph_context`, then one
+  `codegraph_explore` for the surfaced symbols — instead of delegating to a
+  file-reading sub-agent or a grep+read loop. The server instructions and the
+  installed instruction files (`CLAUDE.md`, `.cursor/rules/codegraph.mdc`,
+  `AGENTS.md`) previously suggested *spawning a sub-agent* for explore-class
+  questions, which produced the opposite, more expensive behavior: the
+  sub-agent reads files regardless of the index, so CodeGraph became overhead
+  stacked on top of the reads. In rigorous N≥4-per-arm benchmarks this cut the
+  cost of an architecture question by ~42–47% versus a no-CodeGraph agent on
+  medium and large repos (Excalidraw ~600 files, VS Code ~10k), with
+  equal-or-better, `file:line`-cited answers and ~6× fewer tool calls; on a
+  tiny repo (~25 files) it's a wash, since native grep is already trivially
+  cheap there.
+- **MCP / codegraph_node**: `includeCode=true` on a class/interface/struct/enum
+  now returns a compact member outline (fields + method signatures + line
+  numbers) instead of the entire class body — which could be thousands of
+  characters and was rarely needed in full. Functions and methods still return
+  their full body; request a specific member for its source.
 - **Minimum Node.js is now 20** (was 18). Node 18 is end-of-life and the
 - **Minimum Node.js is now 20** (was 18). Node 18 is end-of-life and the
   native SQLite binding (`better-sqlite3` 12.x) no longer ships a Node 18
   native SQLite binding (`better-sqlite3` 12.x) no longer ships a Node 18
   prebuilt binary. Node 22 LTS and Node 24 get the native backend out of the
   prebuilt binary. Node 22 LTS and Node 24 get the native backend out of the
@@ -48,7 +67,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   now scales with indexed file count: small projects (<500 files) cap at
   now scales with indexed file count: small projects (<500 files) cap at
   ~18KB and skip the "Additional relevant files" / completeness / explore-
   ~18KB and skip the "Additional relevant files" / completeness / explore-
   budget reminders that earn their keep on bigger codebases; medium
   budget reminders that earn their keep on bigger codebases; medium
-  (<5,000) caps at ~28KB; large (<15,000) keeps the historical ~35KB; very
+  (<5,000) caps at ~13KB; large (<15,000) keeps the historical ~35KB; very
   large goes up to ~38KB. A new per-file char cap also prevents a single
   large goes up to ~38KB. A new per-file char cap also prevents a single
   file with many adjacent symbols from collapsing into one whole-file dump
   file with many adjacent symbols from collapsing into one whole-file dump
   (the Alamofire `Session.swift` case from #185). Per-file cluster
   (the Alamofire `Session.swift` case from #185). Per-file cluster
@@ -63,6 +82,11 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   Thanks to [@essopsp](https://github.com/essopsp) for the repro.
   Thanks to [@essopsp](https://github.com/essopsp) for the repro.
 
 
 ### Fixed
 ### Fixed
+- **MCP / explore**: `codegraph_explore` output is now hard-capped to its
+  adaptive size budget. It could previously overrun (e.g. ~30K against a 28K
+  cap) once the relationship map and trailer sections were appended; the
+  oversized payload then sat in the agent's context and was re-read on every
+  later turn.
 - **Sync / status**: git-untracked files are no longer reported as pending
 - **Sync / status**: git-untracked files are no longer reported as pending
   "Added" forever. After `codegraph sync` indexed a newly-created untracked
   "Added" forever. After `codegraph sync` indexed a newly-created untracked
   source file, `codegraph status` kept listing it under Pending Changes and
   source file, `codegraph status` kept listing it under Pending Changes and

+ 35 - 46
README.md

@@ -4,7 +4,7 @@
 
 
 ### Supercharge Claude Code, Cursor, Codex, and OpenCode with Semantic Code Intelligence
 ### Supercharge Claude Code, Cursor, Codex, and OpenCode with Semantic Code Intelligence
 
 
-**94% fewer tool calls · 77% faster exploration · 100% local**
+**~35% cheaper · ~70% fewer tool calls · 100% local**
 
 
 [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph)
 [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -50,61 +50,50 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
 
 
 ### Benchmark Results
 ### Benchmark Results
 
 
-Tested across 6 real-world codebases comparing Claude Code's Explore agent **with** and **without** CodeGraph:
+Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**.
 
 
-> **Average: 92% fewer tool calls · 71% faster**
+> **Average: 35% cheaper · 59% fewer tokens · 49% faster · 70% fewer tool calls**
 
 
-| Codebase | With CG | Without CG | Improvement |
-|----------|---------|------------|-------------|
-| **VS Code** · TypeScript | 3 calls, 17s | 52 calls, 1m 37s | **94% fewer · 82% faster** |
-| **Excalidraw** · TypeScript | 3 calls, 29s | 47 calls, 1m 45s | **94% fewer · 72% faster** |
-| **Claude Code** · Python + Rust | 3 calls, 39s | 40 calls, 1m 8s | **93% fewer · 43% faster** |
-| **Claude Code** · Java | 1 call, 19s | 26 calls, 1m 22s | **96% fewer · 77% faster** |
-| **Alamofire** · Swift | 3 calls, 22s | 32 calls, 1m 39s | **91% fewer · 78% faster** |
-| **Swift Compiler** · Swift/C++ | 6 calls, 35s | 37 calls, 2m 8s | **84% fewer · 73% faster** |
+| Codebase | Language | Cost | Tokens | Time | Tool calls |
+|----------|----------|------|--------|------|------------|
+| **VS Code** | TypeScript · ~10k files | 35% cheaper | 73% fewer | 41% faster | 72% fewer |
+| **Excalidraw** | TypeScript · ~600 | 47% cheaper | 73% fewer | 60% faster | 86% fewer |
+| **Django** | Python · ~2.7k | 34% cheaper | 64% fewer | 59% faster | 81% fewer |
+| **Tokio** | Rust · ~700 | 52% cheaper | 81% fewer | 63% faster | 89% fewer |
+| **OkHttp** | Java · ~640 | 17% cheaper | 41% fewer | 36% faster | 64% fewer |
+| **Gin** | Go · ~150 | 22% cheaper | 23% fewer | 34% faster | 19% fewer |
+| **Alamofire** | Swift · ~100 | 38% cheaper | 59% fewer | 51% faster | 77% fewer |
+
+The gains scale with codebase size: on large repos the agent answers from the index in a handful of calls with **zero file reads**, while the no-CodeGraph agent fans out across grep/find/Read (and the sub-agents it spawns). On a small repo like Gin (~150 files) native search is already cheap, so the margin narrows.
 
 
 <details>
 <details>
 <summary><strong>Full benchmark details</strong></summary>
 <summary><strong>Full benchmark details</strong></summary>
 
 
-All tests used Claude Opus 4.6 (1M context) with Claude Code v2.1.91. Each test spawned a single Explore agent with the same question.
+**Methodology.** Each arm is `claude -p` (Claude Opus 4.7, Claude Code v2.1.145) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them.
 
 
-**Queries used:**
+**Queries:**
 | Codebase | Query |
 | Codebase | Query |
 |----------|-------|
 |----------|-------|
 | VS Code | "How does the extension host communicate with the main process?" |
 | VS Code | "How does the extension host communicate with the main process?" |
-| Excalidraw | "How does collaborative editing and real-time sync work?" |
-| Claude Code (Python+Rust) | "How does tool execution work end to end?" |
-| Claude Code (Java) | "How does tool execution work end to end?" |
-| Alamofire | "Trace how a request flows from Session.request() through to the URLSession layer" |
-| Swift Compiler | "How does the Swift compiler handle error diagnostics?" |
-
-**With CodeGraph — the agent uses `codegraph_explore` and stops:**
-| Codebase | Files Indexed | Nodes | Tool Uses | Tokens | Time | File Reads |
-|----------|--------------|-------|-----------|--------|------|------------|
-| VS Code (TypeScript) | 4,002 | 59,377 | 3 | 56.6k | 17s | 0 |
-| Excalidraw (TypeScript) | 626 | 9,859 | 3 | 57.1k | 29s | 0 |
-| Claude Code (Python+Rust) | 115 | 3,080 | 3 | 67.1k | 39s | 0 |
-| Claude Code (Java) | — | — | 1 | 40.8k | 19s | 0 |
-| Alamofire (Swift) | 102 | 2,624 | 3 | 57.3k | 22s | 0 |
-| Swift Compiler (Swift/C++) | 25,874 | 272,898 | 6 | 77.4k | 35s | 0 |
-
-**Without CodeGraph — the agent uses grep, find, ls, and Read extensively:**
-| Codebase | Tool Uses | Tokens | Time | File Reads |
-|----------|-----------|--------|------|------------|
-| VS Code (TypeScript) | 52 | 89.4k | 1m 37s | ~15 |
-| Excalidraw (TypeScript) | 47 | 77.9k | 1m 45s | ~20 |
-| Claude Code (Python+Rust) | 40 | 69.3k | 1m 8s | ~15 |
-| Claude Code (Java) | 26 | 73.3k | 1m 22s | ~15 |
-| Alamofire (Swift) | 32 | 52.4k | 1m 39s | ~10 |
-| Swift Compiler (Swift/C++) | 37 | 99.1k | 2m 8s | ~20 |
-
-**Key observations:**
-- With CodeGraph, the agent **never fell back to reading files** — it trusted the codegraph_explore results completely
-- Without CodeGraph, agents spent most of their time on discovery (find, ls, grep) before they could even start reading relevant code
-- The Java codebase needed only **1 codegraph_explore call** to answer the entire question
-- Cross-language queries (Python+Rust) worked seamlessly — CodeGraph's graph traversal found connections across language boundaries
-- The Swift benchmark (Alamofire) traced a **9-step call chain** from `Session.request()` to `URLSession.dataTask()` — CodeGraph's graph traversal at depth 3 captured the full chain in one explore call
-- The **Swift Compiler** benchmark is the largest codebase tested (**25,874 files, 272,898 nodes**) — CodeGraph indexed it in under 4 minutes and the agent answered a complex cross-cutting question with **6 explore calls and zero file reads** in 35 seconds
+| Excalidraw | "How does Excalidraw render and update canvas elements?" |
+| Django | "How does Django's ORM build and execute a query from a QuerySet?" |
+| Tokio | "How does tokio schedule and run async tasks on its runtime?" |
+| OkHttp | "How does OkHttp process a request through its interceptor chain?" |
+| Gin | "How does gin route requests through its middleware chain?" |
+| Alamofire | "How does Alamofire build, send, and validate a request?" |
+
+**Raw medians — WITH → WITHOUT:**
+| Codebase | Cost | Tokens | Time | Tool calls |
+|----------|------|--------|------|------------|
+| VS Code | $0.42 → $0.64 | 393k → 1.4M | 1m 0s → 1m 43s | 7 → 23 |
+| Excalidraw | $0.54 → $1.02 | 851k → 3.2M | 1m 17s → 3m 14s | 12 → 83 |
+| Django | $0.41 → $0.62 | 499k → 1.4M | 1m 0s → 2m 25s | 9 → 48 |
+| Tokio | $0.50 → $1.04 | 657k → 3.4M | 1m 5s → 2m 56s | 9 → 75 |
+| OkHttp | $0.36 → $0.44 | 352k → 596k | 45s → 1m 11s | 5 → 14 |
+| Gin | $0.36 → $0.46 | 431k → 562k | 47s → 1m 11s | 7 → 8 |
+| Alamofire | $0.61 → $0.99 | 1.1M → 2.6M | 1m 19s → 2m 41s | 15 → 64 |
+
+**Why CodeGraph wins:** with the index available, the agent answers directly — `codegraph_context` to map the area, then one `codegraph_explore` for the relevant source — and stops, usually with zero file reads. Without it, the agent (and the Explore sub-agents it spawns) spends most of its budget on discovery (find/ls/grep) before reading the right code. CodeGraph only helps when queried *directly*, so its instructions steer agents to answer directly rather than delegate exploration to file-reading sub-agents — otherwise a sub-agent reads files regardless and CodeGraph becomes overhead.
 
 
 </details>
 </details>
 
 

+ 3 - 2
src/installer/instructions-template.ts

@@ -37,16 +37,17 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "What would break if I changed Z?" | \`codegraph_impact\` |
 | "What would break if I changed Z?" | \`codegraph_impact\` |
 | "Show me Y's signature / source / docstring" | \`codegraph_node\` |
 | "Show me Y's signature / source / docstring" | \`codegraph_node\` |
 | "Give me focused context for a task/area" | \`codegraph_context\` |
 | "Give me focused context for a task/area" | \`codegraph_context\` |
-| "Survey an unfamiliar module/topic" | \`codegraph_explore\` |
+| "See several related symbols' source at once" | \`codegraph_explore\` |
 | "What files exist under path/" | \`codegraph_files\` |
 | "What files exist under path/" | \`codegraph_files\` |
 | "Is the index healthy?" | \`codegraph_status\` |
 | "Is the index healthy?" | \`codegraph_status\` |
 
 
 ### Rules of thumb
 ### Rules of thumb
 
 
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: \`codegraph_context\` first, then ONE \`codegraph_explore\` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. \`codegraph_search\` is faster and returns kind + location + signature in one call.
 - **Don't grep first** when looking up a symbol by name. \`codegraph_search\` is faster and returns kind + location + signature in one call.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one call.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one call.
-- **\`codegraph_explore\` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean.
+- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 
 
 ### If \`.codegraph/\` doesn't exist
 ### If \`.codegraph/\` doesn't exist

+ 14 - 2
src/mcp/server-instructions.ts

@@ -22,6 +22,18 @@ in the workspace. Reads are sub-millisecond; the index lags writes by
 about a second through the file watcher. Consult it BEFORE writing or
 about a second through the file watcher. Consult it BEFORE writing or
 editing code, not during.
 editing code, not during.
 
 
+## Answer directly — don't delegate exploration
+
+For "how does X work", architecture, trace, or where-is-X questions,
+answer DIRECTLY using 2-3 codegraph calls: \`codegraph_context\` first,
+then ONE \`codegraph_explore\` for the source of the symbols it surfaces.
+Codegraph IS the pre-built search index — so delegating the lookup to a
+separate file-reading sub-task/agent, or running your own grep + read
+loop, repeats work codegraph already did and costs more for the same
+answer. Reach for raw Read/Grep only to confirm a specific detail
+codegraph didn't cover. A direct codegraph answer is typically a handful
+of calls; a grep/read exploration is dozens.
+
 ## Tool selection by intent
 ## Tool selection by intent
 
 
 - **"What is the symbol named X?"** → \`codegraph_search\`
 - **"What is the symbol named X?"** → \`codegraph_search\`
@@ -30,7 +42,7 @@ editing code, not during.
 - **"What does this call?"** → \`codegraph_callees\`
 - **"What does this call?"** → \`codegraph_callees\`
 - **"What would changing this break?"** → \`codegraph_impact\`
 - **"What would changing this break?"** → \`codegraph_impact\`
 - **"Show me this symbol's source / signature / docstring."** → \`codegraph_node\`
 - **"Show me this symbol's source / signature / docstring."** → \`codegraph_node\`
-- **"Survey an unfamiliar topic / pattern / module."** → \`codegraph_explore\` (heavier; deep dive)
+- **"Show me several related symbols' source / survey an area."** → \`codegraph_explore\` (ONE capped call; prefer over many codegraph_node/Read)
 - **"What's in directory X?"** → \`codegraph_files\`
 - **"What's in directory X?"** → \`codegraph_files\`
 - **"Is the index ready / what's its size?"** → \`codegraph_status\`
 - **"Is the index ready / what's its size?"** → \`codegraph_status\`
 
 
@@ -44,7 +56,7 @@ editing code, not during.
 
 
 - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature.
 - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one round-trip.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one round-trip.
-- **Don't use \`codegraph_explore\` for narrow questions** — it's a multi-call deep dive, expensive in tokens. Save it for genuine "I'm new here" surveys.
+- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol.
 - **Don't query the index immediately after editing a file** — the watcher needs ~500ms to debounce + sync. Wait for the next turn.
 - **Don't query the index immediately after editing a file** — the watcher needs ~500ms to debounce + sync. Wait for the next turn.
 
 
 ## Limitations
 ## Limitations

+ 74 - 14
src/mcp/tools.ts

@@ -25,6 +25,16 @@ const MAX_OUTPUT_LENGTH = 15000;
  */
  */
 const RUST_PATH_PREFIXES = new Set(['crate', 'super', 'self']);
 const RUST_PATH_PREFIXES = new Set(['crate', 'super', 'self']);
 
 
+/**
+ * Node kinds that contain other symbols. For these, `codegraph_node` with
+ * `includeCode=true` returns a structural outline (member names + signatures
+ * + line numbers) instead of the full body, which for a large class is a
+ * multi-thousand-character wall of source that bloats the agent's context.
+ */
+const CONTAINER_NODE_KINDS = new Set<NodeKind>([
+  'class', 'struct', 'interface', 'trait', 'protocol', 'enum', 'namespace', 'module',
+]);
+
 /** Last `::` / `.` / `/`-separated segment of a qualified symbol. */
 /** Last `::` / `.` / `/`-separated segment of a qualified symbol. */
 function lastQualifierPart(symbol: string): string {
 function lastQualifierPart(symbol: string): string {
   const parts = symbol.split(/::|[./]/).filter((p) => p.length > 0);
   const parts = symbol.split(/::|[./]/).filter((p) => p.length > 0);
@@ -102,12 +112,12 @@ export function getExploreOutputBudget(fileCount: number): ExploreOutputBudget {
   }
   }
   if (fileCount < 5000) {
   if (fileCount < 5000) {
     return {
     return {
-      maxOutputChars: 28000,
-      defaultMaxFiles: 9,
-      maxCharsPerFile: 5000,
-      gapThreshold: 12,
-      maxSymbolsInFileHeader: 10,
-      maxEdgesPerRelationshipKind: 10,
+      maxOutputChars: 13000,
+      defaultMaxFiles: 6,
+      maxCharsPerFile: 2500,
+      gapThreshold: 10,
+      maxSymbolsInFileHeader: 8,
+      maxEdgesPerRelationshipKind: 8,
       includeRelationships: true,
       includeRelationships: true,
       includeAdditionalFiles: true,
       includeAdditionalFiles: true,
       includeCompletenessSignal: true,
       includeCompletenessSignal: true,
@@ -263,7 +273,7 @@ export const tools: ToolDefinition[] = [
   },
   },
   {
   {
     name: 'codegraph_context',
     name: 'codegraph_context',
-    description: 'PRIMARY TOOL: Build comprehensive context for a task. Returns entry points, related symbols, and key code - often enough to understand the codebase without additional tool calls. NOTE: This provides CODE context, not product requirements. For new features, still clarify UX/behavior questions with the user before implementing.',
+    description: 'PRIMARY TOOL — call this FIRST for any "how does X work", architecture, feature, or bug-context question. Composes search + node + callers + callees and returns entry points, related symbols, and key code in ONE call — usually enough to answer with no further search/Read/Grep. Prefer this over chaining codegraph_search + codegraph_node, and over codegraph_explore. NOTE: provides CODE context, not product requirements; for new features still clarify UX/edge cases with the user.',
     inputSchema: {
     inputSchema: {
       type: 'object',
       type: 'object',
       properties: {
       properties: {
@@ -348,7 +358,7 @@ export const tools: ToolDefinition[] = [
   },
   },
   {
   {
     name: 'codegraph_node',
     name: 'codegraph_node',
-    description: 'Get detailed information about a specific code symbol. Use includeCode=true only when you need the full source code - otherwise just get location and signature to minimize context usage.',
+    description: 'Get detailed info about ONE symbol (location, signature, docstring). Pass includeCode=true for source: a function/method returns its body; a class/interface/struct/enum returns a compact member OUTLINE (fields + method signatures + line numbers), not every method body — Read or codegraph_node a specific member for its body. Keep includeCode=false to minimize context. For SEVERAL related symbols, make ONE codegraph_explore (or codegraph_context) call instead of many node calls — repeated node calls each re-read the whole context and cost far more.',
     inputSchema: {
     inputSchema: {
       type: 'object',
       type: 'object',
       properties: {
       properties: {
@@ -368,7 +378,7 @@ export const tools: ToolDefinition[] = [
   },
   },
   {
   {
     name: 'codegraph_explore',
     name: 'codegraph_explore',
-    description: 'Deep exploration tool — returns comprehensive context for a topic in a SINGLE call. Groups all relevant source code by file (contiguous sections, not snippets), includes a relationship map, and uses deeper graph traversal. Designed to replace multiple codegraph_node + file Read calls. Use this instead of codegraph_context when you need thorough understanding. IMPORTANT: Use specific symbol names, file names, or short code terms in your query — NOT natural language sentences. Before calling this, use codegraph_search to discover relevant symbol names, then include those names in your query. Bad: "how are agent prompts loaded and passed to the CLI". Good: "readAgentsFromDirectory createClaudeSession chat-manager agents.ts".',
+    description: 'Returns source for SEVERAL related symbols grouped by file, plus a relationship map, in ONE capped call. This is the efficient way to inspect many related symbols at once — strongly prefer it over a series of codegraph_node or Read calls (each separate call re-reads the whole context, so 8 node calls cost far more than 1 explore). Use it after codegraph_context when you need to see the actual source of several symbols. Query with specific symbol/file/code terms, NOT natural-language sentences — run codegraph_search first to find names. Bad: "how are agent prompts loaded and passed to the CLI". Good: "renderStaticScene drawElementOnCanvas ShapeCache renderElement.ts".',
     inputSchema: {
     inputSchema: {
       type: 'object',
       type: 'object',
       properties: {
       properties: {
@@ -1241,7 +1251,20 @@ export class ToolHandler {
       }
       }
     }
     }
 
 
-    return this.textResult(lines.join('\n'));
+    // Hard-cap to the adaptive budget. The per-file loop bounds the source
+    // sections, but the relationship map, additional-files list, and
+    // completeness/budget notes can still push the assembled output past
+    // maxOutputChars (observed 30k against a 28k tier cap). A fat explore
+    // payload persists in the agent's context and is re-read as cache-input
+    // on every subsequent turn, so the overrun is paid many times over.
+    const output = lines.join('\n');
+    if (output.length > budget.maxOutputChars) {
+      const cut = output.slice(0, budget.maxOutputChars);
+      const lastNewline = cut.lastIndexOf('\n');
+      const safe = lastNewline > budget.maxOutputChars * 0.8 ? cut.slice(0, lastNewline) : cut;
+      return this.textResult(safe + '\n\n... (explore output truncated to budget — use codegraph_node or Read for more)');
+    }
+    return this.textResult(output);
   }
   }
 
 
   /**
   /**
@@ -1261,12 +1284,24 @@ export class ToolHandler {
     }
     }
 
 
     let code: string | null = null;
     let code: string | null = null;
+    let outline: string | null = null;
 
 
     if (includeCode) {
     if (includeCode) {
-      code = await cg.getCode(match.node.id);
+      // For container symbols (class/interface/struct/…), the full body is the
+      // sum of every method body — a wall of source (e.g. a 10k-char class)
+      // that bloats context and is rarely needed in full. Return a structural
+      // outline (members + signatures + line numbers) instead; the agent can
+      // Read or codegraph_node a specific method for its body. Leaf symbols
+      // (function/method/etc.) return their full body as before.
+      if (CONTAINER_NODE_KINDS.has(match.node.kind)) {
+        outline = this.buildContainerOutline(cg, match.node);
+      }
+      if (!outline) {
+        code = await cg.getCode(match.node.id);
+      }
     }
     }
 
 
-    const formatted = this.formatNodeDetails(match.node, code) + match.note;
+    const formatted = this.formatNodeDetails(match.node, code, outline) + match.note;
     return this.textResult(this.truncateOutput(formatted));
     return this.textResult(this.truncateOutput(formatted));
   }
   }
 
 
@@ -1716,7 +1751,29 @@ export class ToolHandler {
     return lines.join('\n');
     return lines.join('\n');
   }
   }
 
 
-  private formatNodeDetails(node: Node, code: string | null): string {
+  /**
+   * Build a compact structural outline of a container symbol from its
+   * indexed children (methods, fields, properties, …) — name, kind,
+   * line number, and signature — so the agent gets the shape of a class
+   * without the full source of every method. Returns '' when the container
+   * has no indexed children, so the caller can fall back to full source.
+   */
+  private buildContainerOutline(cg: CodeGraph, node: Node): string {
+    const children = cg.getChildren(node.id)
+      .filter(c => c.kind !== 'import' && c.kind !== 'export')
+      .sort((a, b) => (a.startLine ?? 0) - (b.startLine ?? 0));
+    if (children.length === 0) return '';
+
+    const lines = [`**Members (${children.length}):**`, ''];
+    for (const c of children) {
+      const loc = c.startLine ? `:${c.startLine}` : '';
+      const sig = c.signature ? ` — \`${c.signature}\`` : '';
+      lines.push(`- ${c.name} (${c.kind})${loc}${sig}`);
+    }
+    return lines.join('\n');
+  }
+
+  private formatNodeDetails(node: Node, code: string | null, outline?: string | null): string {
     const location = node.startLine ? `:${node.startLine}` : '';
     const location = node.startLine ? `:${node.startLine}` : '';
     const lines: string[] = [
     const lines: string[] = [
       `## ${node.name} (${node.kind})`,
       `## ${node.name} (${node.kind})`,
@@ -1733,7 +1790,10 @@ export class ToolHandler {
       lines.push('', node.docstring);
       lines.push('', node.docstring);
     }
     }
 
 
-    if (code) {
+    if (outline) {
+      lines.push('', outline, '',
+        `> Structural outline only. Read \`${node.filePath}\` or call codegraph_node on a specific member for its body.`);
+    } else if (code) {
       lines.push('', '```' + node.language, code, '```');
       lines.push('', '```' + node.language, code, '```');
     }
     }