Bläddra i källkod

perf(mcp): answer-directly steering — ~35% cheaper, ~70% fewer tool calls (#224)

* perf(mcp): steer agents to answer directly instead of delegating to subagents

CodeGraph beats native grep/read on cost only when the agent queries it
directly. When the agent delegates to file-reading sub-agents, those
sub-agents read files regardless of the index, so CodeGraph becomes net
overhead on top of the reads. The install templates even told agents to
"spawn a subagent for explore-class questions" — the expensive path.

Changes:
- server-instructions + both install templates: add an "Answer directly —
  don't delegate exploration" directive; reposition codegraph_explore as the
  efficient one-call multi-symbol tool (was: "spawn a subagent for it").
- codegraph_explore: hard-cap output to its adaptive budget (it overran,
  ~30k vs a 28k cap) and tighten the medium tier (28k->13k).
- codegraph_node: return a member outline for container kinds instead of the
  full class body.

Rigorous N>=4-per-arm warm-block benchmark (median total_cost_usd):
  excalidraw (~600 files):  WITH $0.54 vs native $1.02  (-47%)
  vscode     (~10k files):  WITH $0.41 vs native $0.72  (-42%)
  ky         (~25 files):   WITH $0.46 vs native $0.44  (wash)
Answers were equal-or-better (correct, file:line-cited) with ~6x fewer tool
calls; the directive drove the direct path on 14/14 codegraph runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(readme): rebuild benchmark with real-world repos + cost/token/time/tool savings

Replace the "Claude Code (Python+Rust/Java)" rows — which benchmarked the
Claude Code CLI repo, not real codebases in those languages — with real
open-source projects per language: Django (Python), Tokio (Rust), OkHttp
(Java), Gin (Go), plus Alamofire (Swift) and the existing TypeScript repos
(VS Code, Excalidraw).

The table now reports all four savings the change targets — cost, tokens,
time, tool calls — as the median of 4 runs per arm (Claude Opus 4.7,
headless claude -p, with vs empty MCP config). Averages across the 7 repos:
35% cheaper, 59% fewer tokens, 49% faster, 70% fewer tool calls. Adds a
methodology note and raw WITH->WITHOUT medians.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 månad sedan
förälder
incheckning
f5bbc26c60
6 ändrade filer med 154 tillägg och 67 borttagningar
  1. 3 2
      .cursor/rules/codegraph.mdc
  2. 25 1
      CHANGELOG.md
  3. 35 46
      README.md
  4. 3 2
      src/installer/instructions-template.ts
  5. 14 2
      src/mcp/server-instructions.ts
  6. 74 14
      src/mcp/tools.ts

+ 3 - 2
.cursor/rules/codegraph.mdc

@@ -19,16 +19,17 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "What would break if I changed Z?" | `codegraph_impact` |
 | "Show me Y's signature / source / docstring" | `codegraph_node` |
 | "Give me focused context for a task/area" | `codegraph_context` |
-| "Survey an unfamiliar module/topic" | `codegraph_explore` |
+| "See several related symbols' source at once" | `codegraph_explore` |
 | "What files exist under path/" | `codegraph_files` |
 | "Is the index healthy?" | `codegraph_status` |
 
 ### Rules of thumb
 
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call.
 - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call.
-- **`codegraph_explore` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean.
+- **Don't loop `codegraph_node` over many symbols** — one `codegraph_explore` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 
 ### If `.codegraph/` doesn't exist

+ 25 - 1
CHANGELOG.md

@@ -33,6 +33,25 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   setup is actually fast. `codegraph uninit` removes any hooks it installed.
 
 ### Changed
+- **MCP / agent guidance**: CodeGraph now tells agents to answer "how does X
+  work" / architecture questions *directly* — `codegraph_context`, then one
+  `codegraph_explore` for the surfaced symbols — instead of delegating to a
+  file-reading sub-agent or a grep+read loop. The server instructions and the
+  installed instruction files (`CLAUDE.md`, `.cursor/rules/codegraph.mdc`,
+  `AGENTS.md`) previously suggested *spawning a sub-agent* for explore-class
+  questions, which produced the opposite, more expensive behavior: the
+  sub-agent reads files regardless of the index, so CodeGraph became overhead
+  stacked on top of the reads. In rigorous N≥4-per-arm benchmarks this cut the
+  cost of an architecture question by ~42–47% versus a no-CodeGraph agent on
+  medium and large repos (Excalidraw ~600 files, VS Code ~10k), with
+  equal-or-better, `file:line`-cited answers and ~6× fewer tool calls; on a
+  tiny repo (~25 files) it's a wash, since native grep is already trivially
+  cheap there.
+- **MCP / codegraph_node**: `includeCode=true` on a class/interface/struct/enum
+  now returns a compact member outline (fields + method signatures + line
+  numbers) instead of the entire class body — which could be thousands of
+  characters and was rarely needed in full. Functions and methods still return
+  their full body; request a specific member for its source.
 - **Minimum Node.js is now 20** (was 18). Node 18 is end-of-life and the
   native SQLite binding (`better-sqlite3` 12.x) no longer ships a Node 18
   prebuilt binary. Node 22 LTS and Node 24 get the native backend out of the
@@ -48,7 +67,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   now scales with indexed file count: small projects (<500 files) cap at
   ~18KB and skip the "Additional relevant files" / completeness / explore-
   budget reminders that earn their keep on bigger codebases; medium
-  (<5,000) caps at ~28KB; large (<15,000) keeps the historical ~35KB; very
+  (<5,000) caps at ~13KB; large (<15,000) keeps the historical ~35KB; very
   large goes up to ~38KB. A new per-file char cap also prevents a single
   file with many adjacent symbols from collapsing into one whole-file dump
   (the Alamofire `Session.swift` case from #185). Per-file cluster
@@ -63,6 +82,11 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
   Thanks to [@essopsp](https://github.com/essopsp) for the repro.
 
 ### Fixed
+- **MCP / explore**: `codegraph_explore` output is now hard-capped to its
+  adaptive size budget. It could previously overrun (e.g. ~30K against a 28K
+  cap) once the relationship map and trailer sections were appended; the
+  oversized payload then sat in the agent's context and was re-read on every
+  later turn.
 - **Sync / status**: git-untracked files are no longer reported as pending
   "Added" forever. After `codegraph sync` indexed a newly-created untracked
   source file, `codegraph status` kept listing it under Pending Changes and

+ 35 - 46
README.md

@@ -4,7 +4,7 @@
 
 ### Supercharge Claude Code, Cursor, Codex, and OpenCode with Semantic Code Intelligence
 
-**94% fewer tool calls · 77% faster exploration · 100% local**
+**~35% cheaper · ~70% fewer tool calls · 100% local**
 
 [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
@@ -50,61 +50,50 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
 
 ### Benchmark Results
 
-Tested across 6 real-world codebases comparing Claude Code's Explore agent **with** and **without** CodeGraph:
+Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**.
 
-> **Average: 92% fewer tool calls · 71% faster**
+> **Average: 35% cheaper · 59% fewer tokens · 49% faster · 70% fewer tool calls**
 
-| Codebase | With CG | Without CG | Improvement |
-|----------|---------|------------|-------------|
-| **VS Code** · TypeScript | 3 calls, 17s | 52 calls, 1m 37s | **94% fewer · 82% faster** |
-| **Excalidraw** · TypeScript | 3 calls, 29s | 47 calls, 1m 45s | **94% fewer · 72% faster** |
-| **Claude Code** · Python + Rust | 3 calls, 39s | 40 calls, 1m 8s | **93% fewer · 43% faster** |
-| **Claude Code** · Java | 1 call, 19s | 26 calls, 1m 22s | **96% fewer · 77% faster** |
-| **Alamofire** · Swift | 3 calls, 22s | 32 calls, 1m 39s | **91% fewer · 78% faster** |
-| **Swift Compiler** · Swift/C++ | 6 calls, 35s | 37 calls, 2m 8s | **84% fewer · 73% faster** |
+| Codebase | Language | Cost | Tokens | Time | Tool calls |
+|----------|----------|------|--------|------|------------|
+| **VS Code** | TypeScript · ~10k files | 35% cheaper | 73% fewer | 41% faster | 72% fewer |
+| **Excalidraw** | TypeScript · ~600 | 47% cheaper | 73% fewer | 60% faster | 86% fewer |
+| **Django** | Python · ~2.7k | 34% cheaper | 64% fewer | 59% faster | 81% fewer |
+| **Tokio** | Rust · ~700 | 52% cheaper | 81% fewer | 63% faster | 89% fewer |
+| **OkHttp** | Java · ~640 | 17% cheaper | 41% fewer | 36% faster | 64% fewer |
+| **Gin** | Go · ~150 | 22% cheaper | 23% fewer | 34% faster | 19% fewer |
+| **Alamofire** | Swift · ~100 | 38% cheaper | 59% fewer | 51% faster | 77% fewer |
+
+The gains scale with codebase size: on large repos the agent answers from the index in a handful of calls with **zero file reads**, while the no-CodeGraph agent fans out across grep/find/Read (and the sub-agents it spawns). On a small repo like Gin (~150 files) native search is already cheap, so the margin narrows.
 
 <details>
 <summary><strong>Full benchmark details</strong></summary>
 
-All tests used Claude Opus 4.6 (1M context) with Claude Code v2.1.91. Each test spawned a single Explore agent with the same question.
+**Methodology.** Each arm is `claude -p` (Claude Opus 4.7, Claude Code v2.1.145) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them.
 
-**Queries used:**
+**Queries:**
 | Codebase | Query |
 |----------|-------|
 | VS Code | "How does the extension host communicate with the main process?" |
-| Excalidraw | "How does collaborative editing and real-time sync work?" |
-| Claude Code (Python+Rust) | "How does tool execution work end to end?" |
-| Claude Code (Java) | "How does tool execution work end to end?" |
-| Alamofire | "Trace how a request flows from Session.request() through to the URLSession layer" |
-| Swift Compiler | "How does the Swift compiler handle error diagnostics?" |
-
-**With CodeGraph — the agent uses `codegraph_explore` and stops:**
-| Codebase | Files Indexed | Nodes | Tool Uses | Tokens | Time | File Reads |
-|----------|--------------|-------|-----------|--------|------|------------|
-| VS Code (TypeScript) | 4,002 | 59,377 | 3 | 56.6k | 17s | 0 |
-| Excalidraw (TypeScript) | 626 | 9,859 | 3 | 57.1k | 29s | 0 |
-| Claude Code (Python+Rust) | 115 | 3,080 | 3 | 67.1k | 39s | 0 |
-| Claude Code (Java) | — | — | 1 | 40.8k | 19s | 0 |
-| Alamofire (Swift) | 102 | 2,624 | 3 | 57.3k | 22s | 0 |
-| Swift Compiler (Swift/C++) | 25,874 | 272,898 | 6 | 77.4k | 35s | 0 |
-
-**Without CodeGraph — the agent uses grep, find, ls, and Read extensively:**
-| Codebase | Tool Uses | Tokens | Time | File Reads |
-|----------|-----------|--------|------|------------|
-| VS Code (TypeScript) | 52 | 89.4k | 1m 37s | ~15 |
-| Excalidraw (TypeScript) | 47 | 77.9k | 1m 45s | ~20 |
-| Claude Code (Python+Rust) | 40 | 69.3k | 1m 8s | ~15 |
-| Claude Code (Java) | 26 | 73.3k | 1m 22s | ~15 |
-| Alamofire (Swift) | 32 | 52.4k | 1m 39s | ~10 |
-| Swift Compiler (Swift/C++) | 37 | 99.1k | 2m 8s | ~20 |
-
-**Key observations:**
-- With CodeGraph, the agent **never fell back to reading files** — it trusted the codegraph_explore results completely
-- Without CodeGraph, agents spent most of their time on discovery (find, ls, grep) before they could even start reading relevant code
-- The Java codebase needed only **1 codegraph_explore call** to answer the entire question
-- Cross-language queries (Python+Rust) worked seamlessly — CodeGraph's graph traversal found connections across language boundaries
-- The Swift benchmark (Alamofire) traced a **9-step call chain** from `Session.request()` to `URLSession.dataTask()` — CodeGraph's graph traversal at depth 3 captured the full chain in one explore call
-- The **Swift Compiler** benchmark is the largest codebase tested (**25,874 files, 272,898 nodes**) — CodeGraph indexed it in under 4 minutes and the agent answered a complex cross-cutting question with **6 explore calls and zero file reads** in 35 seconds
+| Excalidraw | "How does Excalidraw render and update canvas elements?" |
+| Django | "How does Django's ORM build and execute a query from a QuerySet?" |
+| Tokio | "How does tokio schedule and run async tasks on its runtime?" |
+| OkHttp | "How does OkHttp process a request through its interceptor chain?" |
+| Gin | "How does gin route requests through its middleware chain?" |
+| Alamofire | "How does Alamofire build, send, and validate a request?" |
+
+**Raw medians — WITH → WITHOUT:**
+| Codebase | Cost | Tokens | Time | Tool calls |
+|----------|------|--------|------|------------|
+| VS Code | $0.42 → $0.64 | 393k → 1.4M | 1m 0s → 1m 43s | 7 → 23 |
+| Excalidraw | $0.54 → $1.02 | 851k → 3.2M | 1m 17s → 3m 14s | 12 → 83 |
+| Django | $0.41 → $0.62 | 499k → 1.4M | 1m 0s → 2m 25s | 9 → 48 |
+| Tokio | $0.50 → $1.04 | 657k → 3.4M | 1m 5s → 2m 56s | 9 → 75 |
+| OkHttp | $0.36 → $0.44 | 352k → 596k | 45s → 1m 11s | 5 → 14 |
+| Gin | $0.36 → $0.46 | 431k → 562k | 47s → 1m 11s | 7 → 8 |
+| Alamofire | $0.61 → $0.99 | 1.1M → 2.6M | 1m 19s → 2m 41s | 15 → 64 |
+
+**Why CodeGraph wins:** with the index available, the agent answers directly — `codegraph_context` to map the area, then one `codegraph_explore` for the relevant source — and stops, usually with zero file reads. Without it, the agent (and the Explore sub-agents it spawns) spends most of its budget on discovery (find/ls/grep) before reading the right code. CodeGraph only helps when queried *directly*, so its instructions steer agents to answer directly rather than delegate exploration to file-reading sub-agents — otherwise a sub-agent reads files regardless and CodeGraph becomes overhead.
 
 </details>
 

+ 3 - 2
src/installer/instructions-template.ts

@@ -37,16 +37,17 @@ Use codegraph for **structural** questions — what calls what, what would break
 | "What would break if I changed Z?" | \`codegraph_impact\` |
 | "Show me Y's signature / source / docstring" | \`codegraph_node\` |
 | "Give me focused context for a task/area" | \`codegraph_context\` |
-| "Survey an unfamiliar module/topic" | \`codegraph_explore\` |
+| "See several related symbols' source at once" | \`codegraph_explore\` |
 | "What files exist under path/" | \`codegraph_files\` |
 | "Is the index healthy?" | \`codegraph_status\` |
 
 ### Rules of thumb
 
+- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: \`codegraph_context\` first, then ONE \`codegraph_explore\` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer.
 - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context.
 - **Don't grep first** when looking up a symbol by name. \`codegraph_search\` is faster and returns kind + location + signature in one call.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one call.
-- **\`codegraph_explore\` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean.
+- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more.
 - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn.
 
 ### If \`.codegraph/\` doesn't exist

+ 14 - 2
src/mcp/server-instructions.ts

@@ -22,6 +22,18 @@ in the workspace. Reads are sub-millisecond; the index lags writes by
 about a second through the file watcher. Consult it BEFORE writing or
 editing code, not during.
 
+## Answer directly — don't delegate exploration
+
+For "how does X work", architecture, trace, or where-is-X questions,
+answer DIRECTLY using 2-3 codegraph calls: \`codegraph_context\` first,
+then ONE \`codegraph_explore\` for the source of the symbols it surfaces.
+Codegraph IS the pre-built search index — so delegating the lookup to a
+separate file-reading sub-task/agent, or running your own grep + read
+loop, repeats work codegraph already did and costs more for the same
+answer. Reach for raw Read/Grep only to confirm a specific detail
+codegraph didn't cover. A direct codegraph answer is typically a handful
+of calls; a grep/read exploration is dozens.
+
 ## Tool selection by intent
 
 - **"What is the symbol named X?"** → \`codegraph_search\`
@@ -30,7 +42,7 @@ editing code, not during.
 - **"What does this call?"** → \`codegraph_callees\`
 - **"What would changing this break?"** → \`codegraph_impact\`
 - **"Show me this symbol's source / signature / docstring."** → \`codegraph_node\`
-- **"Survey an unfamiliar topic / pattern / module."** → \`codegraph_explore\` (heavier; deep dive)
+- **"Show me several related symbols' source / survey an area."** → \`codegraph_explore\` (ONE capped call; prefer over many codegraph_node/Read)
 - **"What's in directory X?"** → \`codegraph_files\`
 - **"Is the index ready / what's its size?"** → \`codegraph_status\`
 
@@ -44,7 +56,7 @@ editing code, not during.
 
 - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one round-trip.
-- **Don't use \`codegraph_explore\` for narrow questions** — it's a multi-call deep dive, expensive in tokens. Save it for genuine "I'm new here" surveys.
+- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol.
 - **Don't query the index immediately after editing a file** — the watcher needs ~500ms to debounce + sync. Wait for the next turn.
 
 ## Limitations

+ 74 - 14
src/mcp/tools.ts

@@ -25,6 +25,16 @@ const MAX_OUTPUT_LENGTH = 15000;
  */
 const RUST_PATH_PREFIXES = new Set(['crate', 'super', 'self']);
 
+/**
+ * Node kinds that contain other symbols. For these, `codegraph_node` with
+ * `includeCode=true` returns a structural outline (member names + signatures
+ * + line numbers) instead of the full body, which for a large class is a
+ * multi-thousand-character wall of source that bloats the agent's context.
+ */
+const CONTAINER_NODE_KINDS = new Set<NodeKind>([
+  'class', 'struct', 'interface', 'trait', 'protocol', 'enum', 'namespace', 'module',
+]);
+
 /** Last `::` / `.` / `/`-separated segment of a qualified symbol. */
 function lastQualifierPart(symbol: string): string {
   const parts = symbol.split(/::|[./]/).filter((p) => p.length > 0);
@@ -102,12 +112,12 @@ export function getExploreOutputBudget(fileCount: number): ExploreOutputBudget {
   }
   if (fileCount < 5000) {
     return {
-      maxOutputChars: 28000,
-      defaultMaxFiles: 9,
-      maxCharsPerFile: 5000,
-      gapThreshold: 12,
-      maxSymbolsInFileHeader: 10,
-      maxEdgesPerRelationshipKind: 10,
+      maxOutputChars: 13000,
+      defaultMaxFiles: 6,
+      maxCharsPerFile: 2500,
+      gapThreshold: 10,
+      maxSymbolsInFileHeader: 8,
+      maxEdgesPerRelationshipKind: 8,
       includeRelationships: true,
       includeAdditionalFiles: true,
       includeCompletenessSignal: true,
@@ -263,7 +273,7 @@ export const tools: ToolDefinition[] = [
   },
   {
     name: 'codegraph_context',
-    description: 'PRIMARY TOOL: Build comprehensive context for a task. Returns entry points, related symbols, and key code - often enough to understand the codebase without additional tool calls. NOTE: This provides CODE context, not product requirements. For new features, still clarify UX/behavior questions with the user before implementing.',
+    description: 'PRIMARY TOOL — call this FIRST for any "how does X work", architecture, feature, or bug-context question. Composes search + node + callers + callees and returns entry points, related symbols, and key code in ONE call — usually enough to answer with no further search/Read/Grep. Prefer this over chaining codegraph_search + codegraph_node, and over codegraph_explore. NOTE: provides CODE context, not product requirements; for new features still clarify UX/edge cases with the user.',
     inputSchema: {
       type: 'object',
       properties: {
@@ -348,7 +358,7 @@ export const tools: ToolDefinition[] = [
   },
   {
     name: 'codegraph_node',
-    description: 'Get detailed information about a specific code symbol. Use includeCode=true only when you need the full source code - otherwise just get location and signature to minimize context usage.',
+    description: 'Get detailed info about ONE symbol (location, signature, docstring). Pass includeCode=true for source: a function/method returns its body; a class/interface/struct/enum returns a compact member OUTLINE (fields + method signatures + line numbers), not every method body — Read or codegraph_node a specific member for its body. Keep includeCode=false to minimize context. For SEVERAL related symbols, make ONE codegraph_explore (or codegraph_context) call instead of many node calls — repeated node calls each re-read the whole context and cost far more.',
     inputSchema: {
       type: 'object',
       properties: {
@@ -368,7 +378,7 @@ export const tools: ToolDefinition[] = [
   },
   {
     name: 'codegraph_explore',
-    description: 'Deep exploration tool — returns comprehensive context for a topic in a SINGLE call. Groups all relevant source code by file (contiguous sections, not snippets), includes a relationship map, and uses deeper graph traversal. Designed to replace multiple codegraph_node + file Read calls. Use this instead of codegraph_context when you need thorough understanding. IMPORTANT: Use specific symbol names, file names, or short code terms in your query — NOT natural language sentences. Before calling this, use codegraph_search to discover relevant symbol names, then include those names in your query. Bad: "how are agent prompts loaded and passed to the CLI". Good: "readAgentsFromDirectory createClaudeSession chat-manager agents.ts".',
+    description: 'Returns source for SEVERAL related symbols grouped by file, plus a relationship map, in ONE capped call. This is the efficient way to inspect many related symbols at once — strongly prefer it over a series of codegraph_node or Read calls (each separate call re-reads the whole context, so 8 node calls cost far more than 1 explore). Use it after codegraph_context when you need to see the actual source of several symbols. Query with specific symbol/file/code terms, NOT natural-language sentences — run codegraph_search first to find names. Bad: "how are agent prompts loaded and passed to the CLI". Good: "renderStaticScene drawElementOnCanvas ShapeCache renderElement.ts".',
     inputSchema: {
       type: 'object',
       properties: {
@@ -1241,7 +1251,20 @@ export class ToolHandler {
       }
     }
 
-    return this.textResult(lines.join('\n'));
+    // Hard-cap to the adaptive budget. The per-file loop bounds the source
+    // sections, but the relationship map, additional-files list, and
+    // completeness/budget notes can still push the assembled output past
+    // maxOutputChars (observed 30k against a 28k tier cap). A fat explore
+    // payload persists in the agent's context and is re-read as cache-input
+    // on every subsequent turn, so the overrun is paid many times over.
+    const output = lines.join('\n');
+    if (output.length > budget.maxOutputChars) {
+      const cut = output.slice(0, budget.maxOutputChars);
+      const lastNewline = cut.lastIndexOf('\n');
+      const safe = lastNewline > budget.maxOutputChars * 0.8 ? cut.slice(0, lastNewline) : cut;
+      return this.textResult(safe + '\n\n... (explore output truncated to budget — use codegraph_node or Read for more)');
+    }
+    return this.textResult(output);
   }
 
   /**
@@ -1261,12 +1284,24 @@ export class ToolHandler {
     }
 
     let code: string | null = null;
+    let outline: string | null = null;
 
     if (includeCode) {
-      code = await cg.getCode(match.node.id);
+      // For container symbols (class/interface/struct/…), the full body is the
+      // sum of every method body — a wall of source (e.g. a 10k-char class)
+      // that bloats context and is rarely needed in full. Return a structural
+      // outline (members + signatures + line numbers) instead; the agent can
+      // Read or codegraph_node a specific method for its body. Leaf symbols
+      // (function/method/etc.) return their full body as before.
+      if (CONTAINER_NODE_KINDS.has(match.node.kind)) {
+        outline = this.buildContainerOutline(cg, match.node);
+      }
+      if (!outline) {
+        code = await cg.getCode(match.node.id);
+      }
     }
 
-    const formatted = this.formatNodeDetails(match.node, code) + match.note;
+    const formatted = this.formatNodeDetails(match.node, code, outline) + match.note;
     return this.textResult(this.truncateOutput(formatted));
   }
 
@@ -1716,7 +1751,29 @@ export class ToolHandler {
     return lines.join('\n');
   }
 
-  private formatNodeDetails(node: Node, code: string | null): string {
+  /**
+   * Build a compact structural outline of a container symbol from its
+   * indexed children (methods, fields, properties, …) — name, kind,
+   * line number, and signature — so the agent gets the shape of a class
+   * without the full source of every method. Returns '' when the container
+   * has no indexed children, so the caller can fall back to full source.
+   */
+  private buildContainerOutline(cg: CodeGraph, node: Node): string {
+    const children = cg.getChildren(node.id)
+      .filter(c => c.kind !== 'import' && c.kind !== 'export')
+      .sort((a, b) => (a.startLine ?? 0) - (b.startLine ?? 0));
+    if (children.length === 0) return '';
+
+    const lines = [`**Members (${children.length}):**`, ''];
+    for (const c of children) {
+      const loc = c.startLine ? `:${c.startLine}` : '';
+      const sig = c.signature ? ` — \`${c.signature}\`` : '';
+      lines.push(`- ${c.name} (${c.kind})${loc}${sig}`);
+    }
+    return lines.join('\n');
+  }
+
+  private formatNodeDetails(node: Node, code: string | null, outline?: string | null): string {
     const location = node.startLine ? `:${node.startLine}` : '';
     const lines: string[] = [
       `## ${node.name} (${node.kind})`,
@@ -1733,7 +1790,10 @@ export class ToolHandler {
       lines.push('', node.docstring);
     }
 
-    if (code) {
+    if (outline) {
+      lines.push('', outline, '',
+        `> Structural outline only. Read \`${node.filePath}\` or call codegraph_node on a specific member for its body.`);
+    } else if (code) {
       lines.push('', '```' + node.language, code, '```');
     }