1 mēnesi atpakaļ · 1f3625a3e9
--- a/README.md
+++ b/README.md
@@ -263,25 +263,21 @@ CodeGraph builds a semantic knowledge graph of codebases for faster, smarter cod
 
				 
			
 
				 ### If `.codegraph/` exists in the project
			
 
				 
			
 
				-**NEVER call `codegraph_explore` or `codegraph_context` directly in the main session.** These tools return large amounts of source code that fills up main session context. Instead, ALWAYS spawn an Explore agent for any exploration question (e.g., "how does X work?", "explain the Y system", "where is Z implemented?").
			
 
				+**Answer directly with CodeGraph — don't delegate exploration to a file-reading sub-agent or a grep/read loop.** CodeGraph *is* the pre-built search index; re-deriving its answers with grep + Read repeats work it already did and costs more for the same result. For "how does X work?", architecture, trace, or where-is-X questions, answer in a handful of CodeGraph calls and stop — typically with **zero file reads**. The returned source is complete and authoritative: treat it as already read and do not re-open those files. Reach for raw Read/Grep only to confirm a specific detail CodeGraph didn't cover.
			
 
				 
			
 
				-**When spawning Explore agents**, include this instruction in the prompt:
			
 
				-
			
 
				-> This project has CodeGraph initialized (.codegraph/ exists). Use `codegraph_explore` as your PRIMARY tool — it returns full source code sections from all relevant files in one call.
			
 
				->
			
 
				-> **Rules:**
			
 
				-> 1. Follow the explore call budget in the `codegraph_explore` tool description — it scales automatically based on project size.
			
 
				-> 2. Do NOT re-read files that codegraph_explore already returned source code for. The source sections are complete and authoritative.
			
 
				-> 3. Only fall back to grep/glob/read for files listed under "Additional relevant files" if you need more detail, or if codegraph returned no results.
			
 
				-
			
 
				-**The main session may only use these lightweight tools directly** (for targeted lookups before making edits, not for exploration):
			
 
				+**Tool selection by intent:**
			
 
				 
			
 
				 | Tool | Use For |
			
 
				 |------|---------|
			
 
				-| `codegraph_search` | Find symbols by name |
			
 
				-| `codegraph_callers` / `codegraph_callees` | Trace call flow |
			
 
				+| `codegraph_context` | Map a task / feature / area first — composes search + node + callers + callees in one call |
			
 
				+| `codegraph_trace` | "How does X reach Y" — the call path, each hop's body inline (follows dynamic-dispatch hops grep can't) |
			
 
				+| `codegraph_explore` | Survey several related symbols' source in ONE budget-capped call |
			
 
				+| `codegraph_search` | Find a symbol by name |
			
 
				+| `codegraph_callers` / `codegraph_callees` | Walk call flow one hop at a time |
			
 
				 | `codegraph_impact` | Check what's affected before editing |
			
 
				-| `codegraph_node` | Get a single symbol's details |
			
 
				+| `codegraph_node` | Get a single symbol's source / signature |
			
 
				+
			
 
				+A direct CodeGraph answer is a handful of calls; a grep/read exploration is dozens.
			
 
				 
			
 
				 ### If `.codegraph/` does NOT exist
			
 
				 
			
@@ -297,34 +293,23 @@ At the start of a session, ask the user if they'd like to initialize CodeGraph:
 
				 ## How It Works
			
 
				 
			
 
				 ```
			
 
				-┌─────────────────────────────────────────────────────────────────┐
			
 
				-│                        Claude Code                               │
			
 
				-│                                                                  │
			
 
				-│  "Implement user authentication"                                 │
			
 
				-│           │                                                      │
			
 
				-│           ▼                                                      │
			
 
				-│  ┌─────────────────┐      ┌─────────────────┐                   │
			
 
				-│  │  Explore Agent  │ ──── │  Explore Agent  │                   │
			
 
				-│  └────────┬────────┘      └────────┬────────┘                   │
			
 
				-│           │                        │                             │
			
 
				-└───────────┼────────────────────────┼─────────────────────────────┘
			
 
				-            │                        │
			
 
				-            ▼                        ▼
			
 
				 ┌───────────────────────────────────────────────────────────────────┐
			
 
				-│                     CodeGraph MCP Server                          │
			
 
				-│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐               │
			
 
				-│  │   Search    │  │   Callers   │  │   Context   │               │
			
 
				-│  │  "auth"     │  │  "login()"  │  │  for task   │               │
			
 
				-│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘               │
			
 
				-│         │                │                │                       │
			
 
				-│         └────────────────┼────────────────┘                       │
			
 
				-│                          ▼                                        │
			
 
				-│              ┌───────────────────────┐                            │
			
 
				-│              │   SQLite Graph DB     │                            │
			
 
				-│              │   • 387 symbols       │                            │
			
 
				-│              │   • 1,204 edges       │                            │
			
 
				-│              │   • Instant lookups   │                            │
			
 
				-│              └───────────────────────┘                            │
			
 
				+│                            Claude Code                            │
			
 
				+│                                                                   │
			
 
				+│   "How does a request reach the database?"                        │
			
 
				+│       calls CodeGraph tools directly — no Explore sub-agent       │
			
 
				+│                                 │                                 │
			
 
				+└─────────────────────────────────┬─────────────────────────────────┘
			
 
				+                                  │
			
 
				+                                  ▼
			
 
				+┌───────────────────────────────────────────────────────────────────┐
			
 
				+│                        CodeGraph MCP Server                       │
			
 
				+│                                                                   │
			
 
				+│       context · trace · explore · callers · callees · impact      │
			
 
				+│                                 │                                 │
			
 
				+│                                 ▼                                 │
			
 
				+│                       SQLite knowledge graph                      │
			
 
				+│          symbols · edges · files · FTS5 full-text search          │
			
 
				 └───────────────────────────────────────────────────────────────────┘
			
 
				 ```
			
 
				 
			
--- a/docs/benchmarks/answer-directly-vs-explore-agent.md
+++ b/docs/benchmarks/answer-directly-vs-explore-agent.md
@@ -0,0 +1,88 @@
 
				+# Answer directly vs. delegate to an Explore agent (interactive A/B)
			
 
				+
			
 
				+**Question:** Does answering a "how does X work?" question *directly* with CodeGraph in the
			
 
				+main session bloat main-session context — and would Claude Code be better off delegating that
			
 
				+exploration to a disposable **Explore agent** (which keeps main context lean by absorbing the
			
 
				+file reads in a sub-transcript)? And critically: **does the answer change at scale**, on a
			
 
				+codebase far larger than Excalidraw?
			
 
				+
			
 
				+**Short answer:** No. With CodeGraph, main-session context is roughly **scale-invariant (~50k)**
			
 
				+because the retrieval is targeted and the `explore` payload is budget-capped — it does not
			
 
				+balloon on a 16× larger repo. Answering directly wins at **every** scale: same-or-leaner main
			
 
				+context than the delegation path, **zero file reads**, and ~28% fewer tokens. The
			
 
				+delegation-for-hygiene advantage stays marginal even on a large codebase.
			
 
				+
			
 
				+## Methodology
			
 
				+
			
 
				+- **Harness:** interactive Claude Code TUI driven via `scripts/agent-eval/itrun.sh` (tmux),
			
 
				+  **not** headless `claude -p`. This matters: headless spawns **0** Explore agents, so it cannot
			
 
				+  measure delegation behavior at all; only the interactive TUI does.
			
 
				+- **Arms:** `WITH` = CodeGraph in the MCP config; `WITHOUT` = empty MCP config (`--strict-mcp-config`).
			
 
				+- **Model:** `opus`. **n = 3 runs per arm.** Main **and** sub-agent transcripts parsed
			
 
				+  (`scripts/agent-eval/parse-session.mjs`); reads/bash are summed across main + sub-agents.
			
 
				+- **Repos:** Excalidraw (643 files, medium) and VS Code (~10.7k files, large — ~16× Excalidraw).
			
 
				+- **Build:** 0.9.4. **Date:** 2026-05-24.
			
 
				+- "main-session context" is the TUI's reported `Context X/Y` for the *main* thread (sub-agent
			
 
				+  context does not count against it). "billable tokens" = summed per-turn assistant usage
			
 
				+  (input + output + cache read + cache creation).
			
 
				+
			
 
				+## Excalidraw (643 files, medium)
			
 
				+
			
 
				+Question: *"How does Excalidraw render and update canvas elements?"*
			
 
				+
			
 
				+| metric | WITH codegraph | WITHOUT |
			
 
				+|---|---|---|
			
 
				+| Explore agents spawned | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2 of 3) |
			
 
				+| main-session context | 51k / 49k / 50k (~50k) | 48k / 34k / 26k (~36k) |
			
 
				+| total tool calls | 4 / 4 / 4 | 16 / 55 / 37 |
			
 
				+| Reads (main+sub) | 0 / 0 / 0 | 6 / 25 / 16 |
			
 
				+| billable tokens | ~127k | ~175k |
			
 
				+
			
 
				+## VS Code (~10.7k files, large — ~16× Excalidraw)
			
 
				+
			
 
				+Question: *"How does the extension host communicate with the main process?"*
			
 
				+
			
 
				+| metric | WITH codegraph | WITHOUT |
			
 
				+|---|---|---|
			
 
				+| main-session context | 47k / 43k / 50k (~47k) | 54k / 29k / 31k (~38k) |
			
 
				+| Explore agents | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2/3) |
			
 
				+| codegraph calls | ~8 (search + explore×2–3 + context) | 0 |
			
 
				+| Reads (main+sub) | 0 / 1 / 0 | 6 / 26 / 19 |
			
 
				+| billable tokens | ~126k | ~176k |
			
 
				+
			
 
				+## Findings
			
 
				+
			
 
				+**Main-session context is scale-invariant with CodeGraph.** With codegraph, main-session
			
 
				+context was **~47k on VS Code — essentially identical to Excalidraw's ~50k**, despite a 16×
			
 
				+bigger repo. It didn't balloon. Reason: codegraph's `explore` payload is **budget-capped** and
			
 
				+retrieval is **targeted** — answering one question pulls in the relevant *flow/area*, not more
			
 
				+just because the repo is huge. So codegraph makes main-session context roughly scale-invariant
			
 
				+(~50k). The delegation-for-hygiene advantage stays marginal even on a large codebase — exactly
			
 
				+the opposite of "it gets significant at scale."
			
 
				+
			
 
				+The thing that *would* balloon at scale is reading many big files directly into main — and
			
 
				+Claude Code avoids that **without** codegraph by delegating to an Explore agent (29–31k main),
			
 
				+but at the cost of **17–26 reads** and ~28% more tokens. CodeGraph keeps main lean a *better*
			
 
				+way: a capped, targeted payload — no delegation, **0 reads**.
			
 
				+
			
 
				+**On "the Explore agents use codegraph."** I couldn't reproduce it: across **6/6**
			
 
				+with-codegraph runs (both repos), Claude Code **never delegated** — it answered directly every
			
 
				+time. The Explore-agent path only appeared in the `without` arm (using grep/read, since codegraph
			
 
				+wasn't in that config). So with the current instructions + codegraph present, Claude Code stays
			
 
				+in the main session — the lean-main-via-Explore-agent best case simply isn't what happens;
			
 
				+lean-main-via-capped-codegraph is, and it's cheaper.
			
 
				+
			
 
				+## Verdict
			
 
				+
			
 
				+**"Answer directly with codegraph" wins for Claude Code too — at every scale.** No per-agent
			
 
				+split is needed; the unified "answer directly" instruction is right for Claude Code *and* for
			
 
				+Codex / Cursor / opencode (which have no Explore-agent mechanism and would otherwise read files
			
 
				+directly). This conclusion drove updating the README's `## CodeGraph` example block, which
			
 
				+previously told agents to "NEVER call `codegraph_explore` directly / ALWAYS spawn an Explore
			
 
				+agent" — i.e., it steered Claude Code toward the *worse* (17–26 read, ~28%-more-token) path.
			
 
				+
			
 
				+**Caveat / future work (not a blocker):** an Explore agent that *itself uses codegraph* could in
			
 
				+principle get lean-main *and* low-work. But the "answer directly" instruction prevents delegation
			
 
				+in practice (0 delegations observed across 6 runs), the main-context gain would be marginal
			
 
				+(~50k → ~30k, both a few percent of a 1M window), and it adds a sub-agent round-trip. Worth a
			
 
				+future experiment, not a default.