2 tygodni temu · 6094ba1e5a
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
				 
			
 
				 ### Supercharge Claude Code, Cursor, Codex, OpenCode, Hermes Agent, Gemini, Antigravity, and Kiro with Semantic Code Intelligence
			
 
				 
			
 
				-**~25% cheaper · ~62% fewer tool calls · 100% local**
			
 
				+**~16% cheaper · ~58% fewer tool calls · 100% local**
			
 
				 
			
 
				 ### [Documentation & Website →](https://colbymchenry.github.io/codegraph/)
			
 
				 
			
@@ -83,21 +83,21 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
 
				 
			
 
				 ### Benchmark Results
			
 
				 
			
 
				-Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with per-symbol adaptive `codegraph_explore` sizing._
			
 
				+Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-06-02), on the current build (`codegraph_explore` as the primary tool)._
			
 
				 
			
 
				-> **Average: 25% cheaper · 57% fewer tokens · 23% faster · 62% fewer tool calls**
			
 
				+> **Average: 16% cheaper · 47% fewer tokens · 22% faster · 58% fewer tool calls**
			
 
				 
			
 
				 | Codebase | Language | Cost | Tokens | Time | Tool calls |
			
 
				 |----------|----------|------|--------|------|------------|
			
 
				-| **VS Code** | TypeScript · ~10k files | 33% cheaper | 70% fewer | 27% faster | 80% fewer |
			
 
				-| **Excalidraw** | TypeScript · ~640 | 27% cheaper | 61% fewer | 26% faster | 70% fewer |
			
 
				-| **Django** | Python · ~3k | 23% cheaper | 70% fewer | 28% faster | 77% fewer |
			
 
				-| **Tokio** | Rust · ~790 | 35% cheaper | 70% fewer | 37% faster | 79% fewer |
			
 
				-| **OkHttp** | Java · ~645 | 11% cheaper | 48% fewer | 26% faster | 70% fewer |
			
 
				-| **Gin** | Go · ~110 | 15% cheaper | 35% fewer | 9% faster | 47% fewer |
			
 
				-| **Alamofire** | Swift · ~110 | 28% cheaper | 46% fewer | 7% faster | 13% fewer |
			
 
				+| **VS Code** | TypeScript · ~10k files | 18% cheaper | 64% fewer | 11% faster | 81% fewer |
			
 
				+| **Excalidraw** | TypeScript · ~640 | even | 25% fewer | 27% faster | 40% fewer |
			
 
				+| **Django** | Python · ~3k | 8% cheaper | 60% fewer | 13% faster | 77% fewer |
			
 
				+| **Tokio** | Rust · ~790 | even | 38% fewer | 18% faster | 57% fewer |
			
 
				+| **OkHttp** | Java · ~645 | 25% cheaper | 54% fewer | 31% faster | 50% fewer |
			
 
				+| **Gin** | Go · ~110 | 19% cheaper | 23% fewer | 24% faster | 44% fewer |
			
 
				+| **Alamofire** | Swift · ~110 | 40% cheaper | 64% fewer | 33% faster | 58% fewer |
			
 
				 
			
 
				-CodeGraph cuts **cost, tokens, tool calls, and time on every repo** — across small, medium, and large codebases — and answers most of them with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. `codegraph_explore` shows the answer in full — the mechanism plus the exact methods you asked about, even when they're buried in a multi-thousand-line file — while collapsing redundant interchangeable implementations to signatures, so the response is sized to the *answer* rather than the file count. The cost margin is narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays solidly positive across the board.
			
 
				+CodeGraph cuts **tokens, tool calls, and wall-clock time on every repo** — across small, medium, and large codebases — and answers them with **near-zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. `codegraph_explore` shows the answer in full — the mechanism plus the exact methods you asked about, even when they're buried in a multi-thousand-line file — while collapsing redundant interchangeable implementations to signatures, so the response is sized to the *answer* rather than the file count. **Cost stays flat-to-cheaper everywhere** — largest on the small repos (Alamofire, OkHttp), roughly break-even on the most response-heavy ones (Excalidraw, Tokio), where CodeGraph trades the no-CodeGraph agent's many small grep/read round-trips for a few large, cache-heavy tool responses.
			
 
				 
			
 
				 <details>
			
 
				 <summary><strong>Per-repo breakdown — WITH vs WITHOUT (median of 4)</strong></summary>
			
@@ -105,79 +105,79 @@ CodeGraph cuts **cost, tokens, tool calls, and time on every repo** — across s
 
				 **VS Code** · ~10k files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 37s | 2m 13s | 27% faster |
			
 
				+| Time | 1m 59s | 2m 13s | 11% faster |
			
 
				 | File Reads | 0 | 9 | −9 |
			
 
				 | Grep/Bash | 0 | 11 | −11 |
			
 
				-| Tool calls | 4 | 21 | 80% fewer |
			
 
				-| Total tokens | 545k | 1.79M | 70% fewer |
			
 
				-| Cost | $0.55 | $0.83 | 33% cheaper |
			
 
				+| Tool calls | 4 | 21 | 81% fewer |
			
 
				+| Total tokens | 640k | 1.79M | 64% fewer |
			
 
				+| Cost | $0.68 | $0.83 | 18% cheaper |
			
 
				 
			
 
				 **Excalidraw** · ~640 files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 34s | 2m 6s | 26% faster |
			
 
				+| Time | 1m 32s | 2m 6s | 27% faster |
			
 
				 | File Reads | 0 | 7 | −7 |
			
 
				-| Grep/Bash | 0 | 8 | −8 |
			
 
				-| Tool calls | 5 | 15 | 70% fewer |
			
 
				-| Total tokens | 651k | 1.69M | 61% fewer |
			
 
				-| Cost | $0.57 | $0.78 | 27% cheaper |
			
 
				+| Grep/Bash | 1 | 8 | −7 |
			
 
				+| Tool calls | 9 | 15 | 40% fewer |
			
 
				+| Total tokens | 1.27M | 1.69M | 25% fewer |
			
 
				+| Cost | $0.78 | $0.78 | even |
			
 
				 
			
 
				 **Django** · ~3k files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 25s | 1m 58s | 28% faster |
			
 
				+| Time | 1m 43s | 1m 58s | 13% faster |
			
 
				 | File Reads | 0 | 9 | −9 |
			
 
				 | Grep/Bash | 0 | 5 | −5 |
			
 
				 | Tool calls | 3 | 13 | 77% fewer |
			
 
				-| Total tokens | 419k | 1.41M | 70% fewer |
			
 
				-| Cost | $0.48 | $0.62 | 23% cheaper |
			
 
				+| Total tokens | 559k | 1.41M | 60% fewer |
			
 
				+| Cost | $0.57 | $0.62 | 8% cheaper |
			
 
				 
			
 
				 **Tokio** · ~790 files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 28s | 2m 20s | 37% faster |
			
 
				+| Time | 1m 55s | 2m 20s | 18% faster |
			
 
				 | File Reads | 0 | 8 | −8 |
			
 
				 | Grep/Bash | 0 | 6 | −6 |
			
 
				-| Tool calls | 3 | 14 | 79% fewer |
			
 
				-| Total tokens | 522k | 1.73M | 70% fewer |
			
 
				-| Cost | $0.53 | $0.82 | 35% cheaper |
			
 
				+| Tool calls | 6 | 14 | 57% fewer |
			
 
				+| Total tokens | 1.08M | 1.73M | 38% fewer |
			
 
				+| Cost | $0.82 | $0.82 | even |
			
 
				 
			
 
				 **OkHttp** · ~645 files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 6s | 1m 29s | 26% faster |
			
 
				-| File Reads | 1 | 4 | −3 |
			
 
				-| Grep/Bash | 0 | 6 | −6 |
			
 
				-| Tool calls | 3 | 10 | 70% fewer |
			
 
				-| Total tokens | 572k | 1.10M | 48% fewer |
			
 
				-| Cost | $0.48 | $0.55 | 11% cheaper |
			
 
				+| Time | 1m 1s | 1m 29s | 31% faster |
			
 
				+| File Reads | 0 | 4 | −4 |
			
 
				+| Grep/Bash | 2 | 6 | −4 |
			
 
				+| Tool calls | 5 | 10 | 50% fewer |
			
 
				+| Total tokens | 502k | 1.10M | 54% fewer |
			
 
				+| Cost | $0.41 | $0.55 | 25% cheaper |
			
 
				 
			
 
				 **Gin** · ~110 files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 1m 28s | 1m 37s | 9% faster |
			
 
				-| File Reads | 0 | 6 | −6 |
			
 
				-| Grep/Bash | 0 | 2 | −2 |
			
 
				-| Tool calls | 5 | 9 | 47% fewer |
			
 
				-| Total tokens | 552k | 847k | 35% fewer |
			
 
				-| Cost | $0.48 | $0.57 | 15% cheaper |
			
 
				+| Time | 1m 14s | 1m 37s | 24% faster |
			
 
				+| File Reads | 1 | 6 | −5 |
			
 
				+| Grep/Bash | 1 | 2 | −1 |
			
 
				+| Tool calls | 5 | 9 | 44% fewer |
			
 
				+| Total tokens | 651k | 847k | 23% fewer |
			
 
				+| Cost | $0.46 | $0.57 | 19% cheaper |
			
 
				 
			
 
				 **Alamofire** · ~110 files
			
 
				 | Metric | WITH cg | WITHOUT cg | Δ |
			
 
				 |---|---|---|---|
			
 
				-| Time | 2m 11s | 2m 21s | 7% faster |
			
 
				-| File Reads | 3 | 9 | −6 |
			
 
				-| Grep/Bash | 2 | 4 | −2 |
			
 
				-| Tool calls | 11 | 12 | 13% fewer |
			
 
				-| Total tokens | 1.13M | 2.10M | 46% fewer |
			
 
				-| Cost | $0.69 | $0.95 | 28% cheaper |
			
 
				+| Time | 1m 35s | 2m 21s | 33% faster |
			
 
				+| File Reads | 0 | 9 | −9 |
			
 
				+| Grep/Bash | 0 | 4 | −4 |
			
 
				+| Tool calls | 5 | 12 | 58% fewer |
			
 
				+| Total tokens | 766k | 2.10M | 64% fewer |
			
 
				+| Cost | $0.57 | $0.95 | 40% cheaper |
			
 
				 
			
 
				 </details>
			
 
				 
			
 
				 <details>
			
 
				 <summary><strong>Full benchmark details</strong></summary>
			
 
				 
			
 
				-**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with per-symbol adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
			
 
				+**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-06-02 on the current build. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
			
 
				 
			
 
				 **Queries:**
			
 
				 | Codebase | Query |