Переглянути джерело

fix(mcp): refine adaptive explore skeletonization — spare named orchestrators, override for family files

The first cut (off-spine + polymorphic-sibling) skeletonized two files the agent
then Read back, defeating the cost win: OkHttp's RealCall (it implements the
9-impl Lockable mixin, so it tripped the sibling signal despite being the
orchestrator) and Django's compiler.py. A real-agent A/B — not the deterministic
probe — exposed it.

Refined gate: a sibling file skeletonizes only if NOT spared, where spared = the
agent NAMED a callable in it (getResponseWithInterceptorChain,
SQLCompiler.execute_sql → keep full) UNLESS the file DEFINES a >=3-impl supertype.
A base+subclasses "family" file (compiler.py) is huge and Read-anyway, so
skeletonizing it frees explore budget for the sibling files the agent would
otherwise Read. buildFlowFromNamedSymbols now also returns namedNodeIds; adds a
definesPolymorphicSupertype helper.

Validated (headless A/B, Opus 4.8): both former README cost outliers flipped —
OkHttp 3% costlier -> cheaper (RealCall full, 0 read-backs), Django 10% costlier
-> cheaper (half the runs read 0). Full 7-repo re-validation: avg 22% cheaper /
47%% fewer tokens / 20%% faster / 50%% fewer tool calls, every repo cost-positive.
Inert repos (excalidraw/tokio/vscode/gin) stay 0 skeletons — the refined gate
skeletonizes a strict subset of the original.

Extends the regression test to 7 cases (named-callable spare + supertype-family
override). Updates the design doc, CHANGELOG, and README benchmark numbers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby McHenry 3 тижнів тому
батько
коміт
51f1d93404
5 змінених файлів з 606 додано та 113 видалено
  1. 1 1
      CHANGELOG.md
  2. 52 52
      README.md
  3. 373 0
      __tests__/adaptive-explore-sizing.test.ts
  4. 128 49
      docs/design/adaptive-explore-sizing.md
  5. 52 11
      src/mcp/tools.ts

+ 1 - 1
CHANGELOG.md

@@ -13,7 +13,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 - `codegraph init` now builds the initial index by default — you no longer need the `-i`/`--index` flag (it's still accepted, so existing commands and scripts keep working). (#483)
 - Go: Gin middleware chains now connect end-to-end in `codegraph_trace` and `codegraph_explore` — following a request reaches the middleware and route handlers registered via `.Use()` / `.GET()` instead of dead-ending where the framework dispatches the chain dynamically.
-- `codegraph_explore` is now leaner on interface-heavy flows: when a query spans many interchangeable implementations of one interface (an HTTP interceptor chain, say), it shows one implementation in full and the rest as signatures instead of every full body — fewer tokens for the same answer, so questions like these stop costing more than plain grep/read. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
+- `codegraph_explore` is now leaner on interface-heavy flows: when a query spans many interchangeable implementations of one interface (an HTTP interceptor chain, say), it shows the rest as signatures instead of every full body, while keeping the dispatch mechanism and any specific method you asked about in full. Fewer tokens for the same answer, so questions like these stop costing more than plain grep/read — in testing, the two slowest-to-pay-off repos (a Java and a Python framework) went from slightly costlier than native search to clearly cheaper. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
 
 ### Fixes
 

+ 52 - 52
README.md

@@ -4,7 +4,7 @@
 
 ### Supercharge Claude Code, Cursor, Codex, OpenCode, Hermes Agent, Gemini, Antigravity, and Kiro with Semantic Code Intelligence
 
-**~18% cheaper · ~57% fewer tool calls · 100% local**
+**~22% cheaper · ~50% fewer tool calls · 100% local**
 
 ### [Documentation & Website →](https://colbymchenry.github.io/codegraph/)
 
@@ -83,21 +83,21 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
 
 ### Benchmark Results
 
-Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on **v0.9.7** + Opus 4.8 (2026-05-28)._
+Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with adaptive `codegraph_explore` sizing._
 
-> **Average: 18% cheaper · 51% fewer tokens · 16% faster · 57% fewer tool calls**
+> **Average: 22% cheaper · 47% fewer tokens · 20% faster · 50% fewer tool calls**
 
 | Codebase | Language | Cost | Tokens | Time | Tool calls |
 |----------|----------|------|--------|------|------------|
-| **VS Code** | TypeScript · ~10k files | 26% cheaper | 63% fewer | 20% faster | 69% fewer |
-| **Excalidraw** | TypeScript · ~640 | 40% cheaper | 71% fewer | 41% faster | 82% fewer |
-| **Django** | Python · ~3k | 10% costlier | 45% fewer | 3% slower | 64% fewer |
-| **Tokio** | Rust · ~790 | 30% cheaper | 69% fewer | 22% faster | 71% fewer |
-| **OkHttp** | Java · ~645 | 3% costlier | 32% fewer | 15% faster | 60% fewer |
-| **Gin** | Go · ~110 | 7% cheaper | 35% fewer | 8% faster | 38% fewer |
-| **Alamofire** | Swift · ~110 | 38% cheaper | 45% fewer | 6% faster | 8% fewer |
+| **VS Code** | TypeScript · ~10k files | 13% cheaper | 63% fewer | 11% faster | 82% fewer |
+| **Excalidraw** | TypeScript · ~640 | 40% cheaper | 71% fewer | 51% faster | 82% fewer |
+| **Django** | Python · ~3k | 9% cheaper | 35% fewer | 7% faster | 38% fewer |
+| **Tokio** | Rust · ~790 | 31% cheaper | 59% fewer | 29% faster | 61% fewer |
+| **OkHttp** | Java · ~645 | 4% cheaper | 16% fewer | 11% faster | 40% fewer |
+| **Gin** | Go · ~110 | 28% cheaper | 40% fewer | 25% faster | 35% fewer |
+| **Alamofire** | Swift · ~110 | 32% cheaper | 43% fewer | 6% faster | 13% fewer |
 
-CodeGraph cuts **tool calls and total tokens on every repo** and answers large repos with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. The **cost** margin is narrower — and occasionally negative on smaller repos (Django, OkHttp) — because a modern model's native search is already cheap and CodeGraph's richer responses cost real input tokens; the consistent wins are fewer tool calls and faster answers.
+CodeGraph cuts **tool calls and total tokens on every repo** and answers large repos with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. **Every repo is now cheaper, not just faster** — the two former cost outliers (Django and OkHttp, where the answer spans many interchangeable implementations of one interface) flipped from *costlier* than native search to cheaper once adaptive `codegraph_explore` sizing stopped shipping every sibling's full body. The margin is still narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays positive across the board; the largest wins remain fewer tool calls and faster answers.
 
 <details>
 <summary><strong>Per-repo breakdown — WITH vs WITHOUT (median of 4)</strong></summary>
@@ -105,79 +105,79 @@ CodeGraph cuts **tool calls and total tokens on every repo** and answers large r
 **VS Code** · ~10k files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 49s | 2m 16s | 20% faster |
-| File Reads | 0 | 7 | −7 |
+| Time | 1m 58s | 2m 13s | 11% faster |
+| File Reads | 0 | 8 | −8 |
 | Grep/Bash | 0 | 9 | −9 |
-| Tool calls | 5 | 16 | 71% fewer |
-| Total tokens | 672k | 1.81M | 63% fewer |
-| Cost | $0.66 | $0.89 | 26% cheaper |
+| Tool calls | 3 | 17 | 82% fewer |
+| Total tokens | 607k | 1.65M | 63% fewer |
+| Cost | $0.66 | $0.76 | 13% cheaper |
 
 **Excalidraw** · ~640 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 41s | 2m 51s | 41% faster |
+| Time | 1m 23s | 2m 48s | 51% faster |
 | File Reads | 0 | 11 | −11 |
-| Grep/Bash | 0 | 11 | −11 |
-| Tool calls | 4 | 22 | 82% fewer |
-| Total tokens | 692k | 2.39M | 71% fewer |
-| Cost | $0.63 | $1.04 | 40% cheaper |
+| Grep/Bash | 0 | 9 | −9 |
+| Tool calls | 4 | 20 | 82% fewer |
+| Total tokens | 596k | 2.06M | 71% fewer |
+| Cost | $0.53 | $0.89 | 40% cheaper |
 
 **Django** · ~3k files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 2m 2s | 1m 58s | 4% slower |
-| File Reads | 2 | 10 | −8 |
-| Grep/Bash | 0 | 5 | −5 |
-| Tool calls | 5 | 14 | 64% fewer |
-| Total tokens | 720k | 1.30M | 45% fewer |
-| Cost | $0.70 | $0.64 | 10% costlier |
+| Time | 1m 43s | 1m 51s | 7% faster |
+| File Reads | 5 | 10 | −5 |
+| Grep/Bash | 0 | 4 | −4 |
+| Tool calls | 8 | 13 | 38% fewer |
+| Total tokens | 752k | 1.16M | 35% fewer |
+| Cost | $0.56 | $0.62 | 9% cheaper |
 
 **Tokio** · ~790 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 54s | 2m 26s | 22% faster |
-| File Reads | 0 | 11 | −11 |
-| Grep/Bash | 0 | 6 | −6 |
-| Tool calls | 5 | 17 | 73% fewer |
-| Total tokens | 657k | 2.10M | 69% fewer |
-| Cost | $0.61 | $0.86 | 30% cheaper |
+| Time | 2m 3s | 2m 53s | 29% faster |
+| File Reads | 3 | 9 | −6 |
+| Grep/Bash | 0 | 7 | −7 |
+| Tool calls | 7 | 17 | 61% fewer |
+| Total tokens | 869k | 2.14M | 59% fewer |
+| Cost | $0.63 | $0.92 | 31% cheaper |
 
 **OkHttp** · ~645 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 18s | 1m 32s | 15% faster |
-| File Reads | 1 | 5 | −4 |
-| Grep/Bash | 0 | 6 | −6 |
-| Tool calls | 4 | 10 | 58% fewer |
-| Total tokens | 713k | 1.05M | 32% fewer |
-| Cost | $0.59 | $0.57 | 3% costlier |
+| Time | 1m 18s | 1m 27s | 11% faster |
+| File Reads | 2 | 4 | −2 |
+| Grep/Bash | 0 | 4 | −4 |
+| Tool calls | 5 | 8 | 40% fewer |
+| Total tokens | 739k | 883k | 16% fewer |
+| Cost | $0.54 | $0.56 | 4% cheaper |
 
 **Gin** · ~110 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 12s | 1m 18s | 9% faster |
-| File Reads | 0 | 4 | −4 |
-| Grep/Bash | 0 | 4 | −4 |
-| Tool calls | 5 | 8 | 40% fewer |
-| Total tokens | 533k | 815k | 35% fewer |
-| Cost | $0.44 | $0.47 | 7% cheaper |
+| Time | 1m 8s | 1m 30s | 25% faster |
+| File Reads | 0 | 3 | −3 |
+| Grep/Bash | 0 | 5 | −5 |
+| Tool calls | 6 | 9 | 35% fewer |
+| Total tokens | 532k | 887k | 40% fewer |
+| Cost | $0.36 | $0.50 | 28% cheaper |
 
 **Alamofire** · ~110 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 2m 0s | 2m 7s | 6% faster |
-| File Reads | 6 | 8 | −2 |
-| Grep/Bash | 2 | 4 | −2 |
-| Tool calls | 11 | 12 | 9% fewer |
-| Total tokens | 1.09M | 1.98M | 45% fewer |
-| Cost | $0.63 | $1.01 | 38% cheaper |
+| Time | 2m 19s | 2m 28s | 6% faster |
+| File Reads | 5 | 9 | −4 |
+| Grep/Bash | 1 | 4 | −3 |
+| Tool calls | 11 | 12 | 13% fewer |
+| Total tokens | 1.22M | 2.14M | 43% fewer |
+| Cost | $0.71 | $1.04 | 32% cheaper |
 
 </details>
 
 <details>
 <summary><strong>Full benchmark details</strong></summary>
 
-**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated on codegraph **v0.9.7** (2026-05-28). These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
+**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
 
 **Queries:**
 | Codebase | Query |

+ 373 - 0
__tests__/adaptive-explore-sizing.test.ts

@@ -0,0 +1,373 @@
+/**
+ * Regression test for adaptive `codegraph_explore` sizing — sibling
+ * skeletonization (branch `feat/adaptive-explore-sizing`, commit d6d059f).
+ *
+ * Feature: when a file is BOTH (1) off the synthesized flow spine AND (2) a
+ * polymorphic sibling — its class implements/extends a supertype shared by
+ * >= MIN_SIBLINGS (3) implementers — `codegraph_explore` renders it as a
+ * class + member *signature* skeleton (bodies elided) instead of full source,
+ * keeping the on-spine exemplar and the mechanism full. This sizes the
+ * response to the answer rather than the budget cap on sibling-heavy flows
+ * (OkHttp's interceptor chain) without starving diffuse ones (distinct
+ * pipeline steps stay full). Default ON; CODEGRAPH_ADAPTIVE_EXPLORE=0 disables.
+ *
+ * The fixture is OkHttp's interceptor chain in miniature:
+ *   - `Interceptor` interface with FOUR implementers (>= 3 => a sibling family)
+ *   - a 3-hop call spine `dispatch -> proceed -> handleLogging` that passes
+ *     THROUGH LoggingInterceptor — so that file is the on-spine exemplar
+ *   - Bridge/Cache/RetryInterceptor: off-spine members of the sibling family
+ *     => skeletonize
+ *   - ResponseFormatter implements `Formatter`, which has only ONE impl (< 3)
+ *     => a distinct step: off-spine but NOT a sibling => stays full
+ *
+ * Guards the two ways the feature can silently regress: skeletonizing too much
+ * (a distinct step or the on-spine exemplar) or too little (the off-spine
+ * siblings), plus the escape hatch.
+ */
+import { describe, it, expect, beforeAll, afterAll, beforeEach } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import { ToolHandler } from '../src/mcp/tools';
+import CodeGraph from '../src/index';
+
+const SKELETON_MARK = '· skeleton (signatures only; Read for a full body)';
+
+/** Return the `#### <path> ...` section for a file basename, header through the
+ *  line before the next `###`/`####` header (or end of output). */
+function sectionFor(text: string, basename: string): string {
+  const lines = text.split('\n');
+  const start = lines.findIndex((l) => l.startsWith('#### ') && l.includes(basename));
+  if (start < 0) return '';
+  let end = lines.length;
+  for (let i = start + 1; i < lines.length; i++) {
+    if (lines[i].startsWith('### ') || lines[i].startsWith('#### ')) {
+      end = i;
+      break;
+    }
+  }
+  return lines.slice(start, end).join('\n');
+}
+
+describe('adaptive codegraph_explore sizing — sibling skeletonization', () => {
+  let testDir: string;
+  let cg: CodeGraph;
+  let handler: ToolHandler;
+
+  // Names the spine (dispatch/proceed/handleLogging), the on-spine exemplar,
+  // the three off-spine siblings, and the distinct step — so every file we
+  // assert on is gathered as relevant. maxFiles overrides the very-tiny tier's
+  // 4-file default so all of them land in one call.
+  const QUERY =
+    'dispatch proceed handleLogging LoggingInterceptor BridgeInterceptor CacheInterceptor RetryInterceptor ResponseFormatter';
+
+  beforeAll(async () => {
+    testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-adaptive-explore-'));
+    const srcDir = path.join(testDir, 'src');
+    fs.mkdirSync(srcDir);
+
+    const write = (name: string, body: string) =>
+      fs.writeFileSync(path.join(srcDir, name), body.trimStart());
+
+    // The interchangeable contract — 4 implementers below => sibling family.
+    write(
+      'interceptor.ts',
+      `
+export interface Interceptor {
+  intercept(request: string): string;
+}
+`
+    );
+
+    // The mechanism + the spine: dispatch -> proceed -> (LoggingInterceptor) handleLogging.
+    // Unique method names so the call edges resolve unambiguously.
+    write(
+      'dispatcher.ts',
+      `
+import { LoggingInterceptor } from './logging-interceptor';
+
+export class RequestDispatcher {
+  dispatch(): string {
+    const chain = new InterceptorChain();
+    return chain.proceed();
+  }
+}
+
+export class InterceptorChain {
+  proceed(): string {
+    const exemplar = new LoggingInterceptor();
+    return exemplar.handleLogging();
+  }
+}
+`
+    );
+
+    // On-spine exemplar: handleLogging is the spine's tail, so this whole file
+    // is on-spine and must stay FULL even though it's a sibling (implements Interceptor).
+    write(
+      'logging-interceptor.ts',
+      `
+import { Interceptor } from './interceptor';
+
+export class LoggingInterceptor implements Interceptor {
+  handleLogging(): string {
+    const tag = 'LOGGING_BODY_MARKER';
+    return this.intercept(tag);
+  }
+  intercept(request: string): string {
+    return 'logged:' + request;
+  }
+}
+`
+    );
+
+    // Off-spine siblings — interchangeable impls of Interceptor => SKELETONIZE.
+    // Each body carries a unique marker that must NOT survive skeletonization.
+    write(
+      'bridge-interceptor.ts',
+      `
+import { Interceptor } from './interceptor';
+
+export class BridgeInterceptor implements Interceptor {
+  intercept(request: string): string {
+    const detail = 'BRIDGE_BODY_MARKER';
+    return 'bridged:' + request + detail;
+  }
+}
+`
+    );
+    write(
+      'cache-interceptor.ts',
+      `
+import { Interceptor } from './interceptor';
+
+export class CacheInterceptor implements Interceptor {
+  intercept(request: string): string {
+    const detail = 'CACHE_BODY_MARKER';
+    return 'cached:' + request + detail;
+  }
+}
+`
+    );
+    write(
+      'retry-interceptor.ts',
+      `
+import { Interceptor } from './interceptor';
+
+export class RetryInterceptor implements Interceptor {
+  intercept(request: string): string {
+    const detail = 'RETRY_BODY_MARKER';
+    return 'retried:' + request + detail;
+  }
+}
+`
+    );
+
+    // A 1:1 interface->impl pair: off-spine, implements something, but the
+    // supertype has only ONE impl (< MIN_SIBLINGS) => a DISTINCT step => FULL.
+    write(
+      'formatter.ts',
+      `
+export interface Formatter {
+  format(input: string): string;
+}
+`
+    );
+    write(
+      'response-formatter.ts',
+      `
+import { Formatter } from './formatter';
+import { JsonCodec } from './codec';
+
+export class ResponseFormatter implements Formatter {
+  format(input: string): string {
+    const detail = 'FORMATTER_BODY_MARKER';
+    // Calls into the Codec family from OFF the dispatch spine, so codec.ts is
+    // gathered as relevant but stays off-spine (mirrors Django: compiler.py is
+    // referenced by the flow yet off the QuerySet-iteration spine).
+    return new JsonCodec().encode(input) + detail;
+  }
+}
+`
+    );
+
+    // An off-spine sibling (implements Interceptor) the agent would otherwise
+    // skeletonize — BUT it owns a uniquely-named method `authenticate` the agent
+    // names in the query. Mirrors OkHttp's RealCall (named getResponseWith-
+    // InterceptorChain): a named callable means "show me this", so it stays full.
+    write(
+      'auth-interceptor.ts',
+      `
+import { Interceptor } from './interceptor';
+
+export class AuthInterceptor implements Interceptor {
+  authenticate(token: string): string {
+    const detail = 'AUTH_BODY_MARKER';
+    return 'auth:' + token + detail;
+  }
+  intercept(request: string): string {
+    return this.authenticate(request);
+  }
+}
+`
+    );
+
+    // A base class that DEFINES a >=3-impl supertype AND co-locates its
+    // subclasses in the same file — mirrors Django's compiler.py (SQLCompiler +
+    // SQLInsertCompiler/SQLUpdateCompiler/...). The subclasses' `extends` edges
+    // make the file look like a sibling, but it's the family's base/mechanism,
+    // so it must stay full.
+    write(
+      'codec.ts',
+      `
+export class Codec {
+  encode(input: string): string {
+    const detail = 'CODEC_BASE_MARKER';
+    return input + detail;
+  }
+}
+export class JsonCodec extends Codec {
+  encode(input: string): string { return '{' + input + '}'; }
+}
+export class XmlCodec extends Codec {
+  encode(input: string): string { return '<' + input + '>'; }
+}
+export class YamlCodec extends Codec {
+  encode(input: string): string { return '- ' + input; }
+}
+`
+    );
+
+    cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
+    await cg.indexAll();
+    handler = new ToolHandler(cg);
+  });
+
+  afterAll(() => {
+    if (cg) cg.destroy();
+    if (testDir && fs.existsSync(testDir)) {
+      fs.rmSync(testDir, { recursive: true, force: true });
+    }
+  });
+
+  beforeEach(() => {
+    // Each test asserts against the default (ON) behaviour unless it opts out.
+    delete process.env.CODEGRAPH_ADAPTIVE_EXPLORE;
+  });
+
+  it('fixture sanity: Interceptor has >=3 implementers, Formatter has <3', () => {
+    const find = (name: string, kind: string) =>
+      cg.searchNodes(name).map((r) => r.node).find((n) => n.name === name && n.kind === kind);
+
+    const interceptor = find('Interceptor', 'interface');
+    const formatter = find('Formatter', 'interface');
+    expect(interceptor).toBeTruthy();
+    expect(formatter).toBeTruthy();
+
+    const implementers = (id: string) =>
+      cg.getIncomingEdges(id).filter((e) => e.kind === 'implements' || e.kind === 'extends').length;
+
+    // The whole gate hinges on this signal — assert the fixture actually
+    // produces the >=3 / <3 split, so a TS-extraction change fails here loudly
+    // rather than silently flipping the skeletonization downstream.
+    expect(implementers(interceptor!.id)).toBeGreaterThanOrEqual(3);
+    expect(implementers(formatter!.id)).toBeLessThan(3);
+  });
+
+  it('skeletonizes off-spine polymorphic siblings (bodies elided, signatures kept)', async () => {
+    const result = await handler.execute('codegraph_explore', { query: QUERY, maxFiles: 12 });
+    const text = result.content?.[0]?.text ?? '';
+
+    // Precondition: the spine must have formed, or nothing skeletonizes.
+    expect(text).toContain('## Flow (call path among the symbols you queried)');
+
+    for (const [file, marker] of [
+      ['bridge-interceptor.ts', 'BRIDGE_BODY_MARKER'],
+      ['cache-interceptor.ts', 'CACHE_BODY_MARKER'],
+      ['retry-interceptor.ts', 'RETRY_BODY_MARKER'],
+    ] as const) {
+      const section = sectionFor(text, file);
+      expect(section, `${file} should be present in the explore output`).not.toBe('');
+      expect(section, `${file} should be skeletonized`).toContain(SKELETON_MARK);
+      // The signature line survives; the body (with its marker) is elided.
+      expect(section).toContain('intercept(request');
+      expect(section, `${file} body marker must NOT survive skeletonization`).not.toContain(marker);
+    }
+  });
+
+  it('keeps the on-spine exemplar full even though it is a sibling', async () => {
+    const result = await handler.execute('codegraph_explore', { query: QUERY, maxFiles: 12 });
+    const text = result.content?.[0]?.text ?? '';
+
+    const section = sectionFor(text, 'logging-interceptor.ts');
+    expect(section, 'logging-interceptor.ts should be present').not.toBe('');
+    expect(section, 'on-spine exemplar must NOT be skeletonized').not.toContain(SKELETON_MARK);
+    // Full source => the body marker is present.
+    expect(section).toContain('LOGGING_BODY_MARKER');
+  });
+
+  it('keeps a distinct step full (off-spine but supertype has < 3 implementers)', async () => {
+    const result = await handler.execute('codegraph_explore', { query: QUERY, maxFiles: 12 });
+    const text = result.content?.[0]?.text ?? '';
+
+    const section = sectionFor(text, 'response-formatter.ts');
+    expect(section, 'response-formatter.ts should be present').not.toBe('');
+    expect(section, 'a 1:1 interface impl is not a sibling and must stay full').not.toContain(SKELETON_MARK);
+    expect(section).toContain('FORMATTER_BODY_MARKER');
+  });
+
+  it('CODEGRAPH_ADAPTIVE_EXPLORE=0 disables skeletonization (siblings render full)', async () => {
+    process.env.CODEGRAPH_ADAPTIVE_EXPLORE = '0';
+    try {
+      const result = await handler.execute('codegraph_explore', { query: QUERY, maxFiles: 12 });
+      const text = result.content?.[0]?.text ?? '';
+
+      expect(text, 'no file should be skeletonized with the flag off').not.toContain(SKELETON_MARK);
+      // The previously-skeletonized siblings now render their full bodies.
+      const section = sectionFor(text, 'bridge-interceptor.ts');
+      expect(section).not.toBe('');
+      expect(section).toContain('BRIDGE_BODY_MARKER');
+    } finally {
+      delete process.env.CODEGRAPH_ADAPTIVE_EXPLORE;
+    }
+  });
+
+  // Names AuthInterceptor's `authenticate` and Codec's `encode` (both methods),
+  // plus the spine tokens so a spine still forms. Same Interceptor family as the
+  // skeleton test, plus the Codec base+subclasses family.
+  const SPARE_QUERY = `${QUERY} authenticate encode AuthInterceptor Codec JsonCodec`;
+
+  it('spares an off-spine sibling when the agent NAMED a callable in it (RealCall fix)', async () => {
+    const result = await handler.execute('codegraph_explore', { query: SPARE_QUERY, maxFiles: 15 });
+    const text = result.content?.[0]?.text ?? '';
+    expect(text).toContain('## Flow (call path among the symbols you queried)');
+
+    // auth-interceptor.ts is an off-spine Interceptor sibling — would skeletonize —
+    // but the agent named its method `authenticate`, so it stays FULL.
+    const auth = sectionFor(text, 'auth-interceptor.ts');
+    expect(auth, 'auth-interceptor.ts should be present').not.toBe('');
+    expect(auth, 'a file holding an agent-named callable must NOT be skeletonized').not.toContain(SKELETON_MARK);
+    expect(auth).toContain('AUTH_BODY_MARKER');
+
+    // Contrast: bridge-interceptor.ts — same family, named only by TYPE — still skeletonizes.
+    const bridge = sectionFor(text, 'bridge-interceptor.ts');
+    expect(bridge, 'a sibling named only by type still skeletonizes').toContain(SKELETON_MARK);
+    expect(bridge).not.toContain('BRIDGE_BODY_MARKER');
+  });
+
+  it('skeletonizes a base+subclasses family file even when named (compiler.py: family override beats the named spare)', async () => {
+    const result = await handler.execute('codegraph_explore', { query: SPARE_QUERY, maxFiles: 15 });
+    const text = result.content?.[0]?.text ?? '';
+
+    // codec.ts defines the base Codec (>=3 subclasses extend it) and co-locates the
+    // subclasses — a redundant, Read-anyway "family" file (Django's compiler.py). Even
+    // though the agent named `encode`, it STILL skeletonizes: a full one would eat the
+    // explore budget and starve the sibling files. Contrast auth-interceptor.ts above,
+    // which is named AND not a family file → spared. This is the override that keeps
+    // Django from regressing (sparing the family file cost more and Read more).
+    const codec = sectionFor(text, 'codec.ts');
+    expect(codec, 'codec.ts should be present').not.toBe('');
+    expect(codec, 'a named base+subclasses family file still skeletonizes (budget)').toContain(SKELETON_MARK);
+    expect(codec, 'the elided base body marker must NOT survive').not.toContain('CODEC_BASE_MARKER');
+  });
+});

+ 128 - 49
docs/design/adaptive-explore-sizing.md

@@ -1,14 +1,34 @@
 # Design + status: adaptive `codegraph_explore` sizing (sibling skeletonization)
 
 **Status:** Implemented & validated, **default-on**, on branch
-`feat/adaptive-explore-sizing` (commit `d6d059f`, 2026-05-29). Escape hatch:
-`CODEGRAPH_ADAPTIVE_EXPLORE=0`.
+`feat/adaptive-explore-sizing` (initial commit `d6d059f`; **refined 2026-05-29**
+after a real-agent A/B exposed a read-back regression — see
+"Refinement" below). Escape hatch: `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
 **Motivation:** make `codegraph_explore` size its output to the *answer* rather
 than always filling the budget cap — so a "sibling-heavy" flow (many
 interchangeable implementations of one interface) stops costing *more* than
 plain grep/read, without starving "diffuse" flows that genuinely need broad
 source.
 
+> **Refinement (2026-05-29) — the read-back regression.** The first cut gated
+> only on *off-spine + polymorphic-sibling*. A real-agent A/B (not the
+> deterministic probe) showed that this skeletonized two files the agent then
+> **Read back**, defeating the point: OkHttp's `RealCall` (it implements the
+> 9-impl `Lockable` *mixin*, so it tripped the sibling signal even though it's
+> the orchestrator) and Django's `compiler.py` (it *defines* `SQLCompiler` and
+> co-locates its subclasses). Two conditions fixed it — a file skeletonizes only
+> if it is **not spared**, where **spared = the agent NAMED a callable in it**
+> (`getResponseWithInterceptorChain`, `SQLCompiler.execute_sql` → keep it full)
+> **UNLESS the file DEFINES a ≥3-impl supertype** (a base+subclasses "family"
+> file is huge and Read-anyway, so skeletonizing it *frees explore budget* for
+> the sibling files the agent would otherwise Read). Result: OkHttp **3%
+> costlier → ~10% cheaper** (RealCall full, 0 read-backs); Django **10% costlier
+> → ~10% cheaper** (compiler.py skeleton frees ~6.5 KB of the 28 KB budget; half
+> the runs answer with 0 reads). The supertype signal was initially used as a
+> *spare* — that was backwards and regressed Django to 9% costlier by starving
+> its budget; it is now an *override* of the named-callable spare. The
+> single-condition history below is kept for context.
+
 ---
 
 ## TL;DR
@@ -27,12 +47,21 @@ polymorphic sibling**, render it as a **skeleton** (class + member *signatures*,
 bodies elided) instead of full source — keeping the on-spine exemplar and the
 mechanism in full.
 
-- **OkHttp:** explore `28.5k → 16.6k` chars; headless A/B median **$0.413 ON vs
-  $0.462 shipped vs ~$0.57 without-CodeGraph** → flips OkHttp from −3% costlier
-  to **~28% cheaper than native**, with **reads NOT raised** (median 1 vs 3).
-- **Excalidraw / Tokio / Django / VS Code / Gin:** explore output is
-  **byte-identical** with the flag on/off (0 skeletons) → **provably zero
-  regression**. Their flows have no off-spine ≥3-implementer sibling group.
+- **OkHttp:** the interceptor-chain flow skeletonizes the 5 redundant
+  `: Interceptor` impls while keeping `RealInterceptorChain` (the dispatch
+  mechanism) and `RealCall` (the orchestrator the agent named) full → **~10%
+  cheaper than native, 0 RealCall read-backs** (see Refinement for the corrected
+  numbers; the original `28.5k → 16.6k` / "reads 1 vs 3" figures came from a
+  deterministic probe query, not the agent's real query).
+- **Django:** the QuerySet→SQL flow skeletonizes `compiler.py` (a
+  base+subclasses family file), freeing budget → **~10% cheaper**. (The earlier
+  claim that Django was "byte-identical / 0 skeletons" was an artifact of the
+  *probe* query; the agent's real query DOES surface the SQLCompiler family.)
+- **Excalidraw / Tokio / VS Code / Gin:** explore output is **byte-identical**
+  with the flag on/off (0 skeletons) — their flows have no off-spine
+  ≥3-implementer sibling group. The corrected gate only *adds* a spare
+  condition, so it skeletonizes a **strict subset** of the original gate → these
+  repos provably stay at 0 skeletons (verified by probe).
 
 ---
 
@@ -60,26 +89,45 @@ reconstruct them from signatures (more reasoning, net costlier; see "Dead ends")
 So the whole game is: **tell "interchangeable sibling" apart from "distinct
 step," cheaply.**
 
-## The two-condition gate
-
-A file is skeletonized iff **both** hold (and `CODEGRAPH_ADAPTIVE_EXPLORE != 0`):
-
-1. **Off the flow spine.** `buildFlowFromNamedSymbols` now returns its path node
-   set (`pathNodeIds`) in addition to the rendered Flow text. A file with any
-   symbol on that traced chain is "on-spine" and always kept full — that's the
-   mechanism + the exemplar the agent is actually tracing through. (Gated on a
-   spine existing at all; if there's no spine, nothing skeletonizes.)
-
-2. **A polymorphic sibling.** The file's class `implements`/`extends` a supertype
-   that has **≥ 3 implementers** (`MIN_SIBLINGS`). This is the signal that the
-   class is one of many *interchangeable* implementations rather than a unique
-   step. Computed from real `implements`/`extends` edges (see "Why this signal"),
-   cached per-supertype so it stays a handful of edge lookups.
-
-`RealInterceptorChain` *also* implements `Interceptor`, but its `proceed` is
-**on the spine** → kept full (condition 1 fails). `RealCall` is off-spine but
-implements nothing with ≥3 impls → kept full (condition 2 fails). The other
-interceptors are off-spine **and** ≥3-impl siblings → skeletonized. Exactly right.
+## The gate (refined)
+
+A file is skeletonized iff **all** hold (and `CODEGRAPH_ADAPTIVE_EXPLORE != 0`):
+
+1. **A spine exists.** `buildFlowFromNamedSymbols` returns its path node set
+   (`pathNodeIds`) and the full set of agent-named callables (`namedNodeIds`). If
+   no spine forms, nothing skeletonizes.
+
+2. **Off the flow spine.** No symbol in the file is on the traced chain — that
+   chain is the mechanism the agent is walking, always kept full.
+
+3. **A polymorphic sibling.** The file's class `implements`/`extends` a supertype
+   with **≥ 3 implementers** (`MIN_SIBLINGS`) — the signal that it's one of many
+   *interchangeable* impls. From real `implements`/`extends` edges, cached.
+
+4. **Not spared.** A file is **spared** (kept full) iff the agent **named a
+   callable in it** — a named method/function is something the agent asked to
+   *see* (`getResponseWithInterceptorChain`, `SQLCompiler.execute_sql`), not an
+   interchangeable leaf — **UNLESS the file itself DEFINES a ≥3-impl supertype**.
+   That last clause is the override: a base+subclasses "family" file (Django's
+   `compiler.py`) is huge and Read-anyway, so a full copy just eats explore
+   budget; skeletonizing it *frees* that budget for the sibling files the agent
+   would otherwise Read. So: *named ⇒ spare, unless it's a family file ⇒
+   skeletonize anyway.*
+
+Worked through the two repos:
+
+- **`RealInterceptorChain`** — `proceed` is on the spine → kept full (cond. 2).
+- **`RealCall`** — off-spine, and it trips the sibling signal via the **9-impl
+  `Lockable` mixin** (not because it's an interchangeable interceptor). But the
+  agent named `getResponseWithInterceptorChain`/`execute`/`enqueue` in it, and it
+  defines no ≥3-impl supertype → **spared, kept full** (cond. 4). This is the fix
+  for the read-back: before cond. 4 it skeletonized and the agent Read it back.
+- **`BridgeInterceptor` & the other 4** — off-spine, ≥3-impl siblings, named only
+  by *type*, define no supertype → **skeletonized**. The win.
+- **Django `compiler.py`** — off-spine, a sibling (its subclasses extend
+  `SQLCompiler`), the agent named `execute_sql` in it — *but it defines the
+  `SQLCompiler` supertype*, so the override fires → **skeletonized** (frees
+  budget). Sparing it instead (the wrong first attempt) cost MORE and Read MORE.
 
 ## Why "shared supertype with ≥3 implementers" is the signal
 
@@ -121,24 +169,28 @@ that actually *names* the symbol, so the skeleton shows the real signature:
 The header still lists the file's symbols and says `Read for a full body`, so the
 agent can pull one specific implementation if it truly needs it.
 
-## Validation
+## Validation (refined gate)
 
-Headless `claude -p`, Opus 4.8, median of 3, WITH-CodeGraph adaptive **on vs off**
-(isolates the flag). Probe sizes from `scripts/agent-eval/probe-explore.mjs`.
+Headless `claude -p`, Opus 4.8, **WITH vs WITHOUT** CodeGraph (the real benchmark
+arm, not the on/off probe the first cut used). Cost = median `total_cost_usd`.
 
-| Repo | explore OFF→ON | skeletons | A/B cost (ON vs shipped) | reads |
+| Repo | WITH→WITHOUT cost | WITH reads | WITHOUT reads | RealCall/compiler read-back |
 |---|---|---|---|---|
-| **OkHttp** | 28.5k → **16.6k** | 6 | **$0.413 vs $0.462** (~28% < native's $0.57) | flat (1 vs 3) |
-| Excalidraw | 28.6k → 28.6k | 0 | byte-identical → neutral | — |
-| Tokio | identical | 0 | neutral | — |
-| Django | identical | 0 | neutral | — |
-| VS Code | identical | 0 | neutral | — |
-| Gin | identical | 0 | neutral | — |
-
-The decisive check (the open risk of skeletonization) **passed**: skeletonizing
-the off-spine interceptors did **not** push the agent to Read them back — reads
-stayed flat (lower, if anything). And the 5 non-sibling repos are byte-identical
-with the flag toggled, so default-on carries no regression for them.
+| **OkHttp** (n=4) | **$0.45 → $0.50** (~10% cheaper) | 2 | 3.5 | **0 / —** (RealCall full) |
+| **Django** (n=6) | **$0.56 → $0.63** (~10% cheaper) | 2 | 8.5 | half the runs read 0 |
+
+Both were the README's **cost outliers** (OkHttp 3% costlier, Django 10%
+costlier) and both flipped to clear wins. OkHttp WITH was cheaper in all 4 runs;
+Django in 5 of 6 (n=6 to see through its high variance). WITHOUT baselines match
+the README ($0.50/$0.63 vs $0.57/$0.64), so the gain is the WITH-arm improving.
+
+The **decisive check now passes for the right reason**: with the named-callable
+spare, OkHttp's `RealCall` stays full and is **never** Read back (it was Read
+back in 3/4 runs before the fix). The inert repos (Excalidraw / Tokio / VS Code /
+Gin) stay at **0 skeletons** — verified by probe — because the refined gate
+skeletonizes a strict subset of the original. (The first cut's "on vs off, reads
+flat 1 vs 3" claim came from a deterministic probe query and did **not** hold for
+the agent's real query — that mismatch is what this refinement corrects.)
 
 ## Dead ends (don't re-attempt these)
 
@@ -156,20 +208,47 @@ with the flag toggled, so default-on carries no regression for them.
 4. **A plain "core-floor" gate** (keep first N full, skeletonize the rest) —
    skeletonized Excalidraw's *distinct* steps → **+17% cost regression**. The
    sibling condition is what makes it safe.
+5. **Sparing a file because it DEFINES the supertype** (the first refinement
+   attempt). Backwards: a base+subclasses *family* file (Django's `compiler.py`,
+   2,266 lines) is huge and Read-anyway, so keeping it full just **eats the 28 KB
+   explore budget and starves the sibling files** the agent then Reads — it
+   regressed Django to **9% costlier** ($0.71). Defining a supertype is instead
+   an **override** that lets a named family file skeletonize anyway.
+6. **Validating skeletonization with the deterministic probe query only.** The
+   probe (`probe-explore.mjs "<symbol bag>"`) and the *agent's* real explore
+   query name symbols differently, so they form different spines and skeletonize
+   different files. The probe said "Django: 0 skeletons / reads flat"; the real
+   agent query skeletonized `compiler.py` and Read it back. **Always confirm with
+   a real-agent A/B (`run-all.sh`), not just the probe.**
 
 ## Code
 
 - `src/mcp/tools.ts`
   - `adaptiveExploreEnabled()` — the flag (default on).
-  - `buildFlowFromNamedSymbols()` — now returns `{ text, pathNodeIds }`.
-  - `handleExplore()` — `isPolymorphicSibling()` helper (supertype ≥3-impl
-    detection, cached) + the skeleton branch in the source-section loop.
+  - `buildFlowFromNamedSymbols()` — returns `{ text, pathNodeIds, namedNodeIds }`.
+    `namedNodeIds` is every callable the agent named (a superset of the spine) —
+    the named-callable spare reads it.
+  - `handleExplore()` — two cached helpers: `isPolymorphicSibling()` (a node has
+    an outgoing `implements`/`extends` to a ≥3-impl supertype) and
+    `definesPolymorphicSupertype()` (a node HAS ≥3 incoming `implements`/`extends`
+    — i.e. the file is the family base). The skeleton branch:
+    `off-spine && isPolymorphicSibling && !(namedInFile && !definesSupertype)`.
+- `__tests__/adaptive-explore-sizing.test.ts` — 7 cases incl. the named-callable
+  spare (RealCall) and the supertype-family override (compiler.py).
 
 ## Frontier / future work
 
-- **No regression test yet** for the skeletonization (a fixture with ≥3 interface
-  impls + a flow spine asserting off-spine siblings skeletonize, distinct steps
-  stay full, `=0` disables). Recommended before/with merge.
+- **Per-symbol skeletonization within a family file.** `compiler.py` is
+  skeletonized whole, so `SQLCompiler.execute_sql` (the base mechanism) becomes a
+  signature too and *is* Read back in ~half the Django runs. The ideal is to keep
+  the base class's methods full and elide only the redundant subclass bodies —
+  shrinking the payload without eliding the answer. Whole-file skeletonization
+  can't express that yet.
+- **Big non-sibling files dominate Django's residual reads.** `query.py` (3,040
+  lines) and `sql/query.py` are not polymorphic families, so skeletonization
+  can't touch them; the agent Reads them when the 28 KB clustered view is
+  insufficient. That's the explore-budget / big-file-clustering frontier, not
+  skeletonization.
 - **Non-interface sibling families** (Go `HandlerFunc` slices, function-pointer
   registries) aren't caught — they have no `implements`/`extends` edge. Gin's
   middleware chain, for instance, doesn't trip the gate (its handlers are funcs,

+ 52 - 11
src/mcp/tools.ts

@@ -1924,8 +1924,8 @@ export class ToolHandler {
    * whose qualifiedName contains another named token (`PmsProductServiceImpl::list`),
    * dropping unrelated `OmsOrderService::list`.
    */
-  private buildFlowFromNamedSymbols(cg: CodeGraph, query: string): { text: string; pathNodeIds: Set<string> } {
-    const EMPTY: { text: string; pathNodeIds: Set<string> } = { text: '', pathNodeIds: new Set<string>() };
+  private buildFlowFromNamedSymbols(cg: CodeGraph, query: string): { text: string; pathNodeIds: Set<string>; namedNodeIds: Set<string> } {
+    const EMPTY = { text: '', pathNodeIds: new Set<string>(), namedNodeIds: new Set<string>() };
     try {
       const CALLABLE = new Set(['method', 'function', 'component', 'constructor']);
       // Strip only a REAL file extension (Create.cs → Create); KEEP qualified
@@ -1999,7 +1999,12 @@ export class ToolHandler {
         out.push(`${i + 1}. ${step.node.name} (${step.node.filePath}:${step.node.startLine})`);
       }
       out.push('', '> Full source for these symbols is below; codegraph_trace(from,to) for the exact path between two endpoints.', '');
-      return { text: out.join('\n'), pathNodeIds: new Set(best.map((s) => s.node.id)) };
+      // namedNodeIds = every callable the agent explicitly named (a superset of
+      // the spine). A file holding one is something the agent asked to SEE, so it
+      // must keep full source even if it's an off-spine polymorphic sibling — the
+      // agent named `getResponseWithInterceptorChain` / `SQLCompiler.execute_sql`
+      // as the mechanism, not as an interchangeable leaf. See the skeleton gate.
+      return { text: out.join('\n'), pathNodeIds: new Set(best.map((s) => s.node.id)), namedNodeIds: new Set(named.keys()) };
     } catch {
       return EMPTY;
     }
@@ -2265,6 +2270,32 @@ export class ToolHandler {
       return false;
     };
 
+    // A file that DEFINES a polymorphic supertype (a class/interface with ≥
+    // MIN_SIBLINGS implementers) AND co-locates its subclasses is a redundant
+    // "family" file — Django's compiler.py holds `SQLCompiler` + its 4 subclasses
+    // (SQLInsert/Update/Delete/AggregateCompiler) in 2,266 lines. Such files are
+    // huge and read-anyway, so they should STILL skeletonize even when the agent
+    // named a method in them: a full one eats ~6.5K of the explore budget (Django
+    // is pinned at the 28K cap, truncating), starving the sibling files the agent
+    // then Reads. This flag OVERRIDES the named-callable spare below — it does NOT
+    // by itself spare a file. (OkHttp's RealCall implements the `Lockable` mixin
+    // but defines no ≥3-impl supertype, so the named spare keeps it full.)
+    const superMany = new Map<string, boolean>();
+    const definesPolymorphicSupertype = (nodes: Node[]): boolean => {
+      for (const n of nodes) {
+        if (n.kind !== 'class' && n.kind !== 'interface' && n.kind !== 'struct'
+            && n.kind !== 'trait' && n.kind !== 'protocol' && n.kind !== 'type_alias') continue;
+        let many = superMany.get(n.id);
+        if (many === undefined) {
+          many = cg.getIncomingEdges(n.id)
+            .filter((x) => x.kind === 'implements' || x.kind === 'extends').length >= MIN_SIBLINGS;
+          superMany.set(n.id, many);
+        }
+        if (many) return true;
+      }
+      return false;
+    };
+
     lines.push('### Source Code');
     lines.push('');
     lines.push('> The code below is the **verbatim, current on-disk source** of these files — re-read from disk on this call and line-numbered, byte-for-byte identical to what the Read tool returns. It is NOT a summary, outline, or stale cache. Treat each block as a Read you have already performed: do not Read a file shown here.');
@@ -2292,16 +2323,26 @@ export class ToolHandler {
       const lang = group.nodes[0]?.language || '';
 
       // Adaptive sizing (CODEGRAPH_ADAPTIVE_EXPLORE, default on): skeletonize a file
-      // (member signatures, bodies elided) only when it is BOTH off the flow spine
-      // AND a polymorphic sibling — one of many interchangeable impls of a shared
-      // interface (OkHttp's interceptors). The on-spine exemplar + the rest as
-      // signatures convey the chain without N redundant full bodies. DISTINCT
-      // pipeline steps (no shared supertype, e.g. Excalidraw's renderStaticScene)
-      // are NOT siblings, so they keep full source — the lever helps sibling-heavy
-      // flows without starving diffuse ones.
+      // (member signatures, bodies elided) when it is a redundant member of a
+      // polymorphic family. Skeletonize iff ALL hold:
+      //   1. a flow spine exists,
+      //   2. no symbol in the file is on that spine (it's not the mechanism path),
+      //   3. it IS a polymorphic sibling (≥ MIN_SIBLINGS impls of a shared supertype),
+      //   4. it is NOT SPARED, where a file is spared iff the agent NAMED a callable
+      //      in it (`getResponseWithInterceptorChain` → keep RealCall.kt full so the
+      //      agent doesn't Read it back) UNLESS the file also DEFINES the family's
+      //      supertype — a base+subclasses "family" file (Django's compiler.py) is
+      //      huge and Read-anyway, so skeletonizing it FREES budget for the sibling
+      //      files the agent would otherwise Read (it's the cheaper option, proven by
+      //      A/B: sparing compiler.py cost MORE and Read MORE).
+      // Before condition 4, off-spine + sibling alone skeletonized RealCall.kt (it
+      // implements the 9-impl `Lockable` mixin), which the agent then Read back.
+      const namedInFile = group.nodes.some(n => flow.namedNodeIds.has(n.id));
+      const spared = namedInFile && !definesPolymorphicSupertype(group.nodes);
       if (adaptiveExploreEnabled() && flow.pathNodeIds.size > 0
           && !group.nodes.some(n => flow.pathNodeIds.has(n.id))
-          && isPolymorphicSibling(group.nodes)) {
+          && isPolymorphicSibling(group.nodes)
+          && !spared) {
         const syms = group.nodes
           .filter(n => n.kind !== 'import' && n.kind !== 'export' && n.startLine > 0)
           .sort((a, b) => a.startLine - b.startLine);