adaptive-explore-sizing.md 8.9 KB

Design + status: adaptive codegraph_explore sizing (sibling skeletonization)

Status: Implemented & validated, default-on, on branch feat/adaptive-explore-sizing (commit d6d059f, 2026-05-29). Escape hatch: CODEGRAPH_ADAPTIVE_EXPLORE=0. Motivation: make codegraph_explore size its output to the answer rather than always filling the budget cap — so a "sibling-heavy" flow (many interchangeable implementations of one interface) stops costing more than plain grep/read, without starving "diffuse" flows that genuinely need broad source.


TL;DR

codegraph_explore returned full source for every relevant file up to its char budget. On a question whose answer spans many same-shaped classes — e.g. "how does OkHttp process a request through its interceptor chain?", which touches ~14 class … : Interceptor implementations — that meant ~28 KB of mostly redundant full bodies. Because those bodies ride in the context window for the rest of the session, the WITH-CodeGraph arm cost more than the WITHOUT arm (which answers the well-named interceptor question in ~10 cheap greps). OkHttp was the benchmark's cost outlier (−3% — i.e. costlier than native search).

Fix: when a file is both (a) off the synthesized flow spine and (b) a polymorphic sibling, render it as a skeleton (class + member signatures, bodies elided) instead of full source — keeping the on-spine exemplar and the mechanism in full.

  • OkHttp: explore 28.5k → 16.6k chars; headless A/B median $0.413 ON vs $0.462 shipped vs ~$0.57 without-CodeGraph → flips OkHttp from −3% costlier to ~28% cheaper than native, with reads NOT raised (median 1 vs 3).
  • Excalidraw / Tokio / Django / VS Code / Gin: explore output is byte-identical with the flag on/off (0 skeletons) → provably zero regression. Their flows have no off-spine ≥3-implementer sibling group.

The problem in one picture

handleExplore gathers relevant files, sorts by relevance, and fills up to maxOutputChars (the "whole-small-file rule" dumps any relevant file ≤220 lines in full). The budget is a target, not a ceiling:

OkHttp explore (shipped):  RealCall (full) + RealInterceptorChain (full)
                         + CallServerInterceptor (full, 8.7k)
                         + Bridge/Connect/Cache/… (full, ~4-5k each)   ← all ~same shape
                         = ~28k, most of it redundant interceptor bodies

The agent only needs the mechanism (RealInterceptorChain.proceed iterating the chain) + the contract every interceptor implements + maybe one concrete example. The other five full bodies are padding — but only because they're interchangeable. On a diffuse question (Excalidraw's render pipeline: mutateElement → … → renderStaticScene), the off-spine files are distinct steps, and their bodies do real work — eliding them just makes the agent reconstruct them from signatures (more reasoning, net costlier; see "Dead ends").

So the whole game is: tell "interchangeable sibling" apart from "distinct step," cheaply.

The two-condition gate

A file is skeletonized iff both hold (and CODEGRAPH_ADAPTIVE_EXPLORE != 0):

  1. Off the flow spine. buildFlowFromNamedSymbols now returns its path node set (pathNodeIds) in addition to the rendered Flow text. A file with any symbol on that traced chain is "on-spine" and always kept full — that's the mechanism + the exemplar the agent is actually tracing through. (Gated on a spine existing at all; if there's no spine, nothing skeletonizes.)

  2. A polymorphic sibling. The file's class implements/extends a supertype that has ≥ 3 implementers (MIN_SIBLINGS). This is the signal that the class is one of many interchangeable implementations rather than a unique step. Computed from real implements/extends edges (see "Why this signal"), cached per-supertype so it stays a handful of edge lookups.

RealInterceptorChain also implements Interceptor, but its proceed is on the spine → kept full (condition 1 fails). RealCall is off-spine but implements nothing with ≥3 impls → kept full (condition 2 fails). The other interceptors are off-spine and ≥3-impl siblings → skeletonized. Exactly right.

Why "shared supertype with ≥3 implementers" is the signal

The thing that makes OkHttp's interceptors interchangeable is precisely that they're N implementations of one interface, invoked polymorphically. That is a structural property the graph records as implements/extends edges:

14 classes ──implements──▶ Interceptor      (BridgeInterceptor, CacheInterceptor,
                                              CallServerInterceptor, … )

Excalidraw's renderStaticScene, Scene, Collab share no common supertype — the ≥3-implementer query returns nothing for them. So the signal cleanly separates the two repos, and (validated below) leaves every non-sibling flow untouched.

The ≥ 3 threshold matters: 1:1 "service interface → single impl" pairs (the common Spring/Java shape) are not siblings and stay full. Only genuine many-impl families (interceptor chains, strategy/visitor families, codec registries) trip the gate.

Skeleton rendering

For a skeletonized file we emit the class + member signature lines (not bodies). Because a symbol node's startLine can point at a decorator/annotation (@Throws, @Override, @objc), we scan forward up to 4 lines for the line that actually names the symbol, so the skeleton shows the real signature:

#### …/CallServerInterceptor.kt — CallServerInterceptor, intercept, … · skeleton (signatures only; Read for a full body)

kotlin 30 object CallServerInterceptor : Interceptor { 32 override fun intercept(chain: Interceptor.Chain): Response { 194 private fun shouldIgnoreAndWaitForRealResponse(code: Int): Boolean =

The header still lists the file's symbols and says Read for a full body, so the agent can pull one specific implementation if it truly needs it.

Validation

Headless claude -p, Opus 4.8, median of 3, WITH-CodeGraph adaptive on vs off (isolates the flag). Probe sizes from scripts/agent-eval/probe-explore.mjs.

Repo explore OFF→ON skeletons A/B cost (ON vs shipped) reads
OkHttp 28.5k → 16.6k 6 $0.413 vs $0.462 (~28% < native's $0.57) flat (1 vs 3)
Excalidraw 28.6k → 28.6k 0 byte-identical → neutral
Tokio identical 0 neutral
Django identical 0 neutral
VS Code identical 0 neutral
Gin identical 0 neutral

The decisive check (the open risk of skeletonization) passed: skeletonizing the off-spine interceptors did not push the agent to Read them back — reads stayed flat (lower, if anything). And the 5 non-sibling repos are byte-identical with the flag toggled, so default-on carries no regression for them.

Dead ends (don't re-attempt these)

  1. Demote/rank low-value files (e.g. broaden isLowValuePath to drop *-testing-support/ fixtures). Improves content quality but not size — explore refills the freed budget with other full bodies (28,478 → 28,424). Ranking ≠ shrinking; you must skeletonize to shrink.
  2. Gate on entry-node membership. A precise symbol-bag explore query names every chain participant, so they're all "entry nodes" — no separation, nothing skeletonizes.
  3. Rely on interface-impl synthesizer edges (synthesizedBy:'interface-impl') for the sibling signal. They were not created for OkHttp's Interceptor (a Kotlin fun interface), so the signal must come from the real implements/extends edges, not synth edges.
  4. A plain "core-floor" gate (keep first N full, skeletonize the rest) — skeletonized Excalidraw's distinct steps → +17% cost regression. The sibling condition is what makes it safe.

Code

  • src/mcp/tools.ts
    • adaptiveExploreEnabled() — the flag (default on).
    • buildFlowFromNamedSymbols() — now returns { text, pathNodeIds }.
    • handleExplore()isPolymorphicSibling() helper (supertype ≥3-impl detection, cached) + the skeleton branch in the source-section loop.

Frontier / future work

  • No regression test yet for the skeletonization (a fixture with ≥3 interface impls + a flow spine asserting off-spine siblings skeletonize, distinct steps stay full, =0 disables). Recommended before/with merge.
  • Non-interface sibling families (Go HandlerFunc slices, function-pointer registries) aren't caught — they have no implements/extends edge. Gin's middleware chain, for instance, doesn't trip the gate (its handlers are funcs, not interface impls).
  • Exemplar selection when no interceptor is on the spine: today all siblings skeletonize and the agent leans on the interface contract; showing one as a forced exemplar might read slightly better (untested).