codegraph_explore sizing (sibling skeletonization)Status: Implemented & validated, default-on, on branch
feat/adaptive-explore-sizing (initial commit d6d059f; refined 2026-05-29
after a real-agent A/B exposed a read-back regression — see
"Refinement" below). Escape hatch: CODEGRAPH_ADAPTIVE_EXPLORE=0.
Motivation: make codegraph_explore size its output to the answer rather
than always filling the budget cap — so a "sibling-heavy" flow (many
interchangeable implementations of one interface) stops costing more than
plain grep/read, without starving "diffuse" flows that genuinely need broad
source.
Refinement (2026-05-29) — the read-back regression. The first cut gated only on off-spine + polymorphic-sibling. A real-agent A/B (not the deterministic probe) showed that this skeletonized two files the agent then Read back, defeating the point: OkHttp's
RealCall(it implements the 9-implLockablemixin, so it tripped the sibling signal even though it's the orchestrator) and Django'scompiler.py(it definesSQLCompilerand co-locates its subclasses). Two conditions fixed it — a file skeletonizes only if it is not spared, where spared = the agent NAMED a callable in it (getResponseWithInterceptorChain,SQLCompiler.execute_sql→ keep it full) UNLESS the file DEFINES a ≥3-impl supertype (a base+subclasses "family" file is huge and Read-anyway, so skeletonizing it frees explore budget for the sibling files the agent would otherwise Read). Result: OkHttp 3% costlier → ~10% cheaper (RealCall full, 0 read-backs); Django 10% costlier → ~10% cheaper (compiler.py skeleton frees ~6.5 KB of the 28 KB budget; half the runs answer with 0 reads). The supertype signal was initially used as a spare — that was backwards and regressed Django to 9% costlier by starving its budget; it is now an override of the named-callable spare. The single-condition history below is kept for context.
codegraph_explore returned full source for every relevant file up to its
char budget. On a question whose answer spans many same-shaped classes — e.g.
"how does OkHttp process a request through its interceptor chain?", which touches
~14 class … : Interceptor implementations — that meant ~28 KB of mostly
redundant full bodies. Because those bodies ride in the context window for
the rest of the session, the WITH-CodeGraph arm cost more than the WITHOUT arm
(which answers the well-named interceptor question in ~10 cheap greps). OkHttp
was the benchmark's cost outlier (−3% — i.e. costlier than native search).
Fix: when a file is both (a) off the synthesized flow spine and (b) a polymorphic sibling, render it as a skeleton (class + member signatures, bodies elided) instead of full source — keeping the on-spine exemplar and the mechanism in full.
: Interceptor impls while keeping RealInterceptorChain (the dispatch
mechanism) and RealCall (the orchestrator the agent named) full → ~10%
cheaper than native, 0 RealCall read-backs (see Refinement for the corrected
numbers; the original 28.5k → 16.6k / "reads 1 vs 3" figures came from a
deterministic probe query, not the agent's real query).compiler.py (a
base+subclasses family file), freeing budget → ~10% cheaper. (The earlier
claim that Django was "byte-identical / 0 skeletons" was an artifact of the
probe query; the agent's real query DOES surface the SQLCompiler family.)handleExplore gathers relevant files, sorts by relevance, and fills up to
maxOutputChars (the "whole-small-file rule" dumps any relevant file ≤220 lines
in full). The budget is a target, not a ceiling:
OkHttp explore (shipped): RealCall (full) + RealInterceptorChain (full)
+ CallServerInterceptor (full, 8.7k)
+ Bridge/Connect/Cache/… (full, ~4-5k each) ← all ~same shape
= ~28k, most of it redundant interceptor bodies
The agent only needs the mechanism (RealInterceptorChain.proceed iterating
the chain) + the contract every interceptor implements + maybe one concrete
example. The other five full bodies are padding — but only because they're
interchangeable. On a diffuse question (Excalidraw's render pipeline:
mutateElement → … → renderStaticScene), the off-spine files are distinct
steps, and their bodies do real work — eliding them just makes the agent
reconstruct them from signatures (more reasoning, net costlier; see "Dead ends").
So the whole game is: tell "interchangeable sibling" apart from "distinct step," cheaply.
A file is skeletonized iff all hold (and CODEGRAPH_ADAPTIVE_EXPLORE != 0):
A spine exists. buildFlowFromNamedSymbols returns its path node set
(pathNodeIds) and the full set of agent-named callables (namedNodeIds). If
no spine forms, nothing skeletonizes.
Off the flow spine. No symbol in the file is on the traced chain — that chain is the mechanism the agent is walking, always kept full.
A polymorphic sibling. The file's class implements/extends a supertype
with ≥ 3 implementers (MIN_SIBLINGS) — the signal that it's one of many
interchangeable impls. From real implements/extends edges, cached.
Not spared. A file is spared (kept full) iff the agent named a
callable in it — a named method/function is something the agent asked to
see (getResponseWithInterceptorChain, SQLCompiler.execute_sql), not an
interchangeable leaf — UNLESS the file itself DEFINES a ≥3-impl supertype.
That last clause is the override: a base+subclasses "family" file (Django's
compiler.py) is huge and Read-anyway, so a full copy just eats explore
budget; skeletonizing it frees that budget for the sibling files the agent
would otherwise Read. So: named ⇒ spare, unless it's a family file ⇒
skeletonize anyway.
Worked through the two repos:
RealInterceptorChain — proceed is on the spine → kept full (cond. 2).RealCall — off-spine, and it trips the sibling signal via the 9-impl
Lockable mixin (not because it's an interchangeable interceptor). But the
agent named getResponseWithInterceptorChain/execute/enqueue in it, and it
defines no ≥3-impl supertype → spared, kept full (cond. 4). This is the fix
for the read-back: before cond. 4 it skeletonized and the agent Read it back.BridgeInterceptor & the other 4 — off-spine, ≥3-impl siblings, named only
by type, define no supertype → skeletonized. The win.compiler.py — off-spine, a sibling (its subclasses extend
SQLCompiler), the agent named execute_sql in it — but it defines the
SQLCompiler supertype, so the override fires → skeletonized (frees
budget). Sparing it instead (the wrong first attempt) cost MORE and Read MORE.The thing that makes OkHttp's interceptors interchangeable is precisely that
they're N implementations of one interface, invoked polymorphically. That is
a structural property the graph records as implements/extends edges:
14 classes ──implements──▶ Interceptor (BridgeInterceptor, CacheInterceptor,
CallServerInterceptor, … )
Excalidraw's renderStaticScene, Scene, Collab share no common
supertype — the ≥3-implementer query returns nothing for them. So the signal
cleanly separates the two repos, and (validated below) leaves every non-sibling
flow untouched.
The ≥ 3 threshold matters: 1:1 "service interface → single impl" pairs (the
common Spring/Java shape) are not siblings and stay full. Only genuine
many-impl families (interceptor chains, strategy/visitor families, codec
registries) trip the gate.
For a skeletonized file we emit the class + member signature lines (not
bodies). Because a symbol node's startLine can point at a decorator/annotation
(@Throws, @Override, @objc), we scan forward up to 4 lines for the line
that actually names the symbol, so the skeleton shows the real signature:
#### …/CallServerInterceptor.kt — CallServerInterceptor, intercept, … · skeleton (signatures only; Read for a full body)
kotlin 30 object CallServerInterceptor : Interceptor { 32 override fun intercept(chain: Interceptor.Chain): Response { 194 private fun shouldIgnoreAndWaitForRealResponse(code: Int): Boolean =
The header still lists the file's symbols and says Read for a full body, so the
agent can pull one specific implementation if it truly needs it.
Headless claude -p, Opus 4.8, WITH vs WITHOUT CodeGraph (the real benchmark
arm, not the on/off probe the first cut used). Cost = median total_cost_usd.
| Repo | WITH→WITHOUT cost | WITH reads | WITHOUT reads | RealCall/compiler read-back |
|---|---|---|---|---|
| OkHttp (n=4) | $0.45 → $0.50 (~10% cheaper) | 2 | 3.5 | 0 / — (RealCall full) |
| Django (n=6) | $0.56 → $0.63 (~10% cheaper) | 2 | 8.5 | half the runs read 0 |
Both were the README's cost outliers (OkHttp 3% costlier, Django 10% costlier) and both flipped to clear wins. OkHttp WITH was cheaper in all 4 runs; Django in 5 of 6 (n=6 to see through its high variance). WITHOUT baselines match the README ($0.50/$0.63 vs $0.57/$0.64), so the gain is the WITH-arm improving.
The decisive check now passes for the right reason: with the named-callable
spare, OkHttp's RealCall stays full and is never Read back (it was Read
back in 3/4 runs before the fix). The inert repos (Excalidraw / Tokio / VS Code /
Gin) stay at 0 skeletons — verified by probe — because the refined gate
skeletonizes a strict subset of the original. (The first cut's "on vs off, reads
flat 1 vs 3" claim came from a deterministic probe query and did not hold for
the agent's real query — that mismatch is what this refinement corrects.)
isLowValuePath to drop
*-testing-support/ fixtures). Improves content quality but not size —
explore refills the freed budget with other full bodies (28,478 → 28,424).
Ranking ≠ shrinking; you must skeletonize to shrink.synthesizedBy:'interface-impl')
for the sibling signal. They were not created for OkHttp's Interceptor
(a Kotlin fun interface), so the signal must come from the real
implements/extends edges, not synth edges.compiler.py,
2,266 lines) is huge and Read-anyway, so keeping it full just eats the 28 KB
explore budget and starves the sibling files the agent then Reads — it
regressed Django to 9% costlier ($0.71). Defining a supertype is instead
an override that lets a named family file skeletonize anyway.probe-explore.mjs "<symbol bag>") and the agent's real explore
query name symbols differently, so they form different spines and skeletonize
different files. The probe said "Django: 0 skeletons / reads flat"; the real
agent query skeletonized compiler.py and Read it back. Always confirm with
a real-agent A/B (run-all.sh), not just the probe.src/mcp/tools.ts
adaptiveExploreEnabled() — the flag (default on).buildFlowFromNamedSymbols() — returns { text, pathNodeIds, namedNodeIds }.
namedNodeIds is every callable the agent named (a superset of the spine) —
the named-callable spare reads it.handleExplore() — two cached helpers: isPolymorphicSibling() (a node has
an outgoing implements/extends to a ≥3-impl supertype) and
definesPolymorphicSupertype() (a node HAS ≥3 incoming implements/extends
— i.e. the file is the family base). The skeleton branch:
off-spine && isPolymorphicSibling && !(namedInFile && !definesSupertype).__tests__/adaptive-explore-sizing.test.ts — 7 cases incl. the named-callable
spare (RealCall) and the supertype-family override (compiler.py).compiler.py is
skeletonized whole, so SQLCompiler.execute_sql (the base mechanism) becomes a
signature too and is Read back in ~half the Django runs. The ideal is to keep
the base class's methods full and elide only the redundant subclass bodies —
shrinking the payload without eliding the answer. Whole-file skeletonization
can't express that yet.query.py (3,040
lines) and sql/query.py are not polymorphic families, so skeletonization
can't touch them; the agent Reads them when the 28 KB clustered view is
insufficient. That's the explore-budget / big-file-clustering frontier, not
skeletonization.HandlerFunc slices, function-pointer
registries) aren't caught — they have no implements/extends edge. Gin's
middleware chain, for instance, doesn't trip the gate (its handlers are funcs,
not interface impls).