|
@@ -0,0 +1,179 @@
|
|
|
|
|
+# Design + status: adaptive `codegraph_explore` sizing (sibling skeletonization)
|
|
|
|
|
+
|
|
|
|
|
+**Status:** Implemented & validated, **default-on**, on branch
|
|
|
|
|
+`feat/adaptive-explore-sizing` (commit `d6d059f`, 2026-05-29). Escape hatch:
|
|
|
|
|
+`CODEGRAPH_ADAPTIVE_EXPLORE=0`.
|
|
|
|
|
+**Motivation:** make `codegraph_explore` size its output to the *answer* rather
|
|
|
|
|
+than always filling the budget cap — so a "sibling-heavy" flow (many
|
|
|
|
|
+interchangeable implementations of one interface) stops costing *more* than
|
|
|
|
|
+plain grep/read, without starving "diffuse" flows that genuinely need broad
|
|
|
|
|
+source.
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## TL;DR
|
|
|
|
|
+
|
|
|
|
|
+`codegraph_explore` returned full source for **every** relevant file up to its
|
|
|
|
|
+char budget. On a question whose answer spans many *same-shaped* classes — e.g.
|
|
|
|
|
+"how does OkHttp process a request through its interceptor chain?", which touches
|
|
|
|
|
+~14 `class … : Interceptor` implementations — that meant ~28 KB of mostly
|
|
|
|
|
+**redundant full bodies**. Because those bodies ride in the context window for
|
|
|
|
|
+the rest of the session, the WITH-CodeGraph arm cost *more* than the WITHOUT arm
|
|
|
|
|
+(which answers the well-named interceptor question in ~10 cheap greps). OkHttp
|
|
|
|
|
+was the benchmark's cost outlier (−3% — i.e. *costlier* than native search).
|
|
|
|
|
+
|
|
|
|
|
+Fix: when a file is **both (a) off the synthesized flow spine and (b) a
|
|
|
|
|
+polymorphic sibling**, render it as a **skeleton** (class + member *signatures*,
|
|
|
|
|
+bodies elided) instead of full source — keeping the on-spine exemplar and the
|
|
|
|
|
+mechanism in full.
|
|
|
|
|
+
|
|
|
|
|
+- **OkHttp:** explore `28.5k → 16.6k` chars; headless A/B median **$0.413 ON vs
|
|
|
|
|
+ $0.462 shipped vs ~$0.57 without-CodeGraph** → flips OkHttp from −3% costlier
|
|
|
|
|
+ to **~28% cheaper than native**, with **reads NOT raised** (median 1 vs 3).
|
|
|
|
|
+- **Excalidraw / Tokio / Django / VS Code / Gin:** explore output is
|
|
|
|
|
+ **byte-identical** with the flag on/off (0 skeletons) → **provably zero
|
|
|
|
|
+ regression**. Their flows have no off-spine ≥3-implementer sibling group.
|
|
|
|
|
+
|
|
|
|
|
+---
|
|
|
|
|
+
|
|
|
|
|
+## The problem in one picture
|
|
|
|
|
+
|
|
|
|
|
+`handleExplore` gathers relevant files, sorts by relevance, and fills up to
|
|
|
|
|
+`maxOutputChars` (the "whole-small-file rule" dumps any relevant file ≤220 lines
|
|
|
|
|
+in full). The budget is a **target**, not a ceiling:
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+OkHttp explore (shipped): RealCall (full) + RealInterceptorChain (full)
|
|
|
|
|
+ + CallServerInterceptor (full, 8.7k)
|
|
|
|
|
+ + Bridge/Connect/Cache/… (full, ~4-5k each) ← all ~same shape
|
|
|
|
|
+ = ~28k, most of it redundant interceptor bodies
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+The agent only needs the **mechanism** (`RealInterceptorChain.proceed` iterating
|
|
|
|
|
+the chain) + the **contract** every interceptor implements + maybe one concrete
|
|
|
|
|
+example. The other five full bodies are padding — but only *because they're
|
|
|
|
|
+interchangeable*. On a diffuse question (Excalidraw's render pipeline:
|
|
|
|
|
+`mutateElement → … → renderStaticScene`), the off-spine files are **distinct
|
|
|
|
|
+steps**, and their bodies do real work — eliding them just makes the agent
|
|
|
|
|
+reconstruct them from signatures (more reasoning, net costlier; see "Dead ends").
|
|
|
|
|
+
|
|
|
|
|
+So the whole game is: **tell "interchangeable sibling" apart from "distinct
|
|
|
|
|
+step," cheaply.**
|
|
|
|
|
+
|
|
|
|
|
+## The two-condition gate
|
|
|
|
|
+
|
|
|
|
|
+A file is skeletonized iff **both** hold (and `CODEGRAPH_ADAPTIVE_EXPLORE != 0`):
|
|
|
|
|
+
|
|
|
|
|
+1. **Off the flow spine.** `buildFlowFromNamedSymbols` now returns its path node
|
|
|
|
|
+ set (`pathNodeIds`) in addition to the rendered Flow text. A file with any
|
|
|
|
|
+ symbol on that traced chain is "on-spine" and always kept full — that's the
|
|
|
|
|
+ mechanism + the exemplar the agent is actually tracing through. (Gated on a
|
|
|
|
|
+ spine existing at all; if there's no spine, nothing skeletonizes.)
|
|
|
|
|
+
|
|
|
|
|
+2. **A polymorphic sibling.** The file's class `implements`/`extends` a supertype
|
|
|
|
|
+ that has **≥ 3 implementers** (`MIN_SIBLINGS`). This is the signal that the
|
|
|
|
|
+ class is one of many *interchangeable* implementations rather than a unique
|
|
|
|
|
+ step. Computed from real `implements`/`extends` edges (see "Why this signal"),
|
|
|
|
|
+ cached per-supertype so it stays a handful of edge lookups.
|
|
|
|
|
+
|
|
|
|
|
+`RealInterceptorChain` *also* implements `Interceptor`, but its `proceed` is
|
|
|
|
|
+**on the spine** → kept full (condition 1 fails). `RealCall` is off-spine but
|
|
|
|
|
+implements nothing with ≥3 impls → kept full (condition 2 fails). The other
|
|
|
|
|
+interceptors are off-spine **and** ≥3-impl siblings → skeletonized. Exactly right.
|
|
|
|
|
+
|
|
|
|
|
+## Why "shared supertype with ≥3 implementers" is the signal
|
|
|
|
|
+
|
|
|
|
|
+The thing that makes OkHttp's interceptors interchangeable is precisely that
|
|
|
|
|
+they're **N implementations of one interface**, invoked polymorphically. That is
|
|
|
|
|
+a *structural* property the graph records as `implements`/`extends` edges:
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+14 classes ──implements──▶ Interceptor (BridgeInterceptor, CacheInterceptor,
|
|
|
|
|
+ CallServerInterceptor, … )
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Excalidraw's `renderStaticScene`, `Scene`, `Collab` share **no** common
|
|
|
|
|
+supertype — the ≥3-implementer query returns nothing for them. So the signal
|
|
|
|
|
+cleanly separates the two repos, and (validated below) leaves every non-sibling
|
|
|
|
|
+flow untouched.
|
|
|
|
|
+
|
|
|
|
|
+The `≥ 3` threshold matters: 1:1 "service interface → single impl" pairs (the
|
|
|
|
|
+common Spring/Java shape) are **not** siblings and stay full. Only genuine
|
|
|
|
|
+many-impl families (interceptor chains, strategy/visitor families, codec
|
|
|
|
|
+registries) trip the gate.
|
|
|
|
|
+
|
|
|
|
|
+## Skeleton rendering
|
|
|
|
|
+
|
|
|
|
|
+For a skeletonized file we emit the class + member **signature lines** (not
|
|
|
|
|
+bodies). Because a symbol node's `startLine` can point at a decorator/annotation
|
|
|
|
|
+(`@Throws`, `@Override`, `@objc`), we scan forward up to 4 lines for the line
|
|
|
|
|
+that actually *names* the symbol, so the skeleton shows the real signature:
|
|
|
|
|
+
|
|
|
|
|
+```
|
|
|
|
|
+#### …/CallServerInterceptor.kt — CallServerInterceptor, intercept, … · skeleton (signatures only; Read for a full body)
|
|
|
|
|
+```kotlin
|
|
|
|
|
+30 object CallServerInterceptor : Interceptor {
|
|
|
|
|
+32 override fun intercept(chain: Interceptor.Chain): Response {
|
|
|
|
|
+194 private fun shouldIgnoreAndWaitForRealResponse(code: Int): Boolean =
|
|
|
|
|
+```
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+The header still lists the file's symbols and says `Read for a full body`, so the
|
|
|
|
|
+agent can pull one specific implementation if it truly needs it.
|
|
|
|
|
+
|
|
|
|
|
+## Validation
|
|
|
|
|
+
|
|
|
|
|
+Headless `claude -p`, Opus 4.8, median of 3, WITH-CodeGraph adaptive **on vs off**
|
|
|
|
|
+(isolates the flag). Probe sizes from `scripts/agent-eval/probe-explore.mjs`.
|
|
|
|
|
+
|
|
|
|
|
+| Repo | explore OFF→ON | skeletons | A/B cost (ON vs shipped) | reads |
|
|
|
|
|
+|---|---|---|---|---|
|
|
|
|
|
+| **OkHttp** | 28.5k → **16.6k** | 6 | **$0.413 vs $0.462** (~28% < native's $0.57) | flat (1 vs 3) |
|
|
|
|
|
+| Excalidraw | 28.6k → 28.6k | 0 | byte-identical → neutral | — |
|
|
|
|
|
+| Tokio | identical | 0 | neutral | — |
|
|
|
|
|
+| Django | identical | 0 | neutral | — |
|
|
|
|
|
+| VS Code | identical | 0 | neutral | — |
|
|
|
|
|
+| Gin | identical | 0 | neutral | — |
|
|
|
|
|
+
|
|
|
|
|
+The decisive check (the open risk of skeletonization) **passed**: skeletonizing
|
|
|
|
|
+the off-spine interceptors did **not** push the agent to Read them back — reads
|
|
|
|
|
+stayed flat (lower, if anything). And the 5 non-sibling repos are byte-identical
|
|
|
|
|
+with the flag toggled, so default-on carries no regression for them.
|
|
|
|
|
+
|
|
|
|
|
+## Dead ends (don't re-attempt these)
|
|
|
|
|
+
|
|
|
|
|
+1. **Demote/rank low-value files** (e.g. broaden `isLowValuePath` to drop
|
|
|
|
|
+ `*-testing-support/` fixtures). Improves *content quality* but **not size** —
|
|
|
|
|
+ explore refills the freed budget with other full bodies (28,478 → 28,424).
|
|
|
|
|
+ Ranking ≠ shrinking; you must *skeletonize* to shrink.
|
|
|
|
|
+2. **Gate on entry-node membership.** A precise symbol-bag explore query *names*
|
|
|
|
|
+ every chain participant, so they're all "entry nodes" — no separation, nothing
|
|
|
|
|
+ skeletonizes.
|
|
|
|
|
+3. **Rely on interface-impl synthesizer edges** (`synthesizedBy:'interface-impl'`)
|
|
|
|
|
+ for the sibling signal. They were **not** created for OkHttp's `Interceptor`
|
|
|
|
|
+ (a Kotlin `fun interface`), so the signal must come from the real
|
|
|
|
|
+ `implements`/`extends` edges, not synth edges.
|
|
|
|
|
+4. **A plain "core-floor" gate** (keep first N full, skeletonize the rest) —
|
|
|
|
|
+ skeletonized Excalidraw's *distinct* steps → **+17% cost regression**. The
|
|
|
|
|
+ sibling condition is what makes it safe.
|
|
|
|
|
+
|
|
|
|
|
+## Code
|
|
|
|
|
+
|
|
|
|
|
+- `src/mcp/tools.ts`
|
|
|
|
|
+ - `adaptiveExploreEnabled()` — the flag (default on).
|
|
|
|
|
+ - `buildFlowFromNamedSymbols()` — now returns `{ text, pathNodeIds }`.
|
|
|
|
|
+ - `handleExplore()` — `isPolymorphicSibling()` helper (supertype ≥3-impl
|
|
|
|
|
+ detection, cached) + the skeleton branch in the source-section loop.
|
|
|
|
|
+
|
|
|
|
|
+## Frontier / future work
|
|
|
|
|
+
|
|
|
|
|
+- **No regression test yet** for the skeletonization (a fixture with ≥3 interface
|
|
|
|
|
+ impls + a flow spine asserting off-spine siblings skeletonize, distinct steps
|
|
|
|
|
+ stay full, `=0` disables). Recommended before/with merge.
|
|
|
|
|
+- **Non-interface sibling families** (Go `HandlerFunc` slices, function-pointer
|
|
|
|
|
+ registries) aren't caught — they have no `implements`/`extends` edge. Gin's
|
|
|
|
|
+ middleware chain, for instance, doesn't trip the gate (its handlers are funcs,
|
|
|
|
|
+ not interface impls).
|
|
|
|
|
+- **Exemplar selection** when *no* interceptor is on the spine: today all siblings
|
|
|
|
|
+ skeletonize and the agent leans on the interface contract; showing one as a
|
|
|
|
|
+ forced exemplar might read slightly better (untested).
|