Selaa lähdekoodia

feat(mcp): codegraph_node reads files like the Read tool — offset/limit, byte-parity (#738)

Makes codegraph_node a drop-in faster Read for indexed source files (file-read mode: <n>\t<line> like Read, offset/limit, + blast-radius header; symbolsOnly for the map). Fixes the old file-view dropping imports/line-numbers. #383/#527 preserved. Validated by A/B: explore/node already return source + line numbers, so Read=0 when used. Includes the A/B eval harness scripts. Full suite green (1270).
Colby Mchenry 2 viikkoa sitten
vanhempi
sitoutus
1983590533

+ 1 - 1
CHANGELOG.md

@@ -16,7 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### New Features
 
-- The `codegraph_node` MCP tool now accepts a file path on its own (no symbol) and returns that file's symbols plus which files depend on it — and the full source with `includeCode`. It's a drop-in upgrade for reading a source file: the same content, plus the file's blast radius, in one call. The agent-facing guidance was also retuned so assistants reach for codegraph while *implementing* a change (not only when answering questions), since one codegraph call returns more accurate context for fewer tokens than re-reading files.
+- The `codegraph_node` MCP tool can now **read a whole source file like the built-in Read tool — only faster, served from the index**. Pass a file path with no symbol and it returns that file's current source with line numbers (the same `<n>⇥<line>` shape Read produces, so an assistant can edit straight from it), narrowable with `offset`/`limit` exactly like Read, plus a one-line note of which files depend on it (the file's blast radius). Use it anywhere you'd reach for Read on an indexed source file. Pass `symbolsOnly: true` for just the file's structure. Configuration/data files (`.yml` / `.properties`) are summarized by key only, never dumped, so secrets in them are never surfaced. The agent-facing guidance was also retuned so assistants reach for codegraph while *implementing* a change (not only when answering questions), since one codegraph call returns the same bytes plus the blast radius, faster than re-reading the file.
 - New `codegraph upgrade` command updates CodeGraph to the latest release in place — it detects how you installed (the standalone `install.sh` / `install.ps1` bundle, npm, or npx) and does the right thing for each, on macOS, Linux, and Windows. Use `codegraph upgrade --check` to see whether an update is available without installing, or `codegraph upgrade <version>` to move to a specific version. After upgrading it reminds you to re-index your projects so they pick up the newer engine's improvements. (#679)
 - `codegraph status` now flags when a project's index was built by an older engine than the one you're running and recommends re-indexing (also surfaced in `codegraph status --json`), so you know when a `codegraph index -f` or `codegraph sync` will add coverage a newer release introduced.
 - Cross-file impact and blast-radius coverage now spans **all 22 supported languages and 14 web frameworks**, each validated on a real-world repo — see the new coverage table in the README. This release ships the cross-file resolution behind it, including Lua and Luau `require`, Shopify OS 2.0 Liquid section templates, Delphi form code-behind, Rust cross-module calls and Rocket route macros, Swift Fluent relationships, and the SvelteKit / Nuxt / Vapor / Axum route conventions. The residual everywhere is genuine static-analysis frontiers (runtime dispatch, reflection / DI, framework-convention entry points), never hidden.

+ 66 - 17
__tests__/node-file-view.test.ts

@@ -1,7 +1,9 @@
 /**
- * codegraph_node FILE-VIEW mode: a bare `file` (no `symbol`) returns that file's
- * symbol map + graph role (dependents), and verbatim bodies with includeCode —
- * a Read replacement for a source file that also surfaces the blast radius.
+ * codegraph_node FILE READ mode: a `file` with no `symbol` reads that file like
+ * the Read tool — current source with `<n>\t<line>` numbering (byte-for-byte
+ * Read's shape), narrowable with offset/limit — plus a one-line blast-radius
+ * header. `symbolsOnly` returns the structural map instead. Config/data files
+ * are summarized by key, never dumped (#383).
  */
 import { describe, it, expect, beforeEach, afterEach } from 'vitest';
 import * as fs from 'fs';
@@ -24,9 +26,23 @@ describe('codegraph_node file-view (Read replacement)', () => {
     );
     fs.writeFileSync(
       path.join(dir, 'src', 'b.ts'),
-      "import { helper } from './a';\nexport function useHelper() { return helper(2); }\n",
+      "import { helper } from './a';\n\n// a comment between symbols\nconst SETTING = 7;\nexport function useHelper() { return helper(2) + SETTING; }\n",
     );
-    cg = CodeGraph.initSync(dir, { config: { include: ['**/*.ts'], exclude: [] } });
+    // A config/data file (#383): its values may be secrets and must never be
+    // dumped verbatim by the file-view.
+    fs.writeFileSync(
+      path.join(dir, 'src', 'application.properties'),
+      'spring.datasource.password=SUPERSECRET123\nserver.port=8080\n',
+    );
+    // A large file: exceeds the file-view line budget, so it must be windowed
+    // honestly (not silently truncated).
+    fs.writeFileSync(
+      path.join(dir, 'src', 'big.ts'),
+      'export function big() {\n' +
+        Array.from({ length: 2000 }, (_, i) => `  const v${i} = ${i};`).join('\n') +
+        '\n  return 0;\n}\n',
+    );
+    cg = CodeGraph.initSync(dir, { config: { include: ['**/*.ts', '**/*.properties'], exclude: [] } });
     await cg.indexAll();
     h = new ToolHandler(cg);
   });
@@ -39,21 +55,54 @@ describe('codegraph_node file-view (Read replacement)', () => {
   const text = async (args: Record<string, unknown>): Promise<string> =>
     (await h.execute('codegraph_node', args)).content.map((c) => c.text).join('\n');
 
-  it("a bare file (no symbol) returns the file's symbols + dependents", async () => {
+  it('reads a whole file like Read by default — `<n>\\t<line>` lines (no pad), imports + gaps included', async () => {
+    const out = await text({ file: 'b.ts' }); // no includeCode needed — content is the default
+    // Byte-for-byte Read shape: line 1 is "1<TAB>import …", NOT space-padded.
+    expect(out).toMatch(/^1\timport \{ helper \} from '\.\/a';$/m);
+    expect(out).toContain('// a comment between symbols'); // inter-symbol gap (Read has it; old reconstruction dropped it)
+    expect(out).toContain('const SETTING = 7'); // top-level statement
+    expect(out).toContain('useHelper'); // the symbol body too
+    expect(out).not.toContain('```'); // Read has no code fence; neither do we
+  });
+
+  it('leads with a one-line blast-radius header (the value-add over Read)', async () => {
     const out = await text({ file: 'a.ts' });
-    expect(out).toContain('src/a.ts');
-    expect(out).toContain('helper');
-    expect(out).toContain('Widget');
-    expect(out).toMatch(/depended on by 1 file/i);
-    expect(out).toContain('src/b.ts'); // the dependent file (blast radius)
+    expect(out).toMatch(/used by 1 file: src\/b\.ts/); // a.ts is imported by b.ts
+    expect(out).toContain('return x + 1'); // still returns the source
+  });
+
+  it('offset/limit narrow the window exactly like Read', async () => {
+    const out = await text({ file: 'big.ts', offset: 1000, limit: 3 });
+    // Window starts at the requested line, numbered exactly: "1000<TAB>  const v998 = 998;"
+    expect(out).toMatch(/^1000\t {2}const v998 = 998;$/m);
+    expect(out).not.toMatch(/^1\t/m); // line 1 is NOT shown
+    expect(out).toMatch(/lines 1000[–-]1002 of \d+/); // honest pagination note
+  });
+
+  it('an offset past EOF is reported, not a crash', async () => {
+    const out = await text({ file: 'a.ts', offset: 9999 });
+    expect(out).toMatch(/past the end/i);
+  });
+
+  it('paginates a large file honestly by default — "lines 1–N of TOTAL", never a silent truncate', async () => {
+    const out = await text({ file: 'big.ts' });
+    expect(out).toMatch(/lines 1[–-]\d+ of \d+/); // explicit window note
+    expect(out).not.toContain('(output truncated)'); // not the generic 15k chop
+    expect(out).toMatch(/^1\texport function big/m); // the head of the window is real source
   });
 
-  it('resolves by basename and returns verbatim bodies with includeCode', async () => {
-    const out = await text({ file: 'a.ts', includeCode: true });
-    expect(out).toContain('return x + 1'); // helper body
-    expect(out).toContain('class Widget'); // class body, verbatim
-    // It must NOT steer the agent back to Read — it is the Read replacement.
-    expect(out.toLowerCase()).not.toContain('read `src/a.ts`');
+  it('does NOT dump a config/data file (yaml/properties) — #383 secret safety', async () => {
+    const out = await text({ file: 'application.properties' });
+    expect(out).not.toContain('SUPERSECRET123'); // the value never reaches the agent
+    expect(out.toLowerCase()).toMatch(/config|values withheld/);
+  });
+
+  it('symbolsOnly returns the structural map, not the source', async () => {
+    const out = await text({ file: 'a.ts', symbolsOnly: true });
+    expect(out).toContain('### Symbols');
+    expect(out).toContain('helper');
+    expect(out).toContain('Widget');
+    expect(out).not.toContain('return x + 1'); // bodies are NOT included in the map
   });
 
   it('still works as a normal symbol lookup (no regression)', async () => {

+ 136 - 0
docs/design/agent-codegraph-adoption.md

@@ -0,0 +1,136 @@
+# Getting agents to actually use codegraph (not Read) — design notes & handoff
+
+> Working doc for a fresh session. Two problems to crack:
+> **(P1)** agents still reach for `Read`/`grep` during implementation instead of codegraph;
+> **(P2)** on startup the codegraph MCP server can be `pending` when the agent's first turn fires, so the agent runs with *no* codegraph at all.
+>
+> Read `codegraph/CLAUDE.md` → "Retrieval performance & dynamic-dispatch coverage" first — it's the doctrine these ideas must respect.
+
+---
+
+## Context — what already shipped (so you don't repeat it)
+
+- **#733 (`7175dc4`)** — reframed the agent-facing steering (`src/mcp/server-instructions.ts` + the `codegraph_node`/`codegraph_explore` descriptions in `src/mcp/tools.ts`) to cover *implementation*, not just Q&A; and added **file-view mode**: `codegraph_node` now accepts a bare `file` (no `symbol`) → returns that file's symbol map + its dependents (blast radius) + verbatim bodies (`includeCode`). `handleFileView` in `src/mcp/tools.ts`.
+- **Clean A/B result** (new build vs baseline build, both codegraph-connected, same fully-implemented task — `kindExclude` added to `codegraph_search`):
+  - **baseline:** 0 codegraph calls, 8 Reads (agent *ignored* available codegraph).
+  - **new:** 2 `codegraph_explore` calls, 5 Reads.
+  - So the reframe *did* move tool-choice — but the agent used `codegraph_explore`, **never the file-view**, and still Read 5×. n=1/arm.
+- **Eval harness fix** (`#735`): nested attach is a *startup-latency* problem, not a hard block. `scripts/agent-eval/ab-new-vs-baseline.sh` now pre-warms a daemon + skips the re-exec; use it (run non-nested for cleanest results).
+
+**Doctrine constraints (from CLAUDE.md — do not relitigate):**
+- *Adapt the tool to the agent.* Changing tool descriptions / `server-instructions.ts` is **low-salience** and has *regressed* wall-clock before. Wording alone won't reliably move tool-choice.
+- *New tools fare worse than extending an existing one* (the agent under-picks even `trace`; `codegraph_context` was removed).
+- The real levers that landed historically: **coverage** (more flows connect statically → `explore` surfaces them) and **sufficiency** (output complete enough that the agent *stops* reading).
+- The optimization target is **wall-clock + tool-call count + Read=0**, not token cost (cost is lower as a side effect).
+
+---
+
+## P1 — Agents under-use codegraph during implementation
+
+### STATUS — 2026-06-08 (RESOLVED via Read-parity, not a hook)
+
+**The fix: make `codegraph_node` read a file *exactly like the Read tool*, only
+faster — so the agent reaches for it naturally. No forcing.** The owner's steer
+settled the direction: *"codegraph should be able to Read just like the Read
+tool… make it as good as Read. Read is slow and old; querying the index is fast.
+You keep diverging away from using codegraph rather than pursuing the fix."*
+
+**DONE — `handleFileView` (`src/mcp/tools.ts`) is now full Read parity:**
+- A `file` with no `symbol` returns the file's current source numbered
+  **byte-for-byte the way Read does — `<n>\t<line>`, no padding, trailing empty
+  line kept** (verified by reading the same file with both and diffing). The only
+  addition is a **one-line blast-radius header** (`used by N files: …`).
+- **`offset` / `limit` mean exactly what they do on Read** (1-based start; max
+  lines; default whole file capped at 2000 lines like Read). Large files paginate
+  honestly (`(lines X–Y of N — pass offset/limit…)`), never the 15k `truncateOutput` chop.
+- Content is the **default** (no `includeCode` needed); `symbolsOnly: true` returns
+  the cheap structural map instead. Security preserved: `yaml`/`properties`
+  summarized by key, never dumped (#383); reads via `validatePathWithinRoot` (#527).
+- Tests: `__tests__/node-file-view.test.ts` (9, incl. strict format parity
+  `^1000\t  const v998 = 998;` and unpadded `^1\timport …`). Full suite green
+  (1270). Descriptions / `server-instructions.ts` / CHANGELOG reframed: "read a
+  source file with codegraph_node instead of Read — same bytes, faster."
+
+**The hook (idea 1) — A/B'd and REJECTED. Do not ship.** Kept only as an eval
+artifact (`scripts/agent-eval/redirect-read-hook.sh` + `ab-hook.sh`).
+- Clean A/B (2 runs/arm, devpit "add `dp ping`, build it"; both arms codegraph-attached):
+  - **nohook:** 0 codegraph calls, 1 Read, **5–7 tool calls, 6–8 turns, 55–77s.** (Reproduces P1: agent ignores codegraph — but read-once-and-edit is *efficient* here.)
+  - **hook (deny-redirect):** 0 *successful* Reads + 1 file-view call (parity worked, edit compiled), but **8–9 tool calls, 9–10 turns, 200–239s**, and the agent **fought the deny** — `ToolSearch` to find the tool, reflexive re-Read (denied), then **`Bash python3` to read the file around the block.**
+  - Verdict: a blanket Read-deny **regresses the target metrics (~2× tool calls, more turns) on a simple edit** and the agent routes around it. Forcing is the wrong lever; making the tool genuinely better than Read is the right one.
+- If routing is ever revisited: not a blanket hook. Either a narrow trigger (large
+  files only / after-N-reads) **with a clean A/B on a Read-heavy multi-file task**
+  (the hook's best case, untested), or just keep widening coverage + sufficiency.
+
+---
+
+**Symptom:** even with codegraph attached + the new steering, the agent reflexively `Read`s/`grep`s mid-implementation, and never reaches for the file-view. Descriptions can't fix this (low-salience wall).
+
+### Ideas, ranked by expected leverage
+
+1. **PreToolUse(Read/Grep) hook that redirects to codegraph** — *highest leverage; the only channel that actually changes behavior.*
+   - Claude Code **hooks** can intercept a tool call and inject context or block it — unlike descriptions, this is *not* low-salience. We already have `scripts/agent-eval/block-read-hook.sh` + `hook-settings.json` (used to force Read=0 in evals).
+   - Ship a **recommended (opt-in) hook**: on `Read` (or `Grep`) of a path that's *indexed*, inject "this file is indexed — `codegraph_node {file}` returns it + its blast radius for fewer tokens; treat its output as already-Read." Soft nudge (don't hard-block, or it'll frustrate users on configs/docs codegraph doesn't index).
+   - The installer (`src/installer/targets/claude.ts`) could offer to add this hook (opt-in, like the auto-allow permissions).
+   - **Validate** with `ab-new-vs-baseline.sh` (Read count, with vs without the hook). This is the experiment most likely to move the needle.
+   - Open Qs: how to know a path is indexed from inside a hook (query `codegraph files`/`status`, or a fast local check against `.codegraph`); avoiding noise on non-indexed files; per-language false positives.
+
+2. **Sufficiency: make the file-view the obvious Read replacement so the agent *wants* it.**
+   - The A/B showed the agent never passed a `file` to `codegraph_node`. Why? It doesn't think "Read this file" → "codegraph_node file=X". Investigate: is the file-view's value (symbols + dependents + bodies) actually *better than Read* for the agent's next step (an `Edit`)? It returns bodies — but does it return enough surrounding context to `Edit` confidently? If not, the agent Reads anyway.
+   - Consider: when the agent *does* Read an indexed file, is there a way to make codegraph's prior `explore`/`node` output have *already* given it what it needed? (i.e. fix the upstream sufficiency, not the Read itself.)
+
+3. **Coverage — the durable lever.** Every flow that connects statically is one the agent doesn't Read to reconstruct. Keep closing dynamic-dispatch gaps (`src/resolution/`). Less about "stop Reading," more about "never need to."
+
+4. **Naming / affordance experiments (low confidence, cheap).** The file-view is buried inside `codegraph_node`. A dedicated, obviously-named affordance might get picked more — *but* "new tools fare worse," so this likely loses. If tried, A/B it; don't assume.
+
+**Recommendation:** prototype **idea 1 (the Read-redirect hook)** and A/B it. It's the one lever with a real chance of moving behavior. Everything else is incremental.
+
+---
+
+## P2 — Agent runs without codegraph because the server is `pending` at startup
+
+**Symptom:** `serve --mcp` isn't ready when the agent's first turn fires (the host marks the MCP server `status:"pending"` / 0 tools), so the agent starts Read/grep and never uses codegraph. We saw this hard in nested evals (~2-3s startup vs the agent's turn-1); **real users hit a milder version** — the first query of a session may not have codegraph.
+
+### Root cause
+`serve --mcp` does a `--liftoff-only` **re-exec** (for a node memory flag) **and** spawns/binds a detached **daemon** before tools are usable. Under load that exceeds the host's MCP-startup window. (`CODEGRAPH_WASM_RELAUNCHED=1` skips the re-exec; pre-warming a daemon removes the bind latency — both proven in `ab-new-vs-baseline.sh`. But a real user can't pre-warm.)
+
+### Ideas, ranked
+
+1. **CODEGRAPH-SIDE — expose the static tool list INSTANTLY, decoupled from the daemon. *Biggest shippable win; helps every user.***
+   - Hypothesis: the host marks codegraph `pending` because `tools/list` (tool exposure) waits on the daemon connect. The local handshake already answers `initialize` fast (~107ms; `runLocalHandshakeProxy` in `src/mcp/proxy.ts`, `getStaticTools` is imported there). **Investigate: does `serve --mcp` answer `tools/list` *locally and instantly* from `getStaticTools`, or does it forward it to the still-connecting daemon?** If the latter, decouple it: advertise the static tools the moment the client asks, mark connected, and resolve the daemon in the background for actual tool *calls*.
+   - Verify with: `printf '<initialize>\n<initialized>\n<tools/list>\n' | node dist/bin/codegraph.js serve --mcp --path <repo>` and time the `tools/list` response, daemon-mode vs in-process. In-process answered in ~165ms; daemon-mode is the suspect.
+   - If this lands, `pending`-at-startup largely disappears without any host change.
+
+2. **CODEGRAPH-SIDE — speed/skip the re-exec on the MCP serve path.** The re-exec exists for a V8 memory flag (`src/extraction/wasm-runtime-flags.ts`, `RELAUNCH_GUARD_ENV = CODEGRAPH_WASM_RELAUNCHED`). For MCP serving on a normal repo the flag may be unnecessary, or settable without a full process re-exec. Removing one process spawn from the cold path shaves the startup window.
+
+3. **CODEGRAPH-SIDE — a SessionStart hook that pre-warms the daemon.** Ship an opt-in Claude Code `SessionStart` hook (installer-added) that spawns/warms the daemon for the project at session start, so it's bound before the first query. Mitigation if (1) is hard.
+
+4. **HOST-SIDE — "wait/retry on pending" — this is what you asked about, but it's a Claude Code (MCP client) behavior, not codegraph's to fix.** codegraph can't make the agent retry. Options: (a) raise it with Anthropic as an MCP-client improvement (don't let the agent's first turn proceed until configured MCP servers finish connecting, or retry `pending` servers); (b) note `MCP_TIMEOUT` exists but did **not** help here, because the problem is *tool exposure timing*, not a connection timeout. Frame this as a request, and lean on (1)–(3) for what we control.
+
+**Recommendation:** chase **idea 1** (decouple `tools/list` from the daemon). It's the fix that makes codegraph "connected" instantly for everyone. Ship **idea 3** (pre-warm SessionStart hook) as a cheap mitigation in parallel. File the host-side request (4) but don't depend on it.
+
+---
+
+## Key files / pointers
+
+- **Steering / tools:** `src/mcp/server-instructions.ts` (the `initialize` instructions — single source of truth), `src/mcp/tools.ts` (tool descriptions + handlers; `handleNode`/`handleFileView`/`handleSearch`, `getStaticTools`).
+- **Startup / daemon / proxy:** `src/mcp/proxy.ts` (`runProxy`, `connectWithHello`, `runLocalHandshakeProxy`, PPID watchdog), `src/mcp/index.ts` (`runProxyWithLocalHandshake`, `spawnDetachedDaemon`), `src/mcp/daemon.ts`.
+- **Runtime flags:** `src/extraction/wasm-runtime-flags.ts` (`RELAUNCH_GUARD_ENV=CODEGRAPH_WASM_RELAUNCHED`, `HOST_PPID_ENV=CODEGRAPH_HOST_PPID`).
+- **Hooks (existing):** `scripts/agent-eval/block-read-hook.sh`, `scripts/agent-eval/hook-settings.json` (the eval's force-Read-0 hook — basis for the P1 redirect hook).
+- **Installer (where to add a recommended hook):** `src/installer/targets/claude.ts`.
+- **Eval harness:** `scripts/agent-eval/ab-new-vs-baseline.sh` (new-vs-baseline, pre-warm baked in), `run-all.sh` (with-vs-without), `parse-run.mjs` (tool-by-type counts; `codegraph tools exposed: 0` + 0 codegraph calls = ran without).
+- **Doctrine:** `CLAUDE.md` → "Retrieval performance & dynamic-dispatch coverage" + the agent-eval note under "Validation methodology".
+
+## How to validate anything here
+- **P1 (Read displacement):** `bash scripts/agent-eval/ab-new-vs-baseline.sh <indexed-repo> "<implementation task>" [baseline-ref]` — compare `Read` vs `mcp__codegraph__*` counts. ≥2 runs/arm (n=1 is noisy). Run non-nested for cleanest results. Use a *genuinely new* feature task (verify it doesn't already exist — the first A/B attempt wasted a run on an already-implemented `--quiet`).
+- **P2 (startup):** time `tools/list` from `serve --mcp` (above); and count cold-start runs where `init` shows `connected` + tools > 0. Don't trust a single `pending` init snapshot — confirm by whether the agent actually called codegraph.
+
+## Constraints / gotchas to remember
+- Descriptions/instructions are low-salience — **A/B every behavioral claim**, don't ship wording on faith.
+- New tools < extending existing ones.
+- The host's `init` snapshot can say `pending` even when the server then connects — judge by actual usage.
+- Don't run evals nested for "clean" numbers unless pre-warmed; even then, a real terminal is better.
+
+## Suggested start order for the fresh session
+1. **P2 idea 1** — verify whether `serve --mcp` answers `tools/list` locally/instantly; if not, decouple it from the daemon. (Highest-value, shippable, helps all users, no behavioral guesswork.)
+2. **P1 idea 1** — prototype the PreToolUse(Read) redirect hook; A/B it. (Highest-value behavioral lever.)
+3. Ship the P2 SessionStart pre-warm hook as a mitigation; file the host-side wait/retry request.

+ 91 - 0
scripts/agent-eval/ab-adoption.sh

@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+# Does the agent PICK codegraph_node to read a file, vs the built-in Read tool?
+# Build A/B: NEW build (HEAD, codegraph_node has Read parity) vs BASELINE build
+# (a ref where it doesn't), BOTH codegraph-attached + pre-warmed, same task. The
+# metric is tool CHOICE: Read calls vs codegraph_node[file] calls per run.
+#
+# Usage: ab-adoption.sh <indexed-repo> "<task>" [runs-per-arm] [baseline-ref]
+# Env: AGENT_EVAL_OUT (default: /tmp/ab-adoption)
+set -uo pipefail
+TARGET="${1:?usage: ab-adoption.sh <indexed-repo> \"<task>\" [runs] [baseline-ref]}"
+TASK="${2:?task required}"
+RUNS="${3:-2}"
+BASE_REF="${4:-HEAD~1}"
+ENGINE="$(cd "$(dirname "$0")/../.." && pwd)"
+BIN="$ENGINE/dist/bin/codegraph.js"
+OUT="${AGENT_EVAL_OUT:-/tmp/ab-adoption}"
+
+command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; }
+[ -d "$TARGET/.codegraph" ] || { echo "target not indexed: run 'codegraph init $TARGET' first"; exit 1; }
+git -C "$ENGINE" diff --quiet && git -C "$ENGINE" diff --cached --quiet || { echo "engine has uncommitted changes — commit/stash first"; exit 1; }
+CHANGED=$(git -C "$ENGINE" diff --name-only "$BASE_REF" HEAD -- src 2>/dev/null)
+[ -n "$CHANGED" ] || { echo "no src/ changes between $BASE_REF and HEAD"; exit 1; }
+
+cleanup() {
+  pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null
+  git -C "$ENGINE" checkout HEAD -- $CHANGED 2>/dev/null
+  ( cd "$ENGINE" && npm run build >/dev/null 2>&1 )
+}
+trap cleanup EXIT
+mkdir -p "$OUT"
+echo "###### target=$TARGET  runs/arm=$RUNS  baseline=$BASE_REF"
+echo "###### changed: $(echo "$CHANGED" | tr '\n' ' ')"
+echo "###### task=$TASK"; echo
+
+prewarm() {
+  pkill -9 -f "serve --mcp --path $1" 2>/dev/null
+  CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" </dev/null >/dev/null 2>&1 &
+  node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" >/dev/null 2>&1
+}
+
+# Per-run tool-choice counts: Read vs codegraph_node[file] vs [symbol].
+count() {
+  node -e '
+    const fs=require("fs");
+    const lines=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean);
+    let read=0,cgFile=0,cgSym=0,cgOther=0,exposed="?";
+    for(const l of lines){try{const o=JSON.parse(l);
+      if(o.type==="system"&&o.subtype==="init"){exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length;}
+      const blocks=o.message?.content||[];
+      for(const b of (Array.isArray(blocks)?blocks:[])){
+        if(b.type!=="tool_use")continue;
+        if(b.name==="Read")read++;
+        else if(b.name==="mcp__codegraph__codegraph_node"){ if(b.input&&b.input.symbol)cgSym++; else cgFile++; }
+        else if(/mcp__codegraph__/.test(b.name))cgOther++;
+      }
+    }catch{}}
+    console.log(`    Read=${read}  codegraph_node[file]=${cgFile}  codegraph_node[symbol]=${cgSym}  other_cg=${cgOther}  (cg exposed=${exposed})`);
+  ' "$1"
+}
+
+run_arm() { # label, N
+  local label="$1" n="$2"
+  local c="$OUT/mcp-$label.json"
+  for i in $(seq 1 "$n"); do
+    local tgt="$OUT/t-$label-$i"
+    rm -rf "$tgt"
+    rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$TARGET/" "$tgt/"
+    node "$BIN" init "$tgt" >/dev/null 2>&1
+    printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$c"
+    prewarm "$tgt"
+    echo "----- [$label] run $i -----"
+    ( cd "$tgt" && claude -p "$TASK" \
+        --output-format stream-json --verbose --permission-mode bypassPermissions \
+        --model opus --max-budget-usd 4 --strict-mcp-config --mcp-config "$c" \
+        </dev/null > "$OUT/run-$label-$i.jsonl" 2>"$OUT/run-$label-$i.err" )
+    count "$OUT/run-$label-$i.jsonl"
+    pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null
+  done
+  echo
+}
+
+echo "== NEW build (HEAD: codegraph_node has Read parity) =="
+( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built"
+run_arm new "$RUNS"
+
+echo "== BASELINE build ($BASE_REF) =="
+git -C "$ENGINE" checkout "$BASE_REF" -- $CHANGED
+( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built"
+run_arm baseline "$RUNS"
+
+echo "###### DONE — compare [new] vs [baseline]: does codegraph_node[file] rise / Read fall? Logs: $OUT"

+ 86 - 0
scripts/agent-eval/ab-hook.sh

@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+# A/B the PreToolUse(Read) REDIRECT hook (P1): does steering Read → codegraph_node
+# file-view actually move the agent off Read during implementation? BOTH arms use
+# the CURRENT build with codegraph attached and pre-warmed; the only difference is
+# the hook. Isolates the hook's behavioral effect from the build/file-view change
+# (use ab-new-vs-baseline.sh for the build A/B).
+#
+#   arm [nohook] — codegraph on, no hook   (does the better file-view get picked on its own?)
+#   arm [hook]   — codegraph on, + redirect hook   (does routing close it?)
+#
+# Reliable attach (works nested): each arm pre-warms a persistent daemon and skips
+# the startup re-exec (CODEGRAPH_WASM_RELAUNCHED=1), so claude connects before the
+# agent's first turn. Judge by ACTUAL codegraph usage in parse-run.mjs's "by type",
+# not claude's init snapshot (which can read pending even when it then connects).
+#
+# Usage: ab-hook.sh <indexed-repo> "<implementation task>" [runs-per-arm]
+#   <indexed-repo>  a repo with a .codegraph index (copied per arm; never mutated)
+#   "<task>"        a GENUINELY-NEW implementation task (verify it isn't already done)
+#   [runs-per-arm]  default 2 (n=1 is noisy — the doctrine says >=2)
+# Env: AGENT_EVAL_OUT (default: /tmp/ab-hook)
+set -uo pipefail
+
+TARGET="${1:?usage: ab-hook.sh <indexed-repo> \"<task>\" [runs-per-arm]}"
+TASK="${2:?task required}"
+RUNS="${3:-2}"
+ENGINE="$(cd "$(dirname "$0")/../.." && pwd)"
+BIN="$ENGINE/dist/bin/codegraph.js"
+HOOK="$ENGINE/scripts/agent-eval/redirect-read-hook.sh"
+OUT="${AGENT_EVAL_OUT:-/tmp/ab-hook}"
+PARSE="$ENGINE/scripts/agent-eval/parse-run.mjs"
+
+command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; }
+command -v jq >/dev/null || { echo "jq not on PATH (the hook needs it)"; exit 1; }
+[ -d "$TARGET/.codegraph" ] || { echo "target not indexed: run 'codegraph init $TARGET' first"; exit 1; }
+chmod +x "$HOOK"
+
+cleanup() { pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null; }
+trap cleanup EXIT
+
+mkdir -p "$OUT"
+echo "###### engine=$ENGINE"
+echo "###### target=$TARGET   runs/arm=$RUNS"
+echo "###### task=$TASK"
+echo
+
+( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built"
+
+# A settings file carrying ONLY the PreToolUse(Read) redirect hook.
+HOOK_SETTINGS="$OUT/hook-settings.json"
+jq -n --arg cmd "bash $HOOK" \
+  '{hooks:{PreToolUse:[{matcher:"Read",hooks:[{type:"command",command:$cmd}]}]}}' > "$HOOK_SETTINGS"
+
+prewarm() { # target — spawn a persistent daemon and wait for its socket
+  pkill -9 -f "serve --mcp --path $1" 2>/dev/null
+  CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" </dev/null >/dev/null 2>&1 &
+  node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" \
+    && echo "  daemon warm: $1" || echo "  WARN: daemon never bound for $1"
+}
+
+run_one() { # arm-label, run-index, use-hook(0|1)
+  local label="$1" idx="$2" hook="$3"
+  local tgt="$OUT/t-$label-$idx" c="$OUT/mcp-$label.json"
+  rm -rf "$tgt"
+  rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$TARGET/" "$tgt/"
+  node "$BIN" init "$tgt" >/dev/null 2>&1
+  printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$c"
+  prewarm "$tgt"
+  local extra=()
+  [ "$hook" = "1" ] && extra=(--settings "$HOOK_SETTINGS")
+  echo "----- [$label] run $idx -----"
+  # ${extra[@]+...} guard: bash 3.2 (macOS) under `set -u` errors on an empty
+  # array expansion otherwise, which would skip the no-hook arm's claude run.
+  ( cd "$tgt" && claude -p "$TASK" \
+      --output-format stream-json --verbose --permission-mode bypassPermissions \
+      --model opus --max-budget-usd 4 --strict-mcp-config --mcp-config "$c" ${extra[@]+"${extra[@]}"} \
+      </dev/null > "$OUT/run-$label-$idx.jsonl" 2>"$OUT/run-$label-$idx.err" )
+  node "$PARSE" "$OUT/run-$label-$idx.jsonl" 2>&1 | grep -E "by type|Result" || echo "  (parse failed — see $OUT/run-$label-$idx.jsonl)"
+  pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null
+  echo
+}
+
+for i in $(seq 1 "$RUNS"); do run_one nohook "$i" 0; done
+for i in $(seq 1 "$RUNS"); do run_one hook   "$i" 1; done
+
+echo "###### DONE. Compare [nohook] vs [hook] 'by type' — Read should fall and"
+echo "###### mcp__codegraph__codegraph_node should rise in the [hook] arm. Logs: $OUT"

+ 78 - 0
scripts/agent-eval/ab-impl.sh

@@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+# Sufficiency A/B for an IMPLEMENTATION task (the agent edits): when it uses
+# codegraph (explore/node) to understand before editing, does it still Read? Like
+# ab-sufficiency.sh but copies+indexes a FRESH target per run (the agent mutates
+# it), so runs don't see each other's edits.
+#
+# WITH codegraph (pre-warmed) vs WITHOUT (empty MCP), N runs each. Reports
+# explore/node vs Read/Grep + the files Read, and whether the build still passes.
+#
+# Usage: ab-impl.sh <indexed-repo> "<task>" [runs] [build-cmd]
+# Env: AGENT_EVAL_OUT (default: /tmp/ab-impl)
+set -uo pipefail
+REPO="${1:?usage: ab-impl.sh <indexed-repo> \"<task>\" [runs] [build-cmd]}"
+Q="${2:?task required}"
+RUNS="${3:-2}"
+BUILD_CMD="${4:-}"
+ENGINE="$(cd "$(dirname "$0")/../.." && pwd)"
+BIN="$ENGINE/dist/bin/codegraph.js"
+OUT="${AGENT_EVAL_OUT:-/tmp/ab-impl}"
+command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; }
+[ -d "$REPO/.codegraph" ] || { echo "no .codegraph index at $REPO"; exit 1; }
+cleanup(){ pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null; }
+trap cleanup EXIT
+mkdir -p "$OUT"
+( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built engine"
+echo "###### repo=$REPO  runs/arm=$RUNS"
+echo "###### task=$Q"; echo
+echo '{"mcpServers":{}}' > "$OUT/mcp-empty.json"
+
+prewarm(){
+  pkill -9 -f "serve --mcp --path $1" 2>/dev/null
+  CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" </dev/null >/dev/null 2>&1 &
+  node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" >/dev/null 2>&1
+}
+
+analyze(){
+  node -e '
+    const fs=require("fs");
+    const L=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean);
+    let ex=0,nf=0,ns=0,oc=0,gr=0,ed=0,exposed="?";const reads=[];
+    for(const l of L){try{const o=JSON.parse(l);
+      if(o.type==="system"&&o.subtype==="init")exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length;
+      for(const b of (o.message?.content||[])){if(b.type!=="tool_use")continue;
+        if(b.name==="mcp__codegraph__codegraph_explore")ex++;
+        else if(b.name==="mcp__codegraph__codegraph_node"){if(b.input&&b.input.symbol)ns++;else nf++;}
+        else if(/mcp__codegraph__/.test(b.name))oc++;
+        else if(b.name==="Read")reads.push((b.input?.file_path||"").split("/").pop());
+        else if(b.name==="Grep")gr++;
+        else if(b.name==="Edit"||b.name==="Write")ed++;
+      }}catch{}}
+    console.log(`    explore=${ex} node[sym]=${ns} node[file]=${nf} other_cg=${oc} | Read=${reads.length}${reads.length?" ("+reads.join(", ")+")":""} Grep=${gr} Edit=${ed}  [cg exposed=${exposed}]`);
+  ' "$1"
+}
+
+run(){ # label, withCodegraph(0/1)
+  local label="$1" wcg="$2"
+  for i in $(seq 1 "$RUNS"); do
+    local tgt="$OUT/t-$label-$i" cfg="$OUT/mcp-$label.json"
+    rm -rf "$tgt"
+    rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$REPO/" "$tgt/"
+    node "$BIN" init "$tgt" >/dev/null 2>&1
+    if [ "$wcg" = "1" ]; then
+      printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$cfg"
+      prewarm "$tgt"
+    else cp "$OUT/mcp-empty.json" "$cfg"; fi
+    ( cd "$tgt" && claude -p "$Q" --output-format stream-json --verbose \
+        --permission-mode bypassPermissions --model opus --max-budget-usd 4 \
+        --strict-mcp-config --mcp-config "$cfg" </dev/null > "$OUT/$label-$i.jsonl" 2>"$OUT/$label-$i.err" )
+    echo "[$label] run $i:"; analyze "$OUT/$label-$i.jsonl"
+    if [ -n "$BUILD_CMD" ]; then ( cd "$tgt" && eval "$BUILD_CMD" >/dev/null 2>&1 && echo "      build: PASS" || echo "      build: FAIL" ); fi
+    pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null
+  done
+  echo
+}
+
+echo "== WITH codegraph =="; run with 1
+echo "== WITHOUT (Read/Grep only) =="; run without 0
+echo "###### DONE: $OUT"

+ 78 - 0
scripts/agent-eval/ab-sufficiency.sh

@@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+# Sufficiency A/B: on a real understanding/flow question, WHEN the agent uses
+# codegraph (explore/node), does it still Read? Premise under test: explore/node
+# return source WITH line numbers, so a Read should not be needed.
+#
+# WITH codegraph (pre-warmed daemon, reliable nested attach) vs WITHOUT (empty
+# MCP, Read/Grep only), N runs each, on a throwaway copy of the repo. Reports
+# explore/node vs Read/Grep, and LISTS the files Read in the WITH arm so a true
+# sufficiency gap (an indexed source file) is distinguishable from out-of-scope
+# (configs, docs, a file codegraph didn't index).
+#
+# Usage: ab-sufficiency.sh <indexed-repo> "<question>" [runs-per-arm]
+# Env: AGENT_EVAL_OUT (default: /tmp/ab-sufficiency)
+set -uo pipefail
+REPO="${1:?usage: ab-sufficiency.sh <indexed-repo> \"<question>\" [runs]}"
+Q="${2:?question required}"
+RUNS="${3:-2}"
+ENGINE="$(cd "$(dirname "$0")/../.." && pwd)"
+BIN="$ENGINE/dist/bin/codegraph.js"
+OUT="${AGENT_EVAL_OUT:-/tmp/ab-sufficiency}"
+TGT="$OUT/target"
+command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; }
+[ -d "$REPO/.codegraph" ] || { echo "no .codegraph index at $REPO"; exit 1; }
+cleanup(){ pkill -9 -f "serve --mcp --path $TGT" 2>/dev/null; }
+trap cleanup EXIT
+mkdir -p "$OUT"
+( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built"
+
+# Throwaway copy + fresh index (the agent works here; a read-only question won't
+# edit, but isolate anyway). Excludes the source repo's index/build/vcs.
+rm -rf "$TGT"
+rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$REPO/" "$TGT/"
+node "$BIN" init "$TGT" >/dev/null 2>&1 && echo "indexed copy ($(node "$BIN" status --json 2>/dev/null | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{console.log(JSON.parse(s).fileCount+" files")}catch{console.log("?")}})' 2>/dev/null || echo '?'))"
+
+echo "###### repo=$REPO  runs/arm=$RUNS"
+echo "###### Q=$Q"; echo
+echo '{"mcpServers":{}}' > "$OUT/mcp-empty.json"
+printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$TGT" > "$OUT/mcp-cg.json"
+
+prewarm(){
+  pkill -9 -f "serve --mcp --path $TGT" 2>/dev/null
+  CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$TGT" </dev/null >/dev/null 2>&1 &
+  node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$TGT" >/dev/null 2>&1
+}
+
+analyze(){
+  node -e '
+    const fs=require("fs");
+    const L=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean);
+    let ex=0,nf=0,ns=0,oc=0,gr=0,exposed="?";const reads=[];
+    for(const l of L){try{const o=JSON.parse(l);
+      if(o.type==="system"&&o.subtype==="init")exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length;
+      for(const b of (o.message?.content||[])){if(b.type!=="tool_use")continue;
+        if(b.name==="mcp__codegraph__codegraph_explore")ex++;
+        else if(b.name==="mcp__codegraph__codegraph_node"){if(b.input&&b.input.symbol)ns++;else nf++;}
+        else if(/mcp__codegraph__/.test(b.name))oc++;
+        else if(b.name==="Read")reads.push((b.input?.file_path||"").split("/").pop());
+        else if(b.name==="Grep")gr++;
+      }}catch{}}
+    console.log(`    explore=${ex} node[sym]=${ns} node[file]=${nf} other_cg=${oc} | Read=${reads.length}${reads.length?" ("+reads.join(", ")+")":""} Grep=${gr}  [cg exposed=${exposed}]`);
+  ' "$1"
+}
+
+run(){ # label, cfg, prewarm(0/1)
+  local label="$1" cfg="$2" pw="$3"
+  for i in $(seq 1 "$RUNS"); do
+    [ "$pw" = "1" ] && prewarm
+    ( cd "$TGT" && claude -p "$Q" --output-format stream-json --verbose \
+        --permission-mode bypassPermissions --model opus --max-budget-usd 4 \
+        --strict-mcp-config --mcp-config "$cfg" </dev/null > "$OUT/$label-$i.jsonl" 2>"$OUT/$label-$i.err" )
+    echo "[$label] run $i:"; analyze "$OUT/$label-$i.jsonl"
+  done
+  echo
+}
+
+echo "== WITH codegraph (premise: explore/node used -> Read ~0) =="; run with "$OUT/mcp-cg.json" 1
+echo "== WITHOUT (Read/Grep only — the contrast) =="; run without "$OUT/mcp-empty.json" 0
+echo "###### DONE. In the WITH arm: are explore/node>0 and Read~0? Any Read of an INDEXED source file = sufficiency gap. Logs: $OUT"

+ 38 - 0
scripts/agent-eval/redirect-read-hook.sh

@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+# PreToolUse(Read) REDIRECT hook — prototype for A/B (P1: get agents off Read and
+# onto codegraph_node during implementation, not just for Q&A).
+#
+# When the agent Reads a SOURCE file, deny it and steer to codegraph_node's
+# file-view, which (as of the Lever-1 change) returns the WHOLE file verbatim
+# WITH line numbers — imports, top-level code, comments and all — PLUS the file's
+# blast radius, in one call. That output is a strict superset of Read, so the
+# redirect is lossless: the agent loses nothing by taking it, and gains who-
+# depends-on-this for the edit it's about to make.
+#
+# Differs from block-read-hook.sh (which steers to explore/node-by-symbol): this
+# names the FILE-VIEW path explicitly (file:"<base>" + includeCode:true), the
+# 1:1 Read replacement we're trying to get picked during implementation.
+#
+# Non-source files (configs, docs, lockfiles, .env) pass through to a real Read.
+# A redirect to a file codegraph hasn't indexed SELF-CORRECTS: the file-view
+# replies "No indexed file matches … Read it directly", so a just-created file
+# never dead-ends — the agent Reads it on the next turn.
+#
+# Wire via:  claude ... --settings <settings-with-this-as-PreToolUse(Read)>
+# Eval artifact only. The production version is an indexed-aware `codegraph`
+# subcommand (cross-platform — no bash/jq — and queries the index so it never
+# bounces a new/un-indexed file), wired opt-in by the installer.
+set -uo pipefail
+input="$(cat)"
+fp="$(printf '%s' "$input" | jq -r '.tool_input.file_path // empty' 2>/dev/null)"
+[ -n "$fp" ] || exit 0
+base="$(basename "$fp")"
+
+case "$fp" in
+  *.ts|*.tsx|*.js|*.jsx|*.mjs|*.cjs|*.py|*.go|*.rs|*.java|*.rb|*.php|*.swift|*.kt|*.kts|*.scala|*.c|*.cc|*.cpp|*.h|*.hpp|*.cs|*.lua|*.vue|*.svelte|*.m|*.mm)
+    msg="codegraph has this file indexed (kept in sync on every edit). Call codegraph_node with file:\"$base\" and includeCode:true instead of Read — it returns the WHOLE file verbatim WITH line numbers (imports, top-level code and all — safe to base an Edit on) PLUS which files depend on it, in one call. Treat its output as already-Read; do not Read this file. (If it answers that the file isn't indexed — e.g. you just created it — then Read it directly.)"
+    jq -n --arg m "$msg" '{hookSpecificOutput:{hookEventName:"PreToolUse",permissionDecision:"deny",permissionDecisionReason:$m}}'
+    exit 0
+    ;;
+esac
+exit 0

+ 3 - 2
src/mcp/server-instructions.ts

@@ -48,7 +48,8 @@ typically one to a few calls; a grep/read exploration is dozens.
 - **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow
 - **"What is the symbol named X?" (just its location)** → \`codegraph_search\`
 - **"What calls this?" / "What does this call?" / "What would changing this break?"** → \`codegraph_callers\` / \`codegraph_callees\` / \`codegraph_impact\`
-- **About to read or edit a symbol you can name** → \`codegraph_node\` (SECONDARY — the after-explore depth tool) instead of \`Read\`: it returns the **verbatim current on-disk source** (safe to base an \`Edit\` on) PLUS its caller/callee trail — the same bytes Read gives you, plus who calls it and what your change would break, for fewer tokens. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload. Or pass a FILE PATH alone (no symbol) to get that whole file's symbol map + what depends on it — a Read replacement for a source file
+- **Reading a source FILE (any time you'd use the \`Read\` tool)** → \`codegraph_node\` with a \`file\` path and no \`symbol\`. It returns the file's **current source with line numbers — the same \`<n>\\t<line>\` shape \`Read\` gives you, safe to \`Edit\` from** — narrowable with \`offset\`/\`limit\` exactly like \`Read\`, PLUS a one-line note of which files depend on it. Same bytes as \`Read\`, faster (served from the index), with the blast radius attached. Use it **instead of \`Read\`** for indexed source files; fall back to \`Read\` only for what codegraph doesn't index (configs, docs). Pass \`symbolsOnly: true\` for just the file's structure.
+- **About to read or edit a symbol you can name** → \`codegraph_node\` with that \`symbol\` (SECONDARY — the after-explore depth tool): the verbatim source (\`includeCode: true\`) PLUS its caller/callee trail, so before changing it you see what calls it and what your edit would break. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload
 - **"What's in directory X?"** → \`codegraph_files\`
 - **"Is the index ready / what's its size?"** → \`codegraph_status\`
 
@@ -65,7 +66,7 @@ typically one to a few calls; a grep/read exploration is dozens.
 - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** to understand an area — ONE \`codegraph_explore\` returns the relevant symbols' source together in a single round-trip.
 - **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol.
-- **Don't \`Read\` a file just to see or edit a symbol you can name** — \`codegraph_node\` returns the same current source plus its caller/callee trail in one call, for fewer tokens. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index.
+- **Don't reach for the \`Read\` tool on an indexed source file** — \`codegraph_node\` with a \`file\` reads it for you (same \`<n>\\t<line>\` source, \`offset\`/\`limit\` like Read, faster, with its blast radius), and with a \`symbol\` it returns the source plus the caller/callee trail. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index.
 - **After editing, check the staleness banner.** When a tool response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Every file NOT in that banner is fresh, so still trust codegraph. \`codegraph_status\` also lists pending files under "Pending sync".
 
 ## Limitations

+ 125 - 62
src/mcp/tools.ts

@@ -26,7 +26,7 @@ import {
   existsSync,
   readFileSync,
 } from 'fs';
-import { clamp, validatePathWithinRoot, validateProjectPath, isConfigLeafNode } from '../utils';
+import { clamp, validatePathWithinRoot, validateProjectPath, isConfigLeafNode, CONFIG_LEAF_LANGUAGES } from '../utils';
 import { isGeneratedFile } from '../extraction/generated-detection';
 import { resolve as resolvePath } from 'path';
 
@@ -463,26 +463,39 @@ export const tools: ToolDefinition[] = [
   },
   {
     name: 'codegraph_node',
-    description: 'SECONDARY (after codegraph_explore): the Read upgrade for ONE symbol you can name. Returns its location, signature, the verbatim CURRENT on-disk source (includeCode=true — the same bytes Read would give you, safe to base an Edit on), AND its caller/callee trail in a single call — so before changing a symbol you already see what calls it and what your edit would break, for fewer tokens than reading the file. Prefer it over Read whenever you know the symbol name. Or pass `file` ALONE (no symbol) to get that whole source file\'s symbol map + what depends on it — a Read replacement for a file. When the name is AMBIGUOUS (an overloaded method, or the same name on different types) it returns EVERY matching definition\'s full body in one call — so you never Read a file to find the right overload; pass `file` (and/or `line`) to pin one. Use codegraph_explore for several related symbols or the full flow.',
+    description: 'Two modes. (1) READ A FILE — use INSTEAD of the Read tool: pass `file` (a path or basename) with no `symbol` and it returns that file\'s current on-disk source with line numbers, exactly the shape Read gives you (`<n>\\t<line>`, safe to Edit from), narrowable with `offset`/`limit` just like Read — PLUS a one-line note of which files depend on it. Same bytes as Read, faster (served from the index), with the blast radius attached. Use it whenever you would Read a source file. (2) ONE SYMBOL you can name — its location, signature, verbatim source (includeCode=true) and caller/callee trail in one call, so before changing it you see what calls it and what your edit would break. For an AMBIGUOUS name it returns EVERY matching definition\'s body in one call (so you never Read a file to find the right overload); pass `file`/`line` to pin one. Use codegraph_explore for several related symbols or the full flow.',
     inputSchema: {
       type: 'object',
       properties: {
         symbol: {
           type: 'string',
-          description: 'Name of the symbol to get details for. Omit it and pass `file` alone to get the whole file\'s symbols + dependents (a Read replacement).',
+          description: 'Name of the symbol to read (symbol mode). Omit it and pass `file` alone to read a whole file like Read.',
         },
         includeCode: {
           type: 'boolean',
-          description: 'Include full source bodies (default: false to minimize context). In file mode, returns every symbol\'s body up to a size budget.',
+          description: 'Symbol mode: include the symbol\'s full body (default: false). Ignored in file mode, which always returns source unless `symbolsOnly` is set.',
           default: false,
         },
         file: {
           type: 'string',
-          description: 'A file path or basename (e.g. "harness.rs", "src/auth/session.ts"). Pass it ALONE (no symbol) to get that whole file\'s symbol map + dependents — a Read replacement. Or pass it WITH a symbol to disambiguate an overloaded name to the definition in this file.',
+          description: 'A file path or basename (e.g. "harness.rs", "src/auth/session.ts"). Pass it ALONE (no symbol) to READ the file like the Read tool — its full source with line numbers + which files depend on it. Or pass it WITH a symbol to disambiguate an overloaded name to the definition in this file.',
+        },
+        offset: {
+          type: 'number',
+          description: 'File mode: 1-based line to start reading from, exactly like Read\'s offset. Defaults to the start of the file.',
+        },
+        limit: {
+          type: 'number',
+          description: 'File mode: maximum number of lines to return, exactly like Read\'s limit. Defaults to the whole file (capped at 2000 lines, like Read).',
+        },
+        symbolsOnly: {
+          type: 'boolean',
+          description: 'File mode: return just the file\'s symbol map + dependents (a cheap structural overview) instead of its source.',
+          default: false,
         },
         line: {
           type: 'number',
-          description: 'Optional: disambiguate to the definition at/around this line (use with the file:line a trail showed you).',
+          description: 'Symbol mode only: disambiguate to the definition at/around this line (use with the file:line a trail showed you).',
         },
         projectPath: projectPathProperty,
       },
@@ -2527,14 +2540,18 @@ export class ToolHandler {
     const includeCode = args.includeCode === true;
     const fileHint = typeof args.file === 'string' && args.file.trim() ? args.file.trim() : undefined;
     const lineHint = typeof args.line === 'number' && args.line > 0 ? args.line : undefined;
+    const offset = typeof args.offset === 'number' && args.offset > 0 ? Math.floor(args.offset) : undefined;
+    const limit = typeof args.limit === 'number' && args.limit > 0 ? Math.floor(args.limit) : undefined;
+    const symbolsOnly = args.symbolsOnly === true;
     const symbolRaw = typeof args.symbol === 'string' ? args.symbol.trim() : '';
 
-    // FILE-VIEW MODE: a bare `file` with no `symbol` returns that file's symbol
-    // map + graph role (which files depend on it) — and, with includeCode, the
-    // bodies. A Read replacement for "show me file X" that also surfaces the
-    // blast radius, so an edit is made with impact in view.
+    // FILE READ MODE: a `file` with no `symbol` reads that file like the Read
+    // tool — its current on-disk source with line numbers, narrowable with
+    // `offset`/`limit` exactly as Read does — PLUS a one-line blast-radius
+    // header (which files depend on it). `symbolsOnly` returns just the
+    // structural map instead. Backed by the index: same bytes Read gives you.
     if (!symbolRaw && fileHint) {
-      return this.handleFileView(cg, fileHint, includeCode);
+      return this.handleFileView(cg, fileHint, { offset, limit, symbolsOnly });
     }
 
     const symbol = this.validateString(args.symbol, 'symbol');
@@ -2634,11 +2651,23 @@ export class ToolHandler {
   }
 
   /**
-   * FILE-VIEW: resolve `fileArg` (path or basename) to an indexed file and
-   * return its symbol map + graph role (which files depend on it), plus bodies
-   * when `includeCode`. A Read replacement that also surfaces the blast radius.
+   * FILE READ MODE: resolve `fileArg` (path or basename) to an indexed file and
+   * read it like the Read tool — its current on-disk source with line numbers,
+   * narrowable with `offset`/`limit` exactly as Read's are — preceded by a
+   * one-line blast-radius header (which files depend on it). `symbolsOnly`
+   * returns just the structural map (symbols + dependents) instead of source.
+   *
+   * Parity goal: the numbered source block is byte-for-byte the shape Read
+   * returns (`<n>\t<line>`, no padding), so the agent treats it as a Read — only
+   * faster (served from the index) and with the blast radius attached. Security:
+   * yaml/properties files are summarized by key, never dumped (#383); reads go
+   * through validatePathWithinRoot (#527).
    */
-  private async handleFileView(cg: CodeGraph, fileArg: string, includeCode: boolean): Promise<ToolResult> {
+  private async handleFileView(
+    cg: CodeGraph,
+    fileArg: string,
+    opts: { offset?: number; limit?: number; symbolsOnly?: boolean } = {},
+  ): Promise<ToolResult> {
     const normalize = (p: string) => p.replace(/\\/g, '/').replace(/^(?:\.?\/+)+/, '').replace(/\/+$/, '');
     const wantLower = normalize(fileArg).toLowerCase();
     const allFiles = cg.getFiles();
@@ -2672,62 +2701,96 @@ export class ToolHandler {
       .sort((a, b) => a.startLine - b.startLine);
     const dependents = cg.getFileDependents(filePath);
 
-    const out: string[] = [`**${filePath}** — ${nodes.length} symbol${nodes.length === 1 ? '' : 's'}`];
-    if (dependents.length) {
-      out.push(
-        `Depended on by ${dependents.length} file${dependents.length === 1 ? '' : 's'}` +
-          `${dependents.length > 8 ? ' (first 8)' : ''}: ${dependents.slice(0, 8).join(', ')}${dependents.length > 8 ? ', …' : ''}`,
-        '> Editing a symbol here can affect those files — run codegraph_impact on the specific symbol for its exact blast radius.',
-      );
-    } else {
-      out.push('No other indexed file depends on this one.');
-    }
-    out.push('');
+    // Compact, one-line blast radius (codegraph's value-add over a plain Read).
+    const depSummary = dependents.length
+      ? `used by ${dependents.length} file${dependents.length === 1 ? '' : 's'}: ${dependents.slice(0, 8).join(', ')}${dependents.length > 8 ? `, +${dependents.length - 8} more` : ''}`
+      : 'no other indexed file depends on it';
+
+    // Symbol-map renderer — for symbolsOnly, the config fallback, and read errors.
+    const symbolMap = (heading: string, limit = 200): string[] => {
+      const lines: string[] = [heading];
+      for (const n of nodes.slice(0, limit)) {
+        const sig = n.signature ? ` ${n.signature.replace(/\s+/g, ' ').trim()}` : '';
+        lines.push(`- \`${n.name}\` (${n.kind})${sig} — :${n.startLine}`);
+      }
+      if (nodes.length > limit) lines.push(`- … +${nodes.length - limit} more`);
+      return lines;
+    };
 
-    if (nodes.length === 0) {
-      out.push('_No indexed symbols in this file (codegraph may track it but not parse it for symbols)._');
+    // symbolsOnly → the cheap structural overview, no source.
+    if (opts.symbolsOnly) {
+      const out = [`**${filePath}** — ${nodes.length} symbol${nodes.length === 1 ? '' : 's'}, ${depSummary}`, ''];
+      if (nodes.length) out.push(...symbolMap('### Symbols'));
+      else out.push('_No indexed symbols in this file._');
+      out.push('', '> Drop `symbolsOnly` (or pass `offset`/`limit`) to read the source, like Read.');
       return this.textResult(this.truncateOutput(out.join('\n')));
     }
 
-    if (!includeCode) {
-      out.push('### Symbols');
-      for (const n of nodes) {
-        const sig = n.signature ? ` ${n.signature.replace(/\s+/g, ' ').trim()}` : '';
-        out.push(`- \`${n.name}\` (${n.kind})${sig} — :${n.startLine}`);
-      }
-      out.push('', '> Call again with `includeCode:true` for the bodies, or `codegraph_node <name>` for one symbol in full.');
+    // SECURITY (#383): never dump a raw config/data file — a yaml/properties
+    // line is `key: <secret>`. Summarize by key and point to a real Read.
+    if (CONFIG_LEAF_LANGUAGES.has(resolved.language)) {
+      const out = [`**${filePath}** — configuration/data file, ${depSummary}`, ''];
+      if (nodes.length) out.push(...symbolMap('### Keys (values withheld for safety)'));
+      out.push('', '> Values may be secrets, so codegraph indexes keys only. Read the file directly if you need a value.');
       return this.textResult(this.truncateOutput(out.join('\n')));
     }
 
-    // Render each OUTERMOST symbol's verbatim body (a container's body already
-    // includes its members, so skip anything covered) — no duplication, and no
-    // "read the file" container outline. Budget-capped.
-    out.push('### Source (verbatim — treat as already Read)');
-    const BODY_BUDGET = 14000;
-    const outermost = [...nodes].sort((a, b) =>
-      a.startLine - b.startLine || (b.endLine ?? b.startLine) - (a.endLine ?? a.startLine));
-    const covered: Array<[number, number]> = [];
-    let used = out.join('\n').length;
-    const listed: Node[] = [];
-    for (const n of outermost) {
-      const end = n.endLine ?? n.startLine;
-      if (covered.some(([s, e]) => s <= n.startLine && e >= end)) continue;
-      const code = await cg.getCode(n.id);
-      if (!code) continue;
-      const section = `#### \`${n.name}\` (${n.kind}) — :${n.startLine}\n\`\`\`\n${code}\n\`\`\``;
-      if (used + section.length <= BODY_BUDGET || used < 1500) {
-        out.push('', section);
-        used += section.length;
-        covered.push([n.startLine, end]);
-      } else {
-        listed.push(n);
-      }
+    // Read the current bytes from disk through the security chokepoint
+    // (validatePathWithinRoot: blocks `../` traversal and symlink escapes, #527).
+    const abs = validatePathWithinRoot(cg.getProjectRoot(), filePath);
+    let content: string | null = null;
+    if (abs) {
+      try { content = readFileSync(abs, 'utf-8'); } catch { content = null; }
     }
-    if (listed.length) {
-      out.push('', `### ${listed.length} more symbol${listed.length === 1 ? '' : 's'} (over the size budget — fetch with codegraph_node <name>)`,
-        ...listed.slice(0, 30).map((n) => `- \`${n.name}\` (${n.kind}) — :${n.startLine}`));
+    if (content === null) {
+      const out = [`**${filePath}** — could not read from disk (it may have moved since indexing). ${depSummary}`, ''];
+      if (nodes.length) out.push(...symbolMap('### Symbols'));
+      out.push('', `> Read \`${filePath}\` directly for its current content.`);
+      return this.textResult(this.truncateOutput(out.join('\n')));
     }
-    return this.textResult(this.truncateOutput(out.join('\n')));
+
+    // Split exactly as Read does — keep the trailing empty line a final newline
+    // produces (Read numbers it too), so line numbers line up byte-for-byte.
+    const fileLines = content.split('\n');
+    const total = fileLines.length;
+
+    // Read-parity windowing: `offset`/`limit` mean exactly what they do on Read
+    // (1-based start line; max line count). Default: the whole file, capped like
+    // Read at 2000 lines and bounded by a char budget that tracks explore's
+    // proven-safe ~38k response ceiling. Overflow is stated explicitly (Read
+    // paginates too) — never the silent 15k truncateOutput chop.
+    const CHAR_BUDGET = 38000;
+    const DEFAULT_LIMIT = 2000;
+    const offset = Math.max(1, opts.offset ?? 1);
+    if (offset > total) {
+      return this.textResult(`**${filePath}** has ${total} line${total === 1 ? '' : 's'} — offset ${offset} is past the end. ${depSummary}`);
+    }
+    const maxLines = Math.max(1, opts.limit ?? DEFAULT_LIMIT);
+    const start = offset - 1; // 0-based
+    const header = `**${filePath}** — ${total} lines, ${nodes.length} symbol${nodes.length === 1 ? '' : 's'} · ${depSummary}`;
+
+    // Numbered lines, byte-for-byte Read's shape: `<n>\t<line>`, no left-pad.
+    const numbered: string[] = [];
+    let used = header.length + 8;
+    let i = start;
+    for (; i < total && numbered.length < maxLines; i++) {
+      const ln = `${i + 1}\t${fileLines[i]}`;
+      if (used + ln.length + 1 > CHAR_BUDGET && numbered.length > 0) break;
+      numbered.push(ln);
+      used += ln.length + 1;
+    }
+    const shownEnd = start + numbered.length;
+    const complete = offset === 1 && shownEnd >= total;
+
+    const out: string[] = [header, '', ...numbered];
+    if (!complete) {
+      out.push(
+        '',
+        `(lines ${offset}–${shownEnd} of ${total} — pass \`offset\`/\`limit\` for another range, or \`codegraph_node <symbol>\` for one symbol in full)`,
+      );
+    }
+    // Self-bounded to CHAR_BUDGET — do NOT route through truncateOutput (15k).
+    return this.textResult(out.join('\n'));
   }
 
   /** Render one symbol: details + (optional) body/outline + its caller/callee trail. */