Przeglądaj źródła

feat(mcp): default tool surface trimmed to 4 — explore, node, search, callers (#818)

callees/impact/files/status stay fully functional (handlers, CLI, library
API untouched; CODEGRAPH_MCP_TOOLS re-enables any) but are no longer
LISTED by default. Evidence: codegraph_impact appears in zero recorded
eval runs ever; its blast-radius info already arrives inline on explore
(Blast radius section) and node (dependents note). callees is redundant
by construction (a symbol's body IS its callee list). files/status
"reduce to one grep" per the tiny-repo audit, and staleness banners
already inline pending-sync. callers stays: exhaustive call-site
enumeration (incl. callback registrations, per-definition sections) is
the one job explore/node don't replicate. Fewer tools = fewer mis-picks
+ ~300 schema tokens saved per session; presence itself steers.

server-instructions rewritten around the 4-tool surface ("what does X
call" → node body+trail; "what breaks" → callers + inline blast radius).
Tiny-repo gate unchanged (its trio ⊆ the default set); stale gate
comment corrected (context/trace are long gone — its "5 core tools" are
today's trio).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Colby Mchenry 1 tydzień temu
rodzic
commit
c450fd95b7

+ 1 - 0
CHANGELOG.md

@@ -16,6 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 ### New Features
 
+- **The MCP tool list is now a focused default of four** — `codegraph_explore`, `codegraph_node`, `codegraph_search`, and `codegraph_callers`. The other four (`codegraph_callees`, `codegraph_impact`, `codegraph_files`, `codegraph_status`) remain fully functional — the CLI and library API are unchanged, and `CODEGRAPH_MCP_TOOLS` re-enables any of them — but they're no longer listed to agents by default: measured agent behavior shows they're never or rarely picked, and the information they carry already arrives inline on the tools agents do use (explore's blast-radius section, node's dependents note, a symbol's own body as its callee list). A leaner list saves context tokens every session and steers agents to the right tool by presence alone.
 - **CodeGraph now goes quiet instead of failing loudly in unindexed projects.** When an AI agent's session starts in a workspace that has no CodeGraph index, the MCP server now announces itself as inactive with a short note and lists no tools at all — instead of presenting the full toolset and erroring on every call, which taught agents to distrust CodeGraph even where it works. Querying another project that isn't indexed likewise returns clear guidance (use your regular tools for that codebase; the user can run `codegraph init` there to enable CodeGraph) instead of an error, and genuine internal errors now tell the agent to retry once rather than give up on CodeGraph entirely. Indexing stays your decision — agents are told not to run it themselves. (#769)
 - **Astro projects are now indexed.** `.astro` files previously weren't parsed at all — on a typical Astro site that left most of the codebase invisible to search, impact, and `codegraph_explore`. CodeGraph now extracts the TypeScript frontmatter (functions, imports, `getStaticPaths`, …) and client-side `<script>` blocks, captures function calls and `<Component>` usages in template markup so cross-component dependencies trace end-to-end, resolves the `Astro` global and `astro:*` module imports as framework-provided, and maps `src/pages/` file-based routing to route nodes (`.astro` pages and `.ts` endpoints, including `[param]` and `[...rest]` dynamic segments, with underscore-prefixed files correctly excluded). Validated on two real-world Astro sites with 93% measured cross-file coverage and every page mapping to its route. Thanks @xingwangzhe. (#768) (Astro)
 - Same-named symbols across a monorepo's apps are no longer conflated. In a NestJS-style workspace with one `UserService` per app, `codegraph_callers`, `codegraph_callees`, and `codegraph_impact` now report **one section per distinct definition** — each app's callers and blast radius under its own file-labeled heading — instead of a single merged list, and accept a `file` argument to focus exactly the definition you mean (like `codegraph_node` already did). Impact in particular no longer overstates a change's blast radius by merging unrelated same-named classes. Thanks @Igorgro. (#764)

+ 19 - 8
__tests__/mcp-tool-allowlist.test.ts

@@ -17,13 +17,23 @@ describe('CODEGRAPH_MCP_TOOLS allowlist', () => {
 
   const listed = () => new ToolHandler(null).getTools().map(t => t.name).sort();
 
-  it('exposes the full tool surface when unset', () => {
+  it('exposes the default 4-tool surface when unset', () => {
     delete process.env[ENV];
-    const all = listed();
-    expect(all).toContain('codegraph_explore');
-    expect(all).not.toContain('codegraph_context');
-    expect(all).not.toContain('codegraph_trace');
-    expect(all.length).toBeGreaterThanOrEqual(8);
+    // The default set (see DEFAULT_MCP_TOOLS): explore + node are the
+    // validated workhorses, search the cheap lookup, callers the one
+    // irreplaceable enumerator. callees/impact/files/status stay defined
+    // and executable but unlisted — impact appeared in ZERO recorded runs.
+    expect(listed()).toEqual([
+      'codegraph_callers',
+      'codegraph_explore',
+      'codegraph_node',
+      'codegraph_search',
+    ]);
+  });
+
+  it('re-enables an unlisted tool via the allowlist (impact)', () => {
+    process.env[ENV] = 'explore,impact';
+    expect(listed()).toEqual(['codegraph_explore', 'codegraph_impact']);
   });
 
   it('filters ListTools to the allowlisted short names', () => {
@@ -36,9 +46,10 @@ describe('CODEGRAPH_MCP_TOOLS allowlist', () => {
     expect(listed()).toEqual(['codegraph_explore', 'codegraph_search']);
   });
 
-  it('treats an empty/whitespace value as unset (full surface)', () => {
+  it('treats an empty/whitespace value as unset (default surface)', () => {
     process.env[ENV] = '   ';
-    expect(listed().length).toBeGreaterThanOrEqual(8);
+    expect(listed()).toHaveLength(4);
+    expect(listed()).toContain('codegraph_explore');
   });
 
   it('rejects a disabled tool on execute (defense in depth)', async () => {

+ 5 - 6
src/mcp/server-instructions.ts

@@ -47,18 +47,17 @@ typically one to a few calls; a grep/read exploration is dozens.
 - **Almost any question — "how does X work", architecture, a bug, "what/where is X", or surveying an area** → \`codegraph_explore\` (PRIMARY — call FIRST; ONE capped call returns the verbatim source of the relevant symbols grouped by file; most often the ONLY call you need)
 - **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow
 - **"What is the symbol named X?" (just its location)** → \`codegraph_search\`
-- **"What calls this?" / "What does this call?" / "What would changing this break?"** → \`codegraph_callers\` / \`codegraph_callees\` / \`codegraph_impact\`. Callers includes where a function is **registered as a callback** (passed as an argument, assigned to a function pointer/field, listed in a handler table) — labeled "via callback registration" — so a function with no direct calls is NOT dead if it's wired up somewhere. When several UNRELATED symbols share a name (one \`UserService\` per monorepo app), these tools report **one section per definition** (never a merged list) — pass \`file\` to focus the definition you mean
+- **"What calls this?" / "What would changing this break?"** → \`codegraph_callers\` — EVERY call site with file:line, including where a function is **registered as a callback** (passed as an argument, assigned to a function pointer/field, listed in a handler table) — labeled "via callback registration" — so a function with no direct calls is NOT dead if it's wired up somewhere. When several UNRELATED symbols share a name (one \`UserService\` per monorepo app), it reports **one section per definition** (never a merged list) — pass \`file\` to focus the definition you mean. The wider blast radius arrives automatically on \`codegraph_explore\` (its "Blast radius" section) and \`codegraph_node\` (the dependents note)
+- **"What does this call?"** → \`codegraph_node\` with that symbol and \`includeCode: true\` — the body IS the callee list, and the caller/callee trail comes with it
 - **Reading a source FILE (any time you'd use the \`Read\` tool)** → \`codegraph_node\` with a \`file\` path and no \`symbol\`. It returns the file's **current source with line numbers — the same \`<n>\\t<line>\` shape \`Read\` gives you, safe to \`Edit\` from** — narrowable with \`offset\`/\`limit\` exactly like \`Read\`, PLUS a one-line note of which files depend on it. Same bytes as \`Read\`, faster (served from the index), with the blast radius attached. Use it **instead of \`Read\`** for indexed source files; fall back to \`Read\` only for what codegraph doesn't index (configs, docs). Pass \`symbolsOnly: true\` for just the file's structure.
 - **About to read or edit a symbol you can name** → \`codegraph_node\` with that \`symbol\` (SECONDARY — the after-explore depth tool): the verbatim source (\`includeCode: true\`) PLUS its caller/callee trail, so before changing it you see what calls it and what your edit would break. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload
-- **"What's in directory X?"** → \`codegraph_files\`
-- **"Is the index ready / what's its size?"** → \`codegraph_status\`
 
 ## Common chains
 
 - **Flow / "how does X reach Y"**: ONE \`codegraph_explore\` with the symbol names spanning the flow — it surfaces the call path among them (riding dynamic-dispatch hops) AND returns their source. No need to reconstruct the path with \`codegraph_search\` + \`codegraph_callers\`.
 - **Onboarding / understanding any area**: ONE \`codegraph_explore\` is usually the whole answer. Only follow up — \`codegraph_node\` for a specific symbol — if something is still unclear.
-- **Refactor planning**: \`codegraph_search\` → \`codegraph_callers\` → \`codegraph_impact\`. The blast-radius answer comes from impact, not from walking callers manually.
-- **Debugging a regression**: \`codegraph_callers\` of the suspected symbol; widen with \`codegraph_impact\` if an unexpected call appears.
+- **Refactor planning**: \`codegraph_callers\` for the complete call-site list to update; the wider blast radius is already attached to \`codegraph_explore\` / \`codegraph_node\` output.
+- **Debugging a regression**: \`codegraph_callers\` of the suspected symbol; \`codegraph_node\` on anything unexpected that appears.
 
 ## Anti-patterns
 
@@ -67,7 +66,7 @@ typically one to a few calls; a grep/read exploration is dozens.
 - **Don't chain \`codegraph_search\` + \`codegraph_node\`** to understand an area — ONE \`codegraph_explore\` returns the relevant symbols' source together in a single round-trip.
 - **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol.
 - **Don't reach for the \`Read\` tool on an indexed source file** — \`codegraph_node\` with a \`file\` reads it for you (same \`<n>\\t<line>\` source, \`offset\`/\`limit\` like Read, faster, with its blast radius), and with a \`symbol\` it returns the source plus the caller/callee trail. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index.
-- **After editing, check the staleness banner.** When a tool response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Every file NOT in that banner is fresh, so still trust codegraph. \`codegraph_status\` also lists pending files under "Pending sync".
+- **After editing, check the staleness banner.** When a tool response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Every file NOT in that banner is fresh, so still trust codegraph.
 
 ## Limitations
 

+ 35 - 5
src/mcp/tools.ts

@@ -606,11 +606,37 @@ export const tools: ToolDefinition[] = [
  */
 export function getStaticTools(): ToolDefinition[] {
   const raw = process.env.CODEGRAPH_MCP_TOOLS;
-  if (!raw || !raw.trim()) return tools;
+  if (!raw || !raw.trim()) {
+    return tools.filter(t => DEFAULT_MCP_TOOLS.has(t.name.replace(/^codegraph_/, '')));
+  }
   const allow = new Set(raw.split(',').map(s => s.trim().replace(/^codegraph_/, '')).filter(Boolean));
   return allow.size ? tools.filter(t => allow.has(t.name.replace(/^codegraph_/, ''))) : tools;
 }
 
+/**
+ * The MCP tools served by DEFAULT (short names). The other defined tools
+ * (callees, impact, files, status) remain fully functional — handlers stay,
+ * the library API and CLI are untouched, and `CODEGRAPH_MCP_TOOLS` re-enables
+ * any of them — they just aren't LISTED to agents anymore.
+ *
+ * Evidence for the cut (the "adapt the tool to the agent" principle —
+ * fewer tools = fewer mis-picks, and presence itself steers):
+ * - `codegraph_impact` appears in ZERO recorded eval runs ever — its
+ *   blast-radius info already arrives inline on explore (the "Blast radius"
+ *   section) and node (the dependents note), so agents never need the
+ *   standalone tool.
+ * - `codegraph_callees` is redundant by construction: a symbol's body (which
+ *   node returns) IS its callee list, plus the caller/callee trail.
+ * - `codegraph_files` / `codegraph_status`: the tiny-repo audit (see
+ *   getTools) found they "reduce to one grep"; staleness banners already
+ *   inline the pending-sync info on every read tool, and the CLI covers
+ *   diagnostics.
+ * - `codegraph_callers` stays: exhaustive call-site enumeration (every
+ *   caller with file:line, callback registrations labeled, one section per
+ *   same-named definition) is the one job explore/node don't replicate.
+ */
+const DEFAULT_MCP_TOOLS = new Set(['explore', 'node', 'search', 'callers']);
+
 /**
  * Tool handler that executes tools against a CodeGraph instance
  *
@@ -703,9 +729,12 @@ export class ToolHandler {
    */
   getTools(): ToolDefinition[] {
     const allow = this.toolAllowlist();
+    // No explicit allowlist → the default 4-tool surface (see
+    // DEFAULT_MCP_TOOLS for the evidence). An allowlist replaces the
+    // default entirely, so any defined tool can be re-enabled.
     let visible = allow
       ? tools.filter(t => allow.has(t.name.replace(/^codegraph_/, '')))
-      : tools;
+      : tools.filter(t => DEFAULT_MCP_TOOLS.has(t.name.replace(/^codegraph_/, '')));
     if (!this.cg) return visible;
 
     try {
@@ -713,9 +742,10 @@ export class ToolHandler {
       const budget = getExploreBudget(stats.fileCount);
 
       // Tiny-repo tool gating: on projects under TINY_REPO_FILE_THRESHOLD
-      // files, only expose the 5 core tools (search, context, node,
-      // explore, trace). The 5 omitted tools (callers, callees, impact,
-      // status, files) reduce to one grep at this scale.
+      // files, only expose the core trio (search, node, explore) — one
+      // below even the 4-tool default: at this scale callers, too, reduces
+      // to one grep. (Historical note: the audit below ran when context and
+      // trace still existed; its "5 core tools" are today's trio.)
       //
       // n=2 audits ruled out cutting below 5 tools:
       // - 3-tool gate (search + context + trace): cost regressed on