ソースを参照

fix(explore): report the curated result count, not the raw candidate gather (#1046) (#1053)

codegraph_explore's "Found N symbols across M files." header reported
`subgraph.nodes.size` / `fileGroups.size` — the raw FTS gather. A broad
natural-language query ("publish status to the API") matches a huge pool
(260 symbols / 124 files on a 636-file repo) while only a handful clear
the relevance gate + budget and render, so the header read as "260
results to wade through" even though the correctly-ranked answer was the
few files shown.

Report instead the files whose source actually SURVIVES in the final
output (after the hard-ceiling truncation that can drop trailing
sections), summing their relevant symbols. Gather, ranking, gate, budget,
and rendering are untouched — only the header string changes. Overflow
relevant files are still named under "Not shown above", so nothing is
hidden. Adds a regression test locking header-count == rendered-sections.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 11 時間 前
コミット
9bce41b858
3 ファイル変更148 行追加3 行削除
  1. 1 0
      CHANGELOG.md
  2. 94 0
      __tests__/explore-result-count.test.ts
  3. 53 3
      src/mcp/tools.ts

+ 1 - 0
CHANGELOG.md

@@ -19,6 +19,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - The graph no longer stores duplicate copies of the same relationship. The same dependency between the same two symbols at the same spot could be recorded more than once, which inflated edge counts and let callers/impact results list a relationship twice. Each relationship is now stored exactly once, and existing projects are de-duplicated automatically the next time CodeGraph opens them. Thanks @inth3shadows for the detailed report. (#1034)
 - `codegraph node` can now read a file from the command line. File-read mode — pass `-f`/`--file` to get a file's source with line numbers plus the files that depend on it, the same output as the `codegraph_node` MCP tool — was rejected with "missing required argument 'name'", because the command always demanded a symbol name even though file mode has none, leaving the feature unreachable from the CLI. The symbol name is now optional: `codegraph node -f src/auth.ts` (or `codegraph node src/auth.ts`) reads the file, `codegraph node parseToken` looks up a symbol, and running it with neither prints a short usage hint instead of a cryptic error. Thanks @jcrabapple for the report. (#1044)
 - `codegraph query` no longer prints meaningless relevance percentages like "12042%" next to each result. The number was a raw full-text search score — useful only for ordering the results, not as a real 0–100% figure — so multiplying it by 100 produced wild values that made the output look broken. Results are already listed best-match first, so the CLI now just shows them in that order with no score, matching what the search tool reports to AI agents. If you script against `codegraph query --json`, the raw `score` is still included for sorting or thresholding. Thanks @jcrabapple for the report. (#1045)
+- `codegraph explore` no longer reports an alarming, inflated result count on broad natural-language queries. The "Found N symbols across M files" summary used to count every symbol the search swept in while ranking, so a broad query (for example "publish status to the API") on a large project could announce hundreds of symbols across a big fraction of the codebase — reading as if you had to wade through all of them — even though only the most relevant handful are actually shown with their source. The summary now counts just the files explore returns source for, so the number matches what you see. Ranking and results are unchanged: the right symbols still come first, and any further relevant files are still listed by name under "Not shown above" so nothing is hidden. Thanks @jcrabapple for the report. (#1046)
 
 
 ## [1.1.2] - 2026-06-28

+ 94 - 0
__tests__/explore-result-count.test.ts

@@ -0,0 +1,94 @@
+/**
+ * codegraph_explore — the "Found N symbols across M files." header reflects the
+ * CURATED answer actually rendered, not the raw candidate gather (#1046).
+ *
+ * A broad natural-language query FTS-matches a huge pool of symbols ("status",
+ * "publish", "api" hit a large fraction of any API-heavy repo), but only a
+ * handful of files clear the relevance gate + budget and render with source.
+ * The header used to report `subgraph.nodes.size` / `fileGroups.size` — the raw
+ * pool (260 symbols / 124 files on a 636-file repo) — which read as "wade
+ * through 260 results" even though the correctly-ranked answer was the few files
+ * below. It now reports only the files whose source survives in the output.
+ *
+ * The locked invariant: the header's file count EQUALS the number of rendered
+ * `**`<path>`**` source sections. Pre-fix that failed whenever the gather
+ * exceeded what rendered (here: 8 disconnected "noise" files are gathered but
+ * gated out), so this fixture discriminates the fix from the old behaviour.
+ */
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import CodeGraph from '../src/index';
+import { ToolHandler } from '../src/mcp/tools';
+
+/** Files explore rendered as ``**`<path>`**`` source sections (issue #778: bold
+ *  labels, not ATX headings). */
+function renderedSourceFiles(text: string): string[] {
+  const out: string[] = [];
+  for (const line of text.split('\n')) {
+    const m = line.match(/^\*\*`(.+?)`\*\*/);
+    if (m) out.push(m[1].trim());
+  }
+  return out;
+}
+
+function headerFileCount(text: string): number | null {
+  const m = text.match(/Found \d+ symbols? across (\d+) files?\./);
+  return m ? parseInt(m[1], 10) : null;
+}
+
+describe('codegraph_explore — curated result count (#1046)', () => {
+  let testDir: string;
+  let cg: CodeGraph;
+  let handler: ToolHandler;
+
+  beforeEach(async () => {
+    testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-count-'));
+
+    // The real, connected flow — its symbols call each other, so it clears the
+    // relevance gate and renders. snake_case so FTS tokenizes "status" out of
+    // the names (camelCase would leave one unmatchable token).
+    fs.writeFileSync(path.join(testDir, 'flow.ts'),
+      `export function publish_status() { return build_status(); }\n` +
+      `export function build_status() { return send_status(); }\n` +
+      `export function send_status() { return 'ok'; }\n`);
+
+    // Disconnected "noise" files: each defines ONE symbol that text-matches the
+    // query word "status" but calls nothing in the flow. They ARE gathered into
+    // the subgraph by FTS (so the OLD header counted them), but score too low to
+    // render — exactly the breadth that inflated the count.
+    for (let i = 0; i < 8; i++) {
+      fs.writeFileSync(path.join(testDir, `status_widget_${i}.ts`),
+        `export function status_widget_${i}() { return ${i}; }\n`);
+    }
+
+    cg = CodeGraph.initSync(testDir, { config: { include: ['**/*.ts'], exclude: [] } });
+    await cg.indexAll();
+    handler = new ToolHandler(cg);
+  });
+
+  afterEach(() => {
+    if (cg) cg.destroy();
+    if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
+  });
+
+  it('header file count equals the number of rendered source sections', async () => {
+    const res = await handler.execute('codegraph_explore', { query: 'publish status' });
+    const text = res.content[0].text;
+
+    const headerFiles = headerFileCount(text);
+    const rendered = renderedSourceFiles(text);
+
+    expect(headerFiles).not.toBeNull();
+    // The core honesty invariant — the header counts what's shown, not the gather.
+    expect(headerFiles).toBe(rendered.length);
+    // The flow file is the answer and must be among the rendered files.
+    expect(rendered).toContain('flow.ts');
+    // Curation actually happened: far fewer than the 9 gathered files (1 flow +
+    // 8 noise) are reported. Pre-fix this was the inflated gather count.
+    expect(headerFiles!).toBeLessThan(5);
+    // And the sentinel placeholder never leaks into the rendered header.
+    expect(text).not.toContain('codegraph-explore-summary');
+  });
+});

+ 53 - 3
src/mcp/tools.ts

@@ -338,6 +338,12 @@ function numberSourceLines(slice: string, firstLineNumber: number): string {
  * (`reasoning/reasoner.ts`) both key off to cut on whole file sections.
  */
 const FILE_SECTION_PREFIX = '**`';
+// Placeholder for codegraph_explore's "Found N symbols across M files." line.
+// The honest N/M can only be known after the final truncation drops trailing
+// sections (#1046), so the header is emitted as this sentinel and substituted
+// at the very end. This bracketed token never occurs in rendered source or a
+// file path, so the final string-replace can't collide.
+const SUMMARY_SENTINEL = '[[codegraph-explore-summary]]';
 function fileSectionHeader(filePath: string, suffix: string): string {
   return suffix
     ? `${FILE_SECTION_PREFIX}${filePath}\`** — ${suffix}`
@@ -2833,9 +2839,16 @@ export class ToolHandler {
     const lines: string[] = [
       `**Exploration: ${query}**`,
       '',
-      `Found ${subgraph.nodes.size} symbols across ${fileGroups.size} files.`,
+      // Curated summary — filled in after the source loop (see below). We do NOT
+      // report `subgraph.nodes.size` / `fileGroups.size` here: that's the raw
+      // candidate gather, which a broad natural-language query inflates wildly
+      // (260 symbols / 124 files on a 636-file repo) even though only a handful
+      // render. Reporting the pool read as "260 results to wade through" when the
+      // real, correctly-ranked answer is the few files below (#1046).
+      '',
       '',
     ];
+    const summaryLineIdx = 2;
 
     // Blast radius (always-on, compact): for the entry symbols, who depends on
     // them + which tests cover them — locations only, no source — so the agent
@@ -2943,6 +2956,9 @@ export class ToolHandler {
 
     let totalChars = lines.join('\n').length;
     let filesIncluded = 0;
+    // Paths we actually render source for below. Drives the curated header count
+    // (#1046) — it must reflect what we show, not the raw candidate gather.
+    const renderedFilePaths: string[] = [];
     let anyFileTrimmed = false;
 
     for (const [filePath, group] of sortedFiles) {
@@ -3082,6 +3098,7 @@ export class ToolHandler {
             : 'skeleton (signatures only — codegraph_explore a name for its full body; do NOT Read)';
           lines.push(fileSectionHeader(filePath, `${names} · ${tag}`), '', '```' + lang, skel.join('\n'), '```', '');
           totalChars += skel.join('\n').length + 120;
+          renderedFilePaths.push(filePath);
           filesIncluded++;
           continue;
         }
@@ -3132,6 +3149,7 @@ export class ToolHandler {
         }
         lines.push(wholeHeader, '', '```' + lang, wholeSection, '```', '');
         totalChars += wholeSection.length + 200;
+        renderedFilePaths.push(filePath);
         filesIncluded++;
         continue;
       }
@@ -3416,9 +3434,16 @@ export class ToolHandler {
       lines.push('');
 
       totalChars += fileSection.length + 200;
+      renderedFilePaths.push(filePath);
       filesIncluded++;
     }
 
+    // The curated header count is computed from the files that SURVIVE the final
+    // truncation (see end of method) — `filesIncluded` can over-count when the
+    // hard ceiling drops trailing sections — so leave a sentinel here and fill it
+    // in once the output is final.
+    lines[summaryLineIdx] = SUMMARY_SENTINEL;
+
     // Add remaining files as references (from both relevant and peripheral files).
     // Small projects (per budget) skip this — the relevant story already fits
     // in the source section, and a trailing pointer list is pure overhead.
@@ -3477,6 +3502,7 @@ export class ToolHandler {
     const output = flow.text + lines.join('\n');
 
     const hardCeiling = Math.min(Math.round(budget.maxOutputChars * 1.5), 25000);
+    let finalText: string;
     if (output.length > hardCeiling) {
       // Cut at a FILE-SECTION boundary (the last ``**` `` file header before the
       // ceiling) so we drop whole trailing file-sections rather than slicing
@@ -3487,9 +3513,33 @@ export class ToolHandler {
       const lastSection = cut.lastIndexOf('\n' + FILE_SECTION_PREFIX);
       const boundary = lastSection > hardCeiling * 0.5 ? lastSection : cut.lastIndexOf('\n');
       const safe = boundary > 0 ? cut.slice(0, boundary) : cut;
-      return this.textResult(safe + '\n\n... (output truncated to budget; the source above is complete and verbatim — treat it as already Read. For any area not covered, run another codegraph_explore with the specific names — do NOT Read these files.)');
+      finalText = safe + '\n\n... (output truncated to budget; the source above is complete and verbatim — treat it as already Read. For any area not covered, run another codegraph_explore with the specific names — do NOT Read these files.)';
+    } else {
+      finalText = output;
     }
-    return this.textResult(output);
+
+    // Curated header (#1046): substitute the sentinel with the count of files
+    // whose source SURVIVES in the final text — not `subgraph`/`fileGroups` (the
+    // raw gather a broad query inflates) and not `filesIncluded` (which can
+    // over-count when the ceiling above drops trailing sections). A file counts
+    // only if its section header is still present; its relevant (non-import)
+    // symbols are summed for N. Files we couldn't fit are still named under "Not
+    // shown above" + the budget note, so nothing is silently dropped.
+    const survivors = renderedFilePaths.filter((fp) =>
+      finalText.includes(`${FILE_SECTION_PREFIX}${fp}\``));
+    const shownSymbols = survivors.reduce((sum, fp) => {
+      const g = fileGroups.get(fp);
+      if (!g) return sum;
+      return sum + new Set(
+        g.nodes.filter((n) => n.kind !== 'import' && n.kind !== 'export').map((n) => n.id),
+      ).size;
+    }, 0);
+    const summaryLine = survivors.length > 0
+      ? `Found ${shownSymbols} symbol${shownSymbols === 1 ? '' : 's'} across ${survivors.length} file${survivors.length === 1 ? '' : 's'}.`
+      : `Found ${subgraph.nodes.size} symbol${subgraph.nodes.size === 1 ? '' : 's'} across ${fileGroups.size} file${fileGroups.size === 1 ? '' : 's'}.`;
+    finalText = finalText.replace(SUMMARY_SENTINEL, summaryLine);
+
+    return this.textResult(finalText);
   }
 
   /**