Pārlūkot izejas kodu

fix(mcp): normalize root-ish path filters in codegraph_files (#426) (#466)

The agent (opencode/Gemini Flash on Windows) called codegraph_files with
path="/" and got "No files found matching the criteria.", which pushed it
straight back to Read/Glob. Indexed file paths are stored as
project-relative POSIX (e.g. "src/foo.py"), and the old startsWith filter
matched nothing for any of the root-ish or platform-flavored shapes an
agent might guess: "/", ".", "./", "", "\\", leading-slash and
leading-./ subpaths, or Windows backslash subpaths.

Normalize the filter (strip leading "/", "./", "\", bare "."; convert
"\" to "/"; trim trailing "/"), then match by exact equal or "<filter>/"
boundary — which also kills a sibling-prefix bleed where filter "src"
used to match "src-utils/...".

Validated on macOS + Linux (Docker) + Windows (Parallels) with 13 new
unit tests plus the existing mcp-input-limits/concurrent-locking
suites, and end-to-end through opencode in tmux (Big Pickle/OpenCode
Zen): codegraph_files [path=/] now returns the project tree and the
agent answers directly instead of falling back to Read.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colby Mchenry 3 nedēļas atpakaļ
vecāks
revīzija
e1eb13cf9b
3 mainītis faili ar 128 papildinājumiem un 3 dzēšanām
  1. 1 0
      CHANGELOG.md
  2. 113 0
      __tests__/mcp-files-path-normalization.test.ts
  3. 14 3
      src/mcp/tools.ts

+ 1 - 0
CHANGELOG.md

@@ -10,6 +10,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 
 ### Fixed
+- **`codegraph_files` now returns the whole project when an agent passes `path="/"`, `"."`, `"./"`, `""`, or a Windows-style `"\\"` — instead of "No files found matching the criteria."** Indexed file paths are stored as project-relative POSIX (e.g. `src/foo.ts`), but the path filter used a plain `startsWith`, so a leading slash or any of the other root-ish shapes an agent might guess matched nothing and pushed the agent back to Read/Glob — the exact opencode + Gemini Flash regression reported on Windows 11. Subdirectory filters are now equally forgiving: `"/src"`, `"./src"`, `"src/"`, `"src\\components"`, etc. all resolve correctly. Sibling-prefix bleed (`"src"` was previously matching `src-utils/...`) is also fixed — the filter now requires either an exact match or a `<filter>/` boundary. Closes #426.
 - **File watcher no longer marks edited files as fresh when another process holds the index lock.** When a second writer (concurrent `codegraph index`, a git hook, another MCP daemon) held `.codegraph/codegraph.lock`, `CodeGraph.sync()` returned a zero-shape no-op instead of throwing. The file watcher took that as a successful sync and cleared `pendingFiles` — so the per-file staleness signal MCP tools surface to agents (issue #403) dropped immediately, even though the edit was never indexed. `CodeGraph.watch()` now converts that no-op into a typed `LockUnavailableError` thrown into the watcher; the existing retry path preserves `pendingFiles` and reschedules until the lock becomes available. The error is logged at debug only (no `onSyncError` callback) so a long-running external indexer doesn't spam stderr every debounce cycle. Closes #449.
 - **Watch sync no longer aborts with `FOREIGN KEY constraint failed`.** PR #62 plugged this FK violation at the extraction layer (empty-named nodes whose containment edges had no target), but the same violation kept reappearing on v0.9.5 during the daemon's *watch sync* — not on initial index. Once an agent's daemon had been running long enough to accumulate edits, a resolver lookup that crossed a framework-specific cache could hand back a node whose row had been removed by a recent file rewrite, and the FK check then aborted the entire resolution batch, leaving the user's daemon log filling with `Watch sync failed { error: 'FOREIGN KEY constraint failed' }`. `QueryBuilder.insertEdges` now validates every batch's endpoints against the `nodes` table directly (one fresh `SELECT id IN (...)` per batch, no cache) and silently skips edges with missing source or target — so a stale lookup result drops one edge instead of aborting the whole sync. Surfaces as a fresh `codegraph init`/`index` cycle now surviving its first watch-sync cycle without the FK error, and the daemon recovering naturally instead of compounding into further failures. Closes #455.
 - **Hermes Agent: `codegraph install --target hermes` no longer corrupts `~/.hermes/config.yaml`.** Hermes serializes its config with PyYAML's default block style, which writes list items at the *same* indent as the parent mapping key (`cli:` and `- hermes-cli` both at column 2). The previous line-based YAML patcher mistook that first `  - hermes-cli` for the next sibling key, truncated the `cli:` block, and then spliced `- mcp-codegraph` at indent 4 *before* the existing items — leaving subsequent entries (`- browser`, `- clarify`, …) and even other platforms (`telegram:`, `discord:`) appearing at the `platform_toolsets:` level, which is no longer parseable YAML. The installer now recognizes the same-indent list style, finds the real end of the block at the next sibling key, and appends `- mcp-codegraph` at whatever indent the existing items already use. Re-installing on an already-corrupted file (or a 4-space-nested config that worked before) still produces a clean, parseable result. Closes #456.

+ 113 - 0
__tests__/mcp-files-path-normalization.test.ts

@@ -0,0 +1,113 @@
+/**
+ * codegraph_files path-filter normalization (#426)
+ *
+ * Stored file paths are project-relative POSIX (e.g. "src/foo.ts"). Some
+ * agents pass project-root variants like "/", ".", "./" or "" when they want
+ * "the whole project", and Windows-style backslashes or leading "/" / "./"
+ * prefixes when they want a subtree. The old filter used a plain
+ * `startsWith(pathFilter)`, so any of those buried the agent at "no files
+ * found" and pushed it back to Read/Glob — the exact opencode regression in
+ * #426. These tests pin every branch of the normalization.
+ */
+
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import CodeGraph from '../src/index';
+import { ToolHandler } from '../src/mcp/tools';
+
+describe('codegraph_files path normalization', () => {
+  let tempDir: string;
+  let cg: CodeGraph;
+  let handler: ToolHandler;
+
+  beforeEach(async () => {
+    tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-files-paths-'));
+    fs.mkdirSync(path.join(tempDir, 'src', 'components'), { recursive: true });
+    fs.mkdirSync(path.join(tempDir, 'tests'), { recursive: true });
+    fs.writeFileSync(path.join(tempDir, 'src', 'index.ts'), `export const x = 1;\n`);
+    fs.writeFileSync(
+      path.join(tempDir, 'src', 'components', 'Button.ts'),
+      `export const Button = () => 1;\n`
+    );
+    fs.writeFileSync(path.join(tempDir, 'tests', 'a.test.ts'), `export const t = 1;\n`);
+    cg = await CodeGraph.init(tempDir, {
+      config: { include: ['**/*.ts'], exclude: [] },
+    });
+    await cg.indexAll();
+    handler = new ToolHandler(cg);
+  });
+
+  afterEach(() => {
+    if (cg) cg.destroy();
+    if (fs.existsSync(tempDir)) {
+      fs.rmSync(tempDir, { recursive: true, force: true });
+    }
+  });
+
+  async function listed(pathFilter: string | undefined): Promise<string> {
+    const result = await handler.execute('codegraph_files', {
+      ...(pathFilter !== undefined ? { path: pathFilter } : {}),
+      format: 'flat',
+      includeMetadata: false,
+    });
+    expect(result.isError).toBeFalsy();
+    return result.content[0]!.text as string;
+  }
+
+  // Root-ish filters: every shape an agent might guess for "whole project"
+  // must list the same files as no filter at all.
+  for (const rootish of ['/', '.', './', '', '\\', '//', './/']) {
+    it(`treats path=${JSON.stringify(rootish)} as project root`, async () => {
+      const output = await listed(rootish);
+      expect(output).toContain('src/index.ts');
+      expect(output).toContain('src/components/Button.ts');
+      expect(output).toContain('tests/a.test.ts');
+    });
+  }
+
+  it('matches a real subdirectory prefix', async () => {
+    const output = await listed('src');
+    expect(output).toContain('src/index.ts');
+    expect(output).toContain('src/components/Button.ts');
+    expect(output).not.toContain('tests/a.test.ts');
+  });
+
+  it('tolerates a leading slash on a real subdirectory', async () => {
+    const output = await listed('/src');
+    expect(output).toContain('src/index.ts');
+    expect(output).not.toContain('tests/a.test.ts');
+  });
+
+  it('tolerates a leading "./" on a real subdirectory', async () => {
+    const output = await listed('./src');
+    expect(output).toContain('src/index.ts');
+    expect(output).not.toContain('tests/a.test.ts');
+  });
+
+  it('tolerates a trailing slash on a real subdirectory', async () => {
+    const output = await listed('src/');
+    expect(output).toContain('src/index.ts');
+    expect(output).not.toContain('tests/a.test.ts');
+  });
+
+  it('normalizes Windows backslashes', async () => {
+    const output = await listed('src\\components');
+    expect(output).toContain('src/components/Button.ts');
+    expect(output).not.toContain('src/index.ts');
+  });
+
+  // Old code matched on raw `startsWith`, so a filter "src" would also
+  // return a sibling like "src-utils/...". The new code requires either an
+  // exact match or a "<filter>/" boundary, so prefixes don't bleed.
+  it('does not match sibling directories that share a prefix', async () => {
+    fs.mkdirSync(path.join(tempDir, 'src-utils'), { recursive: true });
+    fs.writeFileSync(path.join(tempDir, 'src-utils', 'helper.ts'), `export const h = 1;\n`);
+    await cg.indexAll();
+
+    const output = await listed('src');
+    expect(output).toContain('src/index.ts');
+    expect(output).not.toContain('src-utils/helper.ts');
+  });
+});

+ 14 - 3
src/mcp/tools.ts

@@ -2248,9 +2248,20 @@ export class ToolHandler {
       return this.textResult('No files indexed. Run `codegraph index` first.');
     }
 
-    // Filter by path prefix
-    let files = pathFilter
-      ? allFiles.filter(f => f.path.startsWith(pathFilter) || f.path.startsWith('./' + pathFilter))
+    // Filter by path prefix. Stored paths are project-relative POSIX (e.g.
+    // "src/foo.ts"), but agents commonly pass project-root variants like "/",
+    // ".", "./", "" or Windows-style "src\foo" — and prefixes with leading
+    // "/", "./" or "\". Normalize all of those before matching so the agent
+    // gets results instead of falling back to Read/Glob (see #426).
+    const normalizedFilter = pathFilter
+      ? pathFilter
+          .replace(/\\/g, '/')
+          .replace(/^(?:\.?\/+)+/, '')
+          .replace(/^\.$/, '')
+          .replace(/\/+$/, '')
+      : '';
+    let files = normalizedFilter
+      ? allFiles.filter(f => f.path === normalizedFilter || f.path.startsWith(normalizedFilter + '/'))
       : allFiles;
 
     // Filter by glob pattern