Преглед изворни кода

fix(sync): apply the ignore matcher to the git change-detection fast path (#766) (#949)

Change detection's git fast path (collectGitStatus) consumed `git status`
output with only an isSourceFile filter, on the assumption that git already
omits ignored paths. It doesn't: gitignore is a no-op for *tracked* files, and
the built-in default excludes (vendor/, node_modules/) aren't gitignore at all.
So a tracked file inside a committed dependency dir, or under a .gitignored
dir, surfaced as a change the full index never tracks — `codegraph status`
reported phantom pending changes that `sync` (a filtered filesystem reconcile)
never cleared, and the public getChangedFiles() API returned the same wrong
list.

Apply buildDefaultIgnore(repoDir) per recursion level, matching repo-relative
paths — structurally equivalent to the full-index path's ScopeIgnore (each
embedded repo judged by its own rules) with no extra git subprocess calls.
Deletions stay unfiltered: getChangedFiles acts on one only when the path is
already tracked in the DB, where removal is always correct, and that lets a
newly-excluded dir's stale rows clean themselves up.

Unblocks #699 (an .ignore overlay inherits this leak unless change detection
consults the same matcher as enumeration).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry пре 1 дан
родитељ
комит
2010c2d2b5
3 измењених фајлова са 150 додато и 3 уклоњено
  1. 1 0
      CHANGELOG.md
  2. 125 0
      __tests__/sync.test.ts
  3. 24 3
      src/extraction/index.ts

+ 1 - 0
CHANGELOG.md

@@ -43,6 +43,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - The MCP server now opens and auto-syncs a project that lives inside a folder an enclosing git repository ignores. Before, if the directory you indexed sat within a larger repo that gitignored it, the shared MCP daemon failed to open the project — its log repeated `Failed to open project … path should be a` `path.relative()` `d string, but got "./"` — so the file watcher never started and the index silently went stale until you ran `codegraph sync` by hand (setting `CODEGRAPH_NO_DAEMON=1` was the only workaround). The daemon now opens the project and starts watching as expected. Most visible with Codex on Windows, but the cause wasn't platform-specific. (#936)
 - A git worktree of a submodule is no longer indexed as a duplicate copy of that submodule's code. CodeGraph already skips ordinary worktrees (a second working view of a repo it indexes), but a worktree created *from a submodule* — common in monorepos that check submodules out into worktrees for parallel feature work — was mistaken for a genuine embedded repo and swept in, duplicating every symbol it shared with the real submodule checkout (one report had ~28% of its index as duplicates, inflating both query results and the database). These submodule worktrees are now recognized and skipped, while the submodule's own checkout stays indexed as distinct code. Thanks @charlesxu2026-ship-it. (#945)
 - A C++ class or struct annotated with an export/visibility macro — `class MYLIB_EXPORT Foo : public Bar { … }`, the common DLL-export / visibility pattern in headers — is no longer mis-indexed as a function spanning the whole class body. Not knowing the macro is a macro, the parser read it as a type and the whole declaration as a function, so the class surfaced as a phantom `function` that showed up as a false caller in `codegraph callers`, `codegraph impact`, and blast-radius analysis, and skewed symbol counts. CodeGraph now recognizes this misparse and drops the bogus node. Thanks @spwlyzx. (#946)
+- `codegraph status` no longer reports phantom pending changes for files CodeGraph deliberately keeps out of the index — a tracked file inside a committed dependency dir (a checked-in `vendor/` or `node_modules/`), or a tracked file under a `.gitignore`d directory. A full index correctly excludes these, and `codegraph sync` never indexes them, but the fast change-detector behind `codegraph status` flagged every edit to such a file as "modified" (and a new one as "added") — so the pending-changes count never cleared no matter how many times you synced. Change detection now applies the same ignore rules the full index does, so `status` agrees with `sync`, and tools built on CodeGraph's change-detection API get the same corrected list. (#766)
 
 
 ## [1.0.1] - 2026-06-13

+ 125 - 0
__tests__/sync.test.ts

@@ -304,6 +304,131 @@ describe('Sync Module', () => {
     });
   });
 
+  // Incremental sync's git fast path used to consume `git status` output without
+  // the ignore matcher the full index applies — so a committed dependency dir
+  // (built-in default exclude) or a tracked file under a .gitignored dir would
+  // leak into the index via `sync`, then vanish on the next `index --force`. The
+  // git fast path must exclude exactly what the full scan does. (#766)
+  describe('Incremental sync honors the ignore matcher (#766)', () => {
+    let testDir: string;
+    let cg: CodeGraph;
+
+    function git(...args: string[]) {
+      execFileSync('git', args, { cwd: testDir, stdio: 'pipe' });
+    }
+
+    beforeEach(async () => {
+      testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-766-'));
+
+      git('init');
+      git('config', 'user.email', 'test@test.com');
+      git('config', 'user.name', 'Test');
+
+      // Real project source — must keep flowing through sync untouched.
+      fs.mkdirSync(path.join(testDir, 'src'));
+      fs.writeFileSync(
+        path.join(testDir, 'src', 'index.ts'),
+        `export function hello() { return 'world'; }`
+      );
+
+      // A COMMITTED vendor/ dir: tracked in git, but a built-in default exclude
+      // git knows nothing about. git status happily reports edits to it.
+      fs.mkdirSync(path.join(testDir, 'vendor'));
+      fs.writeFileSync(
+        path.join(testDir, 'vendor', 'lib.ts'),
+        `export function vendoredHelper() { return 1; }`
+      );
+
+      // A tracked file inside a .gitignored dir: gitignore is a no-op for files
+      // already committed, so git status still reports modifications to it.
+      fs.writeFileSync(path.join(testDir, '.gitignore'), 'generated/\n');
+      fs.mkdirSync(path.join(testDir, 'generated'));
+      fs.writeFileSync(
+        path.join(testDir, 'generated', 'out.ts'),
+        `export function generatedThing() { return 2; }`
+      );
+
+      git('add', '-A'); // .gitignore + src/ + vendor/ (generated/ is now ignored)
+      git('add', '-f', 'generated/out.ts'); // force the ignored-but-tracked file in
+      git('commit', '-m', 'initial');
+
+      cg = CodeGraph.initSync(testDir, {
+        config: { include: ['**/*.ts'], exclude: [] },
+      });
+      await cg.indexAll();
+    });
+
+    afterEach(() => {
+      if (cg) cg.destroy();
+      if (fs.existsSync(testDir)) fs.rmSync(testDir, { recursive: true, force: true });
+    });
+
+    it('the full index excludes both (baseline the sync path must match)', () => {
+      expect(cg.searchNodes('hello').length).toBeGreaterThan(0);
+      expect(cg.searchNodes('vendoredHelper')).toHaveLength(0);
+      expect(cg.searchNodes('generatedThing')).toHaveLength(0);
+    });
+
+    it('does not re-index a modified tracked file in a built-in excluded dir (vendor/)', () => {
+      fs.writeFileSync(
+        path.join(testDir, 'vendor', 'lib.ts'),
+        `export function vendoredHelper() { return 999; }`
+      );
+      const changes = cg.getChangedFiles();
+      expect(changes.modified).not.toContain('vendor/lib.ts');
+      expect(changes.added).not.toContain('vendor/lib.ts');
+    });
+
+    it('does not re-index a modified tracked file in a .gitignored dir', () => {
+      fs.writeFileSync(
+        path.join(testDir, 'generated', 'out.ts'),
+        `export function generatedThing() { return 999; }`
+      );
+      const changes = cg.getChangedFiles();
+      expect(changes.modified).not.toContain('generated/out.ts');
+      expect(changes.added).not.toContain('generated/out.ts');
+    });
+
+    it('does not index a new untracked file in an excluded dir', () => {
+      // vendor/ isn't in .gitignore, so an untracked file there surfaces as `??`
+      // in git status — it must still be filtered to match the full index.
+      fs.writeFileSync(
+        path.join(testDir, 'vendor', 'extra.ts'),
+        `export function vendoredExtra() { return 3; }`
+      );
+      const changes = cg.getChangedFiles();
+      expect(changes.added).not.toContain('vendor/extra.ts');
+    });
+
+    it('status (getChangedFiles) agrees with sync — no phantom pending changes', async () => {
+      // The user-visible symptom today: `codegraph status` reads getChangedFiles
+      // and reports a vendor edit as a pending change that `sync` (a filesystem
+      // reconcile) then never indexes — so the count never clears. Both must now
+      // agree that nothing happened.
+      fs.writeFileSync(
+        path.join(testDir, 'vendor', 'lib.ts'),
+        `export function vendoredHelper() { return 999; }`
+      );
+      const changes = cg.getChangedFiles();
+      expect(changes.added).toHaveLength(0);
+      expect(changes.modified).toHaveLength(0);
+
+      const result = await cg.sync();
+      expect(result.filesModified).toBe(0);
+      expect(result.changedFilePaths ?? []).not.toContain('vendor/lib.ts');
+      expect(cg.searchNodes('vendoredHelper')).toHaveLength(0);
+    });
+
+    it('still syncs a normal modified source file (no over-filtering)', () => {
+      fs.writeFileSync(
+        path.join(testDir, 'src', 'index.ts'),
+        `export function hello() { return 'changed'; }`
+      );
+      const changes = cg.getChangedFiles();
+      expect(changes.modified).toContain('src/index.ts');
+    });
+  });
+
   describe('Cross-file module-attribute caller edges survive callee re-index (#899)', () => {
     let testDir: string;
     let cg: CodeGraph;

+ 24 - 3
src/extraction/index.ts

@@ -639,6 +639,18 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
     { cwd: repoDir, encoding: 'utf-8', timeout: 10000, maxBuffer: 50 * 1024 * 1024, stdio: ['pipe', 'pipe', 'pipe'], windowsHide: true }
   );
 
+  // This repo's own ignore rules — built-in defaults (#407) plus its .gitignore.
+  // Change detection must exclude the SAME files the full index does, but git
+  // status hides neither: it ignores nothing for *tracked* paths, and the
+  // built-in defaults aren't gitignore at all. Without this filter a committed
+  // vendor/ dir, or a tracked file under a .gitignored dir, surfaces here as a
+  // change — so `codegraph status` (which reads getChangedFiles) reports a
+  // pending edit the full index never tracks and `sync` never clears. Matching
+  // repo-relative `rel` at each recursion level mirrors getGitVisibleFiles'
+  // ScopeIgnore: every embedded repo is judged by ITS OWN rules, never the
+  // parent's. (#766)
+  const ig = buildDefaultIgnore(repoDir);
+
   const untrackedDirs: string[] = [];
   for (const line of output.split('\n')) {
     if (line.length < 4) continue; // Minimum: "XY file"
@@ -654,13 +666,22 @@ function collectGitStatus(repoDir: string, prefix: string, out: GitChanges): voi
     }
 
     const filePath = normalizePath(prefix + rel);
-    // Skip non-source files (git status already omits .gitignored paths).
     if (!isSourceFile(filePath)) continue;
 
+    if (statusCode.includes('D')) {
+      // Deletions stay unfiltered: getChangedFiles acts on one only when the
+      // path is already tracked in the DB, where removal is always correct — and
+      // that lets a newly-excluded dir's stale rows clean themselves up. (#766)
+      out.deleted.push(filePath);
+      continue;
+    }
+
+    // Added (`??`) / modified files inside an excluded dir must not enter the
+    // index — match against the repo-relative path, same as the full scan. (#766)
+    if (ig.ignores(rel)) continue;
+
     if (statusCode === '??') {
       out.added.push(filePath);
-    } else if (statusCode.includes('D')) {
-      out.deleted.push(filePath);
     } else {
       // M, MM, AM, A (staged), etc. — treat as modified
       out.modified.push(filePath);