Просмотр исходного кода

fix(watcher): retain pending files on zero-result sync (#450)

* fix(watcher): retain pending files on zero-result sync

* refactor(watcher): detect lock-unavailable at the wrapper

Replace the heuristic `(filesChanged === 0 && durationMs === 0)` check
inside `FileWatcher.flush()` with a typed `LockUnavailableError` thrown
by `CodeGraph.watch()`'s sync wrapper. The wrapper has access to the
full `SyncResult`, including `filesChecked` — which is **only** zero
when `sync()` failed to acquire the cross-process file lock (a real
empty sync always has `filesChecked > 0` because `scanDirectory` ran).
That eliminates the heuristic's edge case where a fast no-op sync
returns `durationMs === 0` by `Date.now()` rounding and gets mistaken
for a lock failure on tiny projects.

The watcher's `catch` block now distinguishes `LockUnavailableError`
from real errors: it logs at `logDebug` (not `logWarn`) and does NOT
call `onSyncError` — so a long-running external indexer holding the
lock doesn't spam stderr every debounce cycle via the MCP daemon's
`Auto-sync error` handler. The existing post-catch path already
preserves `pendingFiles` and reschedules, so no new control flow is
needed.

A/B validated end-to-end against the built dist on macOS with a
three-scenario repro (lock held, lock released mid-flight, real sync
error):

- main:           lock-held silently clears pendingFiles (BUG);
                  lock-released never recovers (no real sync runs).
- PR-as-is:       lock-held preserves pendingFiles; lock-released
                  drains. Same observable behavior as wrapper-level.
- wrapper-level:  same outcomes; lock-failure goes through the catch
                  path silently (logDebug only, no onSyncError noise);
                  real errors still surface via onSyncError.

Updates the regression test to throw `LockUnavailableError` (the real
contract surfaced to `FileWatcher` by `CodeGraph.watch()`), and
asserts `onSyncError` stays quiet during the lock-held cycle.

Closes #449.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Colby McHenry <me@colbymchenry.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thismilktea 3 недель назад
Родитель
Сommit
72c08c2bef
5 измененных файлов с 80 добавлено и 12 удалено
  1. 1 0
      CHANGELOG.md
  2. 42 6
      __tests__/watcher.test.ts
  3. 10 2
      src/index.ts
  4. 1 1
      src/sync/index.ts
  5. 26 3
      src/sync/watcher.ts

+ 1 - 0
CHANGELOG.md

@@ -10,6 +10,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
 
 ### Fixed
+- **File watcher no longer marks edited files as fresh when another process holds the index lock.** When a second writer (concurrent `codegraph index`, a git hook, another MCP daemon) held `.codegraph/codegraph.lock`, `CodeGraph.sync()` returned a zero-shape no-op instead of throwing. The file watcher took that as a successful sync and cleared `pendingFiles` — so the per-file staleness signal MCP tools surface to agents (issue #403) dropped immediately, even though the edit was never indexed. `CodeGraph.watch()` now converts that no-op into a typed `LockUnavailableError` thrown into the watcher; the existing retry path preserves `pendingFiles` and reschedules until the lock becomes available. The error is logged at debug only (no `onSyncError` callback) so a long-running external indexer doesn't spam stderr every debounce cycle. Closes #449.
 - **Watch sync no longer aborts with `FOREIGN KEY constraint failed`.** PR #62 plugged this FK violation at the extraction layer (empty-named nodes whose containment edges had no target), but the same violation kept reappearing on v0.9.5 during the daemon's *watch sync* — not on initial index. Once an agent's daemon had been running long enough to accumulate edits, a resolver lookup that crossed a framework-specific cache could hand back a node whose row had been removed by a recent file rewrite, and the FK check then aborted the entire resolution batch, leaving the user's daemon log filling with `Watch sync failed { error: 'FOREIGN KEY constraint failed' }`. `QueryBuilder.insertEdges` now validates every batch's endpoints against the `nodes` table directly (one fresh `SELECT id IN (...)` per batch, no cache) and silently skips edges with missing source or target — so a stale lookup result drops one edge instead of aborting the whole sync. Surfaces as a fresh `codegraph init`/`index` cycle now surviving its first watch-sync cycle without the FK error, and the daemon recovering naturally instead of compounding into further failures. Closes #455.
 - **Hermes Agent: `codegraph install --target hermes` no longer corrupts `~/.hermes/config.yaml`.** Hermes serializes its config with PyYAML's default block style, which writes list items at the *same* indent as the parent mapping key (`cli:` and `- hermes-cli` both at column 2). The previous line-based YAML patcher mistook that first `  - hermes-cli` for the next sibling key, truncated the `cli:` block, and then spliced `- mcp-codegraph` at indent 4 *before* the existing items — leaving subsequent entries (`- browser`, `- clarify`, …) and even other platforms (`telegram:`, `discord:`) appearing at the `platform_toolsets:` level, which is no longer parseable YAML. The installer now recognizes the same-indent list style, finds the real end of the block at the next sibling key, and appends `- mcp-codegraph` at whatever indent the existing items already use. Re-installing on an already-corrupted file (or a 4-space-nested config that worked before) still produces a clean, parseable result. Closes #456.
 - **NestJS: `RouterModule.register([...])` route prefixes now propagate to controller routes.** Previously a controller declared inside a module wired through NestJS's `RouterModule` (a common pattern for modular apps with nested route prefixes) was indexed with its raw `@Controller(...) + @Get(...)` path — so `UsersController` under `RouterModule.register([{ path: 'admin', module: AdminModule, children: [{ path: 'users', module: UsersModule }] }])` showed up as `GET /` instead of `GET /admin/users`. The new cross-file pass walks every `*.module.{ts,js}` for `RouterModule.register/forRoot/forChild([...])` (recursive `children`) and `@Module({ controllers: [...] })`, then prepends the correct prefix to each affected route — including non-empty `@Controller` paths and method-level params (`/admin/users/:id`). The route node's `id` is preserved across the update so existing route→handler edges stay intact, and the pass is idempotent so incremental sync recovers when `app.module.ts` itself is edited. Closes #459.

+ 42 - 6
__tests__/watcher.test.ts

@@ -25,7 +25,7 @@ import { describe, it, expect, beforeEach, afterEach } from 'vitest';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';
-import { FileWatcher } from '../src/sync/watcher';
+import { FileWatcher, LockUnavailableError } from '../src/sync/watcher';
 import CodeGraph from '../src/index';
 import { triggerFileEvent } from './__helpers__/chokidar-mock';
 
@@ -274,17 +274,53 @@ describe('FileWatcher', () => {
       const after = watcher.getPendingFiles();
       expect(after.some((p) => p.path === 'src/will-fail.ts')).toBe(true);
 
-      // Schedule a retry by emitting the event again (production would do
-      // this implicitly on the next file change; tests synthesize it).
-      triggerFileEvent(testDir, 'change', 'src/will-fail.ts');
-
-      // Retry resolves; entry clears.
+      // Retry resolves automatically; entry clears.
       await waitFor(
         () => !watcher.getPendingFiles().some((p) => p.path === 'src/will-fail.ts'),
       );
 
       watcher.stop();
     });
+
+    it('should retain pending files and retry when syncFn throws LockUnavailableError (#449)', async () => {
+      // CodeGraph.watch() converts the cross-process lock-failure no-op
+      // into LockUnavailableError so the watcher's retry path picks it up
+      // instead of falsely clearing pendingFiles. This test exercises the
+      // contract directly.
+      const syncFn = vi
+        .fn()
+        .mockRejectedValueOnce(new LockUnavailableError())
+        .mockResolvedValueOnce({ filesChanged: 1, durationMs: 10 });
+      const onSyncComplete = vi.fn();
+      const onSyncError = vi.fn();
+      const watcher = new FileWatcher(testDir, syncFn, {
+        debounceMs: 100,
+        onSyncComplete,
+        onSyncError,
+      });
+      watcher.start();
+      await watcher.waitUntilReady();
+
+      triggerFileEvent(testDir, 'add', 'src/locked.ts');
+
+      await waitFor(() => syncFn.mock.calls.length >= 1);
+      expect(watcher.getPendingFiles().some((p) => p.path === 'src/locked.ts')).toBe(true);
+      // A held-lock no-op is not a sync failure — onSyncError stays quiet
+      // so a long-running external indexer doesn't spam stderr every cycle.
+      expect(onSyncError).not.toHaveBeenCalled();
+      expect(onSyncComplete).not.toHaveBeenCalled();
+
+      await waitFor(() => syncFn.mock.calls.length >= 2);
+      await waitFor(
+        () => !watcher.getPendingFiles().some((p) => p.path === 'src/locked.ts'),
+      );
+
+      expect(onSyncComplete).toHaveBeenCalledTimes(1);
+      expect(onSyncComplete).toHaveBeenCalledWith({ filesChanged: 1, durationMs: 10 });
+      expect(onSyncError).not.toHaveBeenCalled();
+
+      watcher.stop();
+    });
   });
 
   describe('callbacks', () => {

+ 10 - 2
src/index.ts

@@ -46,7 +46,7 @@ import {
 import { GraphTraverser, GraphQueryManager } from './graph';
 import { ContextBuilder, createContextBuilder } from './context';
 import { Mutex, FileLock } from './utils';
-import { FileWatcher, WatchOptions, PendingFile } from './sync';
+import { FileWatcher, WatchOptions, PendingFile, LockUnavailableError } from './sync';
 
 // Re-export types for consumers
 export * from './types';
@@ -75,7 +75,7 @@ export {
   defaultLogger,
 } from './errors';
 export { Mutex, FileLock, processInBatches, debounce, throttle, MemoryMonitor } from './utils';
-export { FileWatcher, WatchOptions, PendingFile } from './sync';
+export { FileWatcher, WatchOptions, PendingFile, LockUnavailableError } from './sync';
 export { MCPServer } from './mcp';
 
 /**
@@ -495,6 +495,14 @@ export class CodeGraph {
       this.projectRoot,
       async () => {
         const result = await this.sync();
+        // sync() returns this exact zero-shape iff it failed to acquire the
+        // file lock (a real empty sync always has filesChecked > 0 because
+        // scanDirectory ran). Surface that to the watcher as a typed error
+        // so it keeps pendingFiles + reschedules instead of clearing them
+        // (#449).
+        if (result.filesChecked === 0 && result.durationMs === 0) {
+          throw new LockUnavailableError();
+        }
         const filesChanged = result.filesAdded + result.filesModified + result.filesRemoved;
         return { filesChanged, durationMs: result.durationMs };
       },

+ 1 - 1
src/sync/index.ts

@@ -13,7 +13,7 @@
  * - Incremental reindexing (in extraction module)
  */
 
-export { FileWatcher, WatchOptions, PendingFile } from './watcher';
+export { FileWatcher, WatchOptions, PendingFile, LockUnavailableError } from './watcher';
 export { watchDisabledReason, detectWsl } from './watch-policy';
 export {
   installGitSyncHook,

+ 26 - 3
src/sync/watcher.ts

@@ -45,6 +45,20 @@ export interface WatchOptions {
   onSyncError?: (error: Error) => void;
 }
 
+/**
+ * Thrown by a `syncFn` to signal that the underlying sync couldn't acquire
+ * the cross-process write lock (#449). The watcher treats this as "no
+ * progress" — preserves `pendingFiles`, skips `onSyncComplete`, and the
+ * `finally` block reschedules. Quiet (debug-only) because a long-running
+ * external indexer can hit this every debounce cycle.
+ */
+export class LockUnavailableError extends Error {
+  constructor(message = 'CodeGraph file lock unavailable; another process is writing') {
+    super(message);
+    this.name = 'LockUnavailableError';
+  }
+}
+
 /**
  * Per-file pending entry — tracks a source file the watcher saw an event for
  * but hasn't yet synced into the index. Exposed via {@link FileWatcher.getPendingFiles}
@@ -352,11 +366,20 @@ export class FileWatcher {
       }
       this.onSyncComplete?.(result);
     } catch (err) {
-      const error = err instanceof Error ? err : new Error(String(err));
-      logWarn('Watch sync failed', { error: error.message });
+      if (err instanceof LockUnavailableError) {
+        // Lock-failure no-op (another writer holds the lock). pendingFiles
+        // stays intact and the `finally` block reschedules. Debug-only —
+        // a long external index would otherwise spam stderr every cycle.
+        logDebug('Watch sync skipped: file lock unavailable', {
+          pendingFiles: this.pendingFiles.size,
+        });
+      } else {
+        const error = err instanceof Error ? err : new Error(String(err));
+        logWarn('Watch sync failed', { error: error.message });
+        this.onSyncError?.(error);
+      }
       // Failure: leave pendingFiles untouched. Every edit it tracks is
       // still unindexed; the rescheduled sync sees the same set.
-      this.onSyncError?.(error);
     } finally {
       this.syncing = false;