Просмотр исходного кода

fix(mcp): treat a stdin 'error' as shutdown so the server can't orphan/spin (#799) (#805)

A stdio MCP server's lifeline is stdin: when the host/client goes away,
stdin should end and the server should exit. The server paths listened
for stdin 'end'/'close' but NOT 'error'.

That gap bites with a socket-backed stdin — the shape VS Code / Claude
Code use (a socketpair, not a pipe). On client death the socket can
surface as an 'error' (ECONNRESET/hangup) instead of a clean 'close'.
Unhandled, it escalated to the process-wide uncaughtException handler,
which logs and keeps running — so the server orphaned instead of
exiting. On Linux a POLLHUP socket fd left registered in epoll then
wakes the event loop continuously, pinning a core at 100% CPU; once the
main thread spins, the setInterval PPID watchdog can't even fire, so the
orphan runs forever (the report's 28+ minutes).

Add treatStdinFailureAsShutdown(): listen for 'error' as well as
'end'/'close', and DESTROY the stdin stream on any terminal event so the
fd leaves epoll and can't churn, then run the path's shutdown. Wired into
the live paths — startDirect, the local-handshake proxy, and
StdioTransport — plus the legacy pipe proxy. Fires once (re-entry guard).

Note: this is hardening for a class of failure that matches every piece
of the report's evidence (socket stdin, userspace main-thread spin, high
involuntary context switches, watchdog never firing), but the exact 100%
CPU spin could not be reproduced in Docker (Linux) across /dev/null EOF,
socket peer-death (RST/FIN), the reporter's 0.9.7 bundle, and the npx
chain — all exited cleanly — so the trigger is environment-specific.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Colby Mchenry 1 неделя назад
Родитель
Сommit
0b1a2eed97
6 измененных файлов с 126 добавлено и 8 удалено
  1. 1 0
      CHANGELOG.md
  2. 46 0
      __tests__/stdin-teardown.test.ts
  3. 5 2
      src/mcp/index.ts
  4. 14 4
      src/mcp/proxy.ts
  5. 46 0
      src/mcp/stdin-teardown.ts
  6. 14 2
      src/mcp/transport.ts

+ 1 - 0
CHANGELOG.md

@@ -49,6 +49,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 - C++ method calls made through a singleton, factory, or chained getter now resolve to the correct class. A call like `Foo::instance().bar()`, `WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an `auto` local first, used to lose the receiver's type — so when two classes had a same-named method the call silently attached to whichever was indexed first (or didn't resolve at all), corrupting callers, impact, and trace. CodeGraph now infers the receiver's type from what the inner call returns (capturing C++ return types for the first time) and creates the edge only when that class genuinely has the method, so a wrong guess produces no edge instead of a misleading one. Covers singletons and self-returning accessors, factories that return a different type, free-function factories, `make_unique` / `make_shared` / `new` / direct construction, and single-level member chains. Existing C/C++ indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @stabey. (#645) (C/C++)
 - The shared background server no longer logs a scary-looking `[error] … undefined` line on every session start. Attaching to the shared daemon is normal, healthy behavior, but the informational message was being surfaced by MCP hosts (Claude Code and others) as an error; it's now silent by default — set `CODEGRAPH_MCP_LOG_ATTACH=1` to surface it when debugging daemon attach. Thanks @mturac. (#618)
 - On Windows, CodeGraph's background processes no longer pile up without bound and saturate CPU over a long session. When the editor or agent that launched CodeGraph exited, its helper process couldn't tell its parent had gone — Windows reports process lineage differently than macOS and Linux — so the helper kept running, the shared background server never saw the client disconnect, and its idle timer never fired to shut it down. CodeGraph now detects parent-process exit directly on Windows, so helpers and the idle background server wind down promptly, the same as they already did on macOS and Linux. (#692, #576, #680)
+- The MCP server now shuts down cleanly when its editor/agent connection drops abruptly, instead of risking an orphaned process that pins a CPU core. Editors talk to a stdio MCP server over a socket; if that socket failed with an error rather than closing cleanly — which can happen when the editor window is reloaded or the launching process is killed — the server didn't treat it as a disconnect and could be left running. CodeGraph now treats any failure of its input stream as a shutdown signal and tears the stream down, so an orphaned server exits promptly. (#799)
 - The shared background server has two further safeguards against ever lingering: it now drops a client the moment it detects that client's process is gone (even if the disconnect arrived uncleanly — a force-quit or a dropped connection that never closed the socket), and it won't stay running indefinitely with clients attached but no activity. Together these guarantee it always winds down, on every platform. (#692)
 - A session no longer loses CodeGraph when the shared background server is restarted out from under it — for example when your MCP host (opencode and others) stops and restarts the server as you open another session. Previously the affected session's connection died silently and any request in flight at that moment hung; now CodeGraph keeps that session working by serving it locally, so the tools stay available without restarting the session. (#662)
 - React Native native→JS events now connect through the common `sendEvent(context, "X", body)` wrapper. Many libraries (react-native-device-info and others) wrap the event emitter behind a helper whose `.emit(eventName, …)` takes a *variable*, so the matcher — which looked for `.emit("literal", …)` — missed it; the literal event name actually lives in the wrapper call. Now a native method that fires `sendEvent(…, "batteryLevelChanged", …)` links to the JS `addListener('batteryLevelChanged', …)` handler, so editing the native emitter surfaces the JS subscriber. (React Native)

+ 46 - 0
__tests__/stdin-teardown.test.ts

@@ -0,0 +1,46 @@
+/**
+ * #799 — a socket-backed stdin that fails must shut the server down, not
+ * orphan/busy-spin. treatStdinFailureAsShutdown is the shared guard.
+ */
+import { describe, it, expect } from 'vitest';
+import { PassThrough } from 'stream';
+import { treatStdinFailureAsShutdown } from '../src/mcp/stdin-teardown';
+
+describe('treatStdinFailureAsShutdown (#799)', () => {
+  it("treats a stdin 'error' (ECONNRESET/hangup) as a shutdown signal", () => {
+    const s = new PassThrough();
+    let calls = 0;
+    treatStdinFailureAsShutdown(() => { calls++; }, s);
+
+    // No extra 'error' listener would throw here — the guard registers one.
+    s.emit('error', new Error('read ECONNRESET'));
+    expect(calls).toBe(1);
+  });
+
+  it("also fires on 'end' and on 'close'", () => {
+    for (const ev of ['end', 'close'] as const) {
+      const s = new PassThrough();
+      let calls = 0;
+      treatStdinFailureAsShutdown(() => { calls++; }, s);
+      s.emit(ev);
+      expect(calls, `event ${ev}`).toBe(1);
+    }
+  });
+
+  it('destroys the stream so a hung fd leaves epoll', () => {
+    const s = new PassThrough();
+    treatStdinFailureAsShutdown(() => { /* noop */ }, s);
+    s.emit('error', new Error('boom'));
+    expect(s.destroyed).toBe(true);
+  });
+
+  it('fires onTerminal at most once, even across error → close', () => {
+    const s = new PassThrough();
+    let calls = 0;
+    treatStdinFailureAsShutdown(() => { calls++; }, s);
+    s.emit('error', new Error('boom')); // fire() also destroys → emits 'close'
+    s.emit('close');                    // must not double-fire
+    s.emit('end');
+    expect(calls).toBe(1);
+  });
+});

+ 5 - 2
src/mcp/index.ts

@@ -50,6 +50,7 @@ import {
 import { connectWithHello, runLocalHandshakeProxy } from './proxy';
 import { getDaemonSocketPath } from './daemon-paths';
 import { supervisionLostReason } from './ppid-watchdog';
+import { treatStdinFailureAsShutdown } from './stdin-teardown';
 import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags';
 
 /**
@@ -330,8 +331,10 @@ export class MCPServer {
     // Detect parent-process death — same logic as pre-refactor. When stdin
     // closes we go through StdioTransport's `process.exit(0)` already, but
     // SIGKILL of the parent doesn't reliably close stdin on Linux (#277).
-    process.stdin.on('end', () => this.stop());
-    process.stdin.on('close', () => this.stop());
+    // Also treat a stdin `'error'` (a socket-backed stdin can fail with
+    // ECONNRESET/hangup instead of a clean close) as shutdown, and destroy the
+    // stream so a hung fd can't busy-spin the event loop (#799).
+    treatStdinFailureAsShutdown(() => this.stop());
 
     this.mode = 'direct';
     this.installSignalHandlers();

+ 14 - 4
src/mcp/proxy.ts

@@ -23,6 +23,7 @@ import * as net from 'net';
 import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags';
 import { DaemonClientHello, DaemonHello, MAX_HELLO_LINE_BYTES } from './daemon';
 import { supervisionLostReason } from './ppid-watchdog';
+import { treatStdinFailureAsShutdown } from './stdin-teardown';
 import { CodeGraphPackageVersion } from './version';
 import { SERVER_INFO, PROTOCOL_VERSION } from './session';
 import { SERVER_INSTRUCTIONS } from './server-instructions';
@@ -298,8 +299,11 @@ export async function runLocalHandshakeProxy(deps: LocalHandshakeDeps): Promise<
       }
     }
   });
-  process.stdin.on('end', shutdown);
-  process.stdin.on('close', shutdown);
+  // Shut down when stdin ends/closes — and also on a stdin `'error'`, which a
+  // socket-backed stdin (the VS Code stdio shape) can emit on client death
+  // instead of a clean close; destroying the stream stops a hung fd from
+  // busy-spinning the event loop (#799).
+  treatStdinFailureAsShutdown(shutdown);
   startPpidWatchdogNoSocket(shutdown);
 
   // ---- daemon connection (background) ----
@@ -459,10 +463,16 @@ function pipeUntilClose(socket: net.Socket): Promise<void> {
       try { socket.end(); } catch { /* ignore */ }
       done();
     });
-    process.stdin.on('close', () => {
+    // 'close' and 'error' both tear down: a socket-backed stdin can fail with
+    // an 'error' (ECONNRESET/hangup) rather than a clean close; destroying it
+    // stops a hung fd from busy-spinning the event loop (#799).
+    const teardown = () => {
+      try { process.stdin.destroy(); } catch { /* ignore */ }
       try { socket.destroy(); } catch { /* ignore */ }
       done();
-    });
+    };
+    process.stdin.on('close', teardown);
+    process.stdin.on('error', teardown);
 
     socket.on('data', (chunk) => {
       try { process.stdout.write(chunk); } catch { /* ignore */ }

+ 46 - 0
src/mcp/stdin-teardown.ts

@@ -0,0 +1,46 @@
+/**
+ * Treat a stdin failure as a shutdown signal — issue #799.
+ *
+ * An MCP stdio server's lifeline is its stdin: when the host/client goes away,
+ * stdin should end and the server should exit. The server paths listened for
+ * `'end'` and `'close'` — but NOT `'error'`.
+ *
+ * That gap bites with a socket-backed stdin, which is the shape VS Code /
+ * Claude Code use (a socketpair, not a pipe). When the client dies, the socket
+ * can surface as an `'error'` (ECONNRESET / hangup) rather than a clean
+ * `'close'`. With no `'error'` listener, Node escalates it to the process-wide
+ * `uncaughtException` handler, which logs and keeps running — so the server
+ * orphans instead of exiting. Worse, on Linux a `POLLHUP` socket fd left
+ * registered in epoll wakes the event loop continuously, pinning a core at
+ * 100% CPU (the spin reported in #799); once the main thread spins, the
+ * `setInterval` PPID watchdog can't even fire, so the orphan runs forever.
+ *
+ * Fix: listen for `'error'` as well, and DESTROY the stdin stream on any
+ * terminal event so the fd leaves epoll and can't keep churning, then run the
+ * caller's shutdown. Fires `onTerminal` at most once — callers' shutdowns are
+ * already re-entry-guarded, but the single-shot guard also keeps `destroy()`'s
+ * follow-on `'close'` from re-invoking it.
+ *
+ * `stream` is injectable for tests; it defaults to `process.stdin`.
+ */
+export function treatStdinFailureAsShutdown(
+  onTerminal: () => void,
+  stream: NodeJS.ReadableStream = process.stdin
+): void {
+  let fired = false;
+  const fire = (): void => {
+    if (fired) return;
+    fired = true;
+    // Drop the fd from epoll so a hung/half-closed socket can't keep waking
+    // the loop. Best-effort: the stream may already be torn down.
+    try {
+      (stream as Partial<{ destroy(): void }>).destroy?.();
+    } catch {
+      /* already gone */
+    }
+    onTerminal();
+  };
+  stream.on('end', fire);
+  stream.on('close', fire);
+  stream.on('error', fire);
+}

+ 14 - 2
src/mcp/transport.ts

@@ -286,12 +286,24 @@ export class StdioTransport extends LineBasedJsonRpcTransport {
       await this.handleLine(line);
     });
 
-    this.rl.on('close', () => {
+    // readline 'close' fires on a clean stdin EOF. But a socket-backed stdin
+    // (the VS Code stdio shape) can fail with an 'error' (ECONNRESET/hangup)
+    // that readline doesn't surface as 'close' — unhandled, it escalated to
+    // the global uncaughtException handler (which keeps running), orphaning
+    // the server and, on Linux, busy-spinning a POLLHUP fd at 100% CPU. Treat
+    // 'error' as terminal too, and destroy stdin so the fd leaves epoll (#799).
+    let closed = false;
+    const onStreamEnd = (): void => {
+      if (closed) return;
+      closed = true;
+      try { process.stdin.destroy(); } catch { /* already gone */ }
       this.opts.onClose();
       if (this.opts.exitOnClose) {
         process.exit(0);
       }
-    });
+    };
+    this.rl.on('close', onStreamEnd);
+    process.stdin.on('error', onStreamEnd);
   }
 
   stop(): void {