v1.38.0.0 fix wave: Windows install hardening + Unicode sanitization at server egress (4 community PRs) (#1505)

* fix(browse): single-point Unicode sanitization at server egress Add sanitizeLoneSurrogates (regex-based UTF-16 lone-half cleaner) and sanitizeReplacer (JSON.stringify replacer that runs the cleaner on every string field during encoding). Split handleCommandInternal into handleCommandInternalImpl (raw) plus a thin sanitizing wrapper. The wrapper applies sanitizeLoneSurrogates to cr.result so both single-command (handleCommand line 1034) and batch-loop (line 1966) egress paths inherit it. Inline INVARIANT comment near the wrapper documents the architectural constraint. Both SSE producers (activity feed at /activity/stream and inspector stream) stringify with sanitizeReplacer. Post-stringify regex is ineffective on those paths because JSON.stringify has already converted the lone surrogate into the escape sequence "\\\\uD800" before any regex could match it; the replacer runs during stringify on the raw string value, so the substitution lands. Originated from @realcarsonterry PR #1463 (handleCommand-only wrap). Architectural lift to handleCommandInternal + SSE coverage authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(setup): _link_or_copy helper for Windows file-copy fallback On Windows without Developer Mode (MSYS2/Git Bash), plain ln -snf silently creates a frozen file copy that doesn't refresh on git pull. Skill files become stale after every upgrade. Add a _link_or_copy SRC DST helper near IS_WINDOWS detection (line ~33). It auto-dispatches: on Unix it preserves ln -snf semantics, on Windows it copies (cp -R for directories, cp -f for files). When the source is a Unix-style name-only alias that doesn't resolve on disk (the connect-chrome → gstack/open-gstack-browser pattern), the helper returns 0 silently on Windows rather than aborting setup under set -e. Rewrite all 42 prior ln -snf call sites to route through the helper: link_claude_skill_dirs (line 437), team-claude install paths (lines 556, 581, 592), Codex host adapter block (lines 618-640), Factory host adapter block (lines 658-678), OpenCode host adapter block (lines 696-731), Kiro host adapter block (lines 939-953), plus migration and alias sites. Add _print_windows_copy_note_once helper and call it from link_claude_skill_dirs after any linking work completes so Windows users see one user-visible note explaining they must re-run ./setup after every git pull. Extend cleanup_old_claude_symlinks and cleanup_prefixed_claude_symlinks with a Windows branch: when the target is a real directory containing a real-file SKILL.md (no symlink to readlink), and IS_WINDOWS=1, treat the name-matched directory as gstack-managed and remove it. This makes --prefix / --no-prefix flips work on Windows instead of leaving stale copies behind. Originated from @realcarsonterry PR #1462 (1 of 42 sites). Helper extraction, 42-site rewrite, alias-resolution edge case, and Windows cleanup compat authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(docs): rename stale gbrain_sync_mode to artifacts_sync_mode + register /document-generate Five stale gstack-config references in docs/ pointed to the deprecated gbrain_sync_mode key (renamed to artifacts_sync_mode in v1.27.0.0): - docs/gbrain-sync.md: lines 62, 110, 111, 173 - docs/gbrain-sync-errors.md: lines 26, 203 Users following the docs would set a key that gstack-brain-sync no longer reads, silently breaking artifacts sync. Originated from @realcarsonterry PR #1461 (verbatim). Also register /document-generate in AGENTS.md (Operational + memory table) and docs/skills.md (skill index). The skill shipped in v1.35.0.0 but the doc-inventory cross-check in test/skill-validation.test.ts was failing because neither file mentioned it. Allowlist the new test/docs-config-keys.test.ts file in test/no-stale-gstack-brain-refs.test.ts — it intentionally lists the deprecated keys in its DEPRECATED_KEYS denylist (defending the rename). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): migrate windows-free-tests to paid faster runner + register wave tests Move the Windows free-test job from GitHub-hosted windows-latest to Blacksmith's paid Windows runner (blacksmith-2vcpu-windows-2022). Spin-up drops from ~60s to ~10s and Bun installs land 3-4x faster. The label can swap to namespace-profile-windows or ubicloud-windows-* if this repo's Blacksmith installation isn't configured. Register the four new wave tests in the workflow's curated test list: - browse/test/server-sanitize-surrogates.test.ts - test/setup-windows-fallback.test.ts - test/build-script-shell-compat.test.ts - test/docs-config-keys.test.ts These tests cover the Windows-hardening surface that this wave ships (sanitizer wiring, _link_or_copy helper, build-script subshells, doc- config drift), so they need to run on Windows where the bug shapes actually manifest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: wave coverage for sanitizer, link_or_copy, build script, doc drift Four new test files (29 cases total): browse/test/server-sanitize-surrogates.test.ts: - 11 unit cases for sanitizeLoneSurrogates (passthrough, valid pair, lone high/low mid-string, trailing/leading lone, adjacent doubles, pair-then-lone, lone-then-pair, empty) - 2 bug-repro tests pinning the regression intent (UTF-8 round-trip, JSON.parse round-trip with codepoint assertion) - 4 wiring invariants asserting the architectural choke points stay intact (handleCommandInternalImpl rename, central sanitization line, sanitizeReplacer function exists, SSE producers stringify with replacer) Function extracted from server.ts via regex + eval'd in test scope so no production-code export is needed. test/setup-windows-fallback.test.ts: - Static invariant (D7): zero raw `ln` calls outside the _link_or_copy helper body and comments - Helper-existence assertions - 4-cell behavior matrix (file/dir × Windows/Unix) via awk-style helper extraction + bash -c sourcing - Windows-note printer registration check Mirrors test/setup-conductor-worktree.test.ts patterns. test/build-script-shell-compat.test.ts: - Regex assertion that package.json scripts.* contain no bash brace groups (Bun-Windows-hostile) - Subshell-precedence check for `.version` redirects Strips single-quoted strings before regexing so embedded JS code inside echo '...' doesn't false-positive. test/docs-config-keys.test.ts: - DEPRECATED_KEYS denylist scanned across docs/**/*.md - Round-trip test for `gstack-config get artifacts_sync_mode` Defends the v1.27.0.0 rename from doc drift. Updates to two existing tests: - test/setup-conductor-worktree.test.ts: expect `_link_or_copy` instead of `ln -snf` at the Conductor-worktree guard call site - test/gen-skill-docs.test.ts: same swap at three assertion sites (Codex section, Claude link_claude_skill_dirs body, Codex link_codex_skill_dirs body) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump v1.38.0.0 + build-script subshells + CHANGELOG VERSION 1.35.0.0 → 1.38.0.0 (MINOR). PR #1500 (lyon-v2) claimed v1.37.0.0 ahead of this branch; v1.38.0.0 is the next free MINOR slot per bin/gstack-next-version queue check. Workspace-aware ship rule applies — queue-advancing past a claimed version within the same bump level is explicitly permitted. package.json build script: three `{ git rev-parse HEAD ...; }` brace groups → `( git rev-parse HEAD ... )` subshells. Bun's Windows shell parser doesn't grok bash brace groups; subshells are POSIX-universal. Originated from @realcarsonterry PR #1460. CHANGELOG entry covers the full wave: - Windows install hardening (42-site _link_or_copy + cleanup compat) - Unicode sanitization architecture (handleCommandInternal + SSE replacer) - Build script POSIX-shell compat (subshells) - Doc rename (gbrain_sync_mode → artifacts_sync_mode) - Windows CI on paid faster runner - 4 new wave tests (29 cases) Frames each item as a current system property, not a fix narrative. Credits @realcarsonterry for PRs #1460, #1461, #1462, #1463 (the seed of the wave). Scope expansion to all 42 setup sites, every server egress path, Windows CI migration, and codex-flagged P0/P1 fixes (connect-chrome alias on Windows, SSE replacer, prefix-cleanup Windows compat) authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship sync for v1.38.0.0 Document the two architectural invariants that landed in v1.38.0.0 in their persistent homes (not just CHANGELOG): - README Windows section: add the `./setup` re-run-after-git-pull requirement that `_print_windows_copy_note_once` shows at runtime. - CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for contributors editing `setup`, with the test that enforces it. - ARCHITECTURE: new "Unicode sanitization at server egress" section between Shell injection prevention and Prompt injection defense, with egress table (HTTP/batch/SSE) and the post-stringify-regex rationale. - CLAUDE.md: cross-references for both invariants, matching the v1.6.0.0 dual-listener pattern (each constraint says which files to read before editing and which test pins it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): use windows-latest-8-cores instead of unregistered Blacksmith label actionlint failed PR #1505 because `blacksmith-2vcpu-windows-2022` isn't in the repo's approved runner-label list (actionlint.yaml only registers `ubicloud-standard-2`, and Ubicloud doesn't ship a Windows pool). Switch to GitHub's paid larger Windows runner `windows-latest-8-cores` — 4x the cores of the free `windows-latest` at the larger-runner billing rate, no new third-party CI provider, no actionlint config changes. CHANGELOG: replace "Blacksmith" / "blacksmith-2vcpu-windows-2022" / "~6x faster spin-up" claims with the actual choice (8 cores vs 4, paid larger runner). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): switch from windows-latest-8-cores to ubicloud-standard-2-windows `windows-latest-8-cores` sat queued indefinitely because the GitHub larger-runner billing isn't enabled at the org level — the "Queued — Waiting to run this check" status surfaced on PR #1505 with no progress for the whole CI run. Switch to Ubicloud Windows runners (`ubicloud-standard-2-windows`) so Windows CI uses the same provider as the existing Linux evals (`ubicloud-standard-2`). Billing stays under one account instead of two. Register the new label in actionlint.yaml alongside the existing ubicloud-standard-2 entry so actionlint doesn't reject it as unknown. CHANGELOG entry updated: runner row reflects the actual provider chosen, "Itemized changes" mentions the actionlint.yaml registration, and the narrative paragraph documents why `windows-latest-8-cores` failed first. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: migrate all workflows to Ubicloud (Linux + Windows, 8-core) Switch every `runs-on` in this repo to Ubicloud so CI has a single billing surface, consistent capacity, and 4x more cores on the workloads that were previously stuck on free `ubuntu-latest` (2 cores). Windows uses Ubicloud's Windows pool too — `ubicloud-standard-8-windows` — so the queued-forever problem with GitHub's `windows-latest-8-cores` paid larger runner (org-level larger-runner billing not enabled) goes away. Workflows touched (9): - evals.yml, evals-periodic.yml, ci-image.yml — bump default + matrix from `ubicloud-standard-2` to `ubicloud-standard-8`. The one matrix entry that was already on -8 stays. - windows-free-tests.yml — `ubicloud-standard-2-windows` → `ubicloud-standard-8-windows`. - make-pdf-gate.yml — matrix `ubuntu-latest` → `ubicloud-standard-8`. macOS entry preserved; the poppler-install `if: matrix.os` conditional swaps to match the new label. - actionlint.yml, pr-title-sync.yml, skill-docs.yml, version-gate.yml — `ubuntu-latest` → `ubicloud-standard-8`. .github/actionlint.yaml registers all four Ubicloud labels in one place: - ubicloud-standard-2 - ubicloud-standard-8 - ubicloud-standard-2-windows (the v1.38.0.0 windows-free-tests target) - ubicloud-standard-8-windows (this PR's windows-free-tests target) Removed the duplicate `actionlint.yaml` at the repo root that I accidentally created in the prior commit — actionlint only reads `.github/actionlint.yaml`, so the root file was dead weight. CHANGELOG entry updated: a single "all Ubicloud" sentence in the narrative plus a metrics-row covering the runner pool change, and the itemized line expanded to enumerate the 9 affected workflows. The previously-orphaned "Itemized changes" line about just `windows-free-tests.yml` is replaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): revert to free `windows-latest` Ubicloud doesn't ship Windows runners — confirmed via their docs. The `ubicloud-standard-*-windows` labels I added do not exist and were causing `windows-free-tests` to sit "Queued — Waiting to run this check" forever (GitHub Actions can't tell a typoed label from a self-hosted runner that's about to register; it just waits). Three prior Windows-runner attempts all failed for different reasons: - `blacksmith-2vcpu-windows-2022` — Blacksmith app not installed on the org - `windows-latest-8-cores` — GitHub paid larger-runner billing not enabled - `ubicloud-standard-2/8-windows` — Ubicloud doesn't offer Windows at all The free `windows-latest` runner (4 cores, ~60s spin-up, $0) is the one path that actually runs. The wave-coverage Windows tests are <30s of real work; total job time stays under 2 minutes. Cleaned up `.github/actionlint.yaml` to drop the bogus `ubicloud-standard-*-windows` entries — kept only the two real Linux labels. CHANGELOG: split the runner-pool row into Linux (migrated to Ubicloud-8) vs Windows (stays on free windows-latest), with the why on each. Itemized line for windows-free-tests rewritten to reflect the actual outcome. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(windows): skip Unix-only cases on Windows runner windows-free-tests on GitHub free windows-latest fails three cases that depend on Unix tooling the runner doesn't have: 1. `setup-windows-fallback.test.ts` behavior matrix — IS_WINDOWS=0 cells assert `ln -snf` produces a real symlink. On Windows-without-Developer- Mode (which the free `windows-latest` runner is), `ln -snf` silently creates a file copy. That's literally the bug `_link_or_copy` exists to work around, so the assertion can never pass there. Skip the whole describe block on win32. The static-invariant test (zero raw `ln` outside the helper body) above the matrix still runs and pins the shape the Windows install relies on. 2. `docs-config-keys.test.ts` round-trip — spawnSync(`bin/gstack-config`) on Windows doesn't read the bash shebang and fails to exec. Skip on win32; the deprecated-key denylist test in the same file still runs and is the actual invariant defending the v1.27.0.0 rename at the doc layer. Use `describe.skipIf(process.platform === 'win32', ...)` and `test.skipIf(process.platform === 'win32', ...)`. Tests still run on macOS and Linux unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 10:52:28 +08:00 · 2026-05-14 21:19:58 -07:00
parent e362b0ae2f
commit 3bf43766d5
28 changed files with 699 additions and 82 deletions
--- a/browse/src/server.ts
+++ b/browse/src/server.ts
@@ -59,6 +59,43 @@ import * as net from 'net';
 import * as path from 'path';
 import * as crypto from 'crypto';

+// ─── Unicode Sanitization ───────────────────────────────────────
+// Remove unpaired UTF-16 surrogate halves (\uD800–\uDFFF). Page DOM text,
+// OCR output, and other CDP-sourced strings can contain lone surrogates;
+// JSON consumers downstream (Anthropic API in particular) reject them with
+// "no low surrogate in string". Valid surrogate pairs (e.g. emoji) survive
+// unchanged. Lone halves become U+FFFD (<28>).
+//
+// INVARIANT: every server egress path that ships page-content strings MUST
+// route through this sanitizer. handleCommandInternal wraps the final
+// cr.result string (text/plain bodies carry lone surrogates verbatim;
+// JSON.stringify already escapes them). The two SSE producers below
+// stringify with `sanitizeReplacer` so payload string fields get cleaned
+// BEFORE escaping. Plain post-stringify regex is a no-op there because
+// JSON.stringify converts \uD800 → "\\ud800" — the regex can't see the
+// surrogate after that point.
+function sanitizeLoneSurrogates(str: string): string {
+  return str.replace(/[\uD800-\uDFFF]/g, (match, offset) => {
+    const code = match.charCodeAt(0);
+    if (code >= 0xD800 && code <= 0xDBFF) {
+      const next = str.charCodeAt(offset + 1);
+      if (next >= 0xDC00 && next <= 0xDFFF) return match;
+    }
+    if (code >= 0xDC00 && code <= 0xDFFF) {
+      const prev = str.charCodeAt(offset - 1);
+      if (prev >= 0xD800 && prev <= 0xDBFF) return match;
+    }
+    return '<27>';
+  });
+}
+
+// JSON.stringify replacer that sanitizes string values before they get
+// escape-encoded. Pair with stringify when the consumer will JSON.parse the
+// payload back into JS strings (SSE clients do this).
+function sanitizeReplacer(_key: string, value: unknown): unknown {
+  return typeof value === 'string' ? sanitizeLoneSurrogates(value) : value;
+}
+
 // ─── Config ─────────────────────────────────────────────────────
 const config = resolveConfig();
 ensureStateDir(config);
@@ -683,7 +720,7 @@ interface CommandResult {
 *   skipActivity: true when called from chain (chain emits 1 event for all subcommands)
 *   chainDepth: recursion guard — reject nested chains (depth > 0 means inside a chain)
 */
-async function handleCommandInternal(
+async function handleCommandInternalImpl(
  body: { command: string; args?: string[]; tabId?: number },
  tokenInfo?: TokenInfo | null,
  opts?: { skipRateCheck?: boolean; skipActivity?: boolean; chainDepth?: number },
@@ -1027,6 +1064,21 @@ async function handleCommandInternal(
  }
 }

+/**
+ * Sanitizing wrapper around handleCommandInternalImpl. ALL callers (single-command
+ * HTTP, batch loop, scoped-token dispatch) go through this so the lone-surrogate
+ * sanitization happens once at the architectural choke point, not per-leaf.
+ * Do not bypass this by calling handleCommandInternalImpl directly.
+ */
+async function handleCommandInternal(
+  body: { command: string; args?: string[]; tabId?: number },
+  tokenInfo?: TokenInfo | null,
+  opts?: { skipRateCheck?: boolean; skipActivity?: boolean; chainDepth?: number },
+): Promise<CommandResult> {
+  const cr = await handleCommandInternalImpl(body, tokenInfo, opts);
+  return { ...cr, result: sanitizeLoneSurrogates(cr.result) };
+}
+
 /** HTTP wrapper — converts CommandResult to Response */
 async function handleCommand(body: any, tokenInfo?: TokenInfo | null): Promise<Response> {
  const cr = await handleCommandInternal(body, tokenInfo);
@@ -1827,19 +1879,24 @@ export async function start() {

        const stream = new ReadableStream({
          start(controller) {
+            // SSE egress invariant: every JSON.stringify here ships page-content-derived
+            // fields (URLs, command args, errors) to the sidebar. Lone surrogates must
+            // be sanitized DURING stringify (via sanitizeReplacer) so they're cleaned
+            // before escape-encoding — post-stringify regex is ineffective because
+            // JSON.stringify has already converted \uD800 → "\\ud800".
            // 1. Gap detection + replay
            const { entries, gap, gapFrom, availableFrom } = getActivityAfter(afterId);
            if (gap) {
-              controller.enqueue(encoder.encode(`event: gap\ndata: ${JSON.stringify({ gapFrom, availableFrom })}\n\n`));
+              controller.enqueue(encoder.encode(`event: gap\ndata: ${JSON.stringify({ gapFrom, availableFrom }, sanitizeReplacer)}\n\n`));
            }
            for (const entry of entries) {
-              controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry)}\n\n`));
+              controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry, sanitizeReplacer)}\n\n`));
            }

            // 2. Subscribe for live events
            const unsubscribe = subscribe((entry) => {
              try {
-                controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry)}\n\n`));
+                controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry, sanitizeReplacer)}\n\n`));
              } catch (err: any) {
                console.debug('[browse] Activity SSE stream error, unsubscribing:', err.message);
                unsubscribe();
@@ -2188,10 +2245,15 @@ export async function start() {
        const encoder = new TextEncoder();
        const stream = new ReadableStream({
          start(controller) {
+            // SSE egress invariant: inspectorData and CDP event payloads carry
+            // page-DOM strings (selectors, attribute values, console messages).
+            // sanitizeReplacer cleans lone surrogates DURING JSON.stringify so
+            // they're neutralized before escape-encoding (post-stringify regex
+            // is a no-op once \uD800 has become "\\ud800").
            // Send current state immediately
            if (inspectorData) {
              controller.enqueue(encoder.encode(
-                `event: state\ndata: ${JSON.stringify({ data: inspectorData, timestamp: inspectorTimestamp })}\n\n`
+                `event: state\ndata: ${JSON.stringify({ data: inspectorData, timestamp: inspectorTimestamp }, sanitizeReplacer)}\n\n`
              ));
            }

@@ -2199,7 +2261,7 @@ export async function start() {
            const notify: InspectorSubscriber = (event) => {
              try {
                controller.enqueue(encoder.encode(
-                  `event: inspector\ndata: ${JSON.stringify(event)}\n\n`
+                  `event: inspector\ndata: ${JSON.stringify(event, sanitizeReplacer)}\n\n`
                ));
              } catch (err: any) {
                console.debug('[browse] Inspector SSE stream error:', err.message);
--- a/browse/test/server-sanitize-surrogates.test.ts
+++ b/browse/test/server-sanitize-surrogates.test.ts
@@ -0,0 +1,129 @@
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+
+// The sanitizer is module-private in server.ts. Rather than refactor it to a
+// separate module just for testing, we extract its source via a regex slice and
+// eval it in a fresh function scope. Keeps the production layout untouched.
+const SERVER_PATH = path.resolve(import.meta.dir, '..', 'src', 'server.ts');
+const SERVER_SRC = fs.readFileSync(SERVER_PATH, 'utf-8');
+
+const fnMatch = SERVER_SRC.match(
+  /function sanitizeLoneSurrogates\(str: string\): string \{[\s\S]*?\n\}/
+);
+if (!fnMatch) throw new Error('Could not locate sanitizeLoneSurrogates in server.ts');
+
+// Strip TS annotations so eval works under plain JS.
+const jsSrc = fnMatch[0].replace('(str: string): string', '(str)');
+const sanitizeLoneSurrogates = new Function(`${jsSrc}\nreturn sanitizeLoneSurrogates;`)() as (
+  s: string,
+) => string;
+
+describe('sanitizeLoneSurrogates — unit cases', () => {
+  test('passthrough ASCII', () => {
+    expect(sanitizeLoneSurrogates('hello')).toBe('hello');
+  });
+
+  test('passthrough empty string', () => {
+    expect(sanitizeLoneSurrogates('')).toBe('');
+  });
+
+  test('preserves valid surrogate pair (U+1F389 🎉)', () => {
+    expect(sanitizeLoneSurrogates('hi 🎉')).toBe('hi 🎉');
+  });
+
+  test('replaces lone high surrogate mid-string', () => {
+    expect(sanitizeLoneSurrogates('a\uD800b')).toBe('a<>b');
+  });
+
+  test('replaces lone low surrogate mid-string', () => {
+    expect(sanitizeLoneSurrogates('a\uDC00b')).toBe('a<>b');
+  });
+
+  test('replaces trailing lone high at end of string', () => {
+    expect(sanitizeLoneSurrogates('a\uD800')).toBe('a<>');
+  });
+
+  test('replaces leading lone low at start of string', () => {
+    expect(sanitizeLoneSurrogates('\uDC00b')).toBe('<27>b');
+  });
+
+  test('replaces two adjacent lone highs', () => {
+    expect(sanitizeLoneSurrogates('\uD800\uD800')).toBe('<27><>');
+  });
+
+  test('replaces two adjacent lone lows', () => {
+    expect(sanitizeLoneSurrogates('\uDC00\uDC00')).toBe('<27><>');
+  });
+
+  test('preserves valid pair followed by lone low', () => {
+    // 𐀀 = U+10000 = 𐀀, then a separate lone low.
+    const input = '𐀀\uDC00';
+    const output = sanitizeLoneSurrogates(input);
+    // Valid pair intact, trailing lone low replaced.
+    expect(output).toBe('𐀀<>');
+  });
+
+  test('preserves valid pair preceded by lone low', () => {
+    const input = '\uDC00𐀀';
+    const output = sanitizeLoneSurrogates(input);
+    expect(output).toBe('<27>𐀀');
+  });
+});
+
+describe('sanitizeLoneSurrogates — bug-repro (D5)', () => {
+  // Pin the regression intent: a future refactor that drops sanitization
+  // must fail this test even if happy-path tests still pass.
+  test('unsanitized lone surrogate causes UTF-8 encode to substitute, sanitized version is stable', () => {
+    const badPayload = 'page content\uD800more content';
+
+    // Buffer.from(str, 'utf-8') silently substitutes invalid sequences with
+    // EF BF BD (U+FFFD). Round-trip is therefore lossy for lone surrogates.
+    const roundTrippedRaw = Buffer.from(badPayload, 'utf-8').toString('utf-8');
+    expect(roundTrippedRaw).not.toBe(badPayload); // proves the bug exists pre-sanitize
+
+    // After sanitization the round-trip is stable.
+    const sanitized = sanitizeLoneSurrogates(badPayload);
+    const roundTrippedSanitized = Buffer.from(sanitized, 'utf-8').toString('utf-8');
+    expect(roundTrippedSanitized).toBe(sanitized);
+  });
+
+  test('JSON.parse(JSON.stringify(...)) round-trip is stable after sanitization', () => {
+    // Anthropic's API path wraps the response body in a tool_result JSON
+    // object. JSON.stringify CAN encode a lone surrogate (escapes it), but
+    // some downstream consumers reject the resulting body.
+    const badPayload = 'before\uD800after';
+    const sanitized = sanitizeLoneSurrogates(badPayload);
+    const wrapped = JSON.stringify({ content: sanitized });
+    const reparsed = JSON.parse(wrapped) as { content: string };
+    // .toBe(sanitized) already proves the surrogate was replaced; the
+    // additional explicit check below documents the specific code points.
+    expect(reparsed.content).toBe(sanitized);
+    expect(reparsed.content.charCodeAt(6)).toBe(0xfffd); // <20> not \uD800
+  });
+});
+
+describe('sanitizeLoneSurrogates — wiring invariants', () => {
+  test('server.ts wraps every command result through handleCommandInternal', () => {
+    // The architectural choice is to wrap once at handleCommandInternal so
+    // both single-command HTTP and the batch loop inherit. If a future
+    // refactor moves sanitization back to handleCommand only, this test
+    // fails by detecting the missing wrapper.
+    expect(SERVER_SRC).toContain('async function handleCommandInternalImpl(');
+    expect(SERVER_SRC).toContain('result: sanitizeLoneSurrogates(cr.result)');
+  });
+
+  test('SSE activity feed sanitizes outbound frames via sanitizeReplacer', () => {
+    // Replacer must run DURING stringify; post-stringify regex is ineffective
+    // because JSON.stringify converts \uD800 → "\\ud800" before our regex sees it.
+    expect(SERVER_SRC).toContain('JSON.stringify(entry, sanitizeReplacer)');
+  });
+
+  test('SSE inspector stream sanitizes outbound frames via sanitizeReplacer', () => {
+    expect(SERVER_SRC).toContain('JSON.stringify(event, sanitizeReplacer)');
+  });
+
+  test('sanitizeReplacer is a function defined in server.ts', () => {
+    expect(SERVER_SRC).toContain('function sanitizeReplacer(');
+  });
+});