Files
gstack/extension/sidepanel-terminal.js
Garry Tan 7ca04d8ef0 v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569)

gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for
GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin
(e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env
via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints,
analytics, and learnings into that plugin's directory. Anyone with the
Codex plugin installed alongside gstack hit this silently.

Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when
CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path
contains "gstack"). Skill installs fall through to \$HOME/.gstack.

Contributed by @ElliotDrel via #1570. Closes #1569.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+

gbrain v0.20+ changed `gbrain sources list --json` to return
{sources: [...]} instead of a flat array. sourceLocalPath crashed
upstream with `list.find is not a function` on every /sync-gbrain
invocation against modern gbrain. Accept both shapes for
forward/backward compat, matching probeSource/sourcePageCount in
lib/gbrain-sources.ts.

Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564
(@tonyjzhou, same fix, different shape — credit retained).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559)

gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`,
which fails in any environment where the `command` builtin isn't on
the spawned process's PATH (most non-interactive shells). The probe
then reported gbrain as missing even when it was installed, and
context-load silently skipped vector/list queries.

Fix: probe `gbrain --version` directly with a 500ms timeout (matching
the rest of the file's MCP_TIMEOUT_MS). Same semantics, works
everywhere execFile works.

Contributed by @jbetala7 via #1560. Closes #1559.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418)

Adds an exec-path regression test that runs a fake gbrain shim emitting
the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings",
exit 1 for health_score < 100, no top-level `engine` field). Confirms
freshDetectEngineTier recovers stdout from the non-zero exit and falls
back to GBRAIN_HOME/config.json for the engine label.

The pre-existing test for #1415 only stripped gbrain from PATH; this
test exercises the actual doctor parse path, closing the gap that
codex's plan review flagged.

Also documents the schema_version separation in
lib/gbrain-local-status.ts: the local CacheEntry stays at version 1,
distinct from the doctor-output schema_version which we accept across
versions in gstack-memory-helpers.

Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2
collapse). The fix landed pre-emptively in v1.29.x; this commit pins
it with a stronger test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346)

#1346 reported that gstack-memory-ingest still called the renamed
gbrain put_page subcommand on gbrain v0.18+. The actual code migrated
to `gbrain put` and later to batch `gbrain import <dir>` before this
report landed — only documentation lag remained.

This commit:
- Updates the --help string ("Skip gbrain put calls (still updates
  state file)") so user-facing docs match the shipped subcommand
- Updates two inline comments that still referenced the old name
- Adds test/memory-ingest-no-put_page.test.ts: a regression pin that
  strips comments from bin/gstack-memory-ingest.ts and fails the build
  if "put_page" appears in any active code or string literal, plus a
  sanity check that the file still calls a supported gbrain page-write
  verb (put or import)

Closes #1346. Reporter @kylma-code surfaced the doc lag; the original
code migration credit is on the v1.27.x wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug>

scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions
using the renamed `gbrain put_page` subcommand across 10 skills
(office-hours, investigate, plan-ceo-review, retro, plan-eng-review,
ship, cso, design-consultation, fallback, entity-stub). Every gstack
user copying those snippets hit "unknown command: put_page" on gbrain
v0.18+.

This commit:
- Rewrites all 10 instruction templates to use `gbrain put <slug>
  --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML
  frontmatter inside --content, matching the v0.18+ subcommand shape
- Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands"
  table to reference `gbrain put` and `gbrain get`
- Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two
  invariants: (a) resolver source ships only canonical instructions,
  (b) every tracked SKILL.md file is free of `gbrain put_page`

CHANGELOG entries are deliberately left untouched (historical record).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561)

Bun's Windows shell parser rejects multiple constructs the inline
package.json build chain used: brace groups `{ cmd; }`, subshells with
redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells
near redirections in general. Every Windows install + every
auto-upgrade since v1.34.2.0 has failed on `bun run build`.

Extracts the build chain to scripts/build.sh and the .version writes to
scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing
involved. Also adds Windows-specific bun.exe handling for non-ASCII
PATHs (a separate Windows footgun where Bun's --compile fails when the
binary lives under a path with non-ASCII chars).

Updates test/build-script-shell-compat.test.ts to assert the new shape:
no subshells with redirections anywhere in the build chain, and build
delegates to scripts/build.sh which delegates .version writes.

Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed
in build helper), #1480 (@mikepsinn, partial overlap), #1460
(@realcarsonterry, brace-group fix subsumed) — credit retained.
Closes #1538, #1537, #1530, #1457, #1561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554)

bun build --compile on Windows appends .exe to the output filename,
producing browse.exe instead of browse. find-browse's existsSync probe
only checked the bare path and returned null on Windows even when the
binary was correctly built. .gitignore similarly only excluded the
bare bin/gstack-global-discover path, leaving the .exe variant
tracked.

This commit:
- .gitignore: changes `bin/gstack-global-discover` →
  `bin/gstack-global-discover*` so the Windows .exe variant is ignored
- browse/src/find-browse.ts: adds isExecutable + findExecutable helpers
  that fall back to .exe/.cmd/.bat probing on Windows, mirroring the
  same helper already in make-pdf/src/browseClient.ts and pdftotext.ts

Contributed by @Mike-E-Log via #1554.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest

Adds .github/workflows/windows-setup-e2e.yml as the gate that catches
Bun shell-parser regressions in the build chain before they reach
users. Triggers on PRs touching package.json, scripts/build.sh,
scripts/write-version-files.sh, setup, browse cli/find-browse, or
gstack-paths.

What it verifies:
1. bun run build completes on Windows (the previously-broken path that
   #1538/#1537/#1530/#1457/#1561 reported)
2. All compiled binaries land on disk (browse.exe, find-browse.exe,
   design.exe, gstack-global-discover.exe)
3. find-browse resolves to the .exe variant on Windows (regression
   gate for #1554)
4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT
   on Windows (regression gate for #1570)

Complements the existing windows-free-tests.yml (curated unit subset);
this new workflow exercises the install path itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209)

Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together
(mutually exclusive at argv level). Every /codex review, /review, and
/ship structured Codex review call ended with an argv error before the
model ran.

Fix: scope the diff in prompt text using
"Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD"
instead of `--base <base>`. Preserves the filesystem boundary
instruction across all invocations and keeps Codex's review prompt
tuning.

Touches:
- codex/SKILL.md.tmpl + regenerated codex/SKILL.md
- scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md
- test/gen-skill-docs.test.ts: new regression that fails if any of the
  five known files still contain the prompt+--base shape
- test/skill-validation.test.ts: corresponding negative + positive pin
  on the rendered SKILL.md files

Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527
(@mvanhorn — same intent, different patch shape, CONFLICTING) and
#1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained
in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): diff from git merge-base, not git diff origin/<base> (#1492)

git diff origin/<base> shows everything since the common ancestor in
both directions — it includes commits that landed on origin/<base>
after this branch was created as deletions. That made /review and
/ship's pre-landing structured review report inflated diff totals and
flagged "removed" code that was actually still present in the working
tree.

Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff
the working tree against that point. Same coverage of uncommitted
edits, no phantom deletions from out-of-order base advancement.

Applies to /review's Step 1 (diff existence check), Step 3 (get the
diff), the build-on-intent scope-creep check, the structured review
DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt.
Same change flows into ship/SKILL.md via the shared resolver.

Touches:
- review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md
- scripts/resolvers/review.ts
- scripts/resolvers/review-army.ts

Contributed by @mvanhorn via #1492.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522)

#1503 reported that the bare codex review --base path stripped the
filesystem boundary instruction, letting Codex spend tokens reading
.claude/skills/ and agents/. #1522 proposed adding a skill-path
detector that switched to the custom-instructions route when the diff
touched skill files.

After C10 (#1209) restructured codex review to always carry the
boundary in the prompt (the prompt+--base argv conflict forced the
restructure), the skill-path detector becomes redundant — every
default call already preserves the boundary.

This commit pins the post-#1209 invariant with a test that fails the
build if any future refactor strips the boundary from codex/SKILL.md,
review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test.

#1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers
its safety concern); credit retained in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): use command -v instead of which for codex detection (#1197)

`which` is not on PATH in every shell — some Windows shells, BusyBox-
only containers, and minimal CI images all fail when skills probe
codex availability via `which codex`. `command -v` is a POSIX builtin
and always available where the skill is running.

Touched:
- codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "")
- scripts/resolvers/review.ts and scripts/resolvers/design.ts:
  3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1`
- Regenerated all 10 affected SKILL.md files (codex, review, ship,
  design-consultation, design-review, office-hours, plan-ceo-review,
  plan-design-review, plan-devex-review, plan-eng-review)
- test/skill-validation.test.ts: updated pin + defensive regression
  test that fails if `which codex` returns to codex/SKILL.md
- test/skill-e2e-plan.test.ts: updated summary regex

Contributed by @mvanhorn via #1197.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327)

When codex exits non-zero (parse errors, arg-shape breaks, model API
errors that propagate as non-zero status), the calling agent
previously saw an empty output and burned 30-60 minutes misdiagnosing
as a silent model/API stall. The hang-detection block only caught
exit 124 (the timeout-wrapper signal).

Adds elif blocks in all four codex invocation sites (Review default,
Challenge, Consult new-session, Consult resume) that:
- Echo "[codex exit N] <stderr first line>" to stdout
- Indent the first 20 stderr lines for inline context
- Log codex_nonzero_exit telemetry tagged with the call site

Contributed by @genisis0x via #1467. Closes #1327.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248)

The design binary previously called process.env.OPENAI_API_KEY without
checking where the key came from. If a user ran $D inside someone
else's project that had OPENAI_API_KEY in its .env, the resulting
generation billed that project's account. Silent and irreversible.

Fix: resolveApiKeyInfo() returns both the key and its source. When the
env-var path matches an OPENAI_API_KEY entry in the current
directory's .env, .env.<NODE_ENV>, or .env.local file, we set a
warning. requireApiKey() prints "Using OpenAI key from <source>" plus
the warning before the run — never the key itself.

Adds 6 unit tests covering: config-vs-env precedence, env-only (no
match), env+cwd .env match, quoted/exported values, value-mismatch
(no false positive), and the no-leak invariant for requireApiKey
stderr output.

Contributed by @jbetala7 via #1278. Closes #1248.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214)

Full-page screenshots of tall pages routinely exceeded 2000px on the
longest dimension, silently bricking the agent's session: the
resulting base64 reached the Anthropic vision API which rejected the
oversized image, leaving the agent burning turns on a useless blob
with no stderr trace from the browse side.

Adds browse/src/screenshot-size-guard.ts as a shared helper:
- guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000
- guardScreenshotPath(path) → file-mode variant that rewrites in place
- Aspect ratio preserved via sharp's resize fit:inside
- Stderr diagnostic on any downscale so callers can see when it fired
- Lazy sharp import so non-screenshot paths pay no startup cost

Wires the guard into all three full-page callsites codex review
flagged:
- browse/src/snapshot.ts: annotated + heatmap fullPage captures
- browse/src/meta-commands.ts: screenshot command (path + base64
  fullPage modes) plus the responsive 3-viewport sweep
- browse/src/write-commands.ts: prettyscreenshot fullPage path

Covers seven unit cases (pass-through, downscale, aspect ratio,
exactly-2000px edge, file-mode rewrite) plus a static invariant test
that fails the build if any of the three callsites stops importing the
guard.

Closes #1214.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370)

The L4 TestSavant classifier in browse/src/security-classifier.ts
can't be imported into the compiled browse server (onnxruntime-node
dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent
that used to host it (sidebar-agent.ts) was removed when the PTY
proved out — leaving the classifier file shipped but with zero
callers. Exactly the gap codex flagged in #1370.

Adds browse/src/security-sidecar-entry.ts: a Node script that runs the
classifier as a subprocess of the browse server. It reads NDJSON
requests from stdin and writes id-correlated NDJSON responses to
stdout, supporting:
  - op: "scan-page-content" — full L4 classifier scan
  - op: "ping" — liveness probe for the client's health check
  - op: "status" — classifier readiness (used by /pty-inject-scan to
    surface l4 { available: bool } in its response)

Plus browse/src/find-security-sidecar.ts: a resolver that locates
node + the bundled JS entry (browse/dist/security-sidecar.js, built in
a follow-up package.json change) or falls back to the dev TS entry.
Returns null cleanly when node isn't on PATH so the calling endpoint
can degrade per D7 (extension WARN + user confirm).

C17 of the security-stack wave. C18 adds the IPC client + lifecycle
management; C19 wires the endpoint; C20 routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370)

Adds browse/src/security-sidecar-client.ts to manage the Node L4
classifier subprocess from the compiled browse server:

- Lazy spawn on first scan; reuses the same process across requests
- Id-correlated request/response via NDJSON over stdio
- 5s default per-scan timeout; 64KB payload cap (short-circuits before
  spawn so oversized requests don't waste a process)
- 3-in-10-minutes respawn cap → trips circuit breaker; subsequent
  scans throw immediately so the /pty-inject-scan endpoint can surface
  l4 { available: false } to the extension and degrade to WARN+confirm
- process.on('exit') sends SIGTERM to the child for clean teardown
- isSidecarAvailable() lets the endpoint probe before scan calls so
  the response shape reflects degraded mode honestly

Unit tests cover the payload cap, the availability probe, and the
breaker-doesn't-crash invariant under repeated rejected calls.

C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20
routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370)

The sidebar's gstackInjectToTerminal callers (toolbar Cleanup,
Inspector "Send to Code") were piping page-derived text directly into
the live claude PTY with ZERO classifier processing — the gap codex
flagged in #1370. The documented sidebar security stack had a hole
the size of every Cleanup-button click.

Adds POST /pty-inject-scan to browse/src/server.ts:
- Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the
  general 404 path; never reaches the scan logic)
- Root-token auth via existing validateAuth() — 401 on unauth
- 64KB request cap → 413 + payload-too-large body
- 5s scan timeout via sidecar client
- URL-blocklist forced to BLOCK in PTY context (page-derived REPL
  input is higher-risk than ordinary tool output)
- L4 ML classifier via the sidecar when available; degrades to WARN
  per D7 when sidecar is unavailable
- Response goes through JSON.stringify(..., sanitizeReplacer) per
  v1.38.0.0 Unicode-egress hardening
- Imports only from security-sidecar-client.ts, never directly from
  security-classifier.ts (which would brick the compiled Bun binary)

Seven static-invariant tests pin the POST verb, auth gate, 64KB cap,
tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability
shape, and the no-direct-classifier-import rule.

C19 of the security-stack wave. C20 routes the extension through it;
C21 adds the invariant AST check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370)

Closes the documented-vs-shipped gap codex flagged in #1370. The
sidebar's two PTY-injection call sites (Inspector "Send to Code" and
toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint
before writing to the live claude REPL.

Adds window.gstackScanForPTYInject(text, origin) to
extension/sidepanel-terminal.js:
- Async, returns { allow, verdict, reasons, l4 }
- POST to /pty-inject-scan with the existing root-token auth
- WARN+confirm on scan failure (network down, sidecar absent, etc.)
  rather than silent PASS — D7 honest-degradation

gstackInjectToTerminal stays synchronous, returns boolean. Per D6:
keeping the inject sync means existing `const ok = ...?.()` callers
don't break, and the invariant test in
test/extension-pty-inject-invariant.test.ts can statically pin that
every call goes through the scan first.

extension/sidepanel.js call sites updated:
- inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via
  window.confirm, PASS injects silently
- runCleanup() → same flow. Static cleanup prompt always PASSes but
  still routes through scan to honor the invariant.

C20 of the security-stack wave. C21 adds the static invariant test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): invariant — extension PTY inject must be scan-gated (#1370)

Static-analysis invariant test that fails the build if any
extension/*.js path calls window.gstackInjectToTerminal without a
preceding window.gstackScanForPTYInject in the same enclosing
function. Closes the documented-vs-shipped gap codex demanded a
machine check on.

Rules:
- Rule 1: any file that calls inject must also reference scan
- Rule 2: in the enclosing function (function declaration, arrow,
  async (), event handler), a scan call must appear before the inject
  call by source position
- Exemption: sidepanel-terminal.js (the file that DEFINES the inject
  function) is exempt from Rule 2 since the definition is not a call

Plus two structural checks:
- sidepanel-terminal.js defines both the inject and scan functions
- inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would
  silently break the `const ok = ...?.()` pattern at every caller

C21 of the security-stack wave. The sidecar architecture (#1370) is
complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension
pre-scan wiring (C20), and now the regression gate (C21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112)

Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current
browse/src/stealth.ts contract. The existing minimal "codex narrowed"
stealth (webdriver-mask + AutomationControlled launch arg) stays the
default. PR #1112's six additional patches are added behind an opt-in
GSTACK_STEALTH=extended env flag.

Extended-mode patches (applied AFTER the default mask, in order):
  1. delete navigator.webdriver from prototype (not just the getter —
     detectors check `"webdriver" in navigator`)
  2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1
     software-GPU tell in containers)
  3. navigator.plugins returns a PluginArray-prototype-passing array
     with MimeType objects and namedItem()
  4. window.chrome populated with chrome.app, chrome.runtime,
     chrome.loadTimes(), chrome.csi() with realistic shapes
  5. navigator.mediaDevices backfilled when headless drops it
  6. CDP cdc_*-prefixed window globals cleared

Why opt-in: the default mode's contract is fingerprint CONSISTENCY,
which protects against detectors that flag spoofing mismatch. Extended
mode actively lies about the environment; sites that reflect on these
properties can break. Users who hit detection in default mode can flip
GSTACK_STEALTH=extended for SannySoft 100% pass-rate.

Twenty unit tests pin the env-flag semantics, all six patches' code
presence, and the applyStealth wiring order. Live SannySoft pass-rate
verification stays in the periodic-tier E2E suite.

Contributed by @garrytan via #1112 (rebased — original PR opened
before the codex-narrowed minimum landed; rebase preserves the
narrowed default while adding the SannySoft-passing path as opt-in).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates

Updates the three ship-SKILL.md golden baselines (claude, codex,
factory hosts) to match the new shape produced by:
- C10 #1209 codex argv (prompt + diff scope, no --base)
- C11 #1492 merge-base diff (DIFF_BASE= preamble)
- C13 #1197 command -v for codex detection
- C12 + boundary preservation per regen-enforcing test

Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs,
commit the regenerated outputs together. Goldens are part of the
regen contract — without this commit, test/host-config.test.ts'
golden-baseline checks fail with the diff codex review surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes)

Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the
release-summary format in CLAUDE.md: two-line headline, lead
paragraph, "The numbers that matter" table, "What this means for
builders" closer, then itemized Added/Changed/Fixed/For contributors
with inline credit to every PR author and original issue reporter.

Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net,
substantial new capability across security (PTY sidecar wiring),
install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI
0.130+), and quality (screenshot guard, design key disclosure,
extended stealth opt-in). MINOR is the right call.

Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537,
#1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327,
#1193 pattern, #1152 pattern. Credit retained inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe]

windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a
freshly-built repo where binaries land at browse/dist/browse.exe (no
.claude/skills/gstack/ install layout). The previous markers chain
only matched .codex/.agents/.claude prefixed paths, so find-browse
exited "not found" even when the binary was present.

Adds a source-checkout fallback after the marker scan: if no
installed layout resolves but <git-root>/browse/dist/browse[.exe]
exists, return that. Three real callers hit this path:
- gstack repo dev workflow before `./setup` runs
- windows-setup-e2e.yml CI (the breakage that surfaced this)
- make-pdf consumers running from a sibling source checkout

Smoke-verified: a fresh git repo with browse/dist/browse on disk now
resolves through the source-checkout branch (was returning null
before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:35:01 -07:00

533 lines
19 KiB
JavaScript

/**
* Terminal sidebar tab — interactive Claude Code PTY in xterm.js.
*
* Lifecycle (per plan + codex review):
* 1. Sidebar opens. Terminal is the default-active tab.
* 2. Bootstrap card shows "Press any key to start Claude Code."
* 3. On first keystroke (lazy spawn — codex finding #8): the extension
* a) POSTs /pty-session on the browse server with the AUTH_TOKEN to
* mint a short-lived HttpOnly cookie scoped to the terminal-agent.
* b) Opens ws://127.0.0.1:<terminalPort>/ws — the cookie travels
* automatically. Terminal-agent validates the cookie + the
* chrome-extension:// Origin (codex finding #9), then spawns
* claude in a PTY.
* 4. Bytes pump both ways. Resize observer sends {type:"resize"} text
* frames; tab-switch hooks send {type:"tabSwitch"} frames.
* 5. PTY exits or WS closes -> we show "Session ended" with a restart
* button. We do NOT auto-reconnect (codex finding #8: auto-reconnect
* = burn fresh claude session every time).
*
* Keep this file dependency-free. xterm.js + xterm-addon-fit are loaded
* via <script src> tags in sidepanel.html (window.Terminal, window.FitAddon).
*/
(function () {
'use strict';
const Terminal = window.Terminal;
const FitAddonModule = window.FitAddon;
if (!Terminal) {
console.error('[gstack terminal] xterm not loaded');
return;
}
const els = {
bootstrap: document.getElementById('terminal-bootstrap'),
bootstrapStatus: document.getElementById('terminal-bootstrap-status'),
installCard: document.getElementById('terminal-install-card'),
installRetry: document.getElementById('terminal-install-retry'),
mount: document.getElementById('terminal-mount'),
ended: document.getElementById('terminal-ended'),
restart: document.getElementById('terminal-restart'),
restartNow: document.getElementById('terminal-restart-now'),
};
/** State machine. */
const STATE = { IDLE: 'idle', CONNECTING: 'connecting', LIVE: 'live', ENDED: 'ended', NO_CLAUDE: 'no-claude' };
let state = STATE.IDLE;
let term = null;
let fitAddon = null;
let ws = null;
function show(el) { el.style.display = ''; }
function hide(el) { el.style.display = 'none'; }
function setState(next, opts = {}) {
state = next;
switch (next) {
case STATE.IDLE:
show(els.bootstrap);
hide(els.installCard);
hide(els.mount);
hide(els.ended);
els.bootstrapStatus.textContent = opts.message || 'Press any key to start Claude Code.';
break;
case STATE.CONNECTING:
show(els.bootstrap);
hide(els.installCard);
hide(els.mount);
hide(els.ended);
els.bootstrapStatus.textContent = 'Connecting...';
break;
case STATE.LIVE:
hide(els.bootstrap);
hide(els.installCard);
show(els.mount);
hide(els.ended);
break;
case STATE.ENDED:
hide(els.bootstrap);
hide(els.installCard);
hide(els.mount);
show(els.ended);
break;
case STATE.NO_CLAUDE:
show(els.bootstrap);
show(els.installCard);
hide(els.mount);
hide(els.ended);
els.bootstrapStatus.textContent = '';
break;
}
}
/**
* Read auth + terminalPort from the server's /health. We don't fetch this
* here — sidepanel.js already polls /health for connection state and
* exposes the relevant fields on window.gstackHealth (set below in init()).
* If terminalPort is missing, the agent isn't ready yet.
*/
function getHealth() {
return window.gstackHealth || {};
}
function getServerPort() {
return window.gstackServerPort || null;
}
function getAuthToken() {
return window.gstackAuthToken || null;
}
/**
* POST /pty-session to mint a fresh terminal session. Returns
* { terminalPort, ptySessionToken, expiresAt } on success, or
* { error } on failure. The token rides on the WebSocket
* Sec-WebSocket-Protocol header, which is the only auth header
* the browser WebSocket API lets us set. The token is NOT persisted —
* each sidebar load mints a fresh one and discards it on close.
*/
async function mintSession() {
const serverPort = getServerPort();
const token = getAuthToken();
if (!serverPort || !token) {
return { error: 'browse server not ready' };
}
try {
const resp = await fetch(`http://127.0.0.1:${serverPort}/pty-session`, {
method: 'POST',
headers: { 'Authorization': `Bearer ${token}` },
credentials: 'include',
});
if (!resp.ok) {
const body = await resp.text().catch(() => '');
return { error: `${resp.status} ${body || resp.statusText}` };
}
return await resp.json();
} catch (err) {
return { error: err && err.message ? err.message : String(err) };
}
}
async function checkClaudeAvailable(terminalPort) {
try {
const resp = await fetch(`http://127.0.0.1:${terminalPort}/claude-available`, {
credentials: 'include',
});
if (!resp.ok) return { available: false };
return await resp.json();
} catch {
return { available: false };
}
}
function ensureXterm() {
if (term) return;
term = new Terminal({
fontFamily: '"JetBrains Mono", "SF Mono", Menlo, "Noto Sans Mono CJK KR", "Malgun Gothic", monospace',
fontSize: 13,
theme: { background: '#0a0a0a', foreground: '#e5e5e5' },
cursorBlink: true,
scrollback: 5000,
allowTransparency: false,
convertEol: false,
});
if (FitAddonModule && FitAddonModule.FitAddon) {
fitAddon = new FitAddonModule.FitAddon();
term.loadAddon(fitAddon);
}
// CRITICAL: caller must make els.mount visible BEFORE invoking
// ensureXterm. xterm.js measures the container synchronously inside
// term.open() — if the mount is display:none, xterm caches a 0-size
// viewport and never auto-grows even after the container goes
// visible. The visible-first pattern is enforced by connect()
// calling setState(STATE.LIVE) before us.
term.open(els.mount);
// First fit waits for the next paint frame so the browser has
// applied the .active class transition. Otherwise term.cols/rows
// can come back as the minimum (2x2) when the mount's clientHeight
// is still being computed.
requestAnimationFrame(() => {
try {
fitAddon && fitAddon.fit();
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
}
} catch {}
});
const ro = new ResizeObserver(() => {
try {
fitAddon && fitAddon.fit();
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
}
} catch {}
});
ro.observe(els.mount);
// IME composition handling for Korean/CJK input (issue #1272).
// Suppress partial jamo during composition; only send the final
// composed string on compositionend. Without this, Korean IME
// sends fragmented input or doubles characters.
let composing = false;
const ta = term.textarea;
if (ta) {
ta.addEventListener('compositionstart', () => { composing = true; });
ta.addEventListener('compositionend', (e) => {
composing = false;
if (e.data && ws && ws.readyState === WebSocket.OPEN) {
ws.send(new TextEncoder().encode(e.data));
}
});
}
term.onData((data) => {
if (composing) return; // suppress partial input events during IME composition
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(new TextEncoder().encode(data));
}
});
}
/**
* Inject a string into the live PTY (the same way a real keystroke would).
* Used by the toolbar's Cleanup button and the Inspector's "Send to Code"
* action so the user can drive claude from outside-the-keyboard surfaces.
* Returns true if the bytes went out, false if no live session.
*
* IMPORTANT (D6): this function stays SYNCHRONOUS and SCAN-FREE. Page-
* derived input MUST be pre-scanned via window.gstackScanForPTYInject()
* before calling this. The invariant test in
* test/extension-pty-inject-invariant.test.ts fails the build if any
* extension/*.js path calls this without the preceding scan.
*
* Why not move the scan inside this function: callers already use the
* sync `const ok = gstackInjectToTerminal?.(text)` pattern. Making the
* inject async would turn `ok` into a Promise and silently break every
* existing call site. Pre-scanning at the caller keeps the boundary
* clean and the invariant testable.
*/
window.gstackInjectToTerminal = function (text) {
if (!text || !ws || ws.readyState !== WebSocket.OPEN) return false;
try {
ws.send(new TextEncoder().encode(text));
return true;
} catch {
return false;
}
};
/**
* Scan page-derived text via the browse server's /pty-inject-scan
* endpoint before injecting it into the PTY. Returns:
* { allow: true, verdict: "PASS" } → safe to inject
* { allow: true, verdict: "WARN", reasons: [...] } → caller should
* prompt the user before injecting
* { allow: false, verdict: "BLOCK", reasons: [...]} → drop the text;
* caller should surface a banner to the user
*
* On any network / endpoint failure: returns
* { allow: true, verdict: "WARN", reasons: ["scan-unreachable"] }
* so the caller falls back to WARN+confirm rather than silent PASS.
*
* Closes #1370.
*/
window.gstackScanForPTYInject = async function (text, origin) {
if (!text) return { allow: false, verdict: 'BLOCK', reasons: ['empty-text'] };
try {
const resp = await fetch('http://127.0.0.1:34567/pty-inject-scan', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${await getAuthTokenForScan()}`,
},
body: JSON.stringify({ text, origin: origin || 'extension' }),
});
if (!resp.ok) {
return { allow: true, verdict: 'WARN', reasons: [`scan-http-${resp.status}`] };
}
const body = await resp.json();
const verdict = body.verdict || 'WARN';
const allow = verdict !== 'BLOCK';
return { allow, verdict, reasons: body.reasons || [], l4: body.l4 };
} catch (err) {
return {
allow: true,
verdict: 'WARN',
reasons: ['scan-unreachable', err && err.message ? err.message : 'fetch-failed'],
};
}
};
// The auth token for /pty-inject-scan comes from the same source the
// sidepanel uses for /pty-session — a runtime fetch from /health (which
// already returns AUTH_TOKEN in headed mode per CLAUDE.md's v1.1 TODO).
// We don't echo the token here; this helper is a thin proxy around the
// existing pattern.
async function getAuthTokenForScan() {
if (window.__gstackPtyScanToken) return window.__gstackPtyScanToken;
try {
const resp = await fetch('http://127.0.0.1:34567/health');
const body = await resp.json();
const token = body.AUTH_TOKEN || body.authToken || '';
if (token) window.__gstackPtyScanToken = token;
return token;
} catch {
return '';
}
}
async function connect() {
if (state !== STATE.IDLE) return; // already connecting/live
setState(STATE.CONNECTING);
const minted = await mintSession();
if (minted.error) {
setState(STATE.IDLE, { message: `Cannot start: ${minted.error}` });
return;
}
const { terminalPort, ptySessionToken } = minted;
if (!ptySessionToken) {
setState(STATE.IDLE, { message: 'Cannot start: no session token returned' });
return;
}
// Pre-flight: does claude even exist on PATH?
const claudeStatus = await checkClaudeAvailable(terminalPort);
if (!claudeStatus.available) {
setState(STATE.NO_CLAUDE);
return;
}
// setState(LIVE) flips terminal-mount from display:none to display:flex.
// We MUST do that BEFORE ensureXterm() — xterm.js measures the container
// synchronously inside term.open() and a hidden container yields a 0x0
// terminal that never recovers. ensureXterm + the requestAnimationFrame
// fit() inside it run after the browser has applied the layout.
setState(STATE.LIVE);
ensureXterm();
// Token rides on Sec-WebSocket-Protocol — the only auth header the
// browser WebSocket API lets us set. Cross-port HttpOnly cookies with
// SameSite=Strict don't survive the jump from server.ts:34567 to the
// agent's random port from a chrome-extension origin, so cookies
// alone weren't reliable.
ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${ptySessionToken}`]);
ws.binaryType = 'arraybuffer';
ws.addEventListener('open', () => {
try {
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
} catch {}
// Push a fresh tab snapshot so claude's tabs.json is populated by
// the time the lazy spawn finishes booting. Background.js exposes
// the snapshot helper via chrome.runtime; we ask for it here and
// forward whatever comes back.
try {
chrome.runtime.sendMessage({ type: 'getTabState' }, (resp) => {
if (resp && ws && ws.readyState === WebSocket.OPEN) {
try {
ws.send(JSON.stringify({
type: 'tabState',
active: resp.active,
tabs: resp.tabs,
reason: 'initial',
}));
} catch {}
}
});
} catch {}
// Send a single byte to nudge the agent to spawn claude (lazy-spawn trigger).
try { ws.send(new TextEncoder().encode('\n')); } catch {}
});
ws.addEventListener('message', (ev) => {
if (typeof ev.data === 'string') {
// Agent control message (rare). Treat as JSON; error frames carry code.
try {
const msg = JSON.parse(ev.data);
if (msg.type === 'error' && msg.code === 'CLAUDE_NOT_FOUND') {
setState(STATE.NO_CLAUDE);
try { ws.close(); } catch {}
}
} catch {}
return;
}
// Binary: feed to xterm.
const buf = ev.data instanceof ArrayBuffer ? new Uint8Array(ev.data) : ev.data;
term.write(buf);
});
ws.addEventListener('close', () => {
ws = null;
if (state !== STATE.NO_CLAUDE) setState(STATE.ENDED);
});
ws.addEventListener('error', (err) => {
console.error('[gstack terminal] ws error', err);
});
}
function teardown() {
try { ws && ws.close(); } catch {}
ws = null;
if (term) {
try { term.dispose(); } catch {}
term = null;
fitAddon = null;
}
setState(STATE.IDLE);
}
// ─── Wiring ───────────────────────────────────────────────────
/**
* Force a fresh session: close any open WS, dispose xterm, return to
* IDLE, kick off auto-connect. Safe to call from any state.
*/
function forceRestart() {
try { ws && ws.close(); } catch {}
ws = null;
if (term) {
try { term.dispose(); } catch {}
term = null;
fitAddon = null;
}
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
tryAutoConnect();
}
/**
* Repaint xterm when the Terminal pane becomes visible. xterm.js has a
* known issue where its renderer doesn't redraw after a display:none →
* display:flex flip — the canvas/DOM stays blank until something forces
* a layout pass. fit() recomputes dimensions, refresh() redraws.
*/
function repaintIfLive() {
if (state !== STATE.LIVE || !term) return;
try { fitAddon && fitAddon.fit(); } catch {}
try { term.refresh(0, term.rows - 1); } catch {}
try {
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
}
} catch {}
}
function init() {
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
els.installRetry?.addEventListener('click', () => {
// Re-probe claude on PATH, then try a connect.
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
tryAutoConnect();
});
// Two restart buttons:
// - els.restart lives inside the ENDED state card (visible only after
// a session has ended).
// - els.restartNow lives in the always-visible toolbar (lets the user
// force a fresh claude mid-session without waiting for it to exit).
els.restart?.addEventListener('click', forceRestart);
els.restartNow?.addEventListener('click', forceRestart);
// Live browser-tab state. background.js → sidepanel.js → us. We
// forward over the live PTY WebSocket; terminal-agent.ts writes
// <stateDir>/active-tab.json + <stateDir>/tabs.json so claude can
// always read the current tab landscape.
document.addEventListener('gstack:tab-state', (ev) => {
if (!ws || ws.readyState !== WebSocket.OPEN) return;
try {
ws.send(JSON.stringify({
type: 'tabState',
active: ev.detail?.active,
tabs: ev.detail?.tabs,
reason: ev.detail?.reason,
}));
} catch {}
});
// Repaint after a debug-tab → primary-pane transition. The debug
// tabs (Activity / Refs / Inspector) hide the Terminal pane via
// .tab-content { display: none }; xterm doesn't auto-redraw when its
// container flips back to visible, so we listen for the close-debug
// event and force a fit + refresh.
const observer = new MutationObserver(() => {
const term = document.getElementById('tab-terminal');
if (term?.classList.contains('active')) {
requestAnimationFrame(repaintIfLive);
}
});
const target = document.getElementById('tab-terminal');
if (target) observer.observe(target, { attributes: true, attributeFilter: ['class'] });
tryAutoConnect();
}
/**
* Eager-connect when the sidebar opens. Polls for sidepanel.js to populate
* window.gstackServerPort + window.gstackAuthToken (which it does as soon
* as /health succeeds), then fires connect() automatically. The user
* doesn't have to press a key — Terminal is the default tab and "tap to
* start" was a needless paper cut on every reload.
*/
function tryAutoConnect() {
if (state !== STATE.IDLE) return;
let waited = 0;
const tick = () => {
// If the user navigated away (Chat tab) or already connected, drop out.
if (state !== STATE.IDLE) return;
if (getServerPort() && getAuthToken()) {
connect();
return;
}
waited += 200;
if (waited > 15000) {
setState(STATE.IDLE, { message: 'Browse server not ready. Reload sidebar to retry.' });
return;
}
setTimeout(tick, 200);
};
tick();
}
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', init);
} else {
init();
}
})();