mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-22 04:38:24 +08:00
029356e1f0693f22cb1fa4524c9b0f28ceab5a1b
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7ca04d8ef0 |
v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569) gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin (e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints, analytics, and learnings into that plugin's directory. Anyone with the Codex plugin installed alongside gstack hit this silently. Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path contains "gstack"). Skill installs fall through to \$HOME/.gstack. Contributed by @ElliotDrel via #1570. Closes #1569. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+ gbrain v0.20+ changed `gbrain sources list --json` to return {sources: [...]} instead of a flat array. sourceLocalPath crashed upstream with `list.find is not a function` on every /sync-gbrain invocation against modern gbrain. Accept both shapes for forward/backward compat, matching probeSource/sourcePageCount in lib/gbrain-sources.ts. Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564 (@tonyjzhou, same fix, different shape — credit retained). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559) gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`, which fails in any environment where the `command` builtin isn't on the spawned process's PATH (most non-interactive shells). The probe then reported gbrain as missing even when it was installed, and context-load silently skipped vector/list queries. Fix: probe `gbrain --version` directly with a 500ms timeout (matching the rest of the file's MCP_TIMEOUT_MS). Same semantics, works everywhere execFile works. Contributed by @jbetala7 via #1560. Closes #1559. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418) Adds an exec-path regression test that runs a fake gbrain shim emitting the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings", exit 1 for health_score < 100, no top-level `engine` field). Confirms freshDetectEngineTier recovers stdout from the non-zero exit and falls back to GBRAIN_HOME/config.json for the engine label. The pre-existing test for #1415 only stripped gbrain from PATH; this test exercises the actual doctor parse path, closing the gap that codex's plan review flagged. Also documents the schema_version separation in lib/gbrain-local-status.ts: the local CacheEntry stays at version 1, distinct from the doctor-output schema_version which we accept across versions in gstack-memory-helpers. Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2 collapse). The fix landed pre-emptively in v1.29.x; this commit pins it with a stronger test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346) #1346 reported that gstack-memory-ingest still called the renamed gbrain put_page subcommand on gbrain v0.18+. The actual code migrated to `gbrain put` and later to batch `gbrain import <dir>` before this report landed — only documentation lag remained. This commit: - Updates the --help string ("Skip gbrain put calls (still updates state file)") so user-facing docs match the shipped subcommand - Updates two inline comments that still referenced the old name - Adds test/memory-ingest-no-put_page.test.ts: a regression pin that strips comments from bin/gstack-memory-ingest.ts and fails the build if "put_page" appears in any active code or string literal, plus a sanity check that the file still calls a supported gbrain page-write verb (put or import) Closes #1346. Reporter @kylma-code surfaced the doc lag; the original code migration credit is on the v1.27.x wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug> scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions using the renamed `gbrain put_page` subcommand across 10 skills (office-hours, investigate, plan-ceo-review, retro, plan-eng-review, ship, cso, design-consultation, fallback, entity-stub). Every gstack user copying those snippets hit "unknown command: put_page" on gbrain v0.18+. This commit: - Rewrites all 10 instruction templates to use `gbrain put <slug> --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML frontmatter inside --content, matching the v0.18+ subcommand shape - Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands" table to reference `gbrain put` and `gbrain get` - Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two invariants: (a) resolver source ships only canonical instructions, (b) every tracked SKILL.md file is free of `gbrain put_page` CHANGELOG entries are deliberately left untouched (historical record). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561) Bun's Windows shell parser rejects multiple constructs the inline package.json build chain used: brace groups `{ cmd; }`, subshells with redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells near redirections in general. Every Windows install + every auto-upgrade since v1.34.2.0 has failed on `bun run build`. Extracts the build chain to scripts/build.sh and the .version writes to scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing involved. Also adds Windows-specific bun.exe handling for non-ASCII PATHs (a separate Windows footgun where Bun's --compile fails when the binary lives under a path with non-ASCII chars). Updates test/build-script-shell-compat.test.ts to assert the new shape: no subshells with redirections anywhere in the build chain, and build delegates to scripts/build.sh which delegates .version writes. Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed in build helper), #1480 (@mikepsinn, partial overlap), #1460 (@realcarsonterry, brace-group fix subsumed) — credit retained. Closes #1538, #1537, #1530, #1457, #1561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554) bun build --compile on Windows appends .exe to the output filename, producing browse.exe instead of browse. find-browse's existsSync probe only checked the bare path and returned null on Windows even when the binary was correctly built. .gitignore similarly only excluded the bare bin/gstack-global-discover path, leaving the .exe variant tracked. This commit: - .gitignore: changes `bin/gstack-global-discover` → `bin/gstack-global-discover*` so the Windows .exe variant is ignored - browse/src/find-browse.ts: adds isExecutable + findExecutable helpers that fall back to .exe/.cmd/.bat probing on Windows, mirroring the same helper already in make-pdf/src/browseClient.ts and pdftotext.ts Contributed by @Mike-E-Log via #1554. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest Adds .github/workflows/windows-setup-e2e.yml as the gate that catches Bun shell-parser regressions in the build chain before they reach users. Triggers on PRs touching package.json, scripts/build.sh, scripts/write-version-files.sh, setup, browse cli/find-browse, or gstack-paths. What it verifies: 1. bun run build completes on Windows (the previously-broken path that #1538/#1537/#1530/#1457/#1561 reported) 2. All compiled binaries land on disk (browse.exe, find-browse.exe, design.exe, gstack-global-discover.exe) 3. find-browse resolves to the .exe variant on Windows (regression gate for #1554) 4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT on Windows (regression gate for #1570) Complements the existing windows-free-tests.yml (curated unit subset); this new workflow exercises the install path itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209) Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together (mutually exclusive at argv level). Every /codex review, /review, and /ship structured Codex review call ended with an argv error before the model ran. Fix: scope the diff in prompt text using "Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD" instead of `--base <base>`. Preserves the filesystem boundary instruction across all invocations and keeps Codex's review prompt tuning. Touches: - codex/SKILL.md.tmpl + regenerated codex/SKILL.md - scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md - test/gen-skill-docs.test.ts: new regression that fails if any of the five known files still contain the prompt+--base shape - test/skill-validation.test.ts: corresponding negative + positive pin on the rendered SKILL.md files Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527 (@mvanhorn — same intent, different patch shape, CONFLICTING) and #1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): diff from git merge-base, not git diff origin/<base> (#1492) git diff origin/<base> shows everything since the common ancestor in both directions — it includes commits that landed on origin/<base> after this branch was created as deletions. That made /review and /ship's pre-landing structured review report inflated diff totals and flagged "removed" code that was actually still present in the working tree. Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff the working tree against that point. Same coverage of uncommitted edits, no phantom deletions from out-of-order base advancement. Applies to /review's Step 1 (diff existence check), Step 3 (get the diff), the build-on-intent scope-creep check, the structured review DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt. Same change flows into ship/SKILL.md via the shared resolver. Touches: - review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md - scripts/resolvers/review.ts - scripts/resolvers/review-army.ts Contributed by @mvanhorn via #1492. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522) #1503 reported that the bare codex review --base path stripped the filesystem boundary instruction, letting Codex spend tokens reading .claude/skills/ and agents/. #1522 proposed adding a skill-path detector that switched to the custom-instructions route when the diff touched skill files. After C10 (#1209) restructured codex review to always carry the boundary in the prompt (the prompt+--base argv conflict forced the restructure), the skill-path detector becomes redundant — every default call already preserves the boundary. This commit pins the post-#1209 invariant with a test that fails the build if any future refactor strips the boundary from codex/SKILL.md, review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test. #1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers its safety concern); credit retained in CHANGELOG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(skills): use command -v instead of which for codex detection (#1197) `which` is not on PATH in every shell — some Windows shells, BusyBox- only containers, and minimal CI images all fail when skills probe codex availability via `which codex`. `command -v` is a POSIX builtin and always available where the skill is running. Touched: - codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "") - scripts/resolvers/review.ts and scripts/resolvers/design.ts: 3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1` - Regenerated all 10 affected SKILL.md files (codex, review, ship, design-consultation, design-review, office-hours, plan-ceo-review, plan-design-review, plan-devex-review, plan-eng-review) - test/skill-validation.test.ts: updated pin + defensive regression test that fails if `which codex` returns to codex/SKILL.md - test/skill-e2e-plan.test.ts: updated summary regex Contributed by @mvanhorn via #1197. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327) When codex exits non-zero (parse errors, arg-shape breaks, model API errors that propagate as non-zero status), the calling agent previously saw an empty output and burned 30-60 minutes misdiagnosing as a silent model/API stall. The hang-detection block only caught exit 124 (the timeout-wrapper signal). Adds elif blocks in all four codex invocation sites (Review default, Challenge, Consult new-session, Consult resume) that: - Echo "[codex exit N] <stderr first line>" to stdout - Indent the first 20 stderr lines for inline context - Log codex_nonzero_exit telemetry tagged with the call site Contributed by @genisis0x via #1467. Closes #1327. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248) The design binary previously called process.env.OPENAI_API_KEY without checking where the key came from. If a user ran $D inside someone else's project that had OPENAI_API_KEY in its .env, the resulting generation billed that project's account. Silent and irreversible. Fix: resolveApiKeyInfo() returns both the key and its source. When the env-var path matches an OPENAI_API_KEY entry in the current directory's .env, .env.<NODE_ENV>, or .env.local file, we set a warning. requireApiKey() prints "Using OpenAI key from <source>" plus the warning before the run — never the key itself. Adds 6 unit tests covering: config-vs-env precedence, env-only (no match), env+cwd .env match, quoted/exported values, value-mismatch (no false positive), and the no-leak invariant for requireApiKey stderr output. Contributed by @jbetala7 via #1278. Closes #1248. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214) Full-page screenshots of tall pages routinely exceeded 2000px on the longest dimension, silently bricking the agent's session: the resulting base64 reached the Anthropic vision API which rejected the oversized image, leaving the agent burning turns on a useless blob with no stderr trace from the browse side. Adds browse/src/screenshot-size-guard.ts as a shared helper: - guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000 - guardScreenshotPath(path) → file-mode variant that rewrites in place - Aspect ratio preserved via sharp's resize fit:inside - Stderr diagnostic on any downscale so callers can see when it fired - Lazy sharp import so non-screenshot paths pay no startup cost Wires the guard into all three full-page callsites codex review flagged: - browse/src/snapshot.ts: annotated + heatmap fullPage captures - browse/src/meta-commands.ts: screenshot command (path + base64 fullPage modes) plus the responsive 3-viewport sweep - browse/src/write-commands.ts: prettyscreenshot fullPage path Covers seven unit cases (pass-through, downscale, aspect ratio, exactly-2000px edge, file-mode rewrite) plus a static invariant test that fails the build if any of the three callsites stops importing the guard. Closes #1214. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370) The L4 TestSavant classifier in browse/src/security-classifier.ts can't be imported into the compiled browse server (onnxruntime-node dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent that used to host it (sidebar-agent.ts) was removed when the PTY proved out — leaving the classifier file shipped but with zero callers. Exactly the gap codex flagged in #1370. Adds browse/src/security-sidecar-entry.ts: a Node script that runs the classifier as a subprocess of the browse server. It reads NDJSON requests from stdin and writes id-correlated NDJSON responses to stdout, supporting: - op: "scan-page-content" — full L4 classifier scan - op: "ping" — liveness probe for the client's health check - op: "status" — classifier readiness (used by /pty-inject-scan to surface l4 { available: bool } in its response) Plus browse/src/find-security-sidecar.ts: a resolver that locates node + the bundled JS entry (browse/dist/security-sidecar.js, built in a follow-up package.json change) or falls back to the dev TS entry. Returns null cleanly when node isn't on PATH so the calling endpoint can degrade per D7 (extension WARN + user confirm). C17 of the security-stack wave. C18 adds the IPC client + lifecycle management; C19 wires the endpoint; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370) Adds browse/src/security-sidecar-client.ts to manage the Node L4 classifier subprocess from the compiled browse server: - Lazy spawn on first scan; reuses the same process across requests - Id-correlated request/response via NDJSON over stdio - 5s default per-scan timeout; 64KB payload cap (short-circuits before spawn so oversized requests don't waste a process) - 3-in-10-minutes respawn cap → trips circuit breaker; subsequent scans throw immediately so the /pty-inject-scan endpoint can surface l4 { available: false } to the extension and degrade to WARN+confirm - process.on('exit') sends SIGTERM to the child for clean teardown - isSidecarAvailable() lets the endpoint probe before scan calls so the response shape reflects degraded mode honestly Unit tests cover the payload cap, the availability probe, and the breaker-doesn't-crash invariant under repeated rejected calls. C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20 routes the extension through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370) The sidebar's gstackInjectToTerminal callers (toolbar Cleanup, Inspector "Send to Code") were piping page-derived text directly into the live claude PTY with ZERO classifier processing — the gap codex flagged in #1370. The documented sidebar security stack had a hole the size of every Cleanup-button click. Adds POST /pty-inject-scan to browse/src/server.ts: - Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the general 404 path; never reaches the scan logic) - Root-token auth via existing validateAuth() — 401 on unauth - 64KB request cap → 413 + payload-too-large body - 5s scan timeout via sidecar client - URL-blocklist forced to BLOCK in PTY context (page-derived REPL input is higher-risk than ordinary tool output) - L4 ML classifier via the sidecar when available; degrades to WARN per D7 when sidecar is unavailable - Response goes through JSON.stringify(..., sanitizeReplacer) per v1.38.0.0 Unicode-egress hardening - Imports only from security-sidecar-client.ts, never directly from security-classifier.ts (which would brick the compiled Bun binary) Seven static-invariant tests pin the POST verb, auth gate, 64KB cap, tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability shape, and the no-direct-classifier-import rule. C19 of the security-stack wave. C20 routes the extension through it; C21 adds the invariant AST check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370) Closes the documented-vs-shipped gap codex flagged in #1370. The sidebar's two PTY-injection call sites (Inspector "Send to Code" and toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint before writing to the live claude REPL. Adds window.gstackScanForPTYInject(text, origin) to extension/sidepanel-terminal.js: - Async, returns { allow, verdict, reasons, l4 } - POST to /pty-inject-scan with the existing root-token auth - WARN+confirm on scan failure (network down, sidecar absent, etc.) rather than silent PASS — D7 honest-degradation gstackInjectToTerminal stays synchronous, returns boolean. Per D6: keeping the inject sync means existing `const ok = ...?.()` callers don't break, and the invariant test in test/extension-pty-inject-invariant.test.ts can statically pin that every call goes through the scan first. extension/sidepanel.js call sites updated: - inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via window.confirm, PASS injects silently - runCleanup() → same flow. Static cleanup prompt always PASSes but still routes through scan to honor the invariant. C20 of the security-stack wave. C21 adds the static invariant test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(security): invariant — extension PTY inject must be scan-gated (#1370) Static-analysis invariant test that fails the build if any extension/*.js path calls window.gstackInjectToTerminal without a preceding window.gstackScanForPTYInject in the same enclosing function. Closes the documented-vs-shipped gap codex demanded a machine check on. Rules: - Rule 1: any file that calls inject must also reference scan - Rule 2: in the enclosing function (function declaration, arrow, async (), event handler), a scan call must appear before the inject call by source position - Exemption: sidepanel-terminal.js (the file that DEFINES the inject function) is exempt from Rule 2 since the definition is not a call Plus two structural checks: - sidepanel-terminal.js defines both the inject and scan functions - inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would silently break the `const ok = ...?.()` pattern at every caller C21 of the security-stack wave. The sidecar architecture (#1370) is complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension pre-scan wiring (C20), and now the regression gate (C21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112) Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current browse/src/stealth.ts contract. The existing minimal "codex narrowed" stealth (webdriver-mask + AutomationControlled launch arg) stays the default. PR #1112's six additional patches are added behind an opt-in GSTACK_STEALTH=extended env flag. Extended-mode patches (applied AFTER the default mask, in order): 1. delete navigator.webdriver from prototype (not just the getter — detectors check `"webdriver" in navigator`) 2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1 software-GPU tell in containers) 3. navigator.plugins returns a PluginArray-prototype-passing array with MimeType objects and namedItem() 4. window.chrome populated with chrome.app, chrome.runtime, chrome.loadTimes(), chrome.csi() with realistic shapes 5. navigator.mediaDevices backfilled when headless drops it 6. CDP cdc_*-prefixed window globals cleared Why opt-in: the default mode's contract is fingerprint CONSISTENCY, which protects against detectors that flag spoofing mismatch. Extended mode actively lies about the environment; sites that reflect on these properties can break. Users who hit detection in default mode can flip GSTACK_STEALTH=extended for SannySoft 100% pass-rate. Twenty unit tests pin the env-flag semantics, all six patches' code presence, and the applyStealth wiring order. Live SannySoft pass-rate verification stays in the periodic-tier E2E suite. Contributed by @garrytan via #1112 (rebased — original PR opened before the codex-narrowed minimum landed; rebase preserves the narrowed default while adding the SannySoft-passing path as opt-in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates Updates the three ship-SKILL.md golden baselines (claude, codex, factory hosts) to match the new shape produced by: - C10 #1209 codex argv (prompt + diff scope, no --base) - C11 #1492 merge-base diff (DIFF_BASE= preamble) - C13 #1197 command -v for codex detection - C12 + boundary preservation per regen-enforcing test Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs, commit the regenerated outputs together. Goldens are part of the regen contract — without this commit, test/host-config.test.ts' golden-baseline checks fail with the diff codex review surfaced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes) Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the release-summary format in CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table, "What this means for builders" closer, then itemized Added/Changed/Fixed/For contributors with inline credit to every PR author and original issue reporter. Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net, substantial new capability across security (PTY sidecar wiring), install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI 0.130+), and quality (screenshot guard, design key disclosure, extended stealth opt-in). MINOR is the right call. Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537, #1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327, #1193 pattern, #1152 pattern. Credit retained inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe] windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a freshly-built repo where binaries land at browse/dist/browse.exe (no .claude/skills/gstack/ install layout). The previous markers chain only matched .codex/.agents/.claude prefixed paths, so find-browse exited "not found" even when the binary was present. Adds a source-checkout fallback after the marker scan: if no installed layout resolves but <git-root>/browse/dist/browse[.exe] exists, return that. Three real callers hit this path: - gstack repo dev workflow before `./setup` runs - windows-setup-e2e.yml CI (the breakage that surfaced this) - make-pdf consumers running from a sibling source checkout Smoke-verified: a fresh git repo with browse/dist/browse on disk now resolves through the source-checkout branch (was returning null before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
00f966b3ec |
v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391)
* fix(codex): use resume-compatible flags * fix: V-001 security vulnerability Automated security fix generated by Orbis Security AI * docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 in |
||
|
|
ed1e4be2f6 |
feat: gstack browser sidebar = interactive Claude Code REPL with live tab awareness (v1.14.0.0) (#1216)
* build: vendor xterm@5 for the Terminal sidebar tab
Adds xterm@5 + xterm-addon-fit as devDependencies and a `vendor:xterm`
build step that copies the assets into `extension/lib/` at build time.
The vendored files are .gitignored so the npm version stays the source
of truth. xterm@5 is eval-free, so no MV3 CSP changes needed.
No runtime callers yet — this just stages the assets.
* feat(server): add pty-session-cookie module for the Terminal tab
Mirrors `sse-session-cookie.ts` exactly. Mints short-lived 30-min HttpOnly
cookies for authenticating the Terminal-tab WebSocket upgrade against
the terminal-agent. Same TTL, same opportunistic-pruning shape, same
"scoped tokens never valid as root" invariant. Two registries instead of
one because the cookie names are different (`gstack_sse` vs `gstack_pty`)
and the token spaces must not overlap.
No callers yet — wired up in the next commit.
* feat(server): add terminal-agent.ts (PTY for the Terminal sidebar tab)
Translates phoenix gbrowser's Go PTY (cmd/gbd/terminal.go) into a Bun
non-compiled process. Lives separately from `sidebar-agent.ts` so a
WS-framing or PTY-cleanup bug can't take down the chat path (codex
outside-voice review caught the coupling risk).
Architecture:
- Bun.serve on 127.0.0.1:0 (never tunneled).
- POST /internal/grant accepts cookie tokens from the parent server over
loopback, authenticated with a per-boot internal token.
- GET /ws upgrades require BOTH (a) Origin: chrome-extension://<id> and
(b) the gstack_pty cookie minted by /pty-session. Either gate alone is
insufficient (CSWSH defense + auth defense).
- Lazy spawn: claude PTY is not started until the WS receives its first
data frame. Idle sidebar opens cost nothing.
- Bun PTY API: `terminal: { rows, cols, data(t, chunk) }` — verified at
impl time on Bun 1.3.10. proc.terminal.write() for input,
proc.terminal.resize() for resize, proc.kill() + 3s SIGKILL fallback
on close.
- process.on('uncaughtException'|'unhandledRejection') handlers so a
framing bug logs but doesn't kill the listener loop.
Test-only `BROWSE_TERMINAL_BINARY` env override lets the integration
tests spawn /bin/bash instead of requiring claude on every CI runner.
Not yet spawned by anything — wired in the next commit.
* feat(server): wire /pty-session route + spawn terminal-agent
Server-side glue connecting the Terminal sidebar tab to the new
terminal-agent process.
server.ts:
- New POST /pty-session route. Validates AUTH_TOKEN, mints a gstack_pty
HttpOnly cookie via pty-session-cookie.ts, posts the cookie value to
the agent's loopback /internal/grant. Returns the terminalPort + Set-Cookie
to the extension.
- /health response gains `terminalPort` (just the port number — never a
shell token). Tokens flow via the cookie path, never /health, because
/health already surfaces AUTH_TOKEN to localhost callers in headed mode
(that's a separate v1.1+ TODO).
- /pty-session and /terminal/* are deliberately NOT added to TUNNEL_PATHS,
so the dual-listener tunnel surface 404s by default-deny.
- Shutdown path now also pkills terminal-agent and unlinks its state files
(terminal-port + terminal-internal-token) so a reconnect doesn't try to
hit a dead port.
cli.ts:
- After spawning sidebar-agent.ts, also spawn terminal-agent.ts. Same
pattern: pkill old instances, Bun.spawn(['bun', 'run', script]) with
BROWSE_STATE_FILE + BROWSE_SERVER_PORT env. Non-fatal if the spawn
fails — chat still works without the terminal agent.
* feat(extension): Terminal as default sidebar tab
Adds a primary tab bar (Terminal | Chat) above the existing tab-content
panes. Terminal is the default-active tab; clicking Chat returns to the
existing claude -p one-shot flow which is preserved verbatim.
manifest.json: adds ws://127.0.0.1:*/ to host_permissions so MV3 doesn't
block the WebSocket upgrade.
sidepanel.html: new primary-tabs nav, new #tab-terminal pane with a
"Press any key to start Claude Code" bootstrap card, claude-not-found
install card, xterm mount point, and "session ended" restart UI. Loads
xterm.js + xterm-addon-fit + sidepanel-terminal.js. tab-chat is no
longer the .active default.
sidepanel.js: new activePrimaryPaneId() helper that reads which primary
tab is selected. Debug-close paths now route back to whichever primary
pane is active (was hardcoded to tab-chat). Primary-tab click handler
toggles .active classes and aria-selected. window.gstackServerPort and
window.gstackAuthToken exposed so sidepanel-terminal.js can build the
/pty-session POST and the WS URL.
sidepanel-terminal.js (new): xterm.js lifecycle. Lazy-spawn — first
keystroke fires POST /pty-session, then opens
ws://127.0.0.1:<terminalPort>/ws. Origin + cookie are set automatically
by the browser. Resize observer sends {type:"resize"} text frames.
ResizeObserver, tab-switch hooks, restart button, install-card retry.
On WS close shows "Session ended, click to restart" — no auto-reconnect
(codex outside-voice flagged that as session-burning).
sidepanel.css: primary-tabs bar + Terminal pane styling (full-height
xterm container, install card, ended state).
* test: terminal-agent + cookie module + sidebar default-tab regression
Three new test files:
terminal-agent.test.ts (16 tests): pty-session-cookie mint/validate/
revoke, Set-Cookie shape (HttpOnly + SameSite=Strict + Path=/, NO Secure
since 127.0.0.1 over HTTP), source-level guards that /pty-session and
/terminal/* are NOT in TUNNEL_PATHS, /health does NOT surface ptyToken
or gstack_pty, terminal-agent binds 127.0.0.1, /ws upgrade enforces
chrome-extension:// Origin AND gstack_pty cookie, lazy-spawn invariant
(spawnClaude is called from message handler, not upgrade), uncaughtException/
unhandledRejection handlers exist, SIGINT-then-SIGKILL cleanup.
terminal-agent-integration.test.ts (7 tests): spawns the agent as a real
subprocess in a tmp state dir. Verifies /internal/grant accepts/rejects
the loopback token, /ws gates (no Origin → 403, bad Origin → 403, no
cookie → 401), real WebSocket round-trip with /bin/bash via the
BROWSE_TERMINAL_BINARY override (write 'echo hello-pty-world\n', read it
back), and resize message acceptance.
sidebar-tabs.test.ts (13 tests): structural regression suite locking the
load-bearing invariants of the default-tab change — Terminal is .active,
Chat is not, xterm assets are loaded, debug-close path no longer hardcodes
tab-chat (uses activePrimaryPaneId), primary-tab click handler exists,
chat surface is not accidentally deleted, terminal JS does NOT auto-
reconnect on close, manifest declares ws:// + http:// localhost host
permissions, no unsafe-eval.
Plan called for Playwright + extension regression; the codebase doesn't
ship Playwright extension launcher infra, so we follow the existing
extension-test pattern (source-level structural assertions). Same
load-bearing intent — locks the invariants before they regress.
* docs: Terminal flow + threat model + v1.1 follow-ups
SIDEBAR_MESSAGE_FLOW.md: new "Terminal flow" section. Documents the WS
upgrade path (/pty-session cookie mint → /ws Origin + cookie gate →
lazy claude spawn), the dual-token model (AUTH_TOKEN for /pty-session,
gstack_pty cookie for /ws, INTERNAL_TOKEN for server↔agent loopback),
and the threat-model boundary — the Terminal tab bypasses the entire
prompt-injection security stack on purpose; user keystrokes are the
trust source. That trust assumption is load-bearing on three transport
guarantees: local-only listener, Origin gate, cookie auth. Drop any
one of those three and the tab becomes unsafe.
CLAUDE.md: extends the "Sidebar architecture" note to include
terminal-agent.ts in the read-this-first list. Adds a "Terminal tab is
its own process" note so a future contributor doesn't bolt PTY logic
onto sidebar-agent.ts.
TODOS.md: three new follow-ups under a new "Sidebar Terminal" section:
- v1.1: PTY session survives sidebar reload (Issue 1C deferred).
- v1.1+: audit /health AUTH_TOKEN distribution (codex finding #2 —
a pre-existing soft leak that cc-pty-import sidesteps but doesn't
fix).
- v1.1+: apply terminal-agent's process.on exception handlers to
sidebar-agent.ts (codex finding #4 — chat path has no fatal
handlers).
* feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip
The chat queue path is gone. The Chrome side panel is now just an
interactive claude PTY in xterm.js. Activity / Refs / Inspector still
exist behind the `debug` toggle in the footer.
Three threads of change, all from dogfood iteration on top of
cc-pty-import:
1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol
- Browsers can't set Authorization on a WebSocket upgrade. We had
been minting an HttpOnly gstack_pty cookie via /pty-session, but
SameSite=Strict cookies don't survive the cross-port jump from
server.ts:34567 to the agent's random port from a chrome-extension
origin. The WS opened then immediately closed → "Session ended."
- /pty-session now also returns ptySessionToken in the JSON body.
- Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`.
Browser sends Sec-WebSocket-Protocol on the upgrade.
- Agent reads the protocol header, validates against validTokens,
and MUST echo the protocol back (Chromium closes the connection
immediately if a server doesn't pick one of the offered protocols).
- Cookie path is kept as a fallback for non-browser callers (curl,
integration tests).
- New integration test exercises the full protocol-auth round-trip
via raw fetch+Upgrade so a future regression of this exact class
fails in CI.
2. fix(extension): UX polish on the Terminal pane
- Eager auto-connect when the sidebar opens — no "Press any key to
start" friction every reload.
- Always-visible ↻ Restart button in the terminal toolbar (not
gated on the ENDED state) so the user can force a fresh claude
mid-session.
- MutationObserver on #tab-terminal's class attribute drives a
fitAddon.fit() + term.refresh() when the pane becomes visible
again — xterm doesn't auto-redraw after display:none → display:flex.
3. feat(extension): rip the chat tab + sidebar-agent.ts
- Sidebar is Terminal-only. No more Terminal | Chat primary nav.
- sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat,
/sidebar-agent/event, /sidebar-tabs* and friends all deleted.
- The pickSidebarModel router (sonnet vs opus) is gone — the live
PTY uses whatever model the user's `claude` CLI is configured with.
- Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive
in the Terminal toolbar. Cleanup now injects its prompt into the
live PTY via window.gstackInjectToTerminal — no more
/sidebar-command POST. The Inspector "Send to Code" action uses
the same injection path.
- clear-chat button removed from the footer.
- sidepanel.js shed ~900 lines of chat polling, optimistic UI,
stop-agent, etc.
Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and
docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar
regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27
structural assertions locking the new layout — Terminal sole pane,
no chat input, quick-actions in toolbar, eager-connect, MutationObserver
repaint, restart helper.
* feat: live tab awareness for the Terminal pane
claude in the PTY now has continuous tab-aware context. Three pieces:
1. Live state files. background.js listens to chrome.tabs.onActivated /
onCreated / onRemoved / onUpdated (throttled to URL/title/status==
complete so loading spinners don't spam) and pushes a snapshot. The
sidepanel relays it as a custom event; sidepanel-terminal.js sends
{type:"tabState"} text frames over the live PTY WebSocket.
terminal-agent.ts writes:
<stateDir>/tabs.json all open tabs (id, url, title, active,
pinned, audible, windowId)
<stateDir>/active-tab.json current active tab (skips chrome:// and
chrome-extension:// internal pages)
Atomic write via tmp + rename so claude never reads a half-written
document. A fresh snapshot is pushed on WS open so the files exist by
the time claude finishes booting.
2. New $B tab-each <command> [args...] meta-command. Fans out a single
command across every open tab, returns
{command, args, total, results: [{tabId, url, title, status, output}]}.
Skips chrome:// pages; restores the originally active tab in a finally
block (so a mid-batch error doesn't leave the user looking at a
different tab); uses bringToFront: false so the OS window doesn't
jump on every fanout. Scope-checks the inner command BEFORE the loop.
3. --append-system-prompt hint at spawn time. Claude is told about both
the state files and the $B tab-each command up front, so it doesn't
have to discover the surface by trial. Passed via the --append-system-
prompt CLI flag, NOT as a leading PTY write — the hint stays out of
the visible transcript.
Tests:
- browse/test/tab-each.test.ts (new) — registration + source-level
invariants (scope check before loop, finally-restore, bringToFront:false,
chrome:// skip) + behavior tests with a mock BrowserManager that verify
iteration order, JSON shape, error handling, and active-tab restore.
- browse/test/terminal-agent.test.ts — three new assertions for
tabState handler shape, atomic-write pattern, and the
--append-system-prompt wiring at spawn.
Verified live: opened 5 tabs, ran $B tab-each url against the live
server, got per-tab JSON results back, original active tab restored
without OS focus stealing.
* chore: drop sidebar-agent test refs after chat rip
Five test files / describe blocks targeted the deleted chat path:
- browse/test/security-e2e-fullstack.test.ts (full-stack chat-pipeline E2E
with mock claude — whole file gone)
- browse/test/security-review-fullstack.test.ts (review-flow E2E with real
classifier — whole file gone)
- browse/test/security-review-sidepanel-e2e.test.ts (Playwright E2E for
the security event banner that was ripped from sidepanel.html)
- browse/test/security-audit-r2.test.ts (5 describe blocks: agent queue
permissions, isValidQueueEntry stateFile traversal, loadSession session-ID
validation, switchChatTab DocumentFragment, pollChat reentrancy guard,
/sidebar-tabs URL sanitization, sidebar-agent SIGTERM→SIGKILL escalation,
AGENT_SRC top-level read converted to graceful fallback)
- browse/test/security-adversarial-fixes.test.ts (canary stream-chunk split
detection on detectCanaryLeak; one tool-output test on sidebar-agent)
- test/skill-validation.test.ts (sidebar agent #584 describe block)
These all assumed sidebar-agent.ts existed and tested chat-queue plumbing,
chat-tab DOM round-trip, chat-polling reentrancy, or per-message classifier
canary detection. With the live PTY there is no chat queue, no chat tab,
no LLM stream to canary-scan, and no per-message subprocess. The Terminal
pane's invariants are covered by the new browse/test/sidebar-tabs.test.ts
(27 structural assertions), browse/test/terminal-agent.test.ts, and
browse/test/terminal-agent-integration.test.ts.
bun test → exit 0, 0 failures.
* chore: bump version and changelog (v1.14.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(extension): xterm fills the full Terminal panel height
The Terminal pane only rendered into the top portion of the panel — most
of the panel below the prompt was an empty black gap. Three layered
issues, all about xterm.js measuring dimensions during a layout state
that wasn't ready yet:
1. order-of-operations in connect(): ensureXterm() ran BEFORE
setState(LIVE), so term.open() measured els.mount while it was still
display:none. xterm caches a 0-size viewport synchronously inside
open() and never auto-recovers when the container goes visible.
Flipped: setState(LIVE) → ensureXterm.
2. first fit() ran synchronously before the browser had applied the
.active class transition. Wrapped in requestAnimationFrame so layout
has settled before fit() reads clientHeight.
3. CSS flex-overflow trap: .terminal-mount has flex:1 inside the
flex-column #tab-terminal, but .tab-content's `overflow-y: auto` and
the lack of `min-height: 0` on .terminal-mount meant the item
couldn't shrink below content size. flex:1 then refused to expand
into available space and xterm rendered into whatever its initial
2x2 measurement happened to be.
Fixes:
- extension/sidepanel-terminal.js: reorder + RAF fit
- extension/sidepanel.css: .terminal-mount gets `flex: 1 1 0` +
`min-height: 0` + `position: relative`. #tab-terminal overrides
.tab-content's `overflow-y: auto` to `overflow: hidden` (xterm has
its own viewport scroll; the parent shouldn't compete) and explicitly
re-declares `display: flex; flex-direction: column` for #tab-terminal.active.
bun test browse/test/sidebar-tabs.test.ts → 27/27 pass.
Manually verified: side panel opens → Terminal fills full panel height,
xterm scrollback works, debug-tab toggle still repaints correctly.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|