* fix(codex): use resume-compatible flags
* fix: V-001 security vulnerability
Automated security fix generated by Orbis Security AI
* docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up)
CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped
0.60 → 0.75 in d75402bb (v1.6.4.0, "cut Haiku classifier FP from 44% to
23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75
and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and
ARCHITECTURE.md still read 0.60.
Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in
security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold
table).
No code change. No behavior change. Pure doc-vs-code alignment.
Verification:
$ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
browse/src/security.ts:37: WARN: 0.75,
CLAUDE.md:290: - \`WARN: 0.75\` ...
ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)...
BROWSER.md:743: - \`WARN: 0.75\` ...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: Korean/CJK IME input and rendering in Sidebar Terminal
Fixes#1272
This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal:
**Bug 1 - IME Input**: Korean text typed via IME composition was not
reaching the PTY correctly. Added compositionstart/compositionend event
listeners to suppress partial jamo fragments and only send the final
composed string.
**Bug 2a - Font Rendering**: Added CJK monospace font fallbacks
("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js
fontFamily config and the CSS --font-mono variable. This ensures
consistent cell-width calculations for Korean characters.
**Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent
multi-byte UTF-8 characters (Korean is 3 bytes) from being split across
WebSocket chunks. This follows the same pattern as PR #1007 which fixed
the sidebar-agent path, but extends it to the terminal-agent path.
Special thanks to @ldybob for the excellent root cause analysis and
proposed solutions in issue #1272.
Tested on WSL2 + Windows 11 with Korean IME.
* fix(ship): tighten Plan Completion gate (VAS-449 remediation)
VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has
/docs/dashboard.md) silently skipped. /ship's Plan Completion subagent
existed at ship time (added in v1.4.1.0) but the gate let the failure
through. Four structural fixes:
1. Path concreteness rule: items naming a concrete filesystem path MUST
be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE.
2. Validator detection: CONTENT-SHAPE items scan target repo's
package.json for validate-* scripts and run them before falling back
to UNVERIFIABLE.
3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked
each one" with per-item Y/N/D loop. The blanket-confirm path is the
exact failure VAS-449 surfaced.
4. Subagent fail-closed: if Plan Completion subagent + inline fallback
both fail, surface explicit AskUserQuestion instead of silent pass.
Replaces the prior "Never block /ship on subagent failure" fail-open.
Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions,
no LLM dependency, ~60ms).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(browse): bash.exe wrap for telemetry on Windows
reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args)
where bin is the gstack-telemetry-log bash script. On Windows this fails
silently with ENOENT — CreateProcess can't dispatch on shebang lines.
Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from
browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for
resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path
or PATH-resolvable override, falling back to Bun.which('bash') which finds
Git Bash on the standard Windows install.
buildTelemetrySpawnCommand() wraps the script invocation on win32 only;
POSIX path is bit-identical. Returns null when bash can't be resolved on
Windows so caller skips spawn — local attempts.jsonl audit trail keeps
working without surfacing a Windows-only failure.
8 new unit tests cover resolveBashBinary (POSIX bash, absolute override,
quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand
(POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array
immutability).
POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the
same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on.
* fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows
Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in
browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the
codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and
make-pdf/src/pdftotext.ts:resolvePdftotext.
Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which`
isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same
Bun.which-based fix shape, same env override convention.
Changes:
- GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides;
BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases.
- Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles
Windows PATHEXT natively; no more `where`-vs-`which` branch.
- findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat
after the bare-path miss on win32. Linux/macOS behavior is bit-identical
(isExecutable short-circuits before the win32 branch ever runs).
- macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local,
/usr/bin). No Windows candidates added; Poppler installs scatter across
Scoop/Chocolatey/portable zips and guessing causes false positives.
- Error messages get a Windows install hint (scoop install poppler / oschwartz10612)
and `setx` example for GSTACK_*_BIN.
- Pre-existing test 'honors BROWSE_BIN when it points at a real executable'
was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant
(cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own.
Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a
narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming
are additive. Either order of merge works.
Test plan:
- bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts
on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null,
resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip,
same shape for resolvePdftotext + Windows install hint in error message).
- POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits
before any win32 logic runs, matching the v1.24 claude-bin.ts pattern.
* fix(browse): NTFS ACL hardening for Windows state files via icacls
gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent
queue contents (with prompt history), session state, security-decision logs,
and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows,
those mode bits are a silent no-op: Node's fs module doesn't translate POSIX
modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable
by other principals on the machine (verified via icacls — six ACEs, the
intended user is the LAST of six).
Threat model is non-trivial on:
- Self-hosted CI runners (different service account on the same Windows box
can read developer tokens, canary tokens, prompt history)
- Shared development machines (agencies, studios, lab environments)
- Multi-tenant servers with shared home directories
Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write
side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin /
global / local installs; this PR ensures files written into those resolved
paths actually get the POSIX 0o600 semantic translated to NTFS.
The fix:
- New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset).
restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX)
or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile /
appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns.
- 19 call sites converted across 9 source files: browser-manager.ts,
browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts,
security-classifier.ts, security.ts (4 sites), server.ts (5 sites),
terminal-agent.ts (8 sites), tunnel-denial-log.ts.
- (OI)(CI) inheritance flags on directories mean files created via fs.write*
*inside* an mkdirSecure-created dir inherit the owner-only ACL automatically
— important for tunnel-denial-log.ts where appends use async fsp.appendFile.
Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened
environments) log a one-shot warning to stderr and proceed. Once-per-process
gating prevents log spam if the condition persists. Filesystem stays
functional; the file just ends up with inherited ACLs.
Test plan:
- bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX
mode-bit assertions, Windows no-throw, mkdir idempotence, recursive
creation, Buffer payloads, append-creates-then-reapplies-once semantics)
- bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security
test suite plus the bash-binary resolution tests added in fix#1119; the
converted writeFileSync/appendFileSync/mkdirSync sites in security.ts
integrate cleanly)
- Empirical icacls before/after on a real file — 6 ACEs → 1 ACE
- bun build typecheck on all modified files — clean (server.ts has a
pre-existing playwright-core/electron resolution issue unrelated to this PR)
POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the
helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux
and macOS see no behavior change.
Inviting pushback on three judgment calls (in PR description):
1. icacls vs npm library
2. ACL scope — just user, or user + SYSTEM?
3. Graceful degradation — once-per-process warn, not silent, not hard-fail.
* fix(browse): declare lastConsoleFlushed to restore console-log persistence
flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337
and assigns it at :344, but the `let lastConsoleFlushed = 0;`
declaration is missing — only the network and dialog siblings are
declared at lines 327-328.
Result: every 1-second flushBuffers tick (line 376) throws
`ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by
the catch at line 369 ("[browse] Buffer flush failed: ..."), and the
console branch's append never runs. browse-console.log is never
written in any production deployment since this regressed.
Discovered by stress-testing the daemon with 15 concurrent CLIs against
cold state — the race surfaced the buffer-flush error spam in one
spawned daemon's stderr. Verified by running the daemon against a real
file:// page with console.log events: in-memory `browse console`
returns the entries, but `.gstack/browse-console.log` is never created
on disk.
Regression introduced by 1a100a2a "fix: eliminate duplicate command
sets in chain, improve flush perf and type safety" — the flush refactor
switched from `Bun.write` to `fs.appendFileSync` and added the
`lastConsoleFlushed` cursor pattern alongside its network/dialog
siblings, but missed the matching `let` declaration. Tests don't
currently exercise flushBuffers, so the regression shipped silently.
Fix:
- Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed`
and `lastDialogFlushed` (browse/src/server.ts:327)
- Add a source-level guard test
(browse/test/server-flush-trackers.test.ts) that fails any future
refactor that adds a fourth `last*Flushed` cursor without the
matching declaration. Same pattern as terminal-agent.test.ts and
dual-listener.test.ts — read source as text, assert invariant, no
daemon required.
Test plan:
- [x] New regression test fails on current main, passes with the fix
- [x] `bun run build` clean
- [x] Manual smoke: spawn daemon -> goto file:// page with
console.log -> wait 4s -> .gstack/browse-console.log now
exists with the expected entries (163 bytes vs zero before)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(browse): per-process state-file temp path to fix concurrent-write ENOENT
The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.
Under concurrent writers the shared filename races on the rename:
t0 Writer A: writeFileSync(stateFile + '.tmp', payloadA)
t1 Writer B: writeFileSync(stateFile + '.tmp', payloadB) // overwrites A
t2 Writer A: renameSync(stateFile + '.tmp', stateFile) // moves B's payload
t3 Writer B: renameSync(stateFile + '.tmp', stateFile) // ENOENT — file gone
Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:
[browse] Failed to start: ENOENT: no such file or directory,
rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'
Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.
Fix:
- New `tmpStatePath()` helper (server.ts:333) returns
`${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
- All 4 call sites use `tmpStatePath()` instead of the shared literal
- Atomic rename still gives last-writer-wins semantics on the final
state.json content; only behavior change is that concurrent writers
no longer kill each other on the rename step
Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.
Test plan:
- [x] Targeted source-level guard test passes (3 / 0)
- [x] `bun run build` clean
- [x] Live regression: 15 concurrent CLIs against cold state →
15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
- [x] No `.tmp.*` orphans left behind after rename succeeds
- [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
findport) — same pre-existing flakes as `main`, no new
regressions introduced
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage
Asymmetric cleanup between two equivalent staleness conditions:
onMainFrameNavigated() → clearRefs() + activeFrame = null ✓
getActiveFrameOrPage() → activeFrame = null (refs NOT cleared) ✗
Both paths see the same staleness condition — refs were captured
against a frame that no longer exists. The main-frame path correctly
clears both pieces of state. The iframe-detach path nulls the frame
but leaves the refMap intact.
The lazy click-time check in `resolveRef` (tab-session.ts:97) partially
saves us — `entry.locator.count()` on a detached-frame locator throws
or returns 0, so the click errors out as "Ref X is stale". But the
user has no signal that frame context silently changed underfoot: the
next `snapshot` runs against `this.page` (main) while old iframe refs
still litter `refMap` with the same role+name keys. New refs collide
with stale ones, the resolver picks one at random, the user clicks
the wrong element.
TODOS.md line 816-820 documents "Detached frame auto-recovery" as a
shipped iframe-support feature in v0.12.1.0. This restores the
documented intent — the recovery should leave the session in a clean
state, not a half-cleared one.
Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null`
inside the if-branch.
Test plan:
- [x] New regression test: 4/4 pass
- refs cleared when getActiveFrameOrPage detects detached iframe
- refs preserved when active frame is still attached (no regression)
- refs preserved when no frame set (page-level path untouched)
- matches onMainFrameNavigated symmetry — both paths reach the
same clean end state
- [x] `bun run build` clean
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(codex): resolve python for JSON parser
* fix: add fail-fast probe for base branch in ship step 12
* fix(plan-devex-review): remove contradictory plan-mode handshake
* fix(design): honor Retry-After header in variants 429 handler
Closes#1244.
The 429 handler in `generateVariant` discarded the `Retry-After` response
header and fell straight through to a local exponential schedule (2s/4s/8s).
In image-generation batches, that burns retry attempts inside the provider's
cooldown window and the request never recovers.
Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`)
and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits
are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds
are validated as digits-only (rejects `2abc`). When `Retry-After` is honored
(including 0 / past-date "retry now"), the next iteration's leading exponential
sleep is skipped so we don't double-wait. Invalid or missing headers fall
through to the existing exponential schedule unchanged.
Behavior matrix:
| Header | Behavior |
|---------------------------------|-------------------------------------------|
| Retry-After: 5 | wait 5s, skip leading on next attempt |
| Retry-After: 999999 | capped to 60s, skip leading |
| Retry-After: 2abc | invalid, fall through to exponential |
| Retry-After: 0 | wait 0, skip leading (retry immediately) |
| Retry-After: <past HTTP-date> | wait 0, skip leading |
| Retry-After: <future date> | wait diff capped at 60s, skip leading |
| no header | fall through to existing exponential |
`generateVariant` now accepts an optional `fetchFn` parameter (defaults to
`globalThis.fetch`) so tests can inject a stub. Production call sites are
unchanged.
Tests cover the five behavior buckets above, asserting both the 1st-to-2nd
call timing gap and call counts. All five pass in ~8s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(docs): correct per-skill symlink removal snippet in README uninstall
Closes#1130.
The manual-uninstall fallback in `## Uninstall` → `### Option 2` used
`find ~/.claude/skills -maxdepth 1 -type l`, which finds nothing on real
installs. Each `~/.claude/skills/<name>/` is a real directory, and only
`<name>/SKILL.md` inside it is a symlink into `gstack/`. The find never
matched, so the snippet silently removed nothing.
Replace with a directory walk that inspects each `<name>/SKILL.md`:
find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack
→ check $dir/SKILL.md is a symlink → readlink it
→ if target is gstack/* or */gstack/*: rm -f the link, rmdir the dir
(only if empty — preserves any user-added files)
Excludes the top-level `gstack/` dir from the walk; that's removed by
step 3 of the same uninstall block.
`bin/gstack-uninstall` (the script-mode path) already handles the layout
correctly via its own walk; only this manual fallback needed updating.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: reject partial browse client env integers
* fix(gemini-adapter): detect new ~/.gemini/oauth_creds.json auth path
gemini-cli >=0.30 stores OAuth credentials at ~/.gemini/oauth_creds.json
instead of the legacy ~/.config/gemini/ directory. The benchmark adapter's
availability check now succeeds for users on recent gemini-cli releases
who have authenticated via interactive login.
Both paths are accepted so users on older versions still work.
* fix(browser): add --no-sandbox for root user on Linux/WSL2
Chromium's sandbox can't initialize when running as root on Linux,
causing an immediate exit. Extend the existing CI/CONTAINER check to
also cover this case, keeping the Windows-safe `typeof getuid` guard.
* security: pass cwd to git via execFileSync, not interpolation through /bin/sh
`bin/gstack-memory-ingest.ts:632-643` ran `execSync(\`git -C ${JSON.stringify(cwd)}
remote get-url origin 2>/dev/null\`, ...)`. JSON.stringify escapes `"` and `\`
but not `$` or backticks, so a `cwd` of `"$(touch /tmp/marker)"` survived JSON
quoting and detonated under /bin/sh's command-substitution-inside-double-quotes.
`cwd` originates from transcript JSONL records under
`~/.claude/projects/<encoded-cwd>/<uuid>.jsonl` and
`~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. The walker grabs the first
`.cwd` it sees per session. That's an untrusted surface in the gstack threat
model — the L1-L6 sidebar security stack exists exactly because agent
transcripts can carry attacker-influenced text. Two pivots above the local
same-uid bar: (a) prompt-injection appending `cwd="$(...)"` to the active
session log turns the next /sync-gbrain run into RCE under the user's uid;
(b) cross-machine transcript share (a colleague's `.claude/projects` snippet
untar'd into HOME, a documented gbrain dogfooding shape) → RCE on first sync.
Fix swaps the one execSync for `execFileSync("git", ["-C", cwd, "remote",
"get-url", "origin"], ...)`. No shell, argv passed directly to git. The same
module already uses execFileSync for `gbrainAvailable()` (line 762 pre-patch)
and `gbrainPutPage()` (line 816 pre-patch) — this single execSync was the
outlier.
Test: `gstack-memory-ingest security: untrusted cwd cannot trigger shell
substitution` plants a Claude-Code-shaped JSONL with cwd=`$(touch <marker>)`
and asserts the marker file is not created after `--incremental --quiet`.
Negative control: with the patch reverted, the test fails (marker created);
with the patch applied, it passes (18/18 in test/gstack-memory-ingest.test.ts).
* security: gate domain-skill auto-promote on classifier_score > 0
`browse/src/domain-skill-commands.ts:140` (handleSave) writes
`classifier_score: 0` with the comment "L4 deferred to load-time / sidebar-agent
fills this in on first prompt-injection load." But CLAUDE.md "Sidebar
architecture" documents that sidebar-agent.ts was ripped, and grep for
recordSkillUse + classifierFlagged callers across browse/src/ returns zero hits
outside the module under test.
Net effect: every quarantined skill that survives three benign uses without
flag (`recordSkillUse(... , classifierFlagged: false)` x3) auto-promotes to
`active` and lands in prompt context wrapped as UNTRUSTED on every subsequent
visit to that host. The L4 score that was supposed to gate the promotion was
never written — the production save path puts 0 on disk and nothing later
updates it.
Threat model: a domain-skill body authored by an agent under the influence of
a poisoned page (the new `gstackInjectToTerminal` PTY path runs no L1-L3
either) would lose its auto-promote barrier after three uses. The exploit
isn't single-step but the bar is exactly N=3 prompt-injection-shaped uses on
a hostile page, which is well within reach.
Fix adds a single condition to the auto-promote gate in `recordSkillUse`:
if (state === 'quarantined' && useCount >= PROMOTE_THRESHOLD &&
flagCount === 0 && current.classifier_score > 0) {
state = 'active';
}
`classifier_score` is set once at writeSkill and never updated. Production
saves it as 0 (handleSave), so the gate stays closed; existing tests that
explicitly pass `classifierScore: 0.1` still auto-promote (the auto-promote
path is preserved for the day L4 is rewired).
Manual promotion via `domain-skill promote-to-global` is unaffected (it goes
through `promoteToGlobal` which has its own state-machine guard at line 337+).
Test: new regression case `does NOT auto-promote when classifier_score is 0
(production handleSave shape)` plants a skill with classifierScore=0 (matches
domain-skill-commands.ts:140), runs three uses without flag, asserts the skill
stays quarantined and readSkill returns null. Negative control: revert the
patch, the test fails with `Received: "active"`. With the patch: 15/15 pass.
* fix(ship): port #1302 SKILL.md edits to .tmpl + resolver source
PR #1302 added Verification Mode + UNVERIFIABLE classification + per-item
confirmation gate to ship/SKILL.md, but only the generated SKILL.md was
edited — not the .tmpl source or scripts/resolvers/review.ts. The next
`bun run gen:skill-docs` run would have wiped the changes.
Port the same content into the resolver and .tmpl so regeneration produces
the intended output.
* ci(windows): extend free-tests lane to cover icacls + Bun.which resolvers from fix-wave PRs
Closes #1306/#1307/#1308 validation gap. The four newly-added test files
already have process.platform guards so they run safely on both POSIX and
Windows lanes — only platform-relevant assertions execute on each.
Tests added to the windows-latest lane:
- browse/test/file-permissions.test.ts (#1308 icacls + writeSecureFile)
- browse/test/security.test.ts (#1306 bash.exe wrap pure-function path)
- make-pdf/test/browseClient.test.ts (#1307 Bun.which browse resolver)
- make-pdf/test/pdftotext.test.ts (#1307 Bun.which pdftotext resolver)
* test(codex): live flag-semantics smoke for codex exec resume
Closes#1270's regex-only test gap. PR #1270 asserted that codex/SKILL.md's
`codex exec resume` invocation drops -C/-s and uses sandbox_mode config.
That regex catches the skill template regressing, but not codex CLI itself
flipping flag semantics again.
This test probes `codex exec resume --help` and asserts the surface gstack
relies on: -c/sandbox_mode is accepted, top-level -C is absent. Skips
silently when codex isn't on PATH, so dev machines without codex installed
never see it fail.
* chore: regen SKILL.md after fix wave
One regen commit at the end of the merge wave per the plan. plan-devex-review
loses the contradictory plan-mode handshake (#1333). review/SKILL.md picks up
the Verification Mode + UNVERIFIABLE classification additions that #1302
authored against ship/SKILL.md (same resolver shared between ship and review
modes).
* fix(server.ts): keep fs.writeFileSync for state-file writes
#1308's writeSecureFile wrapper added Windows icacls hardening for the
4 state-file write sites in server.ts, but #1310's regression test grep's
for fs.writeFileSync(tmpStatePath()) calls. The two changes are technically
compatible only if the test relaxes — keeping the test strict (the safer
choice for catching regressions on the cold-start race) means the 4 state-
file sites stay on fs.writeFileSync(..., { mode: 0o600 }).
POSIX 0o600 hardening is preserved on those 4 sites. Windows icacls
hardening still applies to all the other writeSecureFile call sites
#1308 added (auth.json, mkdirSecure, etc.).
Also refreshes golden baselines after #1302 / port + minor wording tweak
in scripts/resolvers/review.ts to keep gen-skill-docs.test.ts assertion
'Cite the specific file' satisfied.
* v1.30.0.0: fix wave — 21 community PRs + 2 closing fixes for Windows + codex CI gaps
Headline release. Browse stops dropping console logs, cold-start race
fixed, codex resume works without python3, Windows hardening (icacls +
Bun.which + bash.exe wrap), ship gate gets VAS-449 remediation, two
closing fixes that put icacls/Bun.which/codex flag semantics under CI.
* test(domain-skills): cover #1369 classifier_score=0 quarantine + score>0 promote path
The pre-existing T6 test seeded skills via writeSkill (which defaults
classifier_score to 0 until L4 is rewired) and then expected 3 uses to
auto-promote. PR #1369 added `current.classifier_score > 0` to the gate
specifically to block that path — a quarantined skill written under the
influence of a poisoned page would otherwise auto-promote after three
benign uses.
Updated test asserts both halves of the new contract:
- classifier_score=0 + 3 uses → stays quarantined (the security guarantee)
- classifier_score>0 + 3 more uses → promotes to active (unblock path)
Catches both regressions: the gate going away (would re-allow the bypass)
and the unblock path breaking (would silently quarantine all skills
forever once L4 is rewired).
---------
Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: orbisai0security <mediratta01.pally@gmail.com>
Co-authored-by: Bryce Alan <brycealan.eth@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Terry Carson YM <cym3118288@gmail.com>
Co-authored-by: Vasko Ckorovski <vckorovski@gmail.com>
Co-authored-by: Samuel Carson <samuel.carson@gmail.com>
Co-authored-by: Yashwant Kotipalli <yashwant7kotipalli@gmail.com>
Co-authored-by: Jasper Chen <jasperchen925@gmail.com>
Co-authored-by: Stefan Neamtu <stefan.neamtu@gmail.com>
Co-authored-by: 陈家名 <chenjiaming@kezaihui.com>
Co-authored-by: Abigail Atheryon <abi@atheryon.ai>
Co-authored-by: Furkan Köykıran <furkankoykiran@gmail.com>
Co-authored-by: gus <gustavoraularagon@gmail.com>
* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives
Adds the foundation layer for Puppeteer-parity features:
- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
enables load-html content to survive context recreation (viewport --scale)
via in-memory replay. ASCII lifecycle diagram in the source explains the
clear-before-navigation contract.
- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
for name aliases (setcontent / set-content / setContent → load-html),
consumed by server dispatch and chain prevalidation.
- buildUnknownCommandError() pure function — rich error messages with
Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.
- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
tokens can use it.
- screenshot and viewport descriptions updated for upcoming flags.
- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
Levenshtein threshold + alphabetical tiebreak, short-input guard,
NEW_IN_VERSION upgrade hint, alias + scope integration invariants.
No consumers yet — pure additive foundation. Safe to bisect on its own.
* feat(browse): accept file:// in goto with smart cwd/home-relative parsing
Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:
file:///abs/path.html → unchanged
file://./docs/page.html → file://<cwd>/docs/page.html
file://~/Documents/page.html → file://<HOME>/Documents/page.html
file://docs/page.html → file://<cwd>/docs/page.html
file://localhost/abs/path → unchanged
file://host.example.com/... → rejected (UNC/network)
file:// and file:/// → rejected (would list a directory)
Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.
Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.
Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).
Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.
References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.
* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing
Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:
1. deviceScaleFactor + currentViewport tracking. New private fields (default
scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
deviceScaleFactor is a context-level Playwright option — changing it
requires recreateContext(). The method validates (finite number, 1-3 cap,
headed-mode rejected), stores new values, calls recreateContext(), and
rolls back the fields on failure so a bad call doesn't leave inconsistent
state. Context options at all three sites (launch, recreate happy path,
recreate fallback) now honor the stored values instead of hardcoding
1280x720.
2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
loadedHtml from the session; restoreState replays it via newSession.
setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
is rehydrated and survives *subsequent* scale changes. In-memory only,
never persisted to disk (HTML may contain secrets or customer data).
3. newTab + restoreState now consume validateNavigationUrl's normalized
return value. file://./x, file://~/x, and bare-segment forms now take
effect at every navigation site, not just the top-level goto command.
Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.
References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.
* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch
Wires the new handlers and dispatch logic that the previous commits made
possible:
write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
actionable errors (not found, directory, oversize), extension allowlist
(.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
<div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
override, frame-context rejection. Calls session.setTabContent() so replay
metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
scale-only variant reads current size via page.viewportSize(). Invalid
scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
(not after) so a timed-out goto post-commit doesn't leave stale metadata
that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.
meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
Rejects alongside positional selector (both = error), preserves --clip
conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
watch blocking, and result labels all use canonical names while audit
labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.
server.ts
- Command canonicalization inserted immediately after parse, before scope /
watch / tab-ownership / content-wrapping checks. rawCommand preserved for
future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
upgrade hint for NEW_IN_VERSION entries.
security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
'for (const c of commands)' to match the new chain step shape. Same
isWatching + BLOCKED invariants still asserted.
* chore: bump version and changelog (v1.1.0.0)
- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
viewport --scale, file:// support, setContent replay, and DX polish in user
voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
"Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
to reflect the new command descriptions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review fixes (9 findings from specialist + adversarial review)
Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:
1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
await. Prior order left phantom HTML in replay metadata if setContent
threw (timeout, browser crash), which a later viewport --scale would
silently replay. Now loadedHtml is only recorded on successful load.
2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
recreateContext after restoring the old fields. The fallback path in
the original recreateContext builds a blank context using whatever
this.deviceScaleFactor/currentViewport hold at that moment (which were
the NEW values we were trying to apply). Rolling back the fields without
a second recreate left the live context at new-scale while state tracked
old-scale. Now: restore fields, force re-recreate with old values, only
if that ALSO fails do we return a combined error.
3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
alphabetically, so first equal-distance wins by default. The prior
'(d === bestDist && best !== undefined && cand < best)' clause was dead
code.
4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
refs + frame. Without this, a user who load-html'd then clicked a link
(or had a form submit / JS redirect / OAuth flow) would retain the stale
replay metadata. The next viewport --scale would silently revert the
tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
content was. Silent data corruption. Browser-emitted navigations trigger
this path via wirePageEvents.
5. browser-manager.ts:saveState + restoreState — tab ownership now flows
through BrowserState.owner. Without this, a scoped agent's viewport
--scale would strand them: tab IDs change during recreate, ownership
map held stale IDs, owner lookup failed. New IDs had no owner, so
writes without tabId were denied (DoS). Worse, if the agent sent a
stale tabId the server's swallowed-tab-switch-error path would let the
command hit whatever tab was currently active (cross-tab authz bypass).
Now: clear ownership before restore, re-add per-tab with new IDs.
6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
allowlist (url, isActive, storage:null) instead of object spread.
Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
user-writable state file, letting a tampered state.json smuggle HTML
past load-html's safe-dirs / extension / magic-byte / 50MB-cap
validators, or forge tab ownership. Now stripped at the boundary.
7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
across normalization. file://./app.html?route=home#login previously
resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
%23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
and fixture URLs with query params 404'd or loaded the wrong route.
Now: split on ?/# before path resolution, reattach after.
8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
parsed.hash to the normalized file:// URL. Same fix at the main
validator for absolute paths that go through fileURLToPath round-trip.
9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
'setcontent'). Previously the isAliased variable was computed but
dropped, losing the raw input from the forensic trail. Completes the
plan's codex v3 P2 requirement.
Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).
Tests pass: exit 0, no failures. Build clean.
* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases
Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).
**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected
**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage
**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws
**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
resurrect the stale HTML (validates the onMainFrameNavigated fix)
**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch
Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).
Full suite: exit 0, 223/223 pass.
* docs: update BROWSER.md + CHANGELOG for v1.1.0.0
BROWSER.md:
- Command reference table updated: goto now lists file:// support,
load-html added to Navigate row, viewport flagged with --scale
option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
--selector flag) and notes the tag-selector-not-caught-positionally
gotcha
- New "Retina screenshots — viewport --scale" subsection explains
deviceScaleFactor mechanics, context recreation side effects, and
headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
explains the two paths, their tradeoffs (URL state, relative asset
resolution), the safe-dirs policy, extension allowlist + magic-byte
sniff, 50MB cap, setContent replay across recreateContext, and the
alias routing (setcontent → load-html before scope check)
CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
JS redirects) — not just explicit goto. Also notes SPA query/fragment
preservation for goto file://
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* plan: batch command endpoint + multi-tab parallel execution for GStack Browser
* refactor: extract TabSession from BrowserManager for per-tab state
Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession
class. BrowserManager delegates to the active TabSession via
getActiveSession(). Zero behavior change — all existing tests pass.
This is the foundation for the /batch endpoint: both /command and /batch
will use the same handler functions with TabSession, eliminating shared
state races during parallel tab execution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: update handler signatures to use TabSession
Change handleReadCommand and handleSnapshot to take TabSession instead of
BrowserManager. Change handleWriteCommand to take both TabSession (per-tab
ops) and BrowserManager (global ops like viewport, headers, dialog).
handleMetaCommand keeps BrowserManager for tab management.
Tests use thin wrapper functions that bridge the old 3-arg call pattern to
the new signatures via bm.getActiveSession().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add POST /batch endpoint for parallel multi-tab execution
Execute multiple commands across tabs in a single HTTP request.
Commands targeting different tabs run concurrently via Promise.allSettled.
Commands targeting the same tab run sequentially within that group.
Features:
- Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.)
- newtab/closetab as special commands within batch
- SSE streaming mode (stream: true) for partial results
- Per-command error isolation (one tab failing doesn't abort the batch)
- Max 50 commands per batch, soft batch-level timeout
A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs
in parallel, batched commands).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add batch endpoint integration tests
10 tests covering:
- Multi-tab parallel execution (goto + text on different tabs)
- Same-tab sequential ordering
- Per-command error isolation (one tab fails, others succeed)
- Page-scoped refs (snapshot refs are per-session, not global)
- Per-tab lastSnapshot (snapshot -D with independent baselines)
- getSession/getActiveSession API
- Batch-safe command subset validation
- closeTab via page.close preserves at-least-one-page invariant
- Parallel goto on 3 tabs simultaneously
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25
The test was copying the full 55KB/1075-line codex SKILL.md into the fixture,
requiring 8 Read calls just to consume it and exhausting the 15-turn budget
before reaching the actual codex review command. Now extracts only the
review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: move batch endpoint plan into BROWSER.md as feature documentation
The batch endpoint is implemented — document it as an actual feature in
BROWSER.md (architecture, API shape, design decisions, usage pattern)
and remove the standalone plan file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.16.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: gstack <ship@gstack.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>