Files
gstack/test/gstack-memory-ingest.test.ts
Garry Tan 00f966b3ec v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391)
* fix(codex): use resume-compatible flags

* fix: V-001 security vulnerability

Automated security fix generated by Orbis Security AI

* docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up)

CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped
0.60 → 0.75 in d75402bb (v1.6.4.0, "cut Haiku classifier FP from 44% to
23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75
and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and
ARCHITECTURE.md still read 0.60.

Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in
security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold
table).

No code change. No behavior change. Pure doc-vs-code alignment.

Verification:
  $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
  browse/src/security.ts:37:  WARN: 0.75,
  CLAUDE.md:290: - \`WARN: 0.75\` ...
  ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)...
  BROWSER.md:743: - \`WARN: 0.75\` ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Korean/CJK IME input and rendering in Sidebar Terminal

Fixes #1272

This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal:

**Bug 1 - IME Input**: Korean text typed via IME composition was not
reaching the PTY correctly. Added compositionstart/compositionend event
listeners to suppress partial jamo fragments and only send the final
composed string.

**Bug 2a - Font Rendering**: Added CJK monospace font fallbacks
("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js
fontFamily config and the CSS --font-mono variable. This ensures
consistent cell-width calculations for Korean characters.

**Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent
multi-byte UTF-8 characters (Korean is 3 bytes) from being split across
WebSocket chunks. This follows the same pattern as PR #1007 which fixed
the sidebar-agent path, but extends it to the terminal-agent path.

Special thanks to @ldybob for the excellent root cause analysis and
proposed solutions in issue #1272.

Tested on WSL2 + Windows 11 with Korean IME.

* fix(ship): tighten Plan Completion gate (VAS-449 remediation)

VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has
/docs/dashboard.md) silently skipped. /ship's Plan Completion subagent
existed at ship time (added in v1.4.1.0) but the gate let the failure
through. Four structural fixes:

1. Path concreteness rule: items naming a concrete filesystem path MUST
   be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE.
2. Validator detection: CONTENT-SHAPE items scan target repo's
   package.json for validate-* scripts and run them before falling back
   to UNVERIFIABLE.
3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked
   each one" with per-item Y/N/D loop. The blanket-confirm path is the
   exact failure VAS-449 surfaced.
4. Subagent fail-closed: if Plan Completion subagent + inline fallback
   both fail, surface explicit AskUserQuestion instead of silent pass.
   Replaces the prior "Never block /ship on subagent failure" fail-open.

Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions,
no LLM dependency, ~60ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): bash.exe wrap for telemetry on Windows

reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args)
where bin is the gstack-telemetry-log bash script. On Windows this fails
silently with ENOENT — CreateProcess can't dispatch on shebang lines.

Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from
browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for
resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path
or PATH-resolvable override, falling back to Bun.which('bash') which finds
Git Bash on the standard Windows install.

buildTelemetrySpawnCommand() wraps the script invocation on win32 only;
POSIX path is bit-identical. Returns null when bash can't be resolved on
Windows so caller skips spawn — local attempts.jsonl audit trail keeps
working without surfacing a Windows-only failure.

8 new unit tests cover resolveBashBinary (POSIX bash, absolute override,
quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand
(POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array
immutability).

POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the
same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on.

* fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows

Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in
browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the
codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and
make-pdf/src/pdftotext.ts:resolvePdftotext.

Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which`
isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same
Bun.which-based fix shape, same env override convention.

Changes:
  - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides;
    BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases.
  - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles
    Windows PATHEXT natively; no more `where`-vs-`which` branch.
  - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat
    after the bare-path miss on win32. Linux/macOS behavior is bit-identical
    (isExecutable short-circuits before the win32 branch ever runs).
  - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local,
    /usr/bin). No Windows candidates added; Poppler installs scatter across
    Scoop/Chocolatey/portable zips and guessing causes false positives.
  - Error messages get a Windows install hint (scoop install poppler / oschwartz10612)
    and `setx` example for GSTACK_*_BIN.
  - Pre-existing test 'honors BROWSE_BIN when it points at a real executable'
    was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant
    (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own.

Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a
narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming
are additive. Either order of merge works.

Test plan:
  - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts
    on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null,
    resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip,
    same shape for resolvePdftotext + Windows install hint in error message).
  - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits
    before any win32 logic runs, matching the v1.24 claude-bin.ts pattern.

* fix(browse): NTFS ACL hardening for Windows state files via icacls

gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent
queue contents (with prompt history), session state, security-decision logs,
and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows,
those mode bits are a silent no-op: Node's fs module doesn't translate POSIX
modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable
by other principals on the machine (verified via icacls — six ACEs, the
intended user is the LAST of six).

Threat model is non-trivial on:
  - Self-hosted CI runners (different service account on the same Windows box
    can read developer tokens, canary tokens, prompt history)
  - Shared development machines (agencies, studios, lab environments)
  - Multi-tenant servers with shared home directories

Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write
side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin /
global / local installs; this PR ensures files written into those resolved
paths actually get the POSIX 0o600 semantic translated to NTFS.

The fix:
  - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset).
    restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX)
    or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile /
    appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns.
  - 19 call sites converted across 9 source files: browser-manager.ts,
    browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts,
    security-classifier.ts, security.ts (4 sites), server.ts (5 sites),
    terminal-agent.ts (8 sites), tunnel-denial-log.ts.
  - (OI)(CI) inheritance flags on directories mean files created via fs.write*
    *inside* an mkdirSecure-created dir inherit the owner-only ACL automatically
    — important for tunnel-denial-log.ts where appends use async fsp.appendFile.

Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened
environments) log a one-shot warning to stderr and proceed. Once-per-process
gating prevents log spam if the condition persists. Filesystem stays
functional; the file just ends up with inherited ACLs.

Test plan:
  - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX
    mode-bit assertions, Windows no-throw, mkdir idempotence, recursive
    creation, Buffer payloads, append-creates-then-reapplies-once semantics)
  - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security
    test suite plus the bash-binary resolution tests added in fix #1119; the
    converted writeFileSync/appendFileSync/mkdirSync sites in security.ts
    integrate cleanly)
  - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE
  - bun build typecheck on all modified files — clean (server.ts has a
    pre-existing playwright-core/electron resolution issue unrelated to this PR)

POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the
helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux
and macOS see no behavior change.

Inviting pushback on three judgment calls (in PR description):
  1. icacls vs npm library
  2. ACL scope — just user, or user + SYSTEM?
  3. Graceful degradation — once-per-process warn, not silent, not hard-fail.

* fix(browse): declare lastConsoleFlushed to restore console-log persistence

flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337
and assigns it at :344, but the `let lastConsoleFlushed = 0;`
declaration is missing — only the network and dialog siblings are
declared at lines 327-328.

Result: every 1-second flushBuffers tick (line 376) throws
`ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by
the catch at line 369 ("[browse] Buffer flush failed: ..."), and the
console branch's append never runs. browse-console.log is never
written in any production deployment since this regressed.

Discovered by stress-testing the daemon with 15 concurrent CLIs against
cold state — the race surfaced the buffer-flush error spam in one
spawned daemon's stderr. Verified by running the daemon against a real
file:// page with console.log events: in-memory `browse console`
returns the entries, but `.gstack/browse-console.log` is never created
on disk.

Regression introduced by 1a100a2a "fix: eliminate duplicate command
sets in chain, improve flush perf and type safety" — the flush refactor
switched from `Bun.write` to `fs.appendFileSync` and added the
`lastConsoleFlushed` cursor pattern alongside its network/dialog
siblings, but missed the matching `let` declaration. Tests don't
currently exercise flushBuffers, so the regression shipped silently.

Fix:
  - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed`
    and `lastDialogFlushed` (browse/src/server.ts:327)
  - Add a source-level guard test
    (browse/test/server-flush-trackers.test.ts) that fails any future
    refactor that adds a fourth `last*Flushed` cursor without the
    matching declaration. Same pattern as terminal-agent.test.ts and
    dual-listener.test.ts — read source as text, assert invariant, no
    daemon required.

Test plan:
  - [x] New regression test fails on current main, passes with the fix
  - [x] `bun run build` clean
  - [x] Manual smoke: spawn daemon -> goto file:// page with
        console.log -> wait 4s -> .gstack/browse-console.log now
        exists with the expected entries (163 bytes vs zero before)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): per-process state-file temp path to fix concurrent-write ENOENT

The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.

Under concurrent writers the shared filename races on the rename:

    t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
    t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
    t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
    t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:

    [browse] Failed to start: ENOENT: no such file or directory,
    rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'

Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.

Fix:
  - New `tmpStatePath()` helper (server.ts:333) returns
    `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
  - All 4 call sites use `tmpStatePath()` instead of the shared literal
  - Atomic rename still gives last-writer-wins semantics on the final
    state.json content; only behavior change is that concurrent writers
    no longer kill each other on the rename step

Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.

Test plan:
  - [x] Targeted source-level guard test passes (3 / 0)
  - [x] `bun run build` clean
  - [x] Live regression: 15 concurrent CLIs against cold state →
        15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
  - [x] No `.tmp.*` orphans left behind after rename succeeds
  - [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
        findport) — same pre-existing flakes as `main`, no new
        regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage

Asymmetric cleanup between two equivalent staleness conditions:

  onMainFrameNavigated()  →  clearRefs() + activeFrame = null  ✓
  getActiveFrameOrPage()  →  activeFrame = null  (refs NOT cleared)  ✗

Both paths see the same staleness condition — refs were captured
against a frame that no longer exists. The main-frame path correctly
clears both pieces of state. The iframe-detach path nulls the frame
but leaves the refMap intact.

The lazy click-time check in `resolveRef` (tab-session.ts:97) partially
saves us — `entry.locator.count()` on a detached-frame locator throws
or returns 0, so the click errors out as "Ref X is stale". But the
user has no signal that frame context silently changed underfoot: the
next `snapshot` runs against `this.page` (main) while old iframe refs
still litter `refMap` with the same role+name keys. New refs collide
with stale ones, the resolver picks one at random, the user clicks
the wrong element.

TODOS.md line 816-820 documents "Detached frame auto-recovery" as a
shipped iframe-support feature in v0.12.1.0. This restores the
documented intent — the recovery should leave the session in a clean
state, not a half-cleared one.

Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null`
inside the if-branch.

Test plan:
  - [x] New regression test: 4/4 pass
        - refs cleared when getActiveFrameOrPage detects detached iframe
        - refs preserved when active frame is still attached (no regression)
        - refs preserved when no frame set (page-level path untouched)
        - matches onMainFrameNavigated symmetry — both paths reach the
          same clean end state
  - [x] `bun run build` clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(codex): resolve python for JSON parser

* fix: add fail-fast probe for base branch in ship step 12

* fix(plan-devex-review): remove contradictory plan-mode handshake

* fix(design): honor Retry-After header in variants 429 handler

Closes #1244.

The 429 handler in `generateVariant` discarded the `Retry-After` response
header and fell straight through to a local exponential schedule (2s/4s/8s).
In image-generation batches, that burns retry attempts inside the provider's
cooldown window and the request never recovers.

Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`)
and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits
are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds
are validated as digits-only (rejects `2abc`). When `Retry-After` is honored
(including 0 / past-date "retry now"), the next iteration's leading exponential
sleep is skipped so we don't double-wait. Invalid or missing headers fall
through to the existing exponential schedule unchanged.

Behavior matrix:

| Header                          | Behavior                                  |
|---------------------------------|-------------------------------------------|
| Retry-After: 5                  | wait 5s, skip leading on next attempt     |
| Retry-After: 999999             | capped to 60s, skip leading               |
| Retry-After: 2abc               | invalid, fall through to exponential      |
| Retry-After: 0                  | wait 0, skip leading (retry immediately)  |
| Retry-After: <past HTTP-date>   | wait 0, skip leading                      |
| Retry-After: <future date>      | wait diff capped at 60s, skip leading     |
| no header                       | fall through to existing exponential      |

`generateVariant` now accepts an optional `fetchFn` parameter (defaults to
`globalThis.fetch`) so tests can inject a stub. Production call sites are
unchanged.

Tests cover the five behavior buckets above, asserting both the 1st-to-2nd
call timing gap and call counts. All five pass in ~8s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): correct per-skill symlink removal snippet in README uninstall

Closes #1130.

The manual-uninstall fallback in `## Uninstall` → `### Option 2` used
`find ~/.claude/skills -maxdepth 1 -type l`, which finds nothing on real
installs. Each `~/.claude/skills/<name>/` is a real directory, and only
`<name>/SKILL.md` inside it is a symlink into `gstack/`. The find never
matched, so the snippet silently removed nothing.

Replace with a directory walk that inspects each `<name>/SKILL.md`:

  find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack
  → check $dir/SKILL.md is a symlink → readlink it
  → if target is gstack/* or */gstack/*: rm -f the link, rmdir the dir
    (only if empty — preserves any user-added files)

Excludes the top-level `gstack/` dir from the walk; that's removed by
step 3 of the same uninstall block.

`bin/gstack-uninstall` (the script-mode path) already handles the layout
correctly via its own walk; only this manual fallback needed updating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reject partial browse client env integers

* fix(gemini-adapter): detect new ~/.gemini/oauth_creds.json auth path

gemini-cli >=0.30 stores OAuth credentials at ~/.gemini/oauth_creds.json
instead of the legacy ~/.config/gemini/ directory. The benchmark adapter's
availability check now succeeds for users on recent gemini-cli releases
who have authenticated via interactive login.

Both paths are accepted so users on older versions still work.

* fix(browser): add --no-sandbox for root user on Linux/WSL2

Chromium's sandbox can't initialize when running as root on Linux,
causing an immediate exit. Extend the existing CI/CONTAINER check to
also cover this case, keeping the Windows-safe `typeof getuid` guard.

* security: pass cwd to git via execFileSync, not interpolation through /bin/sh

`bin/gstack-memory-ingest.ts:632-643` ran `execSync(\`git -C ${JSON.stringify(cwd)}
remote get-url origin 2>/dev/null\`, ...)`. JSON.stringify escapes `"` and `\`
but not `$` or backticks, so a `cwd` of `"$(touch /tmp/marker)"` survived JSON
quoting and detonated under /bin/sh's command-substitution-inside-double-quotes.

`cwd` originates from transcript JSONL records under
`~/.claude/projects/<encoded-cwd>/<uuid>.jsonl` and
`~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. The walker grabs the first
`.cwd` it sees per session. That's an untrusted surface in the gstack threat
model — the L1-L6 sidebar security stack exists exactly because agent
transcripts can carry attacker-influenced text. Two pivots above the local
same-uid bar: (a) prompt-injection appending `cwd="$(...)"` to the active
session log turns the next /sync-gbrain run into RCE under the user's uid;
(b) cross-machine transcript share (a colleague's `.claude/projects` snippet
untar'd into HOME, a documented gbrain dogfooding shape) → RCE on first sync.

Fix swaps the one execSync for `execFileSync("git", ["-C", cwd, "remote",
"get-url", "origin"], ...)`. No shell, argv passed directly to git. The same
module already uses execFileSync for `gbrainAvailable()` (line 762 pre-patch)
and `gbrainPutPage()` (line 816 pre-patch) — this single execSync was the
outlier.

Test: `gstack-memory-ingest security: untrusted cwd cannot trigger shell
substitution` plants a Claude-Code-shaped JSONL with cwd=`$(touch <marker>)`
and asserts the marker file is not created after `--incremental --quiet`.
Negative control: with the patch reverted, the test fails (marker created);
with the patch applied, it passes (18/18 in test/gstack-memory-ingest.test.ts).

* security: gate domain-skill auto-promote on classifier_score > 0

`browse/src/domain-skill-commands.ts:140` (handleSave) writes
`classifier_score: 0` with the comment "L4 deferred to load-time / sidebar-agent
fills this in on first prompt-injection load." But CLAUDE.md "Sidebar
architecture" documents that sidebar-agent.ts was ripped, and grep for
recordSkillUse + classifierFlagged callers across browse/src/ returns zero hits
outside the module under test.

Net effect: every quarantined skill that survives three benign uses without
flag (`recordSkillUse(... , classifierFlagged: false)` x3) auto-promotes to
`active` and lands in prompt context wrapped as UNTRUSTED on every subsequent
visit to that host. The L4 score that was supposed to gate the promotion was
never written — the production save path puts 0 on disk and nothing later
updates it.

Threat model: a domain-skill body authored by an agent under the influence of
a poisoned page (the new `gstackInjectToTerminal` PTY path runs no L1-L3
either) would lose its auto-promote barrier after three uses. The exploit
isn't single-step but the bar is exactly N=3 prompt-injection-shaped uses on
a hostile page, which is well within reach.

Fix adds a single condition to the auto-promote gate in `recordSkillUse`:

    if (state === 'quarantined' && useCount >= PROMOTE_THRESHOLD &&
        flagCount === 0 && current.classifier_score > 0) {
      state = 'active';
    }

`classifier_score` is set once at writeSkill and never updated. Production
saves it as 0 (handleSave), so the gate stays closed; existing tests that
explicitly pass `classifierScore: 0.1` still auto-promote (the auto-promote
path is preserved for the day L4 is rewired).

Manual promotion via `domain-skill promote-to-global` is unaffected (it goes
through `promoteToGlobal` which has its own state-machine guard at line 337+).

Test: new regression case `does NOT auto-promote when classifier_score is 0
(production handleSave shape)` plants a skill with classifierScore=0 (matches
domain-skill-commands.ts:140), runs three uses without flag, asserts the skill
stays quarantined and readSkill returns null. Negative control: revert the
patch, the test fails with `Received: "active"`. With the patch: 15/15 pass.

* fix(ship): port #1302 SKILL.md edits to .tmpl + resolver source

PR #1302 added Verification Mode + UNVERIFIABLE classification + per-item
confirmation gate to ship/SKILL.md, but only the generated SKILL.md was
edited — not the .tmpl source or scripts/resolvers/review.ts. The next
`bun run gen:skill-docs` run would have wiped the changes.

Port the same content into the resolver and .tmpl so regeneration produces
the intended output.

* ci(windows): extend free-tests lane to cover icacls + Bun.which resolvers from fix-wave PRs

Closes #1306/#1307/#1308 validation gap. The four newly-added test files
already have process.platform guards so they run safely on both POSIX and
Windows lanes — only platform-relevant assertions execute on each.

Tests added to the windows-latest lane:
- browse/test/file-permissions.test.ts (#1308 icacls + writeSecureFile)
- browse/test/security.test.ts (#1306 bash.exe wrap pure-function path)
- make-pdf/test/browseClient.test.ts (#1307 Bun.which browse resolver)
- make-pdf/test/pdftotext.test.ts (#1307 Bun.which pdftotext resolver)

* test(codex): live flag-semantics smoke for codex exec resume

Closes #1270's regex-only test gap. PR #1270 asserted that codex/SKILL.md's
`codex exec resume` invocation drops -C/-s and uses sandbox_mode config.
That regex catches the skill template regressing, but not codex CLI itself
flipping flag semantics again.

This test probes `codex exec resume --help` and asserts the surface gstack
relies on: -c/sandbox_mode is accepted, top-level -C is absent. Skips
silently when codex isn't on PATH, so dev machines without codex installed
never see it fail.

* chore: regen SKILL.md after fix wave

One regen commit at the end of the merge wave per the plan. plan-devex-review
loses the contradictory plan-mode handshake (#1333). review/SKILL.md picks up
the Verification Mode + UNVERIFIABLE classification additions that #1302
authored against ship/SKILL.md (same resolver shared between ship and review
modes).

* fix(server.ts): keep fs.writeFileSync for state-file writes

#1308's writeSecureFile wrapper added Windows icacls hardening for the
4 state-file write sites in server.ts, but #1310's regression test grep's
for fs.writeFileSync(tmpStatePath()) calls. The two changes are technically
compatible only if the test relaxes — keeping the test strict (the safer
choice for catching regressions on the cold-start race) means the 4 state-
file sites stay on fs.writeFileSync(..., { mode: 0o600 }).

POSIX 0o600 hardening is preserved on those 4 sites. Windows icacls
hardening still applies to all the other writeSecureFile call sites
#1308 added (auth.json, mkdirSecure, etc.).

Also refreshes golden baselines after #1302 / port + minor wording tweak
in scripts/resolvers/review.ts to keep gen-skill-docs.test.ts assertion
'Cite the specific file' satisfied.

* v1.30.0.0: fix wave — 21 community PRs + 2 closing fixes for Windows + codex CI gaps

Headline release. Browse stops dropping console logs, cold-start race
fixed, codex resume works without python3, Windows hardening (icacls +
Bun.which + bash.exe wrap), ship gate gets VAS-449 remediation, two
closing fixes that put icacls/Bun.which/codex flag semantics under CI.

* test(domain-skills): cover #1369 classifier_score=0 quarantine + score>0 promote path

The pre-existing T6 test seeded skills via writeSkill (which defaults
classifier_score to 0 until L4 is rewired) and then expected 3 uses to
auto-promote. PR #1369 added `current.classifier_score > 0` to the gate
specifically to block that path — a quarantined skill written under the
influence of a poisoned page would otherwise auto-promote after three
benign uses.

Updated test asserts both halves of the new contract:
- classifier_score=0 + 3 uses → stays quarantined (the security guarantee)
- classifier_score>0 + 3 more uses → promotes to active (unblock path)

Catches both regressions: the gate going away (would re-allow the bypass)
and the unblock path breaking (would silently quarantine all skills
forever once L4 is rewired).

---------

Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: orbisai0security <mediratta01.pally@gmail.com>
Co-authored-by: Bryce Alan <brycealan.eth@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Terry Carson YM <cym3118288@gmail.com>
Co-authored-by: Vasko Ckorovski <vckorovski@gmail.com>
Co-authored-by: Samuel Carson <samuel.carson@gmail.com>
Co-authored-by: Yashwant Kotipalli <yashwant7kotipalli@gmail.com>
Co-authored-by: Jasper Chen <jasperchen925@gmail.com>
Co-authored-by: Stefan Neamtu <stefan.neamtu@gmail.com>
Co-authored-by: 陈家名 <chenjiaming@kezaihui.com>
Co-authored-by: Abigail Atheryon <abi@atheryon.ai>
Co-authored-by: Furkan Köykıran <furkankoykiran@gmail.com>
Co-authored-by: gus <gustavoraularagon@gmail.com>
2026-05-09 08:06:47 -07:00

455 lines
19 KiB
TypeScript

/**
* Unit tests for bin/gstack-memory-ingest.ts (Lane A).
*
* Covers the unit-testable internals: parseTranscriptJsonl (Codex + Claude Code +
* truncated last line), buildTranscriptPage / buildArtifactPage shape, repoSlug,
* dateOnly, fileChangedSinceState mtime+sha logic, state file load/save with
* schema_version backup-on-mismatch.
*
* E2E coverage (full --probe / --bulk on real ~/.claude/projects) lives in
* test/skill-e2e-memory-ingest.test.ts (Lane F).
*
* Strategy: we re-import the module under test through bun's runtime and shell
* out to it for end-to-end mode tests; for the pure helpers, we re-import the
* source file via dynamic import.
*/
import { describe, it, expect, beforeEach, afterEach } from "bun:test";
import { mkdtempSync, writeFileSync, readFileSync, existsSync, rmSync, mkdirSync, statSync, chmodSync } from "fs";
import { tmpdir } from "os";
import { join } from "path";
import { spawnSync } from "child_process";
const SCRIPT = join(import.meta.dir, "..", "bin", "gstack-memory-ingest.ts");
// ── Helpers ────────────────────────────────────────────────────────────────
function makeTestHome(): string {
return mkdtempSync(join(tmpdir(), "gstack-memory-ingest-"));
}
function runScript(args: string[], env: Record<string, string> = {}): { stdout: string; stderr: string; exitCode: number } {
const result = spawnSync("bun", [SCRIPT, ...args], {
encoding: "utf-8",
timeout: 30000,
env: { ...process.env, ...env },
});
return {
stdout: result.stdout || "",
stderr: result.stderr || "",
exitCode: result.status ?? 1,
};
}
function writeClaudeCodeSession(home: string, projectName: string, sessionId: string, content: string): string {
const projectsDir = join(home, ".claude", "projects", projectName);
mkdirSync(projectsDir, { recursive: true });
const file = join(projectsDir, `${sessionId}.jsonl`);
writeFileSync(file, content, "utf-8");
return file;
}
function writeCodexSession(home: string, ymd: string, content: string): string {
const [y, m, d] = ymd.split("-");
const dir = join(home, ".codex", "sessions", y, m, d);
mkdirSync(dir, { recursive: true });
const file = join(dir, `rollout-${Date.now()}.jsonl`);
writeFileSync(file, content, "utf-8");
return file;
}
// ── --help and --probe ─────────────────────────────────────────────────────
describe("gstack-memory-ingest CLI", () => {
it("prints usage on --help and exits 0", () => {
const r = runScript(["--help"]);
expect(r.exitCode).toBe(0);
expect(r.stderr).toContain("Usage: gstack-memory-ingest");
expect(r.stderr).toContain("--probe");
expect(r.stderr).toContain("--incremental");
expect(r.stderr).toContain("--bulk");
});
it("rejects unknown arguments with exit 1", () => {
const r = runScript(["--bogus-flag"]);
expect(r.exitCode).toBe(1);
expect(r.stderr).toContain("Unknown argument: --bogus-flag");
});
it("--probe on empty home reports 0 files", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 0");
rmSync(home, { recursive: true, force: true });
});
it("--probe finds Claude Code sessions", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const session = `{"type":"user","message":{"role":"user","content":"hello"},"timestamp":"${new Date().toISOString()}","cwd":"/tmp/x"}\n{"type":"assistant","message":{"role":"assistant","content":"hi"},"timestamp":"${new Date().toISOString()}"}\n`;
writeClaudeCodeSession(home, "tmp-x", "abc123", session);
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 1");
expect(r.stdout).toContain("transcript");
rmSync(home, { recursive: true, force: true });
});
it("--probe finds Codex sessions", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const today = new Date();
const ymd = `${today.getFullYear()}-${String(today.getMonth() + 1).padStart(2, "0")}-${String(today.getDate()).padStart(2, "0")}`;
const session = `{"type":"session_meta","payload":{"id":"sess-xyz","cwd":"/tmp/x","git":{"repository_url":"https://github.com/foo/bar"}},"timestamp":"${today.toISOString()}"}\n`;
writeCodexSession(home, ymd, session);
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 1");
rmSync(home, { recursive: true, force: true });
});
it("--probe finds gstack artifacts (learnings, eureka, ceo-plan)", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(join(gstackHome, "analytics"), { recursive: true });
mkdirSync(join(gstackHome, "projects", "foo-bar", "ceo-plans"), { recursive: true });
writeFileSync(join(gstackHome, "analytics", "eureka.jsonl"), '{"insight":"lake first"}\n');
writeFileSync(join(gstackHome, "projects", "foo-bar", "learnings.jsonl"), '{"key":"a","insight":"b"}\n');
writeFileSync(join(gstackHome, "projects", "foo-bar", "ceo-plans", "2026-05-01-test.md"), "# Plan\n");
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 3");
expect(r.stdout).toContain("eureka");
expect(r.stdout).toContain("learning");
expect(r.stdout).toContain("ceo-plan");
rmSync(home, { recursive: true, force: true });
});
it("--sources filter limits the walk to specific types", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(join(gstackHome, "analytics"), { recursive: true });
mkdirSync(join(gstackHome, "projects", "foo", "ceo-plans"), { recursive: true });
writeFileSync(join(gstackHome, "analytics", "eureka.jsonl"), '{"insight":"x"}\n');
writeFileSync(join(gstackHome, "projects", "foo", "learnings.jsonl"), '{"key":"a"}\n');
const r = runScript(["--probe", "--sources", "eureka"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 1");
expect(r.stdout).toContain("eureka");
expect(r.stdout).not.toContain("learning ");
rmSync(home, { recursive: true, force: true });
});
it("--sources rejects empty list with exit 1", () => {
const r = runScript(["--probe", "--sources", "bogus"]);
expect(r.exitCode).toBe(1);
expect(r.stderr).toContain("--sources must include at least one of");
});
});
// ── State file behavior ────────────────────────────────────────────────────
describe("gstack-memory-ingest state file", () => {
it("--incremental on empty home creates state file with schema_version: 1", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const r = runScript(["--incremental", "--quiet"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
const statePath = join(gstackHome, ".transcript-ingest-state.json");
expect(existsSync(statePath)).toBe(true);
const state = JSON.parse(readFileSync(statePath, "utf-8"));
expect(state.schema_version).toBe(1);
expect(state.last_writer).toBe("gstack-memory-ingest");
rmSync(home, { recursive: true, force: true });
});
it("backs up state file on schema_version mismatch", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const statePath = join(gstackHome, ".transcript-ingest-state.json");
writeFileSync(statePath, JSON.stringify({ schema_version: 999, sessions: {} }), "utf-8");
const r = runScript(["--incremental", "--quiet"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(existsSync(statePath + ".bak")).toBe(true);
const fresh = JSON.parse(readFileSync(statePath, "utf-8"));
expect(fresh.schema_version).toBe(1);
rmSync(home, { recursive: true, force: true });
});
it("backs up state file on JSON parse error", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const statePath = join(gstackHome, ".transcript-ingest-state.json");
writeFileSync(statePath, "{ this is not valid json", "utf-8");
const r = runScript(["--incremental", "--quiet"], { HOME: home, GSTACK_HOME: gstackHome });
expect(r.exitCode).toBe(0);
expect(existsSync(statePath + ".bak")).toBe(true);
rmSync(home, { recursive: true, force: true });
});
});
// ── Security: cwd in transcript JSONL must not reach a shell ─────────────
describe("gstack-memory-ingest security: untrusted cwd cannot trigger shell substitution", () => {
it("does not invoke /bin/sh when a transcript record contains $() in cwd", () => {
// Transcript JSONL is an untrusted surface — a record's `.cwd` value
// can be set by anyone who can write to ~/.claude/projects (cross-machine
// share, prompt-injection appending to the active session log, etc.).
// resolveGitRemote() must use execFileSync, not execSync with template
// interpolation, or `cwd="$(...)"` triggers command substitution under
// /bin/sh -c on the next ingest run.
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const markerDir = mkdtempSync(join(tmpdir(), "gstack-mi-cwd-marker-"));
const marker = join(markerDir, "PWNED");
// Plain $(...) — what an attacker would write into a transcript record.
// execFileSync passes this verbatim to git as a -C argument; execSync
// (the prior code path) wrapped it in a /bin/sh -c template that ran
// the substitution.
const malicious = "$(touch " + marker + ")";
const record = JSON.stringify({
type: "user",
uuid: "11111111-1111-1111-1111-111111111111",
sessionId: "abc",
cwd: malicious,
timestamp: new Date().toISOString(),
message: { role: "user", content: "hi" },
});
writeClaudeCodeSession(home, "-tmp-target", "abc", record + "\n");
const r = runScript(["--incremental", "--quiet"], {
HOME: home,
GSTACK_HOME: gstackHome,
GSTACK_MEMORY_INGEST_NO_WRITE: "1",
});
expect(r.exitCode).toBe(0);
expect(existsSync(marker)).toBe(false);
rmSync(home, { recursive: true, force: true });
rmSync(markerDir, { recursive: true, force: true });
});
});
// ── Transcript parser via re-import of the source module ───────────────────
describe("internal: parseTranscriptJsonl + buildTranscriptPage shape", () => {
it("parses a Claude Code JSONL session", async () => {
const dir = mkdtempSync(join(tmpdir(), "gstack-mi-parse-"));
const file = join(dir, "abc123.jsonl");
const content =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/foo"}\n` +
`{"type":"assistant","message":{"role":"assistant","content":"hello"},"timestamp":"2026-05-01T00:00:01Z"}\n`;
writeFileSync(file, content, "utf-8");
// Re-import via dynamic import is tricky because the script auto-runs main().
// We instead test via shell invocation: --probe with this file should find 1 transcript.
const home = makeTestHome();
const projDir = join(home, ".claude", "projects", "tmp-foo");
mkdirSync(projDir, { recursive: true });
writeFileSync(join(projDir, "abc123.jsonl"), content, "utf-8");
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: join(home, ".gstack") });
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 1");
rmSync(dir, { recursive: true, force: true });
rmSync(home, { recursive: true, force: true });
});
it("treats a truncated last line as partial (does not crash)", () => {
const home = makeTestHome();
const projDir = join(home, ".claude", "projects", "tmp-bar");
mkdirSync(projDir, { recursive: true });
// Truncated last line — JSON parse will fail on it
const content =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/bar"}\n` +
`{"type":"assistant","message":{"role":"assistant","content":"hello"},"timestamp":"2026-05-01T00:00:01Z"}\n` +
`{"type":"assistant","message":{"role":"assistant","content":"this is truncat`; // no closing brace + no newline
writeFileSync(join(projDir, "trunc.jsonl"), content, "utf-8");
const r = runScript(["--probe"], { HOME: home, GSTACK_HOME: join(home, ".gstack") });
// Should not crash; should report 1 transcript
expect(r.exitCode).toBe(0);
expect(r.stdout).toContain("Total files in window: 1");
rmSync(home, { recursive: true, force: true });
});
});
// ── --limit shortcut for smoke tests ───────────────────────────────────────
describe("gstack-memory-ingest --limit", () => {
it("respects --limit by stopping after N writes (mocked via --probe shortcut)", () => {
const r = runScript(["--probe", "--limit", "1"]);
// --limit doesn't apply to probe but argument should parse without error
expect(r.exitCode).toBe(0);
});
it("rejects --limit 0 with exit 1", () => {
const r = runScript(["--probe", "--limit", "0"]);
expect(r.exitCode).toBe(1);
expect(r.stderr).toContain("--limit requires a positive integer");
});
});
// ── Writer regression: gbrain v0.27+ uses `put`, not `put_page` ───────────
/**
* Stand up a fake `gbrain` shim on PATH that:
* - advertises `put` in `--help` output (so gbrainAvailable() passes)
* - records `put <slug>` invocations + their stdin to a log
* - rejects `put_page` with a non-zero exit, mimicking real gbrain v0.27+
*
* If the writer ever regresses to the legacy flag-form, the bulk pass will
* report 0 writes and the assertion on `Wrote: 1` will fail loudly.
*/
function installFakeGbrain(home: string): { binDir: string; logFile: string; stdinFile: string } {
const binDir = join(home, "fake-bin");
mkdirSync(binDir, { recursive: true });
const logFile = join(home, "gbrain-calls.log");
const stdinFile = join(home, "gbrain-stdin.log");
const script = `#!/usr/bin/env bash
set -euo pipefail
LOG="${logFile}"
STDIN_LOG="${stdinFile}"
case "\${1:-}" in
--help|-h)
cat <<EOF
Usage: gbrain <command> [options]
Commands:
put <slug> Write a page (content via stdin, YAML frontmatter for metadata)
search <query> Keyword search across pages
ask <question> Hybrid semantic + keyword query
EOF
exit 0
;;
put)
if [ "\${2:-}" = "--help" ]; then
echo "Usage: gbrain put <slug>"
exit 0
fi
echo "put \${2:-}" >> "\$LOG"
{
echo "--- slug=\${2:-} ---"
cat
echo
} >> "\$STDIN_LOG"
exit 0
;;
put_page|put-page)
echo "Unknown command: \$1" >&2
exit 2
;;
*)
echo "Unknown command: \${1:-<empty>}" >&2
exit 2
;;
esac
`;
const binPath = join(binDir, "gbrain");
writeFileSync(binPath, script, "utf-8");
chmodSync(binPath, 0o755);
return { binDir, logFile, stdinFile };
}
describe("gstack-memory-ingest writer (gbrain v0.27+ `put` interface)", () => {
it("invokes `gbrain put <slug>` with stdin body, not legacy `put_page`", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
const { binDir, logFile, stdinFile } = installFakeGbrain(home);
// Single Claude Code session fixture. --include-unattributed lets it write
// even though there's no resolvable git remote in /tmp.
const session =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/foo"}\n` +
`{"type":"assistant","message":{"role":"assistant","content":"hello"},"timestamp":"2026-05-01T00:00:01Z"}\n`;
writeClaudeCodeSession(home, "tmp-foo", "abc123", session);
const r = runScript(["--bulk", "--include-unattributed", "--quiet"], {
HOME: home,
GSTACK_HOME: gstackHome,
PATH: `${binDir}:${process.env.PATH || ""}`,
});
expect(r.exitCode).toBe(0);
expect(existsSync(logFile)).toBe(true);
const calls = readFileSync(logFile, "utf-8");
expect(calls).toContain("put ");
expect(calls).not.toContain("put_page");
// Body should ride stdin and carry frontmatter that gbrain can parse.
// The transcript builder prepends its own frontmatter (agent, session_id,
// etc.) but does NOT include title/type/tags — the writer injects those
// into the existing frontmatter so gbrain pages list/search/filter
// actually surface the page. Asserting all three guards against the
// exact regression that landed in v1.26.0.0 (writer ignored these fields
// entirely; pages landed empty-titled, un-typed, un-tagged).
const stdin = readFileSync(stdinFile, "utf-8");
expect(stdin).toContain("---");
expect(stdin).toMatch(/agent:\s+claude-code/);
expect(stdin).toMatch(/title:\s/);
expect(stdin).toMatch(/type:\s+transcript/);
expect(stdin).toMatch(/tags:/);
rmSync(home, { recursive: true, force: true });
});
it("fails fast when gbrain CLI is missing the `put` subcommand", () => {
const home = makeTestHome();
const gstackHome = join(home, ".gstack");
mkdirSync(gstackHome, { recursive: true });
// Fake gbrain that ONLY advertises legacy `put_page` (no `put`).
const binDir = join(home, "legacy-bin");
mkdirSync(binDir, { recursive: true });
const script = `#!/usr/bin/env bash
case "\${1:-}" in
--help|-h) echo "Commands:"; echo " put_page Write a page (legacy)"; exit 0 ;;
*) echo "Unknown command: \$1" >&2; exit 2 ;;
esac
`;
const binPath = join(binDir, "gbrain");
writeFileSync(binPath, script, "utf-8");
chmodSync(binPath, 0o755);
const session =
`{"type":"user","message":{"role":"user","content":"hi"},"timestamp":"2026-05-01T00:00:00Z","cwd":"/tmp/bar"}\n`;
writeClaudeCodeSession(home, "tmp-bar", "def456", session);
const r = runScript(["--bulk", "--include-unattributed"], {
HOME: home,
GSTACK_HOME: gstackHome,
PATH: `${binDir}:${process.env.PATH || ""}`,
});
// Bulk completes (the script is per-page tolerant), but every page
// surfaces the missing-`put` error rather than the old "Unknown command".
expect(r.stderr + r.stdout).toMatch(/missing `put` subcommand|gbrain CLI not in PATH/);
rmSync(home, { recursive: true, force: true });
});
});