Commit Graph

2 Commits

Author SHA1 Message Date
Garry Tan
b9371d716e v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478)
* fix(learnings): accept type:"investigation" in gstack-learnings-log

The /investigate skill instructed agents to log learnings with type:"investigation",
but bin/gstack-learnings-log:22 rejected anything not in
[pattern, pitfall, preference, architecture, tool, operational]. Every
investigation run exited 1 to stderr and the learning was dropped, silently
to the user.

Fix: add 'investigation' to ALLOWED_TYPES.

Regression test: round-trips a learning with type:"investigation" and asserts
exit 0 + file write; second test reads investigate/SKILL.md.tmpl and asserts
it emits the literal type:"investigation" string, guarding the
template/validator contract at both ends.

Fixes #1423. Reported by diogolealassis.

* fix(gbrain): engine detection survives gbrain ≥0.25 schema + non-zero doctor exit

freshDetectEngineTier() in lib/gstack-memory-helpers.ts returned engine:
"unknown" for every Supabase user on gbrain ≥0.25. Two stacking bugs:

1. execSync("gbrain doctor --json --fast 2>/dev/null") threw on non-zero
   exit. gbrain doctor exits 1 whenever health_score < 100, which is
   essentially every fresh install due to resolver_health warnings. The
   JSON output never reached the parser.
2. gbrain ≥0.25 shipped schema_version:2 doctor output that dropped the
   top-level 'engine' field entirely.

Result: every /sync-gbrain on Supabase logged 'engine=unknown' and skipped
all sync stages silently.

Fix:
- Replace execSync with execFileSync (no shell, no bash-specific 2>/dev/null
  redirect; portable to Windows).
- Recover stdout from the thrown error object so non-zero exits still parse.
- Fall back to reading gbrain's config.json (respecting GBRAIN_HOME env var,
  defaulting to ~/.gbrain/config.json) when doctor output doesn't surface
  an engine field.
- Add logGbrainError() helper that appends one-line JSONL to
  ~/.gstack/.gbrain-errors.jsonl on parse failure, so future regressions
  leave a forensic trail.

The "supabase" tier here means "remote postgres" in practice — gbrain
config uses engine:"postgres" for both real Supabase and any other
remote postgres (e.g. local-postgres-for-testing). Downstream sync code
treats them identically, so the label compression is intentional and
documented inline.

Regression test: existing detectEngineTier suite now isolates HOME +
GBRAIN_HOME + PATH to temp dirs (closes a flake source where the prior
tests would read whatever was on the reviewer's machine). New test
forces gbrain off PATH, writes a synthetic config.json with
engine:"postgres", asserts detectEngineTier() returns
engine:"supabase".

Fixes #1415. Patch shape contributed by Shiv @shivasymbl (tested on
gstack v1.31.0.0 + gbrain v0.31.3 + Supabase).

* fix(codex): /codex review works on Codex CLI ≥0.130.0

Codex CLI 0.130.0 made [PROMPT] and --base <BRANCH> mutually exclusive at
argv level. Step 2A of codex/SKILL.md.tmpl had always passed both (the
filesystem boundary prefix as the prompt argument + the base branch), so
every /codex review call died with:

  error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>'

Fix: split Step 2A into two paths.

Default (no custom user instructions): bare 'codex review --base <base>'.
Codex's review prompt is internally diff-scoped, so the model focuses on
the changes against base. The filesystem boundary prefix is dropped here
because Codex 0.130 has no documented system-prompt config key
(probed -c 'system_prompt="..."' against 0.130 — the flag is silently
accepted but the value isn't applied). Skill files under .claude/ and
agents/ are public, so this is a token-efficiency concern, not a safety
one.

Custom instructions (/codex review <focus>): route through codex exec
with the diff written to a tempfile, inlined into the prompt between
explicit DIFF_START / DIFF_END markers. The boundary is preserved here
because codex exec isn't auto-scoped to the diff. The DIFF_START/END
delimiters tell the model where data ends and instructions resume, which
materially reduces prompt-injection hijack rates when the diff contains
adversarial content.

Note on bash semantics: codex's earlier review flagged the exec route as
"command injection via $_DIFF interpolation." That framing is wrong —
bash parameter expansion does not re-evaluate $(...) or backticks inside
the expanded value, so a diff containing $(rm -rf /) is plain string
data to codex exec. The real risk is prompt injection (model-side, not
shell-side), which the DIFF_START/END pattern mitigates.

Regression tests in test/codex-hardening.test.ts assert across BOTH
codex/SKILL.md.tmpl AND the generated codex/SKILL.md:
1. No 'codex review' invocation line combines a quoted-string OR variable
   positional argument with --base.
2. Step 2A still contains either bare 'codex review --base' OR 'codex
   exec' (guards against accidental deletion of both fix paths).

Fixes #1428. Reported by Stashub.

* test: raise timeouts for slow integration tests

Two test files were timing out at the default 5s on developer machines,
both pre-existing on origin/main but unrelated to this branch's bug fixes:

- test/gstack-artifacts-init.test.ts: 13 tests spawning real subprocesses
  via fake gh/glab/git shims in PATH. bun's fork+exec overhead pushed
  these past 5s consistently. Added a local test-wrapper that aliases
  test() with a 30s timeout (matches the brain-sync.test.ts pattern
  already in the repo).
- test/gstack-next-version.test.ts: one integration smoke test that
  spawns 'bun run ./bin/gstack-next-version' and parses the resulting
  JSON. The subprocess does a 'gh pr list' against the live GitHub API
  to enumerate claimed version slots. Network latency makes 5s tight;
  raised this single test to 30s.

No production code changed. The tests already passed deterministically
once given enough wall-clock time.

* chore: bump version and changelog (v1.34.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 11:11:52 -04:00
Garry Tan
9ec4ab7eb9 codex + Apple Silicon hardening wave (v0.18.4.0) (#1056)
* fix: ad-hoc codesign compiled binaries on Apple Silicon after build

On some Apple Silicon machines, Bun's --compile produces a corrupt or
linker-only code signature. macOS kills these binaries with SIGKILL
(exit 137, zsh: killed) before they execute a single instruction.

Add a post-build codesign step to setup that runs only on Darwin arm64:
1. Remove the corrupt/linker-only signature (required — a direct re-sign
   fails with 'invalid or unsupported format for signature')
2. Apply a fresh ad-hoc signature

The step is idempotent, costs <1s, and is what Bun's own docs recommend
for distributed standalone executables. All four compiled binaries are
covered: browse, find-browse, design, and gstack-global-discover.
Failure is a non-fatal warning so Intel/CI builds are unaffected.

Fixes #997

* fix: prevent codex exec stdin deadlock with </dev/null redirect

codex CLI 0.120.0+ blocks indefinitely when stdin is a non-TTY pipe
(Claude Code Bash tool, background bash, CI). The CLI sees a non-TTY
stdin and waits for EOF to append it as a <stdin> block, even when the
prompt is passed as a positional argument.

Fix: add < /dev/null to every codex exec and codex review invocation
in the source-of-truth files (scripts/resolvers/*.ts and *.md.tmpl).
Generated SKILL.md files will be produced by bun run gen:skill-docs
in a subsequent commit (Tension D: template+resolver only, generator
is authoritative, not cherry-picked artifacts).

Affected source files (16 total invocations):
- scripts/resolvers/review.ts (4)
- scripts/resolvers/design.ts (3)
- codex/SKILL.md.tmpl (5)
- autoplan/SKILL.md.tmpl (4)

Fixes #971

Co-Authored-By: loning <loning@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: codex/autoplan hardening + Apple Silicon coreutils auto-install

Hardens /codex and /autoplan against silent failures surfaced by the #972
stdin fix and #1003 Apple Silicon codesign. Six-layer defense:

1. **Multi-signal auth probe** (new Step 0.5 / Phase 0.5): env-based auth
   ($CODEX_API_KEY, $OPENAI_API_KEY) OR file-based auth
   (${CODEX_HOME:-~/.codex}/auth.json). Rejects false negatives that the
   old file-only check produced for CI / platform-engineer users.

2. **Timeout wrapper** around every codex exec / codex review invocation:
   gtimeout → timeout → unwrapped fallback chain. On exit 124, surfaces
   common causes + actionable next step. Guards against model-API stalls
   not covered by the #972 stdin fix.

3. **Stderr capture in Challenge mode** (codex/SKILL.md.tmpl:208):
   2>/dev/null → 2>$TMPERR. Post-invocation grep for auth/login/unauthorized
   surfaces errors that were previously dropped silently.

4. **Completeness check** in the Python JSON parser: tracks turn.completed
   events and warns on zero (possible mid-stream disconnect).

5. **Version warning** for known-bad Codex CLI (0.120.0-0.120.2, the range
   that introduced the stdin deadlock #972 fixes). Anchored regex
   `(^|[^0-9.])0\.120\.(0|1|2)([^0-9.]|$)` prevents 0.120.10 / 0.120.20
   false positives.

6. **Failure telemetry + operational learnings**: codex_timeout,
   codex_auth_failed, codex_cli_missing, codex_version_warning events
   land in ~/.gstack/analytics/skill-usage.jsonl behind the existing
   telemetry opt-in. On timeout (exit 124), auto-logs an operational
   learning via gstack-learnings-log so future /investigate sessions
   surface prior hang patterns automatically.

**Shared helper** (bin/gstack-codex-probe): consolidates all four pieces
(auth probe, version check, timeout wrapper, telemetry logger) into one
bash file that /codex and /autoplan source. Namespace-prefixed
(_gstack_codex_*) with a unit test that verifies sourcing does not leak
shell options into the caller. pathRewrites in host configs rewrite
~/.claude/skills/gstack → $GSTACK_ROOT for Codex, $GSTACK_BIN for
Factory/Cursor/etc.

**Apple Silicon coreutils auto-install** (setup:264): macOS lacks GNU
timeout by default; Homebrew's coreutils installs it as gtimeout to
avoid shadowing BSD utilities. ./setup now auto-installs coreutils on
Darwin (arch-agnostic — applies to Intel + Apple Silicon) when neither
gtimeout nor timeout is present. Opt-out via GSTACK_SKIP_COREUTILS=1
for CI, managed machines, or offline envs.

**25 deterministic unit tests** (test/codex-hardening.test.ts):
- 8 auth probe combinations (env precedence, whitespace, alternate
  $CODEX_HOME, corrupt file paths)
- 10 version regex cases including 0.120.10 false-positive guards
  and v-prefixed / multiline output
- 4 timeout wrapper + namespace hygiene (bash -n, gtimeout
  preference, set-option leak check)
- 3 telemetry payload schema checks (confirms env values + auth
  tokens never leak into emitted events)

**1 periodic-tier E2E** (test/skill-e2e-autoplan-dual-voice.test.ts):
gates the /autoplan dual-voice path — asserts both Claude subagent
and Codex voices produce output in Phase 1, OR that [codex-unavailable]
is logged when Codex is absent. ~\$1/run, not a CI gate.

Golden baseline + gen-skill-docs exclusion list updated for the new
codex path references and the 16 < /dev/null redirects from #972.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: plan-review right-sized diff counterbalance (not minimal-diff default)

/plan-ceo-review and /plan-eng-review listed "minimal diff" as an
engineering preference without counterbalancing language. Reviewers
picked up on that and rejected rewrites that should have been approved.

The preference is now framed as "right-sized diff" with explicit
permission to recommend a rewrite when the existing foundation is
broken. Implementation alternatives section in CEO review gets an
equal-weight clarification: don't default to minimal viable just
because it is smaller. Recommend whichever best serves the user's
goal; if the right answer is a rewrite, say so.

Three-line tone edit per template, no voice / ETHOS / YC / promotional
content change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* release: v0.18.4.0 — codex + Apple Silicon hardening wave

- Apple Silicon codesign fix (#1003 @voidborne-d)
- Codex stdin deadlock fix (#972 @loning)
- Codex timeout wrapper (gtimeout → timeout → unwrapped fallback)
- Multi-signal auth gate for /codex + /autoplan
- Codex version warning for known-bad CLI (0.120.0-0.120.2)
- Challenge mode stderr capture + completeness check
- Plan-review right-sized diff counterbalance
- Failure telemetry + auto-log timeout as operational learning
- 25 deterministic unit tests + dual-voice periodic E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: loning <loning@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:30:54 +08:00