gstack

hai/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-08 21:49:45 +08:00

Files

Garry Tan 9a424a9f55 test: apply ship review-army findings — helper extract, slice SKILL.md, defensive judge

Five categories of fixes surfaced by the /ship pre-landing reviews
(testing + maintainability + security + performance + adversarial Claude),
applied as one review-iteration commit.

Refactor — collapse 5x duplicated judge-assertion block:
- Add assertRecommendationQuality() + RECOMMENDATION_SUBSTANCE_THRESHOLD
  constant to test/helpers/e2e-helpers.ts.
- Plan-format (4 cases) and Phase 4 (1 case) collapse from ~22 lines each
  to a single helper call. Future rubric tweaks land in one place instead
  of five.

Performance — extract Phase 4 slice instead of copying full SKILL.md:
- Phase 4 test fixture now reads office-hours/SKILL.md and writes only the
  AskUserQuestion Format section + Phase 4 section to the tmpdir, per
  CLAUDE.md "extract, don't copy" rule. Verified locally: cost dropped
  from $0.51 → $0.36/run, turn count 8 → 4, latency 50s → 36s. Reduces
  Opus context bloat without weakening the regression check.
- Add `if (!workDir) return` guard to Phase 4 afterAll cleanup so a
  skipped describe block doesn't silently fs.rmSync(undefined) under the
  empty catch.

Defense — judge prompt + output:
- Wrap captured AskUserQuestion text in clearly delimited UNTRUSTED_CONTEXT
  block with explicit instruction to treat its content as data, not commands.
  Cheap defense against the (unlikely but real) injection vector where a
  captured AskUserQuestion contains "Ignore previous instructions" text.
- Bump captured-text budget from 4000 → 8000 chars; real plan-format menus
  with 4 options × ~800 chars exceed 4000 and were silently truncating
  Haiku context mid-option.

Cleanup — abbreviation rule + dead imports + touchfile consistency:
- AUQ → AskUserQuestion in 3 sites (office-hours/SKILL.md.tmpl Phase 4
  footer, two test comments) per the always-write-in-full memory rule.
  Regenerated office-hours/SKILL.md.
- Drop unused `describe`/`test` imports in 2 new test files (only
  describeIfSelected/testConcurrentIfSelected wrappers are used).
- Add `test/skill-e2e-office-hours-phase4.test.ts` to its own touchfile
  entry for consistency with other entries that include their test file.
- Fix misleading comment in fixture test about LLM short-circuiting (it's
  has_because, not commits, that skips the API call).

Verified: build clean, free `bun test` exits 0, fixture test 30/30
expect() calls pass, Phase 4 paid eval passes substance 5 in 36s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 18:40:01 -07:00

fixtures

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

helpers

test: apply ship review-army findings — helper extract, slice SKILL.md, defensive judge

2026-05-01 18:40:01 -07:00

agent-sdk-runner.test.ts

v1.11.1.0 fix: plan-mode handshake + canUseTool test harness (#1182 )

2026-04-24 00:04:53 -07:00

analytics.test.ts

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

audit-compliance.test.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

benchmark-cli.test.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

benchmark-runner.test.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

brain-sync.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

builder-profile.test.ts

feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937 )

2026-04-08 22:21:28 -10:00

codex-e2e-plan-format.test.ts

fix(plan-reviews): restore RECOMMENDATION + Completeness split + Codex ELI10 (v1.6.3.0) (#1149 )

2026-04-23 07:25:20 -07:00

codex-e2e.test.ts

feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425 )

2026-03-23 23:05:22 -07:00

codex-hardening.test.ts

codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )

2026-04-18 12:30:54 +08:00

context-save-hardening.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

diff-scope.test.ts

feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )

2026-03-30 22:07:50 -06:00

e2e-harness-audit.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

explain-level-config.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gbrain-detect-install.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

gbrain-lib-verify.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

gbrain-repo-policy.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

gbrain-supabase-provision.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

gemini-e2e.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

gen-skill-docs.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

global-discover.test.ts

fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817 )

2026-04-05 02:02:06 -07:00

gstack-brain-init-gh-mock.test.ts

v1.12.2.0 fix: /setup-gbrain day-two fixes (MCP scope, version parse, gh repo create order, smoke test) (#1187 )

2026-04-24 07:51:46 -07:00

gstack-developer-profile.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gstack-gbrain-source-wireup.test.ts

v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234 )

2026-04-28 01:17:54 -07:00

gstack-next-version.test.ts

v1.11.0.0 feat(ship): workspace-aware version allocation (#1168 )

2026-04-23 23:03:27 -07:00

gstack-paths.test.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

gstack-question-log.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gstack-question-preference.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gstack-upgrade-migration-v1_17_0_0.test.ts

v1.17.0.0: setup-gbrain wireup ships the gbrain federation surface (#1234 )

2026-04-28 01:17:54 -07:00

helpers-unit.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

hook-scripts.test.ts

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

host-config.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

jargon-list.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

learnings-injection.test.ts

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

2026-04-06 00:47:04 -07:00

learnings.test.ts

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 )

2026-03-29 17:02:01 -06:00

llm-judge-recommendation.test.ts

test: apply ship review-army findings — helper extract, slice SKILL.md, defensive judge

2026-05-01 18:40:01 -07:00

migration-checkpoint-ownership.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

model-overlay-opus-4-7.test.ts

v1.13.0.0 feat: add Claude outside-voice skill (#1212 )

2026-04-25 11:52:48 -07:00

openclaw-native-skills.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

plan-tune.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

pr-title-rewrite.test.ts

v1.23.0.0 feat: always prefix PR titles with v<VERSION> (#1284 )

2026-05-01 07:06:37 -07:00

preamble-compose.test.ts

v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178 )

2026-04-23 18:25:34 -07:00

readme-throughput.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

relink.test.ts

fix: headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) (#1025 )

2026-04-16 15:39:44 -07:00

resolver-ask-user-format.test.ts

v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178 )

2026-04-23 18:25:34 -07:00

review-log.test.ts

fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552 )

2026-03-26 23:21:27 -06:00

secret-sink-harness.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

setup-codesign.test.ts

codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )

2026-04-18 12:30:54 +08:00

ship-version-sync.test.ts

fix(ship): detect + repair VERSION/package.json drift in Step 12 (v1.1.1.0) (#1063 )

2026-04-18 23:58:59 +08:00

skill-budget-regression.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-collision-sentinel.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

skill-e2e-ask-user-question-format-compliance.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-e2e-auto-decide-preserved.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-autoplan-auto-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-autoplan-chain.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-e2e-autoplan-dual-voice.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

skill-e2e-benchmark-providers.test.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

skill-e2e-brain-privacy-gate.test.ts

v1.12.0.0 feat: /setup-gbrain — coding-agent onboarding for gbrain (#1183 )

2026-04-24 01:38:21 -07:00

skill-e2e-bws.test.ts

feat(v1.9.0.0): gbrain-sync — cross-machine gstack memory (#1151 )

2026-04-23 17:54:54 -07:00

skill-e2e-context-skills.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

skill-e2e-cso.test.ts

feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384 )

2026-03-23 06:57:22 -07:00

skill-e2e-deploy.test.ts

feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518 )

2026-03-26 11:08:31 -07:00

skill-e2e-design.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-learnings.test.ts

feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )

2026-03-31 23:08:22 -06:00

skill-e2e-office-hours-auto-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-office-hours-phase4.test.ts

test: apply ship review-army findings — helper extract, slice SKILL.md, defensive judge

2026-05-01 18:40:01 -07:00

skill-e2e-office-hours.test.ts

feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )

2026-04-19 05:44:39 +08:00

skill-e2e-opus-47.test.ts

feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117 )

2026-04-22 01:06:22 -07:00

skill-e2e-overlay-harness.test.ts

feat(v1.10.1.0): overlay efficacy harness + Opus 4.7 fanout nudge removal (#1166 )

2026-04-23 18:42:58 -07:00

skill-e2e-plan-ceo-finding-count.test.ts

v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255 )

2026-04-30 02:50:09 -07:00

skill-e2e-plan-ceo-mode-routing.test.ts

v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255 )

2026-04-30 02:50:09 -07:00

skill-e2e-plan-ceo-plan-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-plan-design-finding-count.test.ts

v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255 )

2026-04-30 02:50:09 -07:00

skill-e2e-plan-design-plan-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-plan-design-with-ui.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-e2e-plan-devex-finding-count.test.ts

v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255 )

2026-04-30 02:50:09 -07:00

skill-e2e-plan-devex-plan-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-plan-eng-finding-count.test.ts

v1.21.1.0 test: tighten plan-ceo-review smoke (Step 0 must fire) (#1255 )

2026-04-30 02:50:09 -07:00

skill-e2e-plan-eng-plan-mode.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

skill-e2e-plan-format.test.ts

test: apply ship review-army findings — helper extract, slice SKILL.md, defensive judge

2026-05-01 18:40:01 -07:00

skill-e2e-plan-mode-no-op.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-e2e-plan-prosons.test.ts

v1.10.0.0: fix AskUserQuestion cadence + Pros/Cons format upgrade (#1178 )

2026-04-23 18:25:34 -07:00

skill-e2e-plan-tune.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

skill-e2e-plan.test.ts

feat: mode-posture energy fix for /plan-ceo-review and /office-hours (v1.1.2.0) (#1065 )

2026-04-19 05:44:39 +08:00

skill-e2e-qa-bugs.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-qa-workflow.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-review-army.test.ts

feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )

2026-03-30 22:07:50 -06:00

skill-e2e-review.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

skill-e2e-session-intelligence.test.ts

fix(checkpoint): rename /checkpoint → /context-save + /context-restore (v1.0.1.0) (#1064 )

2026-04-19 08:38:19 +08:00

skill-e2e-ship-idempotency.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00

skill-e2e-sidebar.test.ts

feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793 )

2026-04-04 15:32:20 -07:00

skill-e2e-skillify.test.ts

v1.20.0.0 feat: browser-skills runtime + gbrain-support carryover (#1233 )

2026-04-28 20:08:04 -07:00

skill-e2e-workflow.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

skill-e2e.test.ts

feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )

2026-03-31 23:08:22 -06:00

skill-llm-eval.test.ts

feat: voice directive for all skills (v0.12.3.0) (#520 )

2026-03-26 17:31:53 -06:00

skill-parser.test.ts

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

skill-routing-e2e.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

skill-validation.test.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

taste-engine.test.ts

feat(v1.3.0.0): open agents learnings + cross-model benchmark skill (#1040 )

2026-04-19 17:50:31 +08:00

team-mode.test.ts

feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing (#1117 )

2026-04-22 01:06:22 -07:00

telemetry.test.ts

feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641 )

2026-03-29 21:43:36 -06:00

test-free-shards.test.ts

v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252 )

2026-05-01 07:21:28 -07:00

timeline.test.ts

feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733 )

2026-04-01 00:50:42 -06:00

touchfiles.test.ts

v1.25.0.0 fix: AskUserQuestion resolves to host MCP variant when native is disallowed (#1287 )

2026-05-01 08:45:36 -07:00

uninstall.test.ts

feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561 )

2026-03-27 00:44:37 -06:00

upgrade-migration-v1.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

v0-dormancy.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

worktree.test.ts

feat: content security — 4-layer prompt injection defense for pair-agent (#815 )

2026-04-06 14:41:06 -07:00

writing-style-resolver.test.ts

v1.15.0.0 feat: slim preamble + real-PTY plan-mode E2E harness (#1215 )

2026-04-26 13:55:13 -07:00