chore: bump version and changelog (v1.26.2.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:39:45 +08:00 · 2026-05-03 20:08:41 -07:00
parent 9df3e140d5
commit e213c10da8
3 changed files with 53 additions and 2 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,56 @@
 # Changelog

+## [1.26.2.0] - 2026-05-03
+
+## **`/plan-eng-review` always asks. Never silently writes findings to your plan first.**
+
+Plan-mode review skills now have a hard STOP gate before any AskUserQuestion. The bug
+this closes: a `/plan-eng-review` session would do Step 0 scope challenge, find real
+issues, write the findings into the plan file as prose, then call `ExitPlanMode` —
+never invoking AskUserQuestion. The user only saw "ready to execute" with the model's
+opinions already baked in. The tool to surface the question existed, the prompt
+told the model to use it, and the model still routed around it.
+
+Five sites in `plan-eng-review/SKILL.md.tmpl` now use the office-hours `b512be71`
+pattern verbatim: "the AskUserQuestion call is a tool_use, not prose — call the
+tool directly," named blockers ("do not edit the plan file, do not call
+ExitPlanMode"), and an anti-rationalization clause ("loading the schema via
+ToolSearch and writing the recommendation as chat prose is the failure mode this
+gate exists to prevent"). The four review-section gates (Architecture, Code
+Quality, Test, Performance) and the Step 0 complexity-check trigger all use the
+same language.
+
+### What you can now do
+
+- **Trust that any plan-* review skill that produces a plan file ends with the review report.** All four plan-mode E2E tests (`plan-eng`, `plan-ceo`, `plan-design`, `plan-devex`) now assert `## GSTACK REVIEW REPORT` is the last `## ` section of the plan file whenever one was written. The `{{PLAN_FILE_REVIEW_REPORT}}` resolver mandated this contract; nothing tested it until now.
+- **Seed a draft plan into a `runPlanSkillObservation` test run.** New `initialPlanContent` option in the runner pre-pumps a user message before invoking the skill, so STOP-gate regression tests have guaranteed-finding-triggering complexity (8+ files, custom-vs-builtin smell) to react to. Without this, plan mode created an empty plan and the skill had nothing to find.
+- **Catch the "writes findings to plan as prose before asking" failure mode.** New `wrote_findings_before_asking` classifier outcome fires when a `Write`/`Edit` to `.claude/plans/*` precedes any AskUserQuestion render in the session window. Opt-in via `strictPlanWrites: true` so existing tests where zero-findings → write plan → plan_ready stays legitimate.
+- **Run `plan-design-review-plan-mode` on PR CI again.** The touchfiles entry was duplicated — `plan-design-review-plan-mode` appeared at line 94 (gate, full deps) and line 243 (smaller deps). JS object literals: later wins. The effective tier was `periodic`, not `gate`. Three of four plan-mode siblings ran on every PR; design didn't.
+
+### Itemized changes
+
+#### Added
+
+- `runPlanSkillObservation`'s `initialPlanContent?: string` option. Pre-pumps a user message containing the seeded plan before invoking the skill, with a 3s gap so the message renders before the slash command.
+- `ClassifyResult` outcome `wrote_findings_before_asking` with companion `strictPlanWrites?` opt on `classifyVisible`. Six new unit tests in `claude-pty-runner.unit.test.ts` cover before/after-AUQ ordering plus the strict-off legacy path.
+- Shared test helper `assertReportAtBottomIfPlanWritten(obs)` in `claude-pty-runner.ts`. Wraps the existing `assertReviewReportAtBottom(content)` and gates on `obs.planFile` (artifact existing), so the assertion fires under `'asked'` and `'plan_ready'` both — wherever a plan file was actually written.
+- New seeded-plan test case in `skill-e2e-plan-eng-plan-mode.test.ts`: `STOP gate fires when seeded plan forces Step 0 findings`. Combines `initialPlanContent` + `--disallowedTools AskUserQuestion` to force the Conductor MCP-variant path through `mcp__*__AskUserQuestion`.
+
+#### Changed
+
+- `plan-eng-review/SKILL.md.tmpl` lines 116, 139, 152, 160, 169 ported from soft "STOP." prose to the office-hours pattern. Adds tool_use reminder, names blocked next steps explicitly, anti-rationalization clause.
+- `runPlanSkillObservation` now captures `obs.planFile` on every classifier outcome (was: only `'plan_ready'`). Catches the case where the skill wrote a plan partway through then paused on a question.
+
+#### Fixed
+
+- `test/helpers/touchfiles.ts` duplicate `plan-design-review-plan-mode` keys deleted (line 243 in `E2E_TOUCHFILES`, line 524 in `E2E_TIERS`). Effective tier is now `gate` again, matching the other three siblings.
+- `scripts/resolvers/review.ts` added to all four plan-mode-test touchfiles entries so changes to the `{{PLAN_FILE_REVIEW_REPORT}}` resolver text trigger all four sibling tests in `bun run eval:select`.
+
+#### For contributors
+
+- 6 new classifier unit tests in `test/helpers/claude-pty-runner.unit.test.ts` (70 → 76).
+- Outside-voice review by Codex (gpt-5-codex / medium) caught three blockers in the planning stage: a duplicate weaker `planFileEndsWithReviewReport` helper proposal (rejected — reuse existing `assertReviewReportAtBottom`), the seeded-plan-invisibility issue (resolved via `initialPlanContent`), and the touchfiles duplicate-key bug (resolved here). Plan iteration tracked as D3/D4/D5 in `~/.claude/plans/system-instruction-you-are-working-fancy-aho.md`.
+
 ## [1.26.0.0] - 2026-05-02

 ## **Your coding agent now remembers everything. Every gstack skill auto-loads what you actually did.**
--- a/2
+++ b/2
@@ -1 +1 @@
-1.26.0.0
+1.26.2.0
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "gstack",
-  "version": "1.26.0.0",
+  "version": "1.26.2.0",
  "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
  "license": "MIT",
  "type": "module",
@@ -1 +1 @@
 .26.0.0
 .26.2.0