diff --git a/CHANGELOG.md b/CHANGELOG.md index 6d00e48a..2b3af621 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,56 @@ # Changelog +## [1.26.2.0] - 2026-05-03 + +## **`/plan-eng-review` always asks. Never silently writes findings to your plan first.** + +Plan-mode review skills now have a hard STOP gate before any AskUserQuestion. The bug +this closes: a `/plan-eng-review` session would do Step 0 scope challenge, find real +issues, write the findings into the plan file as prose, then call `ExitPlanMode` — +never invoking AskUserQuestion. The user only saw "ready to execute" with the model's +opinions already baked in. The tool to surface the question existed, the prompt +told the model to use it, and the model still routed around it. + +Five sites in `plan-eng-review/SKILL.md.tmpl` now use the office-hours `b512be71` +pattern verbatim: "the AskUserQuestion call is a tool_use, not prose — call the +tool directly," named blockers ("do not edit the plan file, do not call +ExitPlanMode"), and an anti-rationalization clause ("loading the schema via +ToolSearch and writing the recommendation as chat prose is the failure mode this +gate exists to prevent"). The four review-section gates (Architecture, Code +Quality, Test, Performance) and the Step 0 complexity-check trigger all use the +same language. + +### What you can now do + +- **Trust that any plan-* review skill that produces a plan file ends with the review report.** All four plan-mode E2E tests (`plan-eng`, `plan-ceo`, `plan-design`, `plan-devex`) now assert `## GSTACK REVIEW REPORT` is the last `## ` section of the plan file whenever one was written. The `{{PLAN_FILE_REVIEW_REPORT}}` resolver mandated this contract; nothing tested it until now. +- **Seed a draft plan into a `runPlanSkillObservation` test run.** New `initialPlanContent` option in the runner pre-pumps a user message before invoking the skill, so STOP-gate regression tests have guaranteed-finding-triggering complexity (8+ files, custom-vs-builtin smell) to react to. Without this, plan mode created an empty plan and the skill had nothing to find. +- **Catch the "writes findings to plan as prose before asking" failure mode.** New `wrote_findings_before_asking` classifier outcome fires when a `Write`/`Edit` to `.claude/plans/*` precedes any AskUserQuestion render in the session window. Opt-in via `strictPlanWrites: true` so existing tests where zero-findings → write plan → plan_ready stays legitimate. +- **Run `plan-design-review-plan-mode` on PR CI again.** The touchfiles entry was duplicated — `plan-design-review-plan-mode` appeared at line 94 (gate, full deps) and line 243 (smaller deps). JS object literals: later wins. The effective tier was `periodic`, not `gate`. Three of four plan-mode siblings ran on every PR; design didn't. + +### Itemized changes + +#### Added + +- `runPlanSkillObservation`'s `initialPlanContent?: string` option. Pre-pumps a user message containing the seeded plan before invoking the skill, with a 3s gap so the message renders before the slash command. +- `ClassifyResult` outcome `wrote_findings_before_asking` with companion `strictPlanWrites?` opt on `classifyVisible`. Six new unit tests in `claude-pty-runner.unit.test.ts` cover before/after-AUQ ordering plus the strict-off legacy path. +- Shared test helper `assertReportAtBottomIfPlanWritten(obs)` in `claude-pty-runner.ts`. Wraps the existing `assertReviewReportAtBottom(content)` and gates on `obs.planFile` (artifact existing), so the assertion fires under `'asked'` and `'plan_ready'` both — wherever a plan file was actually written. +- New seeded-plan test case in `skill-e2e-plan-eng-plan-mode.test.ts`: `STOP gate fires when seeded plan forces Step 0 findings`. Combines `initialPlanContent` + `--disallowedTools AskUserQuestion` to force the Conductor MCP-variant path through `mcp__*__AskUserQuestion`. + +#### Changed + +- `plan-eng-review/SKILL.md.tmpl` lines 116, 139, 152, 160, 169 ported from soft "STOP." prose to the office-hours pattern. Adds tool_use reminder, names blocked next steps explicitly, anti-rationalization clause. +- `runPlanSkillObservation` now captures `obs.planFile` on every classifier outcome (was: only `'plan_ready'`). Catches the case where the skill wrote a plan partway through then paused on a question. + +#### Fixed + +- `test/helpers/touchfiles.ts` duplicate `plan-design-review-plan-mode` keys deleted (line 243 in `E2E_TOUCHFILES`, line 524 in `E2E_TIERS`). Effective tier is now `gate` again, matching the other three siblings. +- `scripts/resolvers/review.ts` added to all four plan-mode-test touchfiles entries so changes to the `{{PLAN_FILE_REVIEW_REPORT}}` resolver text trigger all four sibling tests in `bun run eval:select`. + +#### For contributors + +- 6 new classifier unit tests in `test/helpers/claude-pty-runner.unit.test.ts` (70 → 76). +- Outside-voice review by Codex (gpt-5-codex / medium) caught three blockers in the planning stage: a duplicate weaker `planFileEndsWithReviewReport` helper proposal (rejected — reuse existing `assertReviewReportAtBottom`), the seeded-plan-invisibility issue (resolved via `initialPlanContent`), and the touchfiles duplicate-key bug (resolved here). Plan iteration tracked as D3/D4/D5 in `~/.claude/plans/system-instruction-you-are-working-fancy-aho.md`. + ## [1.26.0.0] - 2026-05-02 ## **Your coding agent now remembers everything. Every gstack skill auto-loads what you actually did.** diff --git a/VERSION b/VERSION index ed66fe8a..75334e9d 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.26.0.0 +1.26.2.0 diff --git a/package.json b/package.json index bd238a24..3fa1d216 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.26.0.0", + "version": "1.26.2.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module",