Adversarial review (during /ship Step 11) found that the previous gate-test
envelope ['asked', 'plan_ready'] for the AskUserQuestion-blocked regression
cases accepted the bug they exist to catch: a model that silently skips
Step 0 entirely (writes a plan with no questions, no `## Decisions to
confirm` section, just ExitPlanModes) reaches plan_ready and passes.
The fix tightens the contract in two layers:
1. Harness: PlanSkillObservation gains a `planFile?: string` field
populated when outcome is plan_ready. extractPlanFilePath() walks the
visible TTY buffer for "Plan saved to:", "Plan file:", or
".claude/plans/<name>.md" patterns and resolves tilde to absolute.
planFileHasDecisionsSection() reads the resolved file and returns true
if it contains a `## Decisions` heading (any form: "to confirm",
"needed", etc.).
2. Tests: 5 of 6 regression cases now require, when outcome is plan_ready,
that obs.planFile is set AND planFileHasDecisionsSection returns true.
Otherwise the test fails with a "Step 0 was silently skipped" diagnosis.
plan-design-review remains the sole exception — it legitimately
short-circuits to plan_ready on no-UI-scope branches and we have no
deterministic way to distinguish that from a silent skip.
This closes the loophole the adversarial review identified. The fix
preamble flow already tells the model to write `## Decisions to confirm`
when neither AUQ variant is callable — now the test verifies the model
actually did it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>