Calibration run 1 surfaced a second issue beyond the parser bug: the
default pick of 1 on /plan-ceo-review's scope-selection AUQ routes
the agent to "branch diff vs main" — so it reviews the gstack PR
itself (recursive!) instead of the seeded fixture plan we sent.
Added firstAUQPick callback to runPlanSkillCounting. Override applies
only to the FIRST AUQ; subsequent presses keep using defaultPick.
ceoStep0Boundary now fires on either the mode-pick AUQ (existing path)
or any AUQ containing "Skip interview and plan immediately" — which
is the scope-selection AUQ. Picking that option bypasses Step 0 and
routes straight to review-phase using the chat-paste plan as context.
Plan-ceo test wires firstAUQPick = pickSkipInterview which finds the
"Skip interview" option by label. Falls back to "describe inline" if
the option labels change.
Two new unit tests: ceoStep0Boundary fires on the scope-selection
fixture; existing mode-pick fixture still fires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Calibration run 1 timed out with step0=0 review=0 because the parser
could not find the cursor in /plan-ceo-review's scope-selection AUQ.
The TTY's box-layout rendering inlines divider + header + prompt +
"1." onto one logical line — cursor escapes get stripped, leaving
text crushed onto a single line.
Cursor anchor regex changed from anchored to unanchored so it matches
mid-line. Cursor-line option extraction uses a non-anchored regex;
subsequent options stay with the original start-of-line parser.
parseQuestionPrompt picks up the inline prompt text BEFORE the cursor
on the cursor line (after stripping box-drawing chars + sigil) and
appends it after any walked-up multi-line prompt above.
Three new unit tests: clean-cursor still works, inline-cursor
extracts all 7 options, prompt extraction strips box chars.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure helpers landing ahead of runPlanSkillCounting:
- parseQuestionPrompt(visible) — extract the 1-3 line prompt above
the latest "❯ 1." cursor, normalize to a 240-char snippet
- auqFingerprint(prompt, opts) — Bun.hash of normalized prompt + sorted
options signature; distinct prompts with shared option labels
(the generic A/B/C TODO menu) get distinct fingerprints
- COMPLETION_SUMMARY_RE — terminal-signal regex matching all four
plan-review skills' completion / verdict markers
- assertReviewReportAtBottom(content) — checks "## GSTACK REVIEW
REPORT" is present and is the last "## " heading in a plan file
- Step0BoundaryPredicate type + four per-skill predicates
(ceo / eng / design / devex) — fire on the answered AUQ's
fingerprint, marking the end of Step 0 deterministically
(event-based, not content-based, per Codex F7)
Plus 37 deterministic unit tests covering option-label collision
regression, prompt extraction edge cases, predicate positive AND
negative cases, and review-report-at-bottom triple-check
(missing / mid-file / multiple trailing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure classifier extracted from runPlanSkillObservation's polling loop so
unit tests can exercise the actual branch order with synthetic input
strings. Runner gains:
- env? passthrough on runPlanSkillObservation (forwarded to launchClaudePty).
gstack-config does not yet honor env overrides; plumbing is in place for a
future change to make tests hermetic.
- TAIL_SCAN_BYTES = 1500 exported constant. Replaces a duplicated magic
number in test/skill-e2e-plan-ceo-mode-routing.test.ts so tuning stays
in sync.
- isPermissionDialogVisible: the bare phrase "Do you want to proceed?" now
requires a file-edit context co-trigger. Other clauses unchanged. Skill
questions that contain the bare phrase are no longer mis-classified.
- classifyVisible(visible): pure function. Branch order silent_write →
plan_ready → asked → null. Permission dialogs filtered out of the
'asked' classification so a permission prompt cannot pose as a Step 0
skill question.
Adds 24 unit tests covering all classifier branches, edge cases, and the
co-trigger contract.