feat(test/helpers): initialPlanContent + wrote_findings_before_asking + shared report-at-bottom assertion

Three additions to claude-pty-runner.ts:

1. runPlanSkillObservation gains initialPlanContent?: string. Pre-pumps a
   user message containing the seeded plan before invoking the skill, with
   a 3s gap so the message renders before the slash command. claude has no
   --plan-file flag (verified via claude --help), so message-pump is the
   route. Lets STOP-gate regression tests force complexity findings.

2. ClassifyResult gains wrote_findings_before_asking with companion
   strictPlanWrites?: boolean opt on classifyVisible. Fires when a Write/
   Edit to .claude/plans/* precedes any AskUserQuestion render in the
   session window. Default off — preserves zero-findings → write plan →
   plan_ready as legitimate for unseeded smokes. Six new unit tests cover
   before/after-AUQ ordering, permission-dialog edge case, strict-off path.

3. assertReportAtBottomIfPlanWritten(obs) shared helper. Wraps the existing
   assertReviewReportAtBottom(content) and gates on obs.planFile (artifact
   existing), so the assertion fires under both 'asked' and 'plan_ready'
   when a plan was actually written.

Also: runPlanSkillObservation now captures obs.planFile on every classifier
outcome, not just 'plan_ready'. Catches the case where the skill wrote a
plan partway through then paused on a question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-03 20:08:15 -07:00
parent 25ac219e98
commit ece2650d4f
2 changed files with 202 additions and 9 deletions

View File

@@ -295,6 +295,78 @@ describe('classifyVisible (runtime path through the runner classifier)', () => {
// recent-tail window would surface here.
expect(TAIL_SCAN_BYTES).toBe(1500);
});
// D4-B: strictPlanWrites detector. Catches the transcript bug where the
// model writes findings to the plan file before any AskUserQuestion fires.
test('strictPlanWrites: plan write before any AUQ → wrote_findings_before_asking', () => {
const visible = `
⏺ Edit(/Users/me/.claude/plans/some-plan.md)
⎿ Updated 12 lines
`;
const result = classifyVisible(visible, { strictPlanWrites: true });
expect(result?.outcome).toBe('wrote_findings_before_asking');
expect(result?.summary).toContain('.claude/plans/some-plan.md');
});
test('strictPlanWrites: plan write AFTER an AUQ render → not flagged', () => {
// AUQ renders first, then the model writes the plan post-answer. This is
// the legitimate end-of-workflow flow and must NOT trigger the detector.
const visible = `
D1 — Some scope question
1. Option A
2. Option B
⏺ Edit(/Users/me/.claude/plans/some-plan.md)
⎿ Updated 12 lines
`;
const result = classifyVisible(visible, { strictPlanWrites: true });
// Outcome is 'asked' (the numbered list rendered); the post-AUQ plan
// write is ignored by the detector.
expect(result?.outcome).toBe('asked');
});
test('strictPlanWrites: AUQ first then plan write — write_pos > auq_pos → not flagged', () => {
// Same scenario, more explicit ordering: the regex finds the write at a
// position AFTER the numbered list. Detector lets it through.
const visible = [
'D1 — Choose your approach',
'',
' 1. Approach A',
' 2. Approach B',
'',
'⏺ Write(/Users/me/.claude/plans/draft.md)',
'⎿ Wrote 42 lines',
].join('\n');
const result = classifyVisible(visible, { strictPlanWrites: true });
expect(result?.outcome).toBe('asked');
});
test('strictPlanWrites: only a permission dialog visible → plan write still flagged', () => {
// A permission dialog 1./2. is NOT an AUQ; pre-AUQ plan writes still
// hit the detector even when a permission prompt is on screen.
const visible = `
⏺ Edit(/Users/me/.claude/plans/some-plan.md)
Edit to /Users/me/.claude/plans/some-plan.md
Do you want to proceed?
1. Yes
2. No
`;
const result = classifyVisible(visible, { strictPlanWrites: true });
expect(result?.outcome).toBe('wrote_findings_before_asking');
});
test('strictPlanWrites OFF: plan write before AUQ → returns null (legacy behavior preserved)', () => {
const visible = `
⏺ Edit(/Users/me/.claude/plans/some-plan.md)
⎿ Updated 12 lines
`;
// Without strictPlanWrites, the sanctioned-path list lets this through.
expect(classifyVisible(visible)).toBeNull();
});
});
describe('parseNumberedOptions', () => {