test: plan-mode handshake E2E coverage and unit assertions

Adds 6 E2E test files and 8 new unit assertions to verify the plan-mode
handshake works end-to-end and stays correct under regeneration.

E2E tests (gate-tier, paid, EVALS=1 EVALS_TIER=gate):
- test/skill-e2e-plan-ceo-plan-mode.test.ts — handshake fires before any
  Write/Edit when plan-mode distinctive phrase is present; 2-option shape
  (Exit/Cancel); option A routes to ExitPlanMode cleanly
- test/skill-e2e-plan-eng-plan-mode.test.ts — same contract for plan-eng
- test/skill-e2e-plan-design-plan-mode.test.ts — same contract for
  plan-design; exercises C-cancel branch instead of A-exit
- test/skill-e2e-plan-devex-plan-mode.test.ts — same contract for plan-devex
- test/skill-e2e-plan-mode-no-op.test.ts — negative regression: handshake
  must NOT fire when distinctive phrase is absent; skill proceeds normally
  through Step 0 (REGRESSION RULE guardrail against breaking existing
  interactive-review sessions)
- test/e2e-harness-audit.test.ts — free unit test asserting every
  `interactive: true` skill has at least one canUseTool-using test file
  (prevents future drift where a skill opts in without coverage)

Shared helper test/helpers/plan-mode-handshake-helpers.ts centralizes the
canUseTool interceptor + distinctive-phrase injection so the 4 sibling
E2E tests are thin wiring (~20 LOC each) and can't drift out of sync.

Unit assertions added to test/gen-skill-docs.test.ts:
- handshake section present in all 4 Claude-generated SKILL.md files
- handshake section absent from non-interactive Claude skills (ship,
  review, qa, office-hours, codex, retro, cso)
- handshake section absent from non-Claude host outputs (.agents, etc.)
- 0C-bis STOP block present in plan-ceo-review/SKILL.md at correct
  position (between the "Present these approach options" line and
  "### 0D-prelude" header)
- handshake resolver wired BEFORE generateUpgradeCheck in preamble
  composition order

6 new gate-tier entries added to test/helpers/touchfiles.ts so any change
to the handshake resolver, preamble composition, skill templates, question
registry, one-way-door classifier, or agent-sdk-runner fires the relevant
E2E tests. test/touchfiles.test.ts updated for the new selection count
(plan-ceo-review/** now triggers 15 tests, up from 8).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-23 23:41:13 -07:00
parent 28b14fbf0c
commit d46a83b2e1
10 changed files with 565 additions and 2 deletions

View File

@@ -2774,3 +2774,93 @@ describe('voice-triggers processing', () => {
expect(frontmatter).not.toContain('voice-triggers:');
});
});
describe('plan-mode handshake (interactive: true) resolver', () => {
const INTERACTIVE_SKILLS = [
'plan-ceo-review',
'plan-eng-review',
'plan-design-review',
'plan-devex-review',
];
const HANDSHAKE_MARKER = '## Plan Mode Handshake';
test.each(INTERACTIVE_SKILLS)(
'%s (Claude host) SKILL.md contains the handshake section',
(skill) => {
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
expect(content).toContain(HANDSHAKE_MARKER);
expect(content).toContain(
'Plan mode is active. The user indicated that they do not want you to execute yet',
);
},
);
test('handshake is absent from non-interactive Claude skills', () => {
const nonInteractive = ['ship', 'review', 'qa', 'office-hours', 'codex', 'retro', 'cso'];
for (const skill of nonInteractive) {
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
expect(content).not.toContain(HANDSHAKE_MARKER);
}
});
test('handshake is absent from non-Claude host outputs when present on disk', () => {
// Non-Claude hosts render to hostSubdirs (.agents/, .openclaw/, etc). The
// handshake resolver returns '' when ctx.host !== 'claude', so those
// outputs must not contain the marker. The current gen-skill-docs layout
// prefixes skill names as `gstack-<skill>` under the hostSubdir; older
// layouts used `gstack/<skill>` (no prefix). Only stable-present paths
// are asserted — older ones may or may not exist per install history.
const candidateOutputs = [
// Current prefixed layout
path.join(ROOT, '.agents', 'skills', 'gstack-plan-ceo-review', 'SKILL.md'),
path.join(ROOT, '.openclaw', 'skills', 'gstack-plan-ceo-review', 'SKILL.md'),
path.join(ROOT, '.opencode', 'skills', 'gstack-plan-ceo-review', 'SKILL.md'),
path.join(ROOT, '.factory', 'skills', 'gstack-plan-ceo-review', 'SKILL.md'),
path.join(ROOT, '.hermes', 'skills', 'gstack-plan-ceo-review', 'SKILL.md'),
];
let checked = 0;
for (const out of candidateOutputs) {
if (fs.existsSync(out)) {
const content = fs.readFileSync(out, 'utf-8');
expect(content).not.toContain(HANDSHAKE_MARKER);
checked++;
}
}
// At least one non-Claude host's output should exist after a full gen
// run; this test is meaningful only if we checked something. If no
// non-Claude outputs exist locally, the cross-host guarantee is still
// enforced by the resolver's ctx.host check; this test is belt-and-
// suspenders and becomes a no-op rather than a false positive.
if (checked === 0) {
// eslint-disable-next-line no-console
console.warn(
'plan-mode handshake: no non-Claude host outputs found for cross-host absence check — ' +
'run `bun run gen:skill-docs --host all` to populate',
);
}
});
test('0C-bis STOP block present in plan-ceo-review/SKILL.md', () => {
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
const presentIdx = content.indexOf('Present these approach options via AskUserQuestion');
const preludeIdx = content.indexOf('### 0D-prelude');
expect(presentIdx).toBeGreaterThan(0);
expect(preludeIdx).toBeGreaterThan(presentIdx);
const between = content.slice(presentIdx, preludeIdx);
expect(between).toContain('**STOP.**');
expect(between).toContain('Do NOT proceed to Step 0D or 0F until the user responds to 0C-bis');
});
test('handshake resolver is wired BEFORE generateUpgradeCheck in preamble', () => {
const content = fs.readFileSync(
path.join(ROOT, 'plan-ceo-review', 'SKILL.md'),
'utf-8',
);
const handshakeIdx = content.indexOf(HANDSHAKE_MARKER);
const upgradeIdx = content.indexOf('UPGRADE_AVAILABLE');
expect(handshakeIdx).toBeGreaterThan(0);
expect(upgradeIdx).toBeGreaterThan(0);
expect(handshakeIdx).toBeLessThan(upgradeIdx);
});
});