claude-pty-runner.ts:
- parseNumberedOptions(visible) anchors on the latest "❯ 1." cursor and
returns {index, label}[]; tests that route on option labels can find
indices without hard-coding positions
- isPermissionDialogVisible(visible) detects file-grant + workspace-trust
+ bash-permission shapes (multiple regex variants)
- isNumberedOptionListVisible: replaced \b2\. word-boundary regex with
[^0-9]2\. — stripAnsi removes TTY cursor-positioning escapes that
collapse "Option 2." to "Option2.", and \b fails on word-to-word
eval-store.ts:
- findBudgetRegressions(comparison, opts?) — pure function returning
tests where tools or turns grew >cap× vs prior run; floors at 5 prior
tools / 3 prior turns to avoid noise on tiny numbers
- assertNoBudgetRegression() — wrapper that throws with full violation
list. Env override GSTACK_BUDGET_RATIO
helpers-unit.test.ts: 23 unit tests covering empty/sparse/wrap-around
buffers for parseNumberedOptions, plus regression-floor + env-override
cases for findBudgetRegressions/assertNoBudgetRegression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds test/helpers/claude-pty-runner.ts. Spawns the actual claude binary
via Bun.spawn({terminal:}) (Bun 1.3.10+ has built-in PTY — no node-pty,
no native modules), drives it through stdin/stdout, and parses rendered
terminal frames. Pattern adapted from the cc-pty-import branch's
terminal-agent.ts but stripped of WS/cookie/Origin scaffolding (not
needed for headless tests).
Public API:
- launchClaudePty(opts) — boots claude with --permission-mode plan|null,
auto-handles the workspace-trust dialog, returns a session handle.
- session.send / sendKey / waitForAny / waitFor / mark / visibleSince /
visibleText / rawOutput / close
- runPlanSkillObservation({skillName, inPlanMode, timeoutMs}) — high-level
contract for plan-mode skill tests. Returns { outcome, summary, evidence,
elapsedMs }. outcome ∈ {asked, plan_ready, silent_write, exited, timeout}.
Replaces the SDK-based runPlanModeSkillTest from plan-mode-helpers.ts
which never worked. Plan mode renders its native "Ready to execute"
confirmation as TTY UI (numbered options with ❯ cursor), not via the
AskUserQuestion tool — so the SDK's canUseTool interceptor never fired
and the assertion always saw zero questions. Real PTY observes the
rendered output directly.
Deletes test/helpers/plan-mode-helpers.ts. No production callers remained.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>