mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-20 11:19:56 +08:00
feat: granular touchfiles + 2-tier E2E test system (gate/periodic)
- Shrink GLOBAL_TOUCHFILES from 9 to 3 (only truly global deps) - Move scoped deps (gen-skill-docs, llm-judge, test-server, worktree, codex/gemini session runners) into individual test entries - Add E2E_TIERS map classifying each test as gate or periodic - Replace EVALS_FAST with EVALS_TIER env var (gate/periodic) - Add tier validation test (E2E_TIERS keys must match E2E_TOUCHFILES) - CI runs only gate tests; periodic tests run weekly via cron - Add evals-periodic.yml workflow (Monday 6 AM UTC + manual) - Remove allow_failure flags (gate tests should be reliable) - Add test:gate and test:periodic scripts, remove test:e2e:fast
This commit is contained in:
12
CLAUDE.md
12
CLAUDE.md
@@ -7,6 +7,8 @@ bun install # install dependencies
|
||||
bun test # run free tests (browse + snapshot + skill validation)
|
||||
bun run test:evals # run paid evals: LLM judge + E2E (diff-based, ~$4/run max)
|
||||
bun run test:evals:all # run ALL paid evals regardless of diff
|
||||
bun run test:gate # run gate-tier tests only (CI default, blocks merge)
|
||||
bun run test:periodic # run periodic-tier tests only (weekly cron / manual)
|
||||
bun run test:e2e # run E2E tests only (diff-based, ~$3.85/run max)
|
||||
bun run test:e2e:all # run ALL E2E tests regardless of diff
|
||||
bun run eval:select # show which tests would run based on current diff
|
||||
@@ -29,9 +31,17 @@ against the previous run.
|
||||
**Diff-based test selection:** `test:evals` and `test:e2e` auto-select tests based
|
||||
on `git diff` against the base branch. Each test declares its file dependencies in
|
||||
`test/helpers/touchfiles.ts`. Changes to global touchfiles (session-runner, eval-store,
|
||||
llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
|
||||
touchfiles.ts itself) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
|
||||
variants to force all tests. Run `eval:select` to preview which tests would run.
|
||||
|
||||
**Two-tier system:** Tests are classified as `gate` or `periodic` in `E2E_TIERS`
|
||||
(in `test/helpers/touchfiles.ts`). CI runs only gate tests (`EVALS_TIER=gate`);
|
||||
periodic tests run weekly via cron or manually. Use `EVALS_TIER=gate` or
|
||||
`EVALS_TIER=periodic` to filter. When adding new E2E tests, classify them:
|
||||
1. Safety guardrail or deterministic functional test? -> `gate`
|
||||
2. Quality benchmark, Opus model test, or non-deterministic? -> `periodic`
|
||||
3. Requires external service (Codex, Gemini)? -> `periodic`
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user