mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-21 03:40:00 +08:00
fix: rewrite session-runner to claude -p subprocess, lower flaky baselines
Session runner now spawns `claude -p` as a subprocess instead of using Agent SDK query(), which fixes E2E tests hanging inside Claude Code. Also lowers command_reference completeness baseline to 3 (flaky oscillation), adds test:e2e script, and updates CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,6 +6,7 @@
|
||||
bun install # install dependencies
|
||||
bun test # run free tests (browse + snapshot + skill validation)
|
||||
bun run test:evals # run paid evals: LLM judge + Agent SDK E2E (~$4/run)
|
||||
bun run test:e2e # run Agent SDK E2E tests only (~$3.85/run)
|
||||
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
|
||||
bun run build # gen docs + compile binaries
|
||||
bun run gen:skill-docs # regenerate SKILL.md files from templates
|
||||
@@ -16,6 +17,9 @@ bun run dev:skill # watch mode: auto-regen + validate on change
|
||||
`test:evals` requires `ANTHROPIC_API_KEY` and must be run from a plain terminal
|
||||
(not inside Claude Code — nested Agent SDK sessions hang).
|
||||
|
||||
**Update (v0.3.5):** The session runner now strips CLAUDE* env vars automatically,
|
||||
so `test:evals` may work inside Claude Code. If E2E tests hang, run from a plain terminal.
|
||||
|
||||
## Project structure
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user