chore: bump version and changelog (v0.4.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-15 21:17:15 -05:00
parent 03a6270b9c
commit 210e1b1f25
5 changed files with 56 additions and 5 deletions

View File

@@ -1,5 +1,30 @@
# Changelog # Changelog
## 0.4.0 — 2026-03-16
### Added
- **QA-only skill** (`/qa-only`) — report-only QA mode that finds and documents bugs without making fixes. Uses `allowedTools` to block `Edit` tool entirely.
- **QA fix loop** — `/qa` now runs a find-fix-verify cycle: discover bugs, fix them, commit, re-navigate to confirm the fix took.
- **Plan-to-QA artifact flow** — `/plan-eng-review` writes test-plan artifacts to `~/.gstack/projects/<slug>/` that `/qa` picks up for targeted testing.
- **`{{QA_METHODOLOGY}}` DRY placeholder** — shared QA methodology block injected into both `/qa` and `/qa-only` SKILL.md templates via gen-skill-docs.
- **Eval efficiency metrics** — turns, duration, and cost now displayed across all eval surfaces (summary, comparison, list, watch). Comparison output includes natural-language **Takeaway** commentary interpreting deltas.
- **`generateCommentary()` engine** — pure function that interprets comparison deltas: flags regressions, notes improvements, reports per-test efficiency changes, and produces overall summary.
- **Eval list columns** — `bun run eval:list` now shows Turns and Duration per run.
- **Eval summary per-test efficiency** — `bun run eval:summary` shows average turns/duration/cost per test across runs.
- **`judgePassed()` unit tests** — extracted and tested the pass/fail judgment logic.
- **3 new E2E tests** — qa-only no-fix guardrail, qa fix loop with commit verification, plan-eng-review test-plan artifact.
- **Browser ref staleness detection** — `resolveRef()` now checks element count to detect stale refs after page mutations.
- 3 new snapshot tests for ref staleness.
### Changed
- QA skill prompt restructured with explicit two-cycle workflow (find → fix → verify).
- `formatComparison()` now shows per-test turns and duration deltas alongside cost.
- `printSummary()` shows turns and duration columns.
- `eval-store.test.ts` fixed pre-existing `_partial` file assertion bug.
### Fixed
- Browser ref staleness — refs collected before page mutation (e.g. SPA navigation) are now detected and re-collected.
## 0.3.9 — 2026-03-15 ## 0.3.9 — 2026-03-15
### Added ### Added

View File

@@ -42,6 +42,7 @@ gstack/
│ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s) │ ├── gen-skill-docs.test.ts # Tier 1: generator quality (free, <1s)
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run) │ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run) │ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── ship/ # Ship workflow skill ├── ship/ # Ship workflow skill
├── review/ # PR review skill ├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill ├── plan-ceo-review/ # /plan-ceo-review skill

View File

@@ -2,7 +2,7 @@
**gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.** **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands. Nine opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, QA testing, and engineering retrospectives — all as slash commands.
### Without gstack ### Without gstack
@@ -22,7 +22,8 @@ Eight opinionated workflow skills for [Claude Code](https://docs.anthropic.com/e
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. | | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Triages Greptile review comments. |
| `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. | | `/ship` | Release engineer | Sync main, run tests, resolve Greptile reviews, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. | | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
| `/qa` | QA lead | Systematic QA testing. On a feature branch, auto-analyzes your diff, identifies affected pages, and tests them. Also: full exploration, quick smoke test, regression mode. | | `/qa` | QA + fix engineer | Test app, find bugs, fix them with atomic commits, re-verify. Before/after health scores and ship-readiness summary. Three tiers: Quick, Standard, Exhaustive. |
| `/qa-only` | QA reporter | Report-only QA testing. Same methodology as /qa but never fixes anything. Use when you want a pure bug report without code changes. |
| `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. | | `/setup-browser-cookies` | Session manager | Import cookies from your real browser (Comet, Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages without logging in manually. |
| `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. | | `/retro` | Engineering manager | Team-aware retro: your deep-dive + per-person praise and growth opportunities for every contributor. |
@@ -103,7 +104,7 @@ This is the setup I use. One person, ten parallel agents, each with the right co
Open Claude Code and paste this. Claude will do the rest. Open Claude Code and paste this. Claude will do the rest.
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it. > Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /qa, /qa-only, /setup-browser-cookies, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
### Step 2: Add to your repo so teammates get it (optional) ### Step 2: Add to your repo so teammates get it (optional)
@@ -613,7 +614,7 @@ Or set `auto_upgrade: true` in `~/.gstack/config.yaml` to upgrade automatically
Paste this into Claude Code: Paste this into Claude Code:
> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. > Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa qa-only setup-browser-cookies; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro qa setup-browser-cookies; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
## Development ## Development

View File

@@ -350,6 +350,30 @@
**Priority:** P3 **Priority:** P3
**Depends on:** Eval persistence (shipped in v0.3.6) **Depends on:** Eval persistence (shipped in v0.3.6)
### CI/CD QA quality gate
**What:** Run `/qa` as a GitHub Action step, fail PR if health score drops below threshold.
**Why:** Automated quality gate catches regressions before merge. Currently QA is manual — CI integration makes it part of the standard workflow.
**Context:** Requires headless browse binary available in CI. The `/qa` skill already produces `baseline.json` with health scores — CI step would compare against the main branch baseline and fail if score drops. Would need `ANTHROPIC_API_KEY` in CI secrets since `/qa` uses Claude.
**Effort:** M
**Priority:** P2
**Depends on:** None
### CDP-based DOM mutation detection for ref staleness
**What:** Use Chrome DevTools Protocol `DOM.documentUpdated` / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit `snapshot` call.
**Why:** Current ref staleness detection (async count() check) only catches stale refs at action time. CDP mutation detection would proactively warn when refs become stale, preventing the 5-second timeout entirely for SPA re-renders.
**Context:** Parts 1+2 of ref staleness fix (RefEntry metadata + eager validation via count()) are shipped. This is Part 3 — the most ambitious piece. Requires CDP session alongside Playwright, MutationObserver bridge, and careful performance tuning to avoid overhead on every DOM change.
**Effort:** L
**Priority:** P3
**Depends on:** Ref staleness Parts 1+2 (shipped)
## Completed ## Completed
### Phase 1: Foundations (v0.2.0) ### Phase 1: Foundations (v0.2.0)

View File

@@ -1 +1 @@
0.3.9 0.4.0