Merge branch 'main' into garrytan/team-supabase-store

Brings in 55 commits from main (v0.12.x–v0.13.5.0): Factory Droid compat,
prompt injection defense, user sovereignty, security audit, design binary,
skill namespacing, modular resolvers, Chrome sidebar, and more.

Conflict resolution:
- .agents/ SKILL.md files: deleted (main moved to .factory/)
- 8 .tmpl templates: accepted main (new features: CDP mode, design tools,
  global retro, parallelization, distribution checks, plan audits)
- scripts/gen-skill-docs.ts: accepted main's modular resolver refactor
- test/helpers/session-runner.ts: accepted main + layered back CostEntry
  tracking from team branch
- Generated SKILL.md files: regenerated via bun run gen:skill-docs
- Updated tests to match main's gstack-slug output (2 lines, no PROJECTS_DIR)
  and review log mechanism (gstack-review-log, not $BRANCH.jsonl)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-29 15:12:12 -07:00
267 changed files with 60292 additions and 12207 deletions

121
CLAUDE.md
View File

@@ -7,6 +7,8 @@ bun install # install dependencies
bun test # run free tests (browse + snapshot + skill validation)
bun run test:evals # run paid evals: LLM judge + E2E (diff-based, ~$4/run max)
bun run test:evals:all # run ALL paid evals regardless of diff
bun run test:gate # run gate-tier tests only (CI default, blocks merge)
bun run test:periodic # run periodic-tier tests only (weekly cron / manual)
bun run test:e2e # run E2E tests only (diff-based, ~$3.85/run max)
bun run test:e2e:all # run ALL E2E tests regardless of diff
bun run eval:select # show which tests would run based on current diff
@@ -30,9 +32,17 @@ against the previous run.
**Diff-based test selection:** `test:evals` and `test:e2e` auto-select tests based
on `git diff` against the base branch. Each test declares its file dependencies in
`test/helpers/touchfiles.ts`. Changes to global touchfiles (session-runner, eval-store,
llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
touchfiles.ts itself) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
variants to force all tests. Run `eval:select` to preview which tests would run.
**Two-tier system:** Tests are classified as `gate` or `periodic` in `E2E_TIERS`
(in `test/helpers/touchfiles.ts`). CI runs only gate tests (`EVALS_TIER=gate`);
periodic tests run weekly via cron or manually. Use `EVALS_TIER=gate` or
`EVALS_TIER=periodic` to filter. When adding new E2E tests, classify them:
1. Safety guardrail or deterministic functional test? -> `gate`
2. Quality benchmark, Opus model test, or non-deterministic? -> `periodic`
3. Requires external service (Codex, Gemini)? -> `periodic`
## Testing
```bash
@@ -56,6 +66,7 @@ gstack/
│ └── dist/ # Compiled binary
├── scripts/ # Build + DX tooling
│ ├── gen-skill-docs.ts # Template → SKILL.md generator
│ ├── resolvers/ # Template resolver modules (preamble, design, review, etc.)
│ ├── skill-check.ts # Health dashboard
│ └── dev-skill.ts # Watch mode
├── test/ # Skill validation + eval tests
@@ -72,10 +83,31 @@ gstack/
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── autoplan/ # /autoplan skill (auto-review pipeline: CEO → design → eng)
├── benchmark/ # /benchmark skill (performance regression detection)
├── canary/ # /canary skill (post-deploy monitoring loop)
├── codex/ # /codex skill (multi-AI second opinion via OpenAI Codex CLI)
├── land-and-deploy/ # /land-and-deploy skill (merge → deploy → canary verify)
├── office-hours/ # /office-hours skill (YC Office Hours — startup diagnostic + builder brainstorm)
├── investigate/ # /investigate skill (systematic root-cause debugging)
├── retro/ # Retrospective skill
├── retro/ # Retrospective skill (includes /retro global cross-project mode)
├── bin/ # CLI utilities (gstack-repo-mode, gstack-slug, gstack-config, etc.)
├── document-release/ # /document-release skill (post-ship doc updates)
├── cso/ # /cso skill (OWASP Top 10 + STRIDE security audit)
├── design-consultation/ # /design-consultation skill (design system from scratch)
├── design-shotgun/ # /design-shotgun skill (visual design exploration)
├── connect-chrome/ # /connect-chrome skill (headed Chrome with side panel)
├── design/ # Design binary CLI (GPT Image API)
│ ├── src/ # CLI + commands (generate, variants, compare, serve, etc.)
│ ├── test/ # Integration tests
│ └── dist/ # Compiled binary
├── extension/ # Chrome extension (side panel + activity feed)
├── lib/ # Shared libraries (worktree.ts)
├── docs/designs/ # Design documents
├── setup-deploy/ # /setup-deploy skill (one-time deploy config)
├── .github/ # CI workflows + Docker image
│ ├── workflows/ # evals.yml (E2E on Ubicloud), skill-docs.yml, actionlint.yml
│ └── docker/ # Dockerfile.ci (pre-baked toolchain + Playwright/Chromium)
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
@@ -150,10 +182,30 @@ symlink or a real copy. If it's a symlink to your working directory, be aware th
- During large refactors, remove the symlink (`rm .claude/skills/gstack`) so the
global install at `~/.claude/skills/gstack/` is used instead
**Prefix setting:** Skill symlinks use either short names (`qa -> gstack/qa`) or
namespaced (`gstack-qa -> gstack/qa`), controlled by `skill_prefix` in
`~/.gstack/config.yaml`. When vendoring into a project, run `./setup` after
symlinking to create the per-skill symlinks with your preferred naming. Pass
`--no-prefix` or `--prefix` to skip the interactive prompt.
**For plan reviews:** When reviewing plans that modify skill templates or the
gen-skill-docs pipeline, consider whether the changes should be tested in isolation
before going live (especially if the user is actively using gstack in other windows).
## Compiled binaries — NEVER commit browse/dist/ or design/dist/
The `browse/dist/` and `design/dist/` directories contain compiled Bun binaries
(`browse`, `find-browse`, `design`, ~58MB each). These are Mach-O arm64 only — they
do NOT work on Linux, Windows, or Intel Macs. The `./setup` script already builds
from source for every platform, so the checked-in binaries are redundant. They are
tracked by git due to a historical mistake and should eventually be removed with
`git rm --cached`.
**NEVER stage or commit these files.** They show up as modified in `git status`
because they're tracked despite `.gitignore` — ignore them. When staging files,
always use specific filenames (`git add file1 file2`) — never `git add .` or
`git add -A`, which will accidentally include the binaries.
## Commit style
**Always bisect commits.** Every commit should be a single logical change. When
@@ -170,7 +222,42 @@ Examples of good bisection:
When the user says "bisect commit" or "bisect and push," split staged/unstaged
changes into logical commits and push.
## CHANGELOG style
## Community PR guardrails
When reviewing or merging community PRs, **always AskUserQuestion** before accepting
any commit that:
1. **Touches ETHOS.md** — this file is Garry's personal builder philosophy. No edits
from external contributors or AI agents, period.
2. **Removes or softens promotional material** — YC references, founder perspective,
and product voice are intentional. PRs that frame these as "unnecessary" or
"too promotional" must be rejected.
3. **Changes Garry's voice** — the tone, humor, directness, and perspective in skill
templates, CHANGELOG, and docs are not generic. PRs that rewrite voice to be
more "neutral" or "professional" must be rejected.
Even if the agent strongly believes a change improves the project, these three
categories require explicit user approval via AskUserQuestion. No exceptions.
No auto-merging. No "I'll just clean this up."
## CHANGELOG + VERSION style
**VERSION and CHANGELOG are branch-scoped.** Every feature branch that ships gets its
own version bump and CHANGELOG entry. The entry describes what THIS branch adds —
not what was already on main.
**When to write the CHANGELOG entry:**
- At `/ship` time (Step 5), not during development or mid-branch.
- The entry covers ALL commits on this branch vs the base branch.
- Never fold new work into an existing CHANGELOG entry from a prior version that
already landed on main. If main has v0.10.0.0 and your branch adds features,
bump to v0.10.1.0 with a new entry — don't edit the v0.10.0.0 entry.
**Key questions before writing:**
1. What branch am I on? What did THIS branch change?
2. Is the base branch version already released? (If yes, bump and create new entry.)
3. Does an existing entry on this branch already cover earlier work? (If yes, replace
it with one unified entry for the final version.)
CHANGELOG.md is **for users**, not contributors. Write it like product release notes:
@@ -247,6 +334,30 @@ them. Report progress at each check (which tests passed, which are running, any
failures so far). The user wants to see the run complete, not a promise that
you'll check later.
## E2E test fixtures: extract, don't copy
**NEVER copy a full SKILL.md file into an E2E test fixture.** SKILL.md files are
1500-2000 lines. When `claude -p` reads a file that large, context bloat causes
timeouts, flaky turn limits, and tests that take 5-10x longer than necessary.
Instead, extract only the section the test actually needs:
```typescript
// BAD — agent reads 1900 lines, burns tokens on irrelevant sections
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
// GOOD — agent reads ~60 lines, finishes in 38s instead of timing out
const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
const start = full.indexOf('## Review Readiness Dashboard');
const end = full.indexOf('\n---\n', start);
fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), full.slice(start, end > start ? end : undefined));
```
Also when running targeted E2E tests to debug failures:
- Run in **foreground** (`bun test ...`), not background with `&` and `tee`
- Never `pkill` running eval processes and restart — you lose results and waste money
- One clean run beats three killed-and-restarted runs
## Deploying to the active skill
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
@@ -255,4 +366,6 @@ The active skill lives at `~/.claude/skills/gstack/`. After making changes:
2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
Or copy the binaries directly:
- `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
- `cp design/dist/design ~/.claude/skills/gstack/design/dist/design`