mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-16 09:12:13 +08:00
* fix(browse): single-point Unicode sanitization at server egress Add sanitizeLoneSurrogates (regex-based UTF-16 lone-half cleaner) and sanitizeReplacer (JSON.stringify replacer that runs the cleaner on every string field during encoding). Split handleCommandInternal into handleCommandInternalImpl (raw) plus a thin sanitizing wrapper. The wrapper applies sanitizeLoneSurrogates to cr.result so both single-command (handleCommand line 1034) and batch-loop (line 1966) egress paths inherit it. Inline INVARIANT comment near the wrapper documents the architectural constraint. Both SSE producers (activity feed at /activity/stream and inspector stream) stringify with sanitizeReplacer. Post-stringify regex is ineffective on those paths because JSON.stringify has already converted the lone surrogate into the escape sequence "\\\\uD800" before any regex could match it; the replacer runs during stringify on the raw string value, so the substitution lands. Originated from @realcarsonterry PR #1463 (handleCommand-only wrap). Architectural lift to handleCommandInternal + SSE coverage authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(setup): _link_or_copy helper for Windows file-copy fallback On Windows without Developer Mode (MSYS2/Git Bash), plain ln -snf silently creates a frozen file copy that doesn't refresh on git pull. Skill files become stale after every upgrade. Add a _link_or_copy SRC DST helper near IS_WINDOWS detection (line ~33). It auto-dispatches: on Unix it preserves ln -snf semantics, on Windows it copies (cp -R for directories, cp -f for files). When the source is a Unix-style name-only alias that doesn't resolve on disk (the connect-chrome → gstack/open-gstack-browser pattern), the helper returns 0 silently on Windows rather than aborting setup under set -e. Rewrite all 42 prior ln -snf call sites to route through the helper: link_claude_skill_dirs (line 437), team-claude install paths (lines 556, 581, 592), Codex host adapter block (lines 618-640), Factory host adapter block (lines 658-678), OpenCode host adapter block (lines 696-731), Kiro host adapter block (lines 939-953), plus migration and alias sites. Add _print_windows_copy_note_once helper and call it from link_claude_skill_dirs after any linking work completes so Windows users see one user-visible note explaining they must re-run ./setup after every git pull. Extend cleanup_old_claude_symlinks and cleanup_prefixed_claude_symlinks with a Windows branch: when the target is a real directory containing a real-file SKILL.md (no symlink to readlink), and IS_WINDOWS=1, treat the name-matched directory as gstack-managed and remove it. This makes --prefix / --no-prefix flips work on Windows instead of leaving stale copies behind. Originated from @realcarsonterry PR #1462 (1 of 42 sites). Helper extraction, 42-site rewrite, alias-resolution edge case, and Windows cleanup compat authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(docs): rename stale gbrain_sync_mode to artifacts_sync_mode + register /document-generate Five stale gstack-config references in docs/ pointed to the deprecated gbrain_sync_mode key (renamed to artifacts_sync_mode in v1.27.0.0): - docs/gbrain-sync.md: lines 62, 110, 111, 173 - docs/gbrain-sync-errors.md: lines 26, 203 Users following the docs would set a key that gstack-brain-sync no longer reads, silently breaking artifacts sync. Originated from @realcarsonterry PR #1461 (verbatim). Also register /document-generate in AGENTS.md (Operational + memory table) and docs/skills.md (skill index). The skill shipped in v1.35.0.0 but the doc-inventory cross-check in test/skill-validation.test.ts was failing because neither file mentioned it. Allowlist the new test/docs-config-keys.test.ts file in test/no-stale-gstack-brain-refs.test.ts — it intentionally lists the deprecated keys in its DEPRECATED_KEYS denylist (defending the rename). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): migrate windows-free-tests to paid faster runner + register wave tests Move the Windows free-test job from GitHub-hosted windows-latest to Blacksmith's paid Windows runner (blacksmith-2vcpu-windows-2022). Spin-up drops from ~60s to ~10s and Bun installs land 3-4x faster. The label can swap to namespace-profile-windows or ubicloud-windows-* if this repo's Blacksmith installation isn't configured. Register the four new wave tests in the workflow's curated test list: - browse/test/server-sanitize-surrogates.test.ts - test/setup-windows-fallback.test.ts - test/build-script-shell-compat.test.ts - test/docs-config-keys.test.ts These tests cover the Windows-hardening surface that this wave ships (sanitizer wiring, _link_or_copy helper, build-script subshells, doc- config drift), so they need to run on Windows where the bug shapes actually manifest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: wave coverage for sanitizer, link_or_copy, build script, doc drift Four new test files (29 cases total): browse/test/server-sanitize-surrogates.test.ts: - 11 unit cases for sanitizeLoneSurrogates (passthrough, valid pair, lone high/low mid-string, trailing/leading lone, adjacent doubles, pair-then-lone, lone-then-pair, empty) - 2 bug-repro tests pinning the regression intent (UTF-8 round-trip, JSON.parse round-trip with codepoint assertion) - 4 wiring invariants asserting the architectural choke points stay intact (handleCommandInternalImpl rename, central sanitization line, sanitizeReplacer function exists, SSE producers stringify with replacer) Function extracted from server.ts via regex + eval'd in test scope so no production-code export is needed. test/setup-windows-fallback.test.ts: - Static invariant (D7): zero raw `ln` calls outside the _link_or_copy helper body and comments - Helper-existence assertions - 4-cell behavior matrix (file/dir × Windows/Unix) via awk-style helper extraction + bash -c sourcing - Windows-note printer registration check Mirrors test/setup-conductor-worktree.test.ts patterns. test/build-script-shell-compat.test.ts: - Regex assertion that package.json scripts.* contain no bash brace groups (Bun-Windows-hostile) - Subshell-precedence check for `.version` redirects Strips single-quoted strings before regexing so embedded JS code inside echo '...' doesn't false-positive. test/docs-config-keys.test.ts: - DEPRECATED_KEYS denylist scanned across docs/**/*.md - Round-trip test for `gstack-config get artifacts_sync_mode` Defends the v1.27.0.0 rename from doc drift. Updates to two existing tests: - test/setup-conductor-worktree.test.ts: expect `_link_or_copy` instead of `ln -snf` at the Conductor-worktree guard call site - test/gen-skill-docs.test.ts: same swap at three assertion sites (Codex section, Claude link_claude_skill_dirs body, Codex link_codex_skill_dirs body) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump v1.38.0.0 + build-script subshells + CHANGELOG VERSION 1.35.0.0 → 1.38.0.0 (MINOR). PR #1500 (lyon-v2) claimed v1.37.0.0 ahead of this branch; v1.38.0.0 is the next free MINOR slot per bin/gstack-next-version queue check. Workspace-aware ship rule applies — queue-advancing past a claimed version within the same bump level is explicitly permitted. package.json build script: three `{ git rev-parse HEAD ...; }` brace groups → `( git rev-parse HEAD ... )` subshells. Bun's Windows shell parser doesn't grok bash brace groups; subshells are POSIX-universal. Originated from @realcarsonterry PR #1460. CHANGELOG entry covers the full wave: - Windows install hardening (42-site _link_or_copy + cleanup compat) - Unicode sanitization architecture (handleCommandInternal + SSE replacer) - Build script POSIX-shell compat (subshells) - Doc rename (gbrain_sync_mode → artifacts_sync_mode) - Windows CI on paid faster runner - 4 new wave tests (29 cases) Frames each item as a current system property, not a fix narrative. Credits @realcarsonterry for PRs #1460, #1461, #1462, #1463 (the seed of the wave). Scope expansion to all 42 setup sites, every server egress path, Windows CI migration, and codex-flagged P0/P1 fixes (connect-chrome alias on Windows, SSE replacer, prefix-cleanup Windows compat) authored on this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship sync for v1.38.0.0 Document the two architectural invariants that landed in v1.38.0.0 in their persistent homes (not just CHANGELOG): - README Windows section: add the `./setup` re-run-after-git-pull requirement that `_print_windows_copy_note_once` shows at runtime. - CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for contributors editing `setup`, with the test that enforces it. - ARCHITECTURE: new "Unicode sanitization at server egress" section between Shell injection prevention and Prompt injection defense, with egress table (HTTP/batch/SSE) and the post-stringify-regex rationale. - CLAUDE.md: cross-references for both invariants, matching the v1.6.0.0 dual-listener pattern (each constraint says which files to read before editing and which test pins it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): use windows-latest-8-cores instead of unregistered Blacksmith label actionlint failed PR #1505 because `blacksmith-2vcpu-windows-2022` isn't in the repo's approved runner-label list (actionlint.yaml only registers `ubicloud-standard-2`, and Ubicloud doesn't ship a Windows pool). Switch to GitHub's paid larger Windows runner `windows-latest-8-cores` — 4x the cores of the free `windows-latest` at the larger-runner billing rate, no new third-party CI provider, no actionlint config changes. CHANGELOG: replace "Blacksmith" / "blacksmith-2vcpu-windows-2022" / "~6x faster spin-up" claims with the actual choice (8 cores vs 4, paid larger runner). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): switch from windows-latest-8-cores to ubicloud-standard-2-windows `windows-latest-8-cores` sat queued indefinitely because the GitHub larger-runner billing isn't enabled at the org level — the "Queued — Waiting to run this check" status surfaced on PR #1505 with no progress for the whole CI run. Switch to Ubicloud Windows runners (`ubicloud-standard-2-windows`) so Windows CI uses the same provider as the existing Linux evals (`ubicloud-standard-2`). Billing stays under one account instead of two. Register the new label in actionlint.yaml alongside the existing ubicloud-standard-2 entry so actionlint doesn't reject it as unknown. CHANGELOG entry updated: runner row reflects the actual provider chosen, "Itemized changes" mentions the actionlint.yaml registration, and the narrative paragraph documents why `windows-latest-8-cores` failed first. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci: migrate all workflows to Ubicloud (Linux + Windows, 8-core) Switch every `runs-on` in this repo to Ubicloud so CI has a single billing surface, consistent capacity, and 4x more cores on the workloads that were previously stuck on free `ubuntu-latest` (2 cores). Windows uses Ubicloud's Windows pool too — `ubicloud-standard-8-windows` — so the queued-forever problem with GitHub's `windows-latest-8-cores` paid larger runner (org-level larger-runner billing not enabled) goes away. Workflows touched (9): - evals.yml, evals-periodic.yml, ci-image.yml — bump default + matrix from `ubicloud-standard-2` to `ubicloud-standard-8`. The one matrix entry that was already on -8 stays. - windows-free-tests.yml — `ubicloud-standard-2-windows` → `ubicloud-standard-8-windows`. - make-pdf-gate.yml — matrix `ubuntu-latest` → `ubicloud-standard-8`. macOS entry preserved; the poppler-install `if: matrix.os` conditional swaps to match the new label. - actionlint.yml, pr-title-sync.yml, skill-docs.yml, version-gate.yml — `ubuntu-latest` → `ubicloud-standard-8`. .github/actionlint.yaml registers all four Ubicloud labels in one place: - ubicloud-standard-2 - ubicloud-standard-8 - ubicloud-standard-2-windows (the v1.38.0.0 windows-free-tests target) - ubicloud-standard-8-windows (this PR's windows-free-tests target) Removed the duplicate `actionlint.yaml` at the repo root that I accidentally created in the prior commit — actionlint only reads `.github/actionlint.yaml`, so the root file was dead weight. CHANGELOG entry updated: a single "all Ubicloud" sentence in the narrative plus a metrics-row covering the runner pool change, and the itemized line expanded to enumerate the 9 affected workflows. The previously-orphaned "Itemized changes" line about just `windows-free-tests.yml` is replaced. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ci(windows): revert to free `windows-latest` Ubicloud doesn't ship Windows runners — confirmed via their docs. The `ubicloud-standard-*-windows` labels I added do not exist and were causing `windows-free-tests` to sit "Queued — Waiting to run this check" forever (GitHub Actions can't tell a typoed label from a self-hosted runner that's about to register; it just waits). Three prior Windows-runner attempts all failed for different reasons: - `blacksmith-2vcpu-windows-2022` — Blacksmith app not installed on the org - `windows-latest-8-cores` — GitHub paid larger-runner billing not enabled - `ubicloud-standard-2/8-windows` — Ubicloud doesn't offer Windows at all The free `windows-latest` runner (4 cores, ~60s spin-up, $0) is the one path that actually runs. The wave-coverage Windows tests are <30s of real work; total job time stays under 2 minutes. Cleaned up `.github/actionlint.yaml` to drop the bogus `ubicloud-standard-*-windows` entries — kept only the two real Linux labels. CHANGELOG: split the runner-pool row into Linux (migrated to Ubicloud-8) vs Windows (stays on free windows-latest), with the why on each. Itemized line for windows-free-tests rewritten to reflect the actual outcome. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(windows): skip Unix-only cases on Windows runner windows-free-tests on GitHub free windows-latest fails three cases that depend on Unix tooling the runner doesn't have: 1. `setup-windows-fallback.test.ts` behavior matrix — IS_WINDOWS=0 cells assert `ln -snf` produces a real symlink. On Windows-without-Developer- Mode (which the free `windows-latest` runner is), `ln -snf` silently creates a file copy. That's literally the bug `_link_or_copy` exists to work around, so the assertion can never pass there. Skip the whole describe block on win32. The static-invariant test (zero raw `ln` outside the helper body) above the matrix still runs and pins the shape the Windows install relies on. 2. `docs-config-keys.test.ts` round-trip — spawnSync(`bin/gstack-config`) on Windows doesn't read the bash shebang and fails to exec. Skip on win32; the deprecated-key denylist test in the same file still runs and is the actual invariant defending the v1.27.0.0 rename at the doc layer. Use `describe.skipIf(process.platform === 'win32', ...)` and `test.skipIf(process.platform === 'win32', ...)`. Tests still run on macOS and Linux unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
490 lines
22 KiB
Markdown
490 lines
22 KiB
Markdown
# Contributing to gstack
|
|
|
|
Thanks for wanting to make gstack better. Whether you're fixing a typo in a skill prompt or building an entirely new workflow, this guide will get you up and running fast.
|
|
|
|
## Quick start
|
|
|
|
gstack skills are Markdown files that Claude Code discovers from a `skills/` directory. Normally they live at `~/.claude/skills/gstack/` (your global install). But when you're developing gstack itself, you want Claude Code to use the skills *in your working tree* — so edits take effect instantly without copying or deploying anything.
|
|
|
|
That's what dev mode does. It symlinks your repo into the local `.claude/skills/` directory so Claude Code reads skills straight from your checkout.
|
|
|
|
```bash
|
|
git clone https://github.com/garrytan/gstack.git && cd gstack
|
|
bun install # install dependencies
|
|
bin/dev-setup # activate dev mode
|
|
```
|
|
|
|
> **Full clone vs shallow.** The README's user-facing install uses `--depth 1` for speed. As a contributor, use a full clone (no `--depth` flag) — you'll need history for `git log`, `git blame`, `git bisect`, and reviewing PRs against earlier versions. If you already have a `--depth 1` clone from following the README, promote it to a full clone with `git fetch --unshallow`.
|
|
|
|
Now edit any `SKILL.md`, invoke it in Claude Code (e.g. `/review`), and see your changes live. When you're done developing:
|
|
|
|
```bash
|
|
bin/dev-teardown # deactivate — back to your global install
|
|
```
|
|
|
|
## Operational self-improvement
|
|
|
|
gstack automatically learns from failures. At the end of every skill session, the agent
|
|
reflects on what went wrong (CLI errors, wrong approaches, project quirks) and logs
|
|
operational learnings to `~/.gstack/projects/{slug}/learnings.jsonl`. Future sessions
|
|
surface these learnings automatically, so gstack gets smarter on your codebase over time.
|
|
|
|
No setup needed. Learnings are logged automatically. View them with `/learn`.
|
|
|
|
### The contributor workflow
|
|
|
|
1. **Use gstack normally** — operational learnings are captured automatically
|
|
2. **Check your learnings:** `/learn` or `ls ~/.gstack/projects/*/learnings.jsonl`
|
|
3. **Fork and clone gstack** (if you haven't already)
|
|
4. **Symlink your fork into the project where you hit the bug:**
|
|
```bash
|
|
# In your core project (the one where gstack annoyed you)
|
|
ln -sfn /path/to/your/gstack-fork .claude/skills/gstack
|
|
cd .claude/skills/gstack && bun install && bun run build && ./setup
|
|
```
|
|
Setup creates per-skill directories with SKILL.md symlinks inside (`qa/SKILL.md -> gstack/qa/SKILL.md`)
|
|
and asks your prefix preference. Pass `--no-prefix` to skip the prompt and use short names.
|
|
5. **Fix the issue** — your changes are live immediately in this project
|
|
6. **Test by actually using gstack** — do the thing that annoyed you, verify it's fixed
|
|
7. **Open a PR from your fork**
|
|
|
|
This is the best way to contribute: fix gstack while doing your real work, in the
|
|
project where you actually felt the pain.
|
|
|
|
### Session awareness
|
|
|
|
When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all skills.
|
|
|
|
## Working on gstack inside the gstack repo
|
|
|
|
When you're editing gstack skills and want to test them by actually using gstack
|
|
in the same repo, `bin/dev-setup` wires this up. It creates `.claude/skills/`
|
|
symlinks (gitignored) pointing back to your working tree, so Claude Code uses
|
|
your local edits instead of the global install.
|
|
|
|
```
|
|
gstack/ <- your working tree
|
|
├── .claude/skills/ <- created by dev-setup (gitignored)
|
|
│ ├── gstack -> ../../ <- symlink back to repo root
|
|
│ ├── review/ <- real directory (short name, default)
|
|
│ │ └── SKILL.md -> gstack/review/SKILL.md
|
|
│ ├── ship/ <- or gstack-review/, gstack-ship/ if --prefix
|
|
│ │ └── SKILL.md -> gstack/ship/SKILL.md
|
|
│ └── ... <- one directory per skill
|
|
├── review/
|
|
│ └── SKILL.md <- edit this, test with /review
|
|
├── ship/
|
|
│ └── SKILL.md
|
|
├── browse/
|
|
│ ├── src/ <- TypeScript source
|
|
│ └── dist/ <- compiled binary (gitignored)
|
|
└── ...
|
|
```
|
|
|
|
Setup creates real directories (not symlinks) at the top level with a SKILL.md
|
|
symlink inside. This ensures Claude discovers them as top-level skills, not nested
|
|
under `gstack/`. Names depend on your prefix setting (`~/.gstack/config.yaml`).
|
|
Short names (`/review`, `/ship`) are the default. Run `./setup --prefix` if you
|
|
prefer namespaced names (`/gstack-review`, `/gstack-ship`).
|
|
|
|
## Day-to-day workflow
|
|
|
|
```bash
|
|
# 1. Enter dev mode
|
|
bin/dev-setup
|
|
|
|
# 2. Edit a skill
|
|
vim review/SKILL.md
|
|
|
|
# 3. Test it in Claude Code — changes are live
|
|
# > /review
|
|
|
|
# 4. Editing browse source? Rebuild the binary
|
|
bun run build
|
|
|
|
# 5. Done for the day? Tear down
|
|
bin/dev-teardown
|
|
```
|
|
|
|
## Testing & evals
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# 1. Copy .env.example and add your API key
|
|
cp .env.example .env
|
|
# Edit .env → set ANTHROPIC_API_KEY=sk-ant-...
|
|
|
|
# 2. Install deps (if you haven't already)
|
|
bun install
|
|
```
|
|
|
|
Bun auto-loads `.env` — no extra config. Conductor workspaces inherit `.env` from the main worktree automatically (see "Conductor workspaces" below).
|
|
|
|
### Test tiers
|
|
|
|
| Tier | Command | Cost | What it tests |
|
|
|------|---------|------|---------------|
|
|
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, TODOS-format.md refs, observability unit tests |
|
|
| 2 — E2E | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
|
|
| 3 — LLM eval | `bun run test:evals` | ~$0.15 standalone | LLM-as-judge scoring of generated SKILL.md docs |
|
|
| 2+3 | `bun run test:evals` | ~$4 combined | E2E + LLM-as-judge (runs both) |
|
|
|
|
```bash
|
|
bun test # Tier 1 only (runs on every commit, <5s)
|
|
bun run test:e2e # Tier 2: E2E only (needs EVALS=1, can't run inside Claude Code)
|
|
bun run test:evals # Tier 2 + 3 combined (~$4/run)
|
|
```
|
|
|
|
### Tier 1: Static validation (free)
|
|
|
|
Runs automatically with `bun test`. No API keys needed.
|
|
|
|
- **Skill parser tests** (`test/skill-parser.test.ts`) — Extracts every `$B` command from SKILL.md bash code blocks and validates against the command registry in `browse/src/commands.ts`. Catches typos, removed commands, and invalid snapshot flags.
|
|
- **Skill validation tests** (`test/skill-validation.test.ts`) — Validates that SKILL.md files reference only real commands and flags, and that command descriptions meet quality thresholds.
|
|
- **Generator tests** (`test/gen-skill-docs.test.ts`) — Tests the template system: verifies placeholders resolve correctly, output includes value hints for flags (e.g. `-d <N>` not just `-d`), enriched descriptions for key commands (e.g. `is` lists valid states, `press` lists key examples).
|
|
|
|
### Tier 2: E2E via `claude -p` (~$3.85/run)
|
|
|
|
Spawns `claude -p` as a subprocess with `--output-format stream-json --verbose`, streams NDJSON for real-time progress, and scans for browse errors. This is the closest thing to "does this skill actually work end-to-end?"
|
|
|
|
```bash
|
|
# Must run from a plain terminal — can't nest inside Claude Code or Conductor
|
|
EVALS=1 bun test test/skill-e2e-*.test.ts
|
|
```
|
|
|
|
- Gated by `EVALS=1` env var (prevents accidental expensive runs)
|
|
- Auto-skips if running inside Claude Code (`claude -p` can't nest)
|
|
- API connectivity pre-check — fails fast on ConnectionRefused before burning budget
|
|
- Real-time progress to stderr: `[Ns] turn T tool #C: Name(...)`
|
|
- Saves full NDJSON transcripts and failure JSON for debugging
|
|
- Tests live in `test/skill-e2e-*.test.ts` (split by category), runner logic in `test/helpers/session-runner.ts`
|
|
|
|
### E2E observability
|
|
|
|
When E2E tests run, they produce machine-readable artifacts in `~/.gstack-dev/`:
|
|
|
|
| Artifact | Path | Purpose |
|
|
|----------|------|---------|
|
|
| Heartbeat | `e2e-live.json` | Current test status (updated per tool call) |
|
|
| Partial results | `evals/_partial-e2e.json` | Completed tests (survives kills) |
|
|
| Progress log | `e2e-runs/{runId}/progress.log` | Append-only text log |
|
|
| NDJSON transcripts | `e2e-runs/{runId}/{test}.ndjson` | Raw `claude -p` output per test |
|
|
| Failure JSON | `e2e-runs/{runId}/{test}-failure.json` | Diagnostic data on failure |
|
|
|
|
**Live dashboard:** Run `bun run eval:watch` in a second terminal to see a live dashboard showing completed tests, the currently running test, and cost. Use `--tail` to also show the last 10 lines of progress.log.
|
|
|
|
**Eval history tools:**
|
|
|
|
```bash
|
|
bun run eval:list # list all eval runs (turns, duration, cost per run)
|
|
bun run eval:compare # compare two runs — shows per-test deltas + Takeaway commentary
|
|
bun run eval:summary # aggregate stats + per-test efficiency averages across runs
|
|
```
|
|
|
|
**Eval comparison commentary:** `eval:compare` generates natural-language Takeaway sections interpreting what changed between runs — flagging regressions, noting improvements, calling out efficiency gains (fewer turns, faster, cheaper), and producing an overall summary. This is driven by `generateCommentary()` in `eval-store.ts`.
|
|
|
|
Artifacts are never cleaned up — they accumulate in `~/.gstack-dev/` for post-mortem debugging and trend analysis.
|
|
|
|
### Tier 3: LLM-as-judge (~$0.15/run)
|
|
|
|
Uses Claude Sonnet to score generated SKILL.md docs on three dimensions:
|
|
|
|
- **Clarity** — Can an AI agent understand the instructions without ambiguity?
|
|
- **Completeness** — Are all commands, flags, and usage patterns documented?
|
|
- **Actionability** — Can the agent execute tasks using only the information in the doc?
|
|
|
|
Each dimension is scored 1-5. Threshold: every dimension must score **≥ 4**. There's also a regression test that compares generated docs against the hand-maintained baseline from `origin/main` — generated must score equal or higher.
|
|
|
|
```bash
|
|
# Needs ANTHROPIC_API_KEY in .env — included in bun run test:evals
|
|
```
|
|
|
|
- Uses `claude-sonnet-4-6` for scoring stability
|
|
- Tests live in `test/skill-llm-eval.test.ts`
|
|
- Calls the Anthropic API directly (not `claude -p`), so it works from anywhere including inside Claude Code
|
|
|
|
### CI
|
|
|
|
A GitHub Action (`.github/workflows/skill-docs.yml`) runs `bun run gen:skill-docs --dry-run` on every push and PR. If the generated SKILL.md files differ from what's committed, CI fails. This catches stale docs before they merge.
|
|
|
|
Tests run against the browse binary directly — they don't require dev mode.
|
|
|
|
## Editing SKILL.md files
|
|
|
|
SKILL.md files are **generated** from `.tmpl` templates. Don't edit the `.md` directly — your changes will be overwritten on the next build.
|
|
|
|
```bash
|
|
# 1. Edit the template
|
|
vim SKILL.md.tmpl # or browse/SKILL.md.tmpl
|
|
|
|
# 2. Regenerate for all hosts
|
|
bun run gen:skill-docs --host all
|
|
|
|
# 3. Check health (reports all hosts)
|
|
bun run skill:check
|
|
|
|
# Or use watch mode — auto-regenerates on save
|
|
bun run dev:skill
|
|
```
|
|
|
|
For template authoring best practices (natural language over bash-isms, dynamic branch detection, `{{BASE_BRANCH_DETECT}}` usage), see CLAUDE.md's "Writing SKILL templates" section.
|
|
|
|
To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild.
|
|
|
|
## Jargon list (V1 writing style)
|
|
|
|
gstack's Writing Style section (injected into every tier-≥2 skill's preamble)
|
|
glosses technical terms on first use per skill invocation. The list of terms
|
|
that qualify for glossing lives at `scripts/jargon-list.json` — ~50 curated
|
|
high-frequency terms (idempotent, race condition, N+1, backpressure, etc.).
|
|
Terms not on the list are assumed plain-English enough.
|
|
|
|
**Adding or removing a term:** open a PR editing `scripts/jargon-list.json`.
|
|
Run `bun run gen:skill-docs` after the edit — terms are baked into every
|
|
generated SKILL.md at gen time, so changes take effect only after regeneration.
|
|
No runtime loading; no user-side override. The repo list is the source of truth.
|
|
|
|
Good candidates for addition: high-frequency terms that non-technical users
|
|
encounter in review output without context (common database/concurrency
|
|
terminology, security jargon, frontend framework concepts). Don't add terms
|
|
that only appear in one or two niche skills — the cost-to-value trade isn't
|
|
worth the review overhead.
|
|
|
|
## Multi-host development
|
|
|
|
gstack generates SKILL.md files for 8 hosts from one set of `.tmpl` templates.
|
|
Each host is a typed config in `hosts/*.ts`. The generator reads these configs
|
|
to produce host-appropriate output (different frontmatter, paths, tool names).
|
|
|
|
**Supported hosts:** Claude (primary), Codex, Factory, Kiro, OpenCode, Slate, Cursor, OpenClaw.
|
|
|
|
### Generating for all hosts
|
|
|
|
```bash
|
|
# Generate for a specific host
|
|
bun run gen:skill-docs # Claude (default)
|
|
bun run gen:skill-docs --host codex # Codex
|
|
bun run gen:skill-docs --host opencode # OpenCode
|
|
bun run gen:skill-docs --host all # All 8 hosts
|
|
|
|
# Or use build, which does all hosts + compiles binaries
|
|
bun run build
|
|
```
|
|
|
|
### What changes between hosts
|
|
|
|
Each host config (`hosts/*.ts`) controls:
|
|
|
|
| Aspect | Example (Claude vs Codex) |
|
|
|--------|---------------------------|
|
|
| Output directory | `{skill}/SKILL.md` vs `.agents/skills/gstack-{skill}/SKILL.md` |
|
|
| Frontmatter | Full (name, description, hooks, version) vs minimal (name + description) |
|
|
| Paths | `~/.claude/skills/gstack` vs `$GSTACK_ROOT` |
|
|
| Tool names | "use the Bash tool" vs same (Factory rewrites to "run this command") |
|
|
| Hook skills | `hooks:` frontmatter vs inline safety advisory prose |
|
|
| Suppressed sections | None vs Codex self-invocation sections stripped |
|
|
|
|
See `scripts/host-config.ts` for the full `HostConfig` interface.
|
|
|
|
### Testing host output
|
|
|
|
```bash
|
|
# Run all static tests (includes parameterized smoke tests for all hosts)
|
|
bun test
|
|
|
|
# Check freshness for all hosts
|
|
bun run gen:skill-docs --host all --dry-run
|
|
|
|
# Health dashboard covers all hosts
|
|
bun run skill:check
|
|
```
|
|
|
|
### Adding a new host
|
|
|
|
See [docs/ADDING_A_HOST.md](docs/ADDING_A_HOST.md) for the full guide. Short version:
|
|
|
|
1. Create `hosts/myhost.ts` (copy from `hosts/opencode.ts`)
|
|
2. Add to `hosts/index.ts`
|
|
3. Add `.myhost/` to `.gitignore`
|
|
4. Run `bun run gen:skill-docs --host myhost`
|
|
5. Run `bun test` (parameterized tests auto-cover it)
|
|
|
|
Zero generator, setup, or tooling code changes needed.
|
|
|
|
### Adding a new skill
|
|
|
|
When you add a new skill template, all hosts get it automatically:
|
|
1. Create `{skill}/SKILL.md.tmpl`
|
|
2. Run `bun run gen:skill-docs --host all`
|
|
3. The dynamic template discovery picks it up, no static list to update
|
|
4. Commit `{skill}/SKILL.md`, external host output is generated at setup time and gitignored
|
|
|
|
## Conductor workspaces
|
|
|
|
If you're using [Conductor](https://conductor.build) to run multiple Claude Code sessions in parallel, `conductor.json` wires up workspace lifecycle automatically:
|
|
|
|
| Hook | Script | What it does |
|
|
|------|--------|-------------|
|
|
| `setup` | `bin/dev-setup` | Copies `.env` from main worktree, installs deps, symlinks skills |
|
|
| `archive` | `bin/dev-teardown` | Removes skill symlinks, cleans up `.claude/` directory |
|
|
|
|
When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It detects the main worktree (via `git worktree list`), copies your `.env` so API keys carry over, and sets up dev mode — no manual steps needed.
|
|
|
|
**First-time setup:** Put your `ANTHROPIC_API_KEY` in `.env` in the main repo (see `.env.example`). Every Conductor workspace inherits it automatically.
|
|
|
|
## Things to know
|
|
|
|
- **SKILL.md files are generated.** Edit the `.tmpl` template, not the `.md`. Run `bun run gen:skill-docs` to regenerate.
|
|
- **TODOS.md is the unified backlog.** Organized by skill/component with P0-P4 priorities. `/ship` auto-detects completed items. All planning/review/retro skills read it for context.
|
|
- **Browse source changes need a rebuild.** If you touch `browse/src/*.ts`, run `bun run build`.
|
|
- **Dev mode shadows your global install.** Project-local skills take priority over `~/.claude/skills/gstack`. `bin/dev-teardown` restores the global one.
|
|
- **Conductor workspaces are independent.** Each workspace is its own git worktree. `bin/dev-setup` runs automatically via `conductor.json`.
|
|
- **`.env` propagates across worktrees.** Set it once in the main repo, all Conductor workspaces get it.
|
|
- **`.claude/skills/` is gitignored.** The symlinks never get committed.
|
|
- **Never write raw `ln -snf` in `setup`.** Every link site in `setup` MUST route through the `_link_or_copy SRC DST` helper near the `IS_WINDOWS` detection. The helper preserves `ln -snf` on Unix and switches to `cp -R` / `cp -f` on Windows without Developer Mode, where plain `ln -snf` produces frozen file copies that don't refresh on `git pull`. `test/setup-windows-fallback.test.ts` enforces this with a static invariant — a single raw `ln` call outside the helper body fails CI.
|
|
|
|
## Testing your changes in a real project
|
|
|
|
**This is the recommended way to develop gstack.** Symlink your gstack checkout
|
|
into the project where you actually use it, so your changes are live while you
|
|
do real work.
|
|
|
|
### Step 1: Symlink your checkout
|
|
|
|
```bash
|
|
# In your core project (not the gstack repo)
|
|
ln -sfn /path/to/your/gstack-checkout .claude/skills/gstack
|
|
```
|
|
|
|
### Step 2: Run setup to create per-skill symlinks
|
|
|
|
The `gstack` symlink alone isn't enough. Claude Code discovers skills through
|
|
individual top-level directories (`qa/SKILL.md`, `ship/SKILL.md`, etc.), not through
|
|
the `gstack/` directory itself. Run `./setup` to create them:
|
|
|
|
```bash
|
|
cd .claude/skills/gstack && bun install && bun run build && ./setup
|
|
```
|
|
|
|
Setup will ask whether you want short names (`/qa`) or namespaced (`/gstack-qa`).
|
|
Your choice is saved to `~/.gstack/config.yaml` and remembered for future runs.
|
|
To skip the prompt, pass `--no-prefix` (short names) or `--prefix` (namespaced).
|
|
|
|
### Step 3: Develop
|
|
|
|
Edit a template, run `bun run gen:skill-docs`, and the next `/review` or `/qa`
|
|
call picks it up immediately. No restart needed.
|
|
|
|
### Going back to the stable global install
|
|
|
|
Remove the project-local symlink. Claude Code falls back to `~/.claude/skills/gstack/`:
|
|
|
|
```bash
|
|
rm .claude/skills/gstack
|
|
```
|
|
|
|
The per-skill directories (`qa/`, `ship/`, etc.) contain SKILL.md symlinks that point
|
|
to `gstack/...`, so they'll resolve to the global install automatically.
|
|
|
|
### Switching prefix mode
|
|
|
|
If you installed gstack with one prefix setting and want to switch:
|
|
|
|
```bash
|
|
cd .claude/skills/gstack && ./setup --no-prefix # switch to /qa, /ship
|
|
cd .claude/skills/gstack && ./setup --prefix # switch to /gstack-qa, /gstack-ship
|
|
```
|
|
|
|
Setup cleans up the old symlinks automatically. No manual cleanup needed.
|
|
|
|
### Alternative: point your global install at a branch
|
|
|
|
If you don't want per-project symlinks, you can switch the global install:
|
|
|
|
```bash
|
|
cd ~/.claude/skills/gstack
|
|
git fetch origin
|
|
git checkout origin/<branch>
|
|
bun install && bun run build && ./setup
|
|
```
|
|
|
|
This affects all projects. To revert: `git checkout main && git pull && bun run build && ./setup`.
|
|
|
|
## Community PR triage (wave process)
|
|
|
|
When community PRs accumulate, batch them into themed waves:
|
|
|
|
1. **Categorize** — group by theme (security, features, infra, docs)
|
|
2. **Deduplicate** — if two PRs fix the same thing, pick the one that
|
|
changes fewer lines. Close the other with a note pointing to the winner.
|
|
3. **Collector branch** — create `pr-wave-N`, merge clean PRs, resolve
|
|
conflicts for dirty ones, verify with `bun test && bun run build`
|
|
4. **Close with context** — every closed PR gets a comment explaining
|
|
why and what (if anything) supersedes it. Contributors did real work;
|
|
respect that with clear communication.
|
|
5. **Ship as one PR** — single PR to main with all attributions preserved
|
|
in merge commits. Include a summary table of what merged and what closed.
|
|
|
|
See [PR #205](../../pull/205) (v0.8.3) for the first wave as an example.
|
|
|
|
## Upgrade migrations
|
|
|
|
When a release changes on-disk state (directory structure, config format, stale
|
|
files) in ways that `./setup` alone can't fix, add a migration script so existing
|
|
users get a clean upgrade.
|
|
|
|
### When to add a migration
|
|
|
|
- Changed how skill directories are created (symlinks vs real dirs)
|
|
- Renamed or moved config keys in `~/.gstack/config.yaml`
|
|
- Need to delete orphaned files from a previous version
|
|
- Changed the format of `~/.gstack/` state files
|
|
|
|
Don't add a migration for: new features (users get them automatically), new
|
|
skills (setup discovers them), or code-only changes (no on-disk state).
|
|
|
|
### How to add one
|
|
|
|
1. Create `gstack-upgrade/migrations/v{VERSION}.sh` where `{VERSION}` matches
|
|
the VERSION file for the release that needs the fix.
|
|
2. Make it executable: `chmod +x gstack-upgrade/migrations/v{VERSION}.sh`
|
|
3. The script must be **idempotent** (safe to run multiple times) and
|
|
**non-fatal** (failures are logged but don't block the upgrade).
|
|
4. Include a comment block at the top explaining what changed, why the
|
|
migration is needed, and which users are affected.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# Migration: v0.15.2.0 — Fix skill directory structure
|
|
# Affected: users who installed with --no-prefix before v0.15.2.0
|
|
set -euo pipefail
|
|
SCRIPT_DIR="$(cd "$(dirname "$0")/../.." && pwd)"
|
|
"$SCRIPT_DIR/bin/gstack-relink" 2>/dev/null || true
|
|
```
|
|
|
|
### How it runs
|
|
|
|
During `/gstack-upgrade`, after `./setup` completes (Step 4.75), the upgrade
|
|
skill scans `gstack-upgrade/migrations/` and runs every `v*.sh` script whose
|
|
version is newer than the user's old version. Scripts run in version order.
|
|
Failures are logged but never block the upgrade.
|
|
|
|
### Testing migrations
|
|
|
|
Migrations are tested as part of `bun test` (tier 1, free). The test suite
|
|
verifies that all migration scripts in `gstack-upgrade/migrations/` are
|
|
executable and parse without syntax errors.
|
|
|
|
## Shipping your changes
|
|
|
|
When you're happy with your skill edits:
|
|
|
|
```bash
|
|
/ship
|
|
```
|
|
|
|
This runs tests, reviews the diff, triages Greptile comments (with 2-tier escalation), manages TODOS.md, bumps the version, and opens a PR. See `ship/SKILL.md` for the full workflow.
|