fix: community security + stability fixes (wave 1) (#325)

* feat: add /cso skill — OWASP Top 10 + STRIDE security audit

* fix: harden gstack-slug against shell injection via eval

Whitelist safe characters (a-zA-Z0-9._-) in SLUG and BRANCH output
to prevent shell metacharacter injection when used with eval.

Only affects self-hosted git servers with lax naming rules — GitHub
and GitLab enforce safe characters already. Defense-in-depth.

* fix(security): sanitize gstack-slug output against shell injection

The gstack-slug script is consumed via eval $(gstack-slug) throughout
skill templates. If a git remote URL contains shell metacharacters
like $(), backticks, or semicolons, they would be executed by eval.

Fix: strip all characters except [a-zA-Z0-9._-] from both SLUG and
BRANCH before output. This preserves normal values while neutralizing
any injection payload in malicious remote URLs.

Before: eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → executes rm
After:  eval $(gstack-slug) with remote "foo/bar$(rm -rf /)" → SLUG=foo-barrm-rf-

* fix(security): redact sensitive values in storage command output

The browse `storage` command dumps all localStorage and sessionStorage
as JSON. This can expose tokens, API keys, JWTs, and session credentials
in QA reports and agent transcripts.

Fix: redact values where the key matches sensitive patterns (token,
secret, key, password, auth, jwt, csrf) or the value starts with known
credential prefixes (eyJ for JWT, sk- for Stripe, ghp_ for GitHub, etc.).

Redacted values show length to aid debugging: [REDACTED — 128 chars]

* fix(browse): kill old server before restart to prevent orphaned chromium processes

When the health check fails or the server connection drops, `ensureServer()`
and `sendCommand()` would call `startServer()` without first killing the
previous server process. This left orphaned `chrome-headless-shell` renderer
processes running at ~120% CPU each.

After several reconnect cycles (e.g. pages that crash during hydration or
trigger hard navigations via `window.location.href`), dozens of zombie
chromium processes accumulate and exhaust system resources.

Fix: call `killServer()` on the stale PID before spawning a new server in
both the `ensureServer()` unhealthy path and the `sendCommand()` connection-
lost retry path.

Fixes #294

* Fix YAML linter error: nested mapping in compact sequence entries

Having "Run: bun" inside a plain scalar is not allowed per YAML spec which states: Plain scalars must never contain the “: ” and “ #” character combinations.

This simple fix switches to block scalars (|) to eliminate the ambiguity without changing runtime behavior.

* fix(security): add Azure metadata endpoint to SSRF blocklist

Add metadata.azure.internal to BLOCKED_METADATA_HOSTS alongside the
existing AWS/GCP endpoints. Closes the coverage gap identified in #125.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add coverage for storage redaction

Test key-based redaction (auth_token, api_key), value-based redaction
(JWT prefix, GitHub PAT prefix), pass-through for normal keys, and
length preservation in redacted output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add community PR triage process to CONTRIBUTING.md

Document the wave-based PR triage pattern used for batching community
contributions. References PR #205 (v0.8.3) as the original example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adjust test key names to avoid redaction pattern collision

Rename testKey→testData and normalKey→displayName in storage tests
to avoid triggering #238's SENSITIVE_KEY regex (which matches 'key').
Also generate Codex variant of /cso skill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.9.10.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)

Absorb Anthropic's security-review false positive filtering into /cso:
- 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only,
  regex injection, race conditions unless concrete, etc.)
- 9 precedents (React XSS-safe, env vars trusted, client-side code
  doesn't need auth, shell scripts need concrete untrusted input path)
- 8/10 confidence gate — below threshold = don't report
- Independent sub-agent verification for each finding
- Exploit scenario requirement per finding
- Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate CHANGELOG — merge /cso launch + community wave into v0.11.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite README — lead with Karpathy quote, cut LinkedIn phrases, add /cso

Opens with the revolution (Karpathy, Steinberger/OpenClaw), keeps credentials
and LOC numbers, cuts filler phrases, adds hater bait, restores hiring block,
removes bloated "What's new" section, adds /cso to skills table and install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cso): adversarial review fixes — FP filtering, prompt injection, language coverage

- Exclusion #10: test files must verify not imported by non-test code
- Exclusion #13: distinguish user-message AI input from system-prompt injection
- Exclusion #14: ReDoS in user-input regex IS a real CVE class, don't exclude
- Add anti-manipulation rule: ignore audit-influencing instructions in codebase
- Fix confidence gate: remove contradictory 7-8 tier, hard cutoff at 8
- Fix verifier anchoring: send only file+line, not category/description
- Add Go, PHP, Java, C#, Kotlin to grep patterns (was 4 languages, now 8)
- Add GraphQL, gRPC, WebSocket endpoint detection to attack surface mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): correct skill counts, add /autoplan to README tables

Skill count was wrong in 3 places (said 19+7=26, said 25, actual is 28).
Added /autoplan to specialist table. Fixed troubleshooting skills list
to include all skills added since v0.7.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): DNS rebinding protection for SSRF blocklist

validateNavigationUrl is now async — resolves hostname to IP and checks
against blocked metadata IPs. Prevents DNS rebinding where evil.com
initially resolves to a safe IP, then switches to 169.254.169.254.
All callers updated to await. Tests updated for async assertions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): lockfile prevents concurrent server start races

Adds exclusive lockfile (O_CREAT|O_EXCL) around ensureServer to prevent
TOCTOU race where two CLI invocations could both kill the old server and
start new ones, leaving an orphaned chromium process. Second caller now
waits for the first to finish starting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): improve storage redaction — word-boundary keys + more value prefixes

Key regex: use underscore/dot/hyphen boundaries instead of \b (which treats
_ as word char). Now correctly redacts auth_token, session_token while
skipping keyboardShortcuts, monkeyPatch, primaryKey.

Value regex: add AWS (AKIA), Stripe (sk_live_, pk_live_), Anthropic (sk-ant-),
Google (AIza), Sendgrid (SG.), Supabase (sbp_) prefixes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate all remaining eval callers to source, fix stale CHANGELOG claim

5 templates and 2 bin scripts still used eval $(gstack-slug). All now use
source <(gstack-slug). Updated gstack-slug comment to match. Fixed v0.8.3
CHANGELOG entry that falsely claimed eval was fully eliminated — it was
the output sanitization that made it safe, not a calling convention change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs): add /autoplan to install instructions, regen skill docs

The install instruction blocks and troubleshooting section were missing
/autoplan. All three skill list locations now include the complete 28-skill
set. Regenerated codex/agents SKILL.md files to match template changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.11.0.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cso): add disclaimer — not a substitute for professional security audits

LLMs can miss subtle vulns and produce false negatives. For production
systems with sensitive data, hire a real firm. /cso is a first pass,
not your only line of defense. Disclaimer appended to every report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: Tyrone Robb <tyrone.robb@icloud.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Orkun Duman <orkun1675@gmail.com>
This commit is contained in:
Garry Tan
2026-03-22 13:19:10 -07:00
committed by GitHub
parent d0300d4a9d
commit cf3582c637
32 changed files with 1920 additions and 139 deletions

111
README.md
View File

@@ -1,10 +1,12 @@
# gstack
Hi, I'm [Garry Tan](https://x.com/garrytan). I'm President & CEO of [Y Combinator](https://www.ycombinator.com/), where I've worked with thousands of startups including Coinbase, Instacart, and Rippling when the founders were just one or two people in a garage — companies now worth tens of billions of dollars. Before YC, I designed the Palantir logo and was one of the first eng manager/PM/designers there. I cofounded Posterous, a blog platform we sold to Twitter. I built Bookface, YC's internal social network, back in 2013. I've been building products as a designer, PM, and eng manager for a long time.
> "I don't think I've typed like a line of code probably since December, basically, which is an extremely large change." — [Andrej Karpathy](https://fortune.com/2026/03/21/andrej-karpathy-openai-cofounder-ai-agents-coding-state-of-psychosis-openclaw/), No Priors podcast, March 2026
And right now I am in the middle of something that feels like a new era entirely.
When I heard Karpathy say this, I wanted to find out how. How does one person ship like a team of twenty? Peter Steinberger built [OpenClaw](https://github.com/openclaw/openclaw) — 247K GitHub stars — essentially solo with AI agents. The revolution is here. A single builder with the right tooling can move faster than a traditional team.
In the last 60 days I have written **over 600,000 lines of production code** — 35% tests — and I am doing **10,000 to 20,000 usable lines of code per day** as a part-time part of my day while doing all my duties as CEO of YC. That is not a typo. My last `/retro` (developer stats from the last 7 days) across 3 projects: **140,751 lines added, 362 commits, ~115k net LOC**. The models are getting dramatically better every week. We are at the dawn of something real — one person shipping at a scale that used to require a team of twenty.
I'm [Garry Tan](https://x.com/garrytan), President & CEO of [Y Combinator](https://www.ycombinator.com/). I've worked with thousands of startups — Coinbase, Instacart, Rippling — when they were one or two people in a garage. Before YC, I was one of the first eng/PM/designers at Palantir, cofounded Posterous (sold to Twitter), and built Bookface, YC's internal social network.
**gstack is my answer.** I've been building products for twenty years, and right now I'm shipping more code than I ever have. In the last 60 days: **600,000+ lines of production code** (35% tests), **10,000-20,000 lines per day**, part-time, while running YC full-time. Here's my last `/retro` across 3 projects: **140,751 lines added, 362 commits, ~115k net LOC** in one week.
**2026 — 1,237 contributions and counting:**
@@ -16,31 +18,27 @@ In the last 60 days I have written **over 600,000 lines of production code** —
Same person. Different era. The difference is the tooling.
**gstack is how I do it.** It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Eighteen specialists and seven power tools, all as slash commands, all Markdown, **all free, MIT license, available right now.**
**gstack is how I do it.** It turns Claude Code into a virtual engineering team — a CEO who rethinks the product, an eng manager who locks architecture, a designer who catches AI slop, a reviewer who finds production bugs, a QA lead who opens a real browser, a security officer who runs OWASP + STRIDE audits, and a release engineer who ships the PR. Twenty specialists and eight power tools, all slash commands, all Markdown, all free, MIT license.
I am learning how to get to the edge of what agentic systems can do as of March 2026, and this is my live experiment. I am sharing it because I want the whole world on this journey with me.
This is my open source software factory. I use it every day. I'm sharing it because these tools should be available to everyone.
Fork it. Improve it. Make it yours. Don't player hate, appreciate.
Fork it. Improve it. Make it yours. And if you want to hate on free open source software — you're welcome to, but I'd rather you just try it first.
**Who this is for:**
- **Founders and CEOs** — especially technical ones who still want to ship. This is how you build like a team of twenty.
- **First-time Claude Code users** — gstack is the best way to start. Structured roles instead of a blank prompt.
- **Tech leads and staff engineers** — bring rigorous review, QA, and release automation to every PR
- **Founders and CEOs** — especially technical ones who still want to ship
- **First-time Claude Code users** — structured roles instead of a blank prompt
- **Tech leads and staff engineers** — rigorous review, QA, and release automation on every PR
## Quick start: your first 10 minutes
## Quick start
1. Install gstack (30 seconds — see below)
2. Run `/office-hours` — describe what you're building. It will reframe the problem before you write a line of code.
2. Run `/office-hours` — describe what you're building
3. Run `/plan-ceo-review` on any feature idea
4. Run `/review` on any branch with changes
5. Run `/qa` on your staging URL
6. Stop there. You'll know if this is for you.
Expect first useful run in under 5 minutes on any repo with tests already set up.
**If you only read one more section, read this one.**
## Install — takes 30 seconds
## Install — 30 seconds
**Requirements:** [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Git](https://git-scm.com/), [Bun](https://bun.sh/) v1.0+, [Node.js](https://nodejs.org/) (Windows only)
@@ -48,11 +46,11 @@ Expect first useful run in under 5 minutes on any repo with tests already set up
Open Claude Code and paste this. Claude does the rest.
> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.
> Install gstack: run **`git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.
### Step 2: Add to your repo so teammates get it (optional)
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.
@@ -72,7 +70,7 @@ git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host auto
```
This installs to `~/.claude/skills/gstack` and/or `~/.codex/skills/gstack` depending on what's available. All 25 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts.
This installs to `~/.claude/skills/gstack` and/or `~/.codex/skills/gstack` depending on what's available. All 28 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts.
## See it work
@@ -115,38 +113,38 @@ You: /ship
Tests: 42 → 51 (+9 new). PR: github.com/you/app/pull/42
```
You said "daily briefing app." The agent said "you're building a chief of staff AI" — because it listened to your pain, not your feature request. Then it challenged your premises, generated three approaches, recommended the narrowest wedge, and wrote a design doc that fed into every downstream skill. Eight commands. That is not a copilot. That is a team.
You said "daily briefing app." The agent said "you're building a chief of staff AI" — because it listened to your pain, not your feature request. Eight commands, end to end. That is not a copilot. That is a team.
## The sprint
gstack is a process, not a collection of tools. The skills are ordered the way a sprint runs:
gstack is a process, not a collection of tools. The skills run in the order a sprint runs:
**Think → Plan → Build → Review → Test → Ship → Reflect**
Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-ceo-review` reads. `/plan-eng-review` writes a test plan that `/qa` picks up. `/review` catches bugs that `/ship` verifies are fixed. Nothing falls through the cracks because every step knows what came before it.
One sprint, one person, one feature — that takes about 30 minutes with gstack. But here's what changes everything: you can run 10-15 of these sprints in parallel. Different features, different branches, different agents — all at the same time. That is how I ship 10,000+ lines of production code per day while doing my actual job.
| Skill | Your specialist | What they do |
|-------|----------------|--------------|
| `/office-hours` | **YC Office Hours** | Start here. Six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives. Design doc feeds into every downstream skill. |
| `/plan-ceo-review` | **CEO / Founder** | Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction. |
| `/plan-eng-review` | **Eng Manager** | Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open. |
| `/plan-design-review` | **Senior Designer** | Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice. |
| `/design-consultation` | **Design Partner** | Build a complete design system from scratch. Knows the landscape, proposes creative risks, generates realistic product mockups. Design at the heart of all other phases. |
| `/design-consultation` | **Design Partner** | Build a complete design system from scratch. Researches the landscape, proposes creative risks, generates realistic product mockups. |
| `/review` | **Staff Engineer** | Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps. |
| `/investigate` | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
| `/design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
| `/qa` | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
| `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. |
| `/ship` | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command. |
| `/land-and-deploy` | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. Takes over after `/ship`. One command from "approved" to "verified in production." |
| `/canary` | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. Periodic screenshots and anomaly detection. |
| `/benchmark` | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. Catch bundle size regressions before they ship. |
| `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Pure bug report without code changes. |
| `/cso` | **Chief Security Officer** | OWASP Top 10 + STRIDE threat model. Zero-noise: 17 false positive exclusions, 8/10+ confidence gate, independent finding verification. Each finding includes a concrete exploit scenario. |
| `/ship` | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. |
| `/land-and-deploy` | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. One command from "approved" to "verified in production." |
| `/canary` | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. |
| `/benchmark` | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. |
| `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
| `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. |
| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. |
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
| `/browse` | **QA Engineer** | Real Chromium browser, real clicks, real screenshots. ~100ms per command. |
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser into the headless session. Test authenticated pages. |
| `/autoplan` | **Review Pipeline** | One command, fully reviewed plan. Runs CEO → design → eng review automatically with encoded decision principles. Surfaces only taste decisions for your approval. |
### Power tools
@@ -162,51 +160,17 @@ One sprint, one person, one feature — that takes about 30 minutes with gstack.
**[Deep dives with examples and philosophy for every skill →](docs/skills.md)**
## What's new and why it matters
## Parallel sprints
**`/office-hours` reframes your product before you write code.** You say "daily briefing app." It listens to your actual pain, pushes back on the framing, tells you you're really building a personal chief of staff AI, challenges your premises, and generates three implementation approaches with effort estimates. The design doc it writes feeds directly into `/plan-ceo-review` and `/plan-eng-review` — so every downstream skill starts with real clarity instead of a vague feature request.
gstack works well with one sprint. It gets interesting with ten running at once.
**Design is at the heart.** `/design-consultation` doesn't just pick fonts. It researches what's out there in your space, proposes safe choices AND creative risks, generates realistic mockups of your actual product, and writes `DESIGN.md` — and then `/design-review` and `/plan-eng-review` read what you chose. Design decisions flow through the whole system.
**`/qa` was a massive unlock.** It let me go from 6 to 12 parallel workers. Claude Code saying *"I SEE THE ISSUE"* and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.
**Smart review routing.** Just like at a well-run startup: CEO doesn't have to look at infra bug fixes, design review isn't needed for backend changes. gstack tracks what reviews are run, figures out what's appropriate, and just does the smart thing. The Review Readiness Dashboard tells you where you stand before you ship.
**Test everything.** `/ship` bootstraps test frameworks from scratch if your project doesn't have one. Every `/ship` run produces a coverage audit. Every `/qa` bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding.
**Ship to production in one command.** `/land-and-deploy` picks up where `/ship` left off — merges your PR, waits for CI and deploy, then runs canary verification on your production URL. Auto-detects Fly.io, Render, Vercel, Netlify, Heroku, or GitHub Actions. If something breaks, it offers a revert. Pair with `/canary` for extended post-deploy monitoring and `/benchmark` to catch performance regressions before they ship.
**`/document-release` is the engineer you never had.** It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now `/ship` auto-invokes it — docs stay current without an extra command.
**Browser handoff when the AI gets stuck.** Hit a CAPTCHA, auth wall, or MFA prompt? `$B handoff` opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, `$B resume` picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures.
**Multi-AI second opinion.** `/codex` gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both `/review` (Claude) and `/codex` (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.
**Safety guardrails on demand.** Say "be careful" and `/careful` warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. `/freeze` locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. `/guard` activates both. `/investigate` auto-freezes to the module being investigated.
**Proactive skill suggestions.** gstack notices what stage you're in — brainstorming, reviewing, debugging, testing — and suggests the right skill. Don't like it? Say "stop suggesting" and it remembers across sessions.
## 10-15 parallel sprints
gstack is powerful with one sprint. It is transformative with ten running at once.
[Conductor](https://conductor.build) runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session running `/office-hours` on a new idea, another doing `/review` on a PR, a third implementing a feature, a fourth running `/qa` on staging, and six more on other branches. All at the same time. I regularly run 10-15 parallel sprints — that's the practical max right now.
The sprint structure is what makes parallelism work. Without a process, ten agents is ten sources of chaos. With a process — think, plan, build, review, test, ship — each agent knows exactly what to do and when to stop. You manage them the way a CEO manages a team: check in on the decisions that matter, let the rest run.
[Conductor](https://conductor.build) runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session on `/office-hours`, another on `/review`, a third implementing a feature, a fourth running `/qa`. All at the same time. The sprint structure is what makes parallelism work — without a process, ten agents is ten sources of chaos. With a process, each agent knows exactly what to do and when to stop.
---
## Come ride the wave
Free, MIT licensed, open source. No premium tier, no waitlist.
This is **free, MIT licensed, open source, available now.** No premium tier. No waitlist. No strings.
I open sourced how I do development and I am actively upgrading my own software factory here. You can fork it and make it your own. That's the whole point. I want everyone on this journey.
Same tools, different outcome — because gstack gives you structured roles and review gates, not generic agent chaos. That governance is the difference between shipping fast and shipping reckless.
The models are getting better fast. The people who figure out how to work with them now — really work with them, not just dabble — are going to have a massive advantage. This is that window. Let's go.
Eighteen specialists and seven power tools. All slash commands. All Markdown. All free. **[github.com/garrytan/gstack](https://github.com/garrytan/gstack)** — MIT License
I open sourced how I build software. You can fork it and make it your own.
> **We're hiring.** Want to ship 10K+ LOC/day and help harden gstack?
> Come work at YC — [ycombinator.com/software](https://ycombinator.com/software)
@@ -253,9 +217,10 @@ Data is stored in [Supabase](https://supabase.com) (open source Firebase alterna
## gstack
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review,
/setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful,
/freeze, /guard, /unfreeze, /gstack-upgrade.
/design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse,
/qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro,
/investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard,
/unfreeze, /gstack-upgrade.
```
## License