docs: v0.9.8.0 — deploy pipeline docs + pre-merge readiness gate (#306)

* docs: v0.9.8.0 — deploy pipeline + E2E performance + pre-merge gate CHANGELOG: added v0.9.8.0 entry covering /land-and-deploy, /canary, /benchmark, /setup-deploy, /review perf pass, E2E model pinning, and 3 test fixes. README: added 4 new skills to tables and install instructions, updated specialist/tool counts (18+7), added deploy pipeline to "What's new" section. /land-and-deploy: added Step 3.5 pre-merge readiness gate that checks review dashboard, E2E results, free tests, and doc-release status before merging. Uses AskUserQuestion for explicit confirmation. VERSION: 0.9.7.0 → 0.9.8.0 TODOS: updated deploy pipeline to Completed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: comprehensive pre-merge readiness gate in /land-and-deploy Step 3.5 now checks 5 dimensions before allowing merge: 1. Review staleness — compares review commit hash against HEAD, flags if significant code changes happened after last review 2. Tests — runs free tests inline, checks today's E2E and LLM eval results from ~/.gstack-dev/evals/ 3. PR body accuracy — compares PR description against actual commits, flags missing features or stale descriptions 4. Document-release — checks if CHANGELOG/VERSION were updated when new features are present in the diff 5. Full readiness report — ASCII dashboard with warnings/blockers, explicit AskUserQuestion confirmation required before merge Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-08 21:49:45 +08:00 · 2026-03-21 14:54:26 -07:00
parent 00bc482fe1
commit 2c0d4b39c7
8 changed files with 577 additions and 24 deletions
--- a/land-and-deploy/SKILL.md
+++ b/land-and-deploy/SKILL.md
@@ -293,11 +293,14 @@ When the user types `/land-and-deploy`, run this skill.
 - `/land-and-deploy #123` — specific PR number
 - `/land-and-deploy #123 <url>` — specific PR + verification URL

-## Non-interactive philosophy (like /ship)
+## Non-interactive philosophy (like /ship) — with one critical gate

-This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT.
+This is a **mostly automated** workflow. Do NOT ask for confirmation at any step except
+the ones listed below. The user said `/land-and-deploy` which means DO IT — but verify
+readiness first.

-**Only stop for:**
+**Always stop for:**
+- **Pre-merge readiness gate (Step 3.5)** — this is the ONE confirmation before merge
 - GitHub CLI not authenticated
 - No PR found for this branch
 - CI failures or merge conflicts
@@ -307,7 +310,6 @@ This is a **non-interactive, fully automated** workflow. Do NOT ask for confirma

 **Never stop for:**
 - Choosing merge method (auto-detect from repo settings)
- Confirming the merge
 - Timeout warnings (warn and continue gracefully)

 ---
@@ -372,6 +374,177 @@ If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate

 ---

+## Step 3.5: Pre-merge readiness gate
+
+**This is the critical safety check before an irreversible merge.** The merge cannot
+be undone without a revert commit. Gather ALL evidence, build a readiness report,
+and get explicit user confirmation before proceeding.
+
+Collect evidence for each check below. Track warnings (yellow) and blockers (red).
+
+### 3.5a: Review staleness check
+
+```bash
+~/.claude/skills/gstack/bin/gstack-review-read 2>/dev/null
+```
+
+Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
+plan-design-review, design-review-lite, codex-review):
+
+1. Find the most recent entry within the last 7 days.
+2. Extract its `commit` field.
+3. Compare against current HEAD: `git rev-list --count STORED_COMMIT..HEAD`
+
+**Staleness rules:**
+- 0 commits since review → CURRENT
+- 1-3 commits since review → RECENT (yellow if those commits touch code, not just docs)
+- 4+ commits since review → STALE (red — review may not reflect current code)
+- No review found → NOT RUN
+
+**Critical check:** Look at what changed AFTER the last review. Run:
+```bash
+git log --oneline STORED_COMMIT..HEAD
+```
+If any commits after the review contain words like "fix", "refactor", "rewrite",
+"overhaul", or touch more than 5 files — flag as **STALE (significant changes
+since review)**. The review was done on different code than what's about to merge.
+
+### 3.5b: Test results
+
+**Free tests — run them now:**
+
+Read CLAUDE.md to find the project's test command. If not specified, use `bun test`.
+Run the test command and capture the exit code and output.
+
+```bash
+bun test 2>&1 | tail -10
+```
+
+If tests fail: **BLOCKER.** Cannot merge with failing tests.
+
+**E2E tests — check recent results:**
+
+```bash
+ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
+```
+
+For each eval file from today, parse pass/fail counts. Show:
+- Total tests, pass count, fail count
+- How long ago the run finished (from file timestamp)
+- Total cost
+- Names of any failing tests
+
+If no E2E results from today: **WARNING — no E2E tests run today.**
+If E2E results exist but have failures: **WARNING — N tests failed.** List them.
+
+**LLM judge evals — check recent results:**
+
+```bash
+ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
+```
+
+If found, parse and show pass/fail. If not found, note "No LLM evals run today."
+
+### 3.5c: PR body accuracy check
+
+Read the current PR body:
+```bash
+gh pr view --json body -q .body
+```
+
+Read the current diff summary:
+```bash
+git log --oneline $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -20
+```
+
+Compare the PR body against the actual commits. Check for:
+1. **Missing features** — commits that add significant functionality not mentioned in the PR
+2. **Stale descriptions** — PR body mentions things that were later changed or reverted
+3. **Wrong version** — PR title or body references a version that doesn't match VERSION file
+
+If the PR body looks stale or incomplete: **WARNING — PR body may not reflect current
+changes.** List what's missing or stale.
+
+### 3.5d: Document-release check
+
+Check if documentation was updated on this branch:
+
+```bash
+git log --oneline --all-match --grep="docs:" $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)..HEAD | head -5
+```
+
+Also check if key doc files were modified:
+```bash
+git diff --name-only $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main)...HEAD -- README.md CHANGELOG.md ARCHITECTURE.md CONTRIBUTING.md CLAUDE.md VERSION
+```
+
+If CHANGELOG.md and VERSION were NOT modified on this branch and the diff includes
+new features (new files, new commands, new skills): **WARNING — /document-release
+likely not run. CHANGELOG and VERSION not updated despite new features.**
+
+If only docs changed (no code): skip this check.
+
+### 3.5e: Readiness report and confirmation
+
+Build the full readiness report:
+
+```
+╔══════════════════════════════════════════════════════════╗
+║              PRE-MERGE READINESS REPORT                  ║
+╠══════════════════════════════════════════════════════════╣
+║                                                          ║
+║  PR: #NNN — title                                        ║
+║  Branch: feature → main                                  ║
+║                                                          ║
+║  REVIEWS                                                 ║
+║  ├─ Eng Review:    CURRENT / STALE (N commits) / —       ║
+║  ├─ CEO Review:    CURRENT / — (optional)                ║
+║  ├─ Design Review: CURRENT / — (optional)                ║
+║  └─ Codex Review:  CURRENT / — (optional)                ║
+║                                                          ║
+║  TESTS                                                   ║
+║  ├─ Free tests:    PASS / FAIL (blocker)                 ║
+║  ├─ E2E tests:     52/52 pass (25 min ago) / NOT RUN     ║
+║  └─ LLM evals:     PASS / NOT RUN                        ║
+║                                                          ║
+║  DOCUMENTATION                                           ║
+║  ├─ CHANGELOG:     Updated / NOT UPDATED (warning)       ║
+║  ├─ VERSION:       0.9.8.0 / NOT BUMPED (warning)        ║
+║  └─ Doc release:   Run / NOT RUN (warning)               ║
+║                                                          ║
+║  PR BODY                                                 ║
+║  └─ Accuracy:      Current / STALE (warning)             ║
+║                                                          ║
+║  WARNINGS: N  |  BLOCKERS: N                             ║
+╚══════════════════════════════════════════════════════════╝
+```
+
+If there are BLOCKERS (failing free tests): list them and recommend B.
+If there are WARNINGS but no blockers: list each warning and recommend A if
+warnings are minor, or B if warnings are significant.
+If everything is green: recommend A.
+
+Use AskUserQuestion:
+
+- **Re-ground:** "About to merge PR #NNN (title) from branch X to Y. Here's the
+  readiness report." Show the report above.
+- List each warning and blocker explicitly.
+- **RECOMMENDATION:** Choose A if green. Choose B if there are significant warnings.
+  Choose C only if the user understands the risks.
+- A) Merge — readiness checks passed (Completeness: 10/10)
+- B) Don't merge yet — address the warnings first (Completeness: 10/10)
+- C) Merge anyway — I understand the risks (Completeness: 3/10)
+
+If the user chooses B: **STOP.** List exactly what needs to be done:
+- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
+- If E2E not run: "Run `bun run test:e2e` to verify."
+- If docs not updated: "Run /document-release to update documentation."
+- If PR body stale: "Update the PR body to reflect current changes."
+
+If the user chooses A or C: continue to Step 4.
+
+---
+
 ## Step 4: Merge the PR

 Record the start timestamp for timing data.