fix: resolve merge conflicts with origin/main (v0.6.0 + v0.6.0.1 + v0.6.1)

Merge main's test bootstrap, boil-the-lake completeness principle,
selective expansion, ship gate overrides, and gstack-upgrade vendor sync.

Conflicts resolved:
- CHANGELOG: keep main's 0.6.1/0.6.0.1/0.6.0/0.5.4/0.5.3 entries
- VERSION: take main's 0.6.1
- design-consultation: office-hours naming + main's "what's out there" phrasing
- ship: keep both verification rules (fresh evidence + coverage tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-17 14:37:22 -07:00
42 changed files with 3926 additions and 1054 deletions

View File

@@ -11,6 +11,7 @@ allowed-tools:
- Grep
- Glob
- AskUserQuestion
- WebSearch
---
{{PREAMBLE}}
@@ -39,6 +40,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)
- Test coverage gaps (auto-generate and commit, or flag in PR body)
---
@@ -54,10 +56,27 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
{{REVIEW_DASHBOARD}}
If the verdict is NOT "CLEARED TO SHIP (3/3)", use AskUserQuestion:
- Show which reviews are missing or have open issues
- RECOMMENDATION: Choose B (run missing reviews first) unless the change is trivial
- Options: A) Ship anyway B) Abort — run missing review(s) first C) Reviews not relevant for this change
If the Eng Review is NOT "CLEAR":
1. **Check for a prior override on this branch:**
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
```
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
2. **If no override exists,** use AskUserQuestion:
- Show that Eng Review is missing or has open issues
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
- Options: A) Ship anyway B) Abort — run /plan-eng-review first C) Change is too small to need eng review
- If CEO/Design reviews are missing, mention them as informational ("CEO Review not run — recommended for product changes") but do NOT block or recommend aborting for them
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
---
@@ -75,6 +94,12 @@ git fetch origin <base> && git merge origin/<base> --no-edit
---
## Step 2.5: Test Framework Bootstrap
{{TEST_BOOTSTRAP}}
---
## Step 3: Run tests (on merged code)
**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
@@ -159,6 +184,144 @@ If multiple suites need to run, run them sequentially (each needs a test lane).
---
## Step 3.4: Test Coverage Audit
100% coverage is the goal — every untested path is a path where bugs hide and vibe coding becomes yolo coding. Evaluate what was ACTUALLY coded (from the diff), not what was planned.
**0. Before/after test count:**
```bash
# Count test files before any generation
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
```
Store this number for the PR body.
**1. Trace every codepath changed** using `git diff origin/<base>...HEAD`:
Read every changed file. For each one, trace how data flows through the code — don't just list functions, actually follow the execution:
1. **Read the diff.** For each changed file, read the full file (not just the diff hunk) to understand context.
2. **Trace data flow.** Starting from each entry point (route handler, exported function, event listener, component render), follow the data through every branch:
- Where does input come from? (request params, props, database, API call)
- What transforms it? (validation, mapping, computation)
- Where does it go? (database write, API response, rendered output, side effect)
- What can go wrong at each step? (null/undefined, invalid input, network failure, empty collection)
3. **Diagram the execution.** For each changed file, draw an ASCII diagram showing:
- Every function/method that was added or modified
- Every conditional branch (if/else, switch, ternary, guard clause, early return)
- Every error path (try/catch, rescue, error boundary, fallback)
- Every call to another function (trace into it — does IT have untested branches?)
- Every edge: what happens with null input? Empty array? Invalid type?
This is the critical step — you're building a map of every line of code that can execute differently based on input. Every branch in this diagram needs a test.
**2. Map user flows, interactions, and error states:**
Code coverage isn't enough — you need to cover how real users interact with the changed code. For each changed feature, think through:
- **User flows:** What sequence of actions does a user take that touches this code? Map the full journey (e.g., "user clicks 'Pay' → form validates → API call → success/failure screen"). Each step in the journey needs a test.
- **Interaction edge cases:** What happens when the user does something unexpected?
- Double-click/rapid resubmit
- Navigate away mid-operation (back button, close tab, click another link)
- Submit with stale data (page sat open for 30 minutes, session expired)
- Slow connection (API takes 10 seconds — what does the user see?)
- Concurrent actions (two tabs, same form)
- **Error states the user can see:** For every error the code handles, what does the user actually experience?
- Is there a clear error message or a silent failure?
- Can the user recover (retry, go back, fix input) or are they stuck?
- What happens with no network? With a 500 from the API? With invalid data from the server?
- **Empty/zero/boundary states:** What does the UI show with zero results? With 10,000 results? With a single character input? With maximum-length input?
Add these to your diagram alongside the code branches. A user flow with no test is just as much a gap as an untested if/else.
**3. Check each branch against existing tests:**
Go through your diagram branch by branch — both code paths AND user flows. For each one, search for a test that exercises it:
- Function `processPayment()` → look for `billing.test.ts`, `billing.spec.ts`, `test/billing_test.rb`
- An if/else → look for tests covering BOTH the true AND false path
- An error handler → look for a test that triggers that specific error condition
- A call to `helperFn()` that has its own branches → those branches need tests too
- A user flow → look for an integration or E2E test that walks through the journey
- An interaction edge case → look for a test that simulates the unexpected action
Quality scoring rubric:
- ★★★ Tests behavior with edge cases AND error paths
- ★★ Tests correct behavior, happy path only
- ★ Smoke test / existence check / trivial assertion (e.g., "it renders", "it doesn't throw")
**4. Output ASCII coverage diagram:**
Include BOTH code paths and user flows in the same diagram:
```
CODE PATH COVERAGE
===========================
[+] src/services/billing.ts
├── processPayment()
│ ├── [★★★ TESTED] Happy path + card declined + timeout — billing.test.ts:42
│ ├── [GAP] Network timeout — NO TEST
│ └── [GAP] Invalid currency — NO TEST
└── refundPayment()
├── [★★ TESTED] Full refund — billing.test.ts:89
└── [★ TESTED] Partial refund (checks non-throw only) — billing.test.ts:101
USER FLOW COVERAGE
===========================
[+] Payment checkout flow
├── [★★★ TESTED] Complete purchase — checkout.e2e.ts:15
├── [GAP] Double-click submit — NO TEST
├── [GAP] Navigate away during payment — NO TEST
└── [★ TESTED] Form validation errors (checks render only) — checkout.test.ts:40
[+] Error states
├── [★★ TESTED] Card declined message — billing.test.ts:58
├── [GAP] Network timeout UX (what does user see?) — NO TEST
└── [GAP] Empty cart submission — NO TEST
─────────────────────────────────
COVERAGE: 5/12 paths tested (42%)
Code paths: 3/5 (60%)
User flows: 2/7 (29%)
QUALITY: ★★★: 2 ★★: 2 ★: 1
GAPS: 7 paths need tests
─────────────────────────────────
```
**Fast path:** All paths covered → "Step 3.4: All new code paths have test coverage ✓" Continue.
**5. Generate tests for uncovered paths:**
If test framework detected (or bootstrapped in Step 2.5):
- Prioritize error handlers and edge cases first (happy paths are more likely already tested)
- Read 2-3 existing test files to match conventions exactly
- Generate unit tests. Mock all external dependencies (DB, API, Redis).
- Write tests that exercise the specific uncovered path with real assertions
- Run each test. Passes → commit as `test: coverage for {feature}`
- Fails → fix once. Still fails → revert, note gap in diagram.
Caps: 30 code paths max, 20 tests generated max (code + user flow combined), 2-min per-test exploration cap.
If no test framework AND user declined bootstrap → diagram only, no generation. Note: "Test generation skipped — no test framework configured."
**Diff is test-only changes:** Skip Step 3.4 entirely: "No new application code paths to audit."
**6. After-count and coverage summary:**
```bash
# Count test files after generation
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' | grep -v node_modules | wc -l
```
For PR body: `Tests: {before} → {after} (+{delta} new)`
Coverage line: `Test Coverage Audit: N new code paths. M covered (X%). K tests generated, J committed.`
---
## Step 3.5: Pre-Landing Review
Review the diff for structural issues that tests don't catch.
@@ -409,6 +572,10 @@ gh pr create --base <base> --title "<type>: <summary>" --body "$(cat <<'EOF'
## Summary
<bullet points from CHANGELOG>
## Test Coverage
<coverage diagram from Step 3.4, or "All new code paths have test coverage.">
<If Step 3.4 ran: "Tests: {before} → {after} (+{delta} new)">
## Pre-Landing Review
<findings from Step 3.5, or "No issues found.">
@@ -451,4 +618,5 @@ EOF
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
- **Never push without fresh verification evidence.** If code changed after Step 3 tests, re-run before pushing.
- **Step 3.4 generates coverage tests.** They must pass before committing. Never commit failing tests.
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**