fix: review log architecture — close gaps, add attribution (v0.11.21.0) (#512)

* fix: review log architecture — close gaps, fix orphans, add attribution

- Ship Step 3.5 now logs its code review to the review log (via:"ship")
- Remove eng review gate — ship runs its own review in Step 3.5
- Dashboard Outside Voice row mapped to codex-plan-review
- Dashboard shows via source attribution (e.g., "via /autoplan")
- land-and-deploy checks all 8 review skill types (was 5)
- codex-review log gets commit field for staleness detection
- autoplan uses placeholder tokens instead of hardcoded "clean"
- Document autoplan-voices as audit-trail-only in review.ts
- E2E test for dashboard via attribution

* chore: bump version and changelog (v0.11.21.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-26 08:31:53 -06:00
committed by GitHub
parent 1bf888d75c
commit 997f7b1da6
16 changed files with 209 additions and 60 deletions

View File

@@ -1,5 +1,16 @@
# Changelog
## [0.11.21.0] - 2026-03-26
### Fixed
- **`/autoplan` reviews now count toward the ship readiness gate.** When `/autoplan` ran full CEO + Design + Eng reviews, `/ship` still showed "0 runs" for Eng Review because autoplan-logged entries weren't being read correctly. Now the dashboard shows source attribution (e.g., "CLEAR (PLAN via /autoplan)") so you can see exactly which tool satisfied each review.
- **`/ship` no longer tells you to "run /review first."** Ship runs its own pre-landing review in Step 3.5 — asking you to run the same review separately was redundant. The gate is removed; ship just does it.
- **`/land-and-deploy` now checks all 8 review types.** Previously missed `review`, `adversarial-review`, and `codex-plan-review` — if you only ran `/review` (not `/plan-eng-review`), land-and-deploy wouldn't see it.
- **Dashboard Outside Voice row now works.** Was showing "0 runs" even after outside voices ran in `/plan-ceo-review` or `/plan-eng-review`. Now correctly maps to `codex-plan-review` entries.
- **`/codex review` now tracks staleness.** Added the `commit` field to codex review log entries so the dashboard can detect when a codex review is outdated.
- **`/autoplan` no longer hardcodes "clean" status.** Review log entries from autoplan used to always record `status:"clean"` even when issues were found. Now uses proper placeholder tokens that Claude substitutes with real values.
## [0.11.20.0] - 2026-03-26
### Added

View File

@@ -1 +1 @@
0.11.20.0
0.11.21.0

View File

@@ -929,24 +929,24 @@ AskUserQuestion options:
## Completion: Write Review Logs
On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.
```bash
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
```
If Phase 2 ran (UI scope):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
```
Replace field values with actual counts from the review.
Dual voice logs (one per phase that ran):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'

View File

@@ -584,24 +584,24 @@ AskUserQuestion options:
## Completion: Write Review Logs
On approval, write 3 separate review log entries so /ship's dashboard recognizes them:
On approval, write 3 separate review log entries so /ship's dashboard recognizes them.
Replace TIMESTAMP, STATUS, and N with actual values from each review phase.
STATUS is "clean" if no unresolved issues, "issues_open" otherwise.
```bash
COMMIT=$(git rev-parse --short HEAD 2>/dev/null)
TIMESTAMP=$(date -u +%Y-%m-%dT%H:%M:%SZ)
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-eng-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"critical_gaps":N,"issues_found":N,"mode":"FULL_REVIEW","via":"autoplan","commit":"'"$COMMIT"'"}'
```
If Phase 2 ran (UI scope):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"clean","unresolved":0,"via":"autoplan","commit":"'"$COMMIT"'"}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"'"$TIMESTAMP"'","status":"STATUS","unresolved":N,"via":"autoplan","commit":"'"$COMMIT"'"}'
```
Replace field values with actual counts from the review.
Dual voice logs (one per phase that ran):
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"autoplan-voices","timestamp":"'"$TIMESTAMP"'","status":"STATUS","source":"SOURCE","phase":"ceo","via":"autoplan","consensus_confirmed":N,"consensus_disagree":N,"commit":"'"$COMMIT"'"}'

View File

@@ -423,7 +423,7 @@ CROSS-MODEL ANALYSIS:
7. Persist the review result:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
```
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),

View File

@@ -127,7 +127,7 @@ CROSS-MODEL ANALYSIS:
7. Persist the review result:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N}'
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"TIMESTAMP","status":"STATUS","gate":"GATE","findings":N,"findings_fixed":N,"commit":"'"$(git rev-parse --short HEAD)"'"}'
```
Substitute: TIMESTAMP (ISO 8601), STATUS ("clean" if PASS, "issues_found" if FAIL),

View File

@@ -447,7 +447,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
```
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
plan-design-review, design-review-lite, codex-review):
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
codex-plan-review):
1. Find the most recent entry within the last 7 days.
2. Extract its `commit` field.
@@ -594,7 +595,7 @@ Use AskUserQuestion:
- C) Merge anyway — I understand the risks (Completeness: 3/10)
If the user chooses B: **STOP.** List exactly what needs to be done:
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
- If E2E not run: "Run `bun run test:e2e` to verify."
- If docs not updated: "Run /document-release to update documentation."
- If PR body stale: "Update the PR body to reflect current changes."

View File

@@ -134,7 +134,8 @@ Collect evidence for each check below. Track warnings (yellow) and blockers (red
```
Parse the output. For each review skill (plan-eng-review, plan-ceo-review,
plan-design-review, design-review-lite, codex-review):
plan-design-review, design-review-lite, codex-review, review, adversarial-review,
codex-plan-review):
1. Find the most recent entry within the last 7 days.
2. Extract its `commit` field.
@@ -281,7 +282,7 @@ Use AskUserQuestion:
- C) Merge anyway — I understand the risks (Completeness: 3/10)
If the user chooses B: **STOP.** List exactly what needs to be done:
- If reviews are stale: "Re-run /plan-eng-review (or /review) to review current code."
- If reviews are stale: "Re-run `/plan-eng-review`, `/review`, or `/autoplan` to review current code."
- If E2E not run: "Run `bun run test:e2e` to verify."
- If docs not updated: "Run /document-release to update documentation."
- If PR body stale: "Update the PR body to reflect current changes."

View File

@@ -1262,7 +1262,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+

View File

@@ -768,7 +768,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+

View File

@@ -870,7 +870,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+

View File

@@ -9,7 +9,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
\`\`\`
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent \`codex-plan-review\` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \\\`"via"\\\` field, append it to the status label in parentheses. Examples: \`plan-eng-review\` with \`via:"autoplan"\` shows as "CLEAR (PLAN via /autoplan)". \`review\` with \`via:"ship"\` shows as "CLEAR (DIFF via /ship)". Entries without a \`via\` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: \`autoplan-voices\` and \`design-outside-voices\` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
\`\`\`
+====================================================================+

View File

@@ -364,7 +364,13 @@ After completing the review, read the review log and config to display the dashb
~/.claude/skills/gstack/bin/gstack-review-read
```
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between `review` (diff-scoped pre-landing review) and `plan-eng-review` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between `adversarial-review` (new auto-scaled) and `codex-review` (legacy). For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent `codex-plan-review` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
**Source attribution:** If the most recent entry for a skill has a \`"via"\` field, append it to the status label in parentheses. Examples: `plan-eng-review` with `via:"autoplan"` shows as "CLEAR (PLAN via /autoplan)". `review` with `via:"ship"` shows as "CLEAR (DIFF via /ship)". Entries without a `via` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
Note: `autoplan-voices` and `design-outside-voices` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
Display:
```
+====================================================================+
@@ -403,26 +409,15 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl
If the Eng Review is NOT "CLEAR":
1. **Check for a prior override on this branch:**
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
```
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."
2. **If no override exists,** use AskUserQuestion:
- Show that Eng Review is missing or has open issues
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
- Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
- For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.
---
@@ -1340,6 +1335,13 @@ Present Codex output under a `CODEX (design):` header, merged with the checklist
If no issues found: `Pre-Landing Review: No issues found.`
9. Persist the review result to the review log:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
```
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
Save the review output — it goes into the PR body in Step 8.
---

View File

@@ -64,26 +64,15 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
If the Eng Review is NOT "CLEAR":
1. **Check for a prior override on this branch:**
```bash
{{SLUG_EVAL}}
grep '"skill":"ship-review-override"' ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_OVERRIDE"
```
If an override exists, display the dashboard and note "Review gate previously accepted — continuing." Do NOT ask again.
Print: "No prior eng review found — ship will run its own pre-landing review in Step 3.5."
2. **If no override exists,** use AskUserQuestion:
- Show that Eng Review is missing or has open issues
- RECOMMENDATION: Choose C if the change is obviously trivial (< 20 lines, typo fix, config-only); Choose B for larger changes
- Options: A) Ship anyway B) Abort — run /review or /plan-eng-review first C) Change is too small to need eng review
- If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block
- For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Check diff size: `git diff <base>...HEAD --stat | tail -1`. If the diff is >200 lines, add: "Note: This is a large diff. Consider running `/plan-eng-review` or `/autoplan` for architecture-level review before shipping."
3. **If the user chooses A or C,** persist the decision so future `/ship` runs on this branch skip the gate:
```bash
{{SLUG_EVAL}}
echo '{"skill":"ship-review-override","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","decision":"USER_CHOICE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
```
Substitute USER_CHOICE with "ship_anyway" or "not_relevant".
If CEO Review is missing, mention as informational ("CEO Review not run — recommended for product changes") but do NOT block.
For Design Review: run `source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)`. If `SCOPE_FRONTEND=true` and no design review (plan-design-review or design-review-lite) exists in the dashboard, mention: "Design Review not run — this PR changes frontend code. The lite design check will run automatically in Step 3.5, but consider running /design-review for a full visual audit post-implementation." Still never block.
Continue to Step 1.5 — do NOT block or ask. Ship runs its own review in Step 3.5.
---
@@ -275,6 +264,13 @@ Review the diff for structural issues that tests don't catch.
If no issues found: `Pre-Landing Review: No issues found.`
9. Persist the review result to the review log:
```bash
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"'"$(git rev-parse --short HEAD)"'","via":"ship"}'
```
Substitute TIMESTAMP (ISO 8601), STATUS ("clean" if no issues, "issues_found" otherwise),
and N values from the summary counts above. The `via:"ship"` distinguishes from standalone `/review` runs.
Save the review output — it goes into the PR body in Step 8.
---

View File

@@ -79,6 +79,7 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
// Ship
'ship-base-branch': ['ship/**', 'bin/gstack-repo-mode'],
'ship-local-workflow': ['ship/**', 'scripts/gen-skill-docs.ts'],
'review-dashboard-via': ['ship/**', 'scripts/resolvers/review.ts', 'codex/**', 'autoplan/**', 'land-and-deploy/**'],
'ship-plan-completion': ['ship/**', 'scripts/gen-skill-docs.ts'],
'ship-plan-verification': ['ship/**', 'scripts/gen-skill-docs.ts'],

View File

@@ -529,6 +529,119 @@ Analyze the git history and produce the narrative report as described in the SKI
}, 420_000);
});
// --- Review Dashboard Via Attribution E2E ---
describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'], () => {
let dashDir: string;
beforeAll(() => {
dashDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-dashboard-via-'));
const run = (cmd: string, args: string[], cwd = dashDir) =>
spawnSync(cmd, args, { cwd, stdio: 'pipe', timeout: 5000 });
// Create git repo with feature branch
run('git', ['init', '-b', 'main']);
run('git', ['config', 'user.email', 'test@test.com']);
run('git', ['config', 'user.name', 'Test']);
fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v1");\n');
run('git', ['add', 'app.ts']);
run('git', ['commit', '-m', 'initial']);
run('git', ['checkout', '-b', 'feature/dashboard-test']);
fs.writeFileSync(path.join(dashDir, 'app.ts'), 'console.log("v2");\n');
run('git', ['add', 'app.ts']);
run('git', ['commit', '-m', 'feat: update']);
// Get HEAD commit for review entries
const headResult = spawnSync('git', ['rev-parse', '--short', 'HEAD'], { cwd: dashDir, stdio: 'pipe' });
const commit = headResult.stdout.toString().trim();
// Pre-populate review log with autoplan-sourced entries
// gstack-review-read reads from ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
// For the test, we'll write a mock gstack-review-read script that returns our test data
const timestamp = new Date().toISOString().replace(/\.\d{3}Z$/, 'Z');
const reviewData = [
`{"skill":"plan-eng-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"issues_found":0,"mode":"FULL_REVIEW","via":"autoplan","commit":"${commit}"}`,
`{"skill":"plan-ceo-review","timestamp":"${timestamp}","status":"clean","unresolved":0,"critical_gaps":0,"mode":"SELECTIVE_EXPANSION","via":"autoplan","commit":"${commit}"}`,
`{"skill":"codex-plan-review","timestamp":"${timestamp}","status":"clean","source":"codex","commit":"${commit}"}`,
].join('\n');
// Write a mock gstack-review-read that returns our test data
const mockBinDir = path.join(dashDir, '.mock-bin');
fs.mkdirSync(mockBinDir, { recursive: true });
fs.writeFileSync(path.join(mockBinDir, 'gstack-review-read'), [
'#!/usr/bin/env bash',
`echo '${reviewData.split('\n').join("'\necho '")}'`,
'echo "---CONFIG---"',
'echo "false"',
'echo "---HEAD---"',
`echo "${commit}"`,
].join('\n'));
fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);
// Copy ship skill
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md'));
});
afterAll(() => {
try { fs.rmSync(dashDir, { recursive: true, force: true }); } catch {}
});
testConcurrentIfSelected('review-dashboard-via', async () => {
const mockBinDir = path.join(dashDir, '.mock-bin');
const result = await runSkillTest({
prompt: `Read ship-SKILL.md. You only need to run the Review Readiness Dashboard section.
Instead of running ~/.claude/skills/gstack/bin/gstack-review-read, run this mock: ${mockBinDir}/gstack-review-read
Parse the output and display the dashboard table. Pay attention to:
1. The "via" field in entries — show source attribution (e.g., "via /autoplan")
2. The codex-plan-review entry — it should populate the Outside Voice row
3. Since Eng Review IS clear, there should be NO gate blocking — just display the dashboard
Skip the preamble, lake intro, telemetry, and all other ship steps.
Write the dashboard output to ${dashDir}/dashboard-output.md`,
workingDirectory: dashDir,
maxTurns: 12,
timeout: 90_000,
testName: 'review-dashboard-via',
runId,
});
logCost('/ship dashboard-via', result);
recordE2E(evalCollector, '/ship review dashboard via attribution', 'Dashboard via field', result);
expect(result.exitReason).toBe('success');
// Check dashboard output for via attribution
const dashPath = path.join(dashDir, 'dashboard-output.md');
const allOutput = [
result.output || '',
...result.toolCalls.map(tc => tc.output || ''),
].join('\n').toLowerCase();
// Verify via attribution appears somewhere (conversation or file)
let dashContent = '';
if (fs.existsSync(dashPath)) {
dashContent = fs.readFileSync(dashPath, 'utf-8').toLowerCase();
}
const combined = allOutput + dashContent;
// Should mention autoplan attribution
expect(combined).toMatch(/autoplan/);
// Should show eng review as CLEAR (it has a clean entry)
expect(combined).toMatch(/clear/i);
// Should NOT contain AskUserQuestion gate (no blocking)
const gateQuestions = result.toolCalls.filter(tc =>
tc.tool === 'mcp__conductor__AskUserQuestion' ||
(tc.tool === 'AskUserQuestion')
);
// Ship dashboard should not gate when eng review is clear
expect(gateQuestions).toHaveLength(0);
}, 120_000);
});
// Module-level afterAll — finalize eval collector after all tests complete
afterAll(async () => {
await finalizeEvalCollector(evalCollector);