fix: resolve merge conflicts with origin/main (v0.5.0 + v0.4.5)

Merge brings in design review skills (plan-design-review, qa-design-review, design-consultation), fix-first review mode (v0.4.5), and browse JS/click fixes. Conflicts resolved in VERSION (keep 0.6.0), CHANGELOG (keep all entries), TODOS (update design review to shipped), and test arrays (combine all skill registrations). Regenerated all SKILL.md files so design skills get the improved preamble (ELI16, escalation protocol, branch detection). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 12:18:24 +08:00 · 2026-03-16 19:59:27 -07:00
parent 9a55efe7c7 4a77cc2c34
commit 95e4430e63
25 changed files with 3766 additions and 75 deletions
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -216,14 +216,53 @@ Follow the output format specified in the checklist. Respect the suppressions

 ---

-## Step 5: Output findings
+## Step 5: Fix-First Review

-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
-  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+   Fix: Add `WHERE status = 'draft'` to the UPDATE
+   → A) Fix  B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+   Fix: Add JSON schema validation
+   → A) Fix  B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.

 ### Verification of claims

@@ -243,7 +282,7 @@ After outputting your own findings, if Greptile comments were classified in Step

 Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.

-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
   - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -292,7 +331,7 @@ If no documentation files exist, skip this step silently.
 ## Important Rules

 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
 - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
--- a/review/SKILL.md.tmpl
+++ b/review/SKILL.md.tmpl
@@ -109,14 +109,53 @@ Follow the output format specified in the checklist. Respect the suppressions

 ---

-## Step 5: Output findings
+## Step 5: Fix-First Review

-**Always output ALL findings** — both critical and informational. The user must see every issue.
+**Every finding gets action — not just critical ones.**

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
-  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
+Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+### Step 5a: Classify each finding
+
+For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
+checklist.md. Critical findings lean toward ASK; informational findings lean
+toward AUTO-FIX.
+
+### Step 5b: Auto-fix all AUTO-FIX items
+
+Apply each fix directly. For each one, output a one-line summary:
+`[AUTO-FIXED] [file:line] Problem → what you did`
+
+### Step 5c: Batch-ask about ASK items
+
+If there are ASK items remaining, present them in ONE AskUserQuestion:
+
+- List each item with a number, the severity label, the problem, and a recommended fix
+- For each item, provide options: A) Fix as recommended, B) Skip
+- Include an overall RECOMMENDATION
+
+Example format:
+```
+I auto-fixed 5 issues. 2 need your input:
+
+1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
+   Fix: Add `WHERE status = 'draft'` to the UPDATE
+   → A) Fix  B) Skip
+
+2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
+   Fix: Add JSON schema validation
+   → A) Fix  B) Skip
+
+RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
+```
+
+If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.
+
+### Step 5d: Apply user-approved fixes
+
+Apply fixes for items where the user chose "Fix." Output what was fixed.
+
+If no ASK items exist (everything was AUTO-FIX), skip the question entirely.

 ### Verification of claims

@@ -136,7 +175,7 @@ After outputting your own findings, if Greptile comments were classified in Step

 Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.

-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
+1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
   - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -185,7 +224,7 @@ If no documentation files exist, skip this step silently.
 ## Important Rules

 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
 - **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
--- a/review/checklist.md
+++ b/review/checklist.md
@@ -5,21 +5,23 @@
 Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.

 **Two-pass review:**
- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
+- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. Highest severity.
+- **Pass 2 (INFORMATIONAL):** Run all remaining categories. Lower severity but still actioned.
+
+All findings get action via Fix-First Review: obvious mechanical fixes are applied automatically,
+genuinely ambiguous issues are batched into a single user question.

 **Output format:**

 ```
 Pre-Landing Review: N issues (X critical, Y informational)

-**CRITICAL** (blocking /ship):
- [file:line] Problem description
-  Fix: suggested fix
+**AUTO-FIXED:**
+- [file:line] Problem → fix applied

-**Issues** (non-blocking):
+**NEEDS INPUT:**
 - [file:line] Problem description
-  Fix: suggested fix
+  Recommended fix: suggested fix
 ```

 If no issues found: `Pre-Landing Review: No issues found.`
@@ -102,10 +104,10 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo

 ---

-## Gate Classification
+## Severity Classification

 ```
-CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
+CRITICAL (highest severity):      INFORMATIONAL (lower severity):
 ├─ SQL & Data Safety              ├─ Conditional Side Effects
 ├─ Race Conditions & Concurrency  ├─ Magic Numbers & String Coupling
 ├─ LLM Output Trust Boundary      ├─ Dead Code & Consistency
@@ -115,10 +117,41 @@ CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
                                   ├─ Time Window Safety
                                   ├─ Type Coercion at Boundaries
                                   └─ View/Frontend
+
+All findings are actioned via Fix-First Review. Severity determines
+presentation order and classification of AUTO-FIX vs ASK — critical
+findings lean toward ASK (they're riskier), informational findings
+lean toward AUTO-FIX (they're more mechanical).
 ```

 ---

+## Fix-First Heuristic
+
+This heuristic is referenced by both `/review` and `/ship`. It determines whether
+the agent auto-fixes a finding or asks the user.
+
+```
+AUTO-FIX (agent fixes without asking):     ASK (needs human judgment):
+├─ Dead code / unused variables            ├─ Security (auth, XSS, injection)
+├─ N+1 queries (missing .includes())      ├─ Race conditions
+├─ Stale comments contradicting code       ├─ Design decisions
+├─ Magic numbers → named constants         ├─ Large fixes (>20 lines)
+├─ Missing LLM output validation           ├─ Enum completeness
+├─ Version/path mismatches                 ├─ Removing functionality
+├─ Variables assigned but never read       └─ Anything changing user-visible
+└─ Inline styles, O(n*m) view lookups        behavior
+```
+
+**Rule of thumb:** If the fix is mechanical and a senior engineer would apply it
+without discussion, it's AUTO-FIX. If reasonable engineers could disagree about
+the fix, it's ASK.
+
+**Critical findings default toward ASK** (they're inherently riskier).
+**Informational findings default toward AUTO-FIX** (they're more mechanical).
+
+---
+
 ## Suppressions — DO NOT flag these

 - "X is redundant with Y" when the redundancy is harmless and aids readability (e.g., `present?` redundant with `length > 20`)