feat: contributor mode, session awareness, recommendation format (#90)

* feat: contributor mode, session awareness, universal RECOMMENDATION format - Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates - Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions) - ELI16 mode when 3+ concurrent sessions detected (re-ground user on context) - Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/ - Universal AskUserQuestion format: context → question → RECOMMENDATION → options - Update plan-ceo-review and plan-eng-review to reference preamble baseline - Add vendored symlink awareness section to CLAUDE.md - Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing - Add tests for contributor mode and session awareness in generated output - Add E2E eval for contributor mode report filing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add Enum & Value Completeness to /review critical checklist New CRITICAL review category that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the class of bugs where a new value is added but not handled in all switch/case chains, allowlists, or frontend-backend contracts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add CHANGELOG style guide — user-facing, sell the feature Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite v0.4.1 changelog to be user-facing and sell the features Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add evals for RECOMMENDATION format, session awareness, and enum completeness Free tests (Tier 1): RECOMMENDATION format + session awareness in all preamble SKILL.md files, enum completeness checklist structure and CRITICAL classification. E2E eval: /review catches missed enum handlers when a new status value is added but not handled in case/switch and notify methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add E2E eval for session awareness ELI16 mode Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments branch, verifies the output re-grounds the user with project, branch, context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: contributor mode eval marked FAIL due to expected browse error The test intentionally runs a nonexistent binary to trigger contributor mode. The session runner's browse error detection catches "no such file or directory...browse" and sets browseErrors, causing recordE2E to mark passed=false. Override passed to check only exitReason since the browse error is the expected scenario. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-21 03:40:00 +08:00 · 2026-03-16 01:45:50 -05:00
parent f3ee0ee28a
commit 3e3843c4a9
32 changed files with 1044 additions and 130 deletions
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -16,15 +16,63 @@ allowed-tools:
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->

-## Update Check (run first)
+## Preamble (run first)

 ```bash
 _UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
 [ -n "$_UPD" ] && echo "$_UPD" || true
+mkdir -p ~/.gstack/sessions
+touch ~/.gstack/sessions/"$PPID"
+_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
+find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
+_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
 ```

 If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.

+## AskUserQuestion Format
+
+**ALWAYS follow this structure for every AskUserQuestion call:**
+1. Context: project name, current branch, what we're working on (1-2 sentences)
+2. The specific question or decision point
+3. `RECOMMENDATION: Choose [X] because [one-line reason]`
+4. Lettered options: `A) ... B) ... C) ...`
+
+If `_SESSIONS` is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. **ELI16 mode** — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes.
+
+Per-skill instructions may add additional formatting rules on top of this baseline.
+
+## Contributor Mode
+
+If `_CONTRIB` is `true`: you are in **contributor mode**. When you hit friction with **gstack itself** (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."
+
+**gstack issues:** browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff.
+**NOT gstack issues:** user's app bugs, network errors to user's URL, auth failures on user's site.
+
+**To file:** write `~/.gstack/contributor-logs/{slug}.md` with this structure:
+
+```
+# {Title}
+
+Hey gstack team — ran into this while using /{skill-name}:
+
+**What I was trying to do:** {what the user/agent was attempting}
+**What happened instead:** {what actually happened}
+**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}
+
+## Steps to reproduce
+1. {step}
+
+## Raw output
+(wrap any error messages or unexpected output in a markdown code block)
+
+**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
+```
+
+Then run: `mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md`
+
+Slug: lowercase, hyphens, max 60 chars (e.g. `browse-snapshot-ref-gap`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
+
 # Pre-Landing PR Review

 You are running the `/review` workflow. Analyze the current branch's diff against main for structural issues that tests don't catch.
@@ -73,9 +121,11 @@ Run `git diff origin/main` to get the full diff. This includes both committed an

 Apply the checklist against the diff in two passes:

-1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
+1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness
 2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend

+**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient.
+
 Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section.

 ---
@@ -84,7 +134,7 @@ Follow the output format specified in the checklist. Respect the suppressions

 **Always output ALL findings** — both critical and informational. The user must see every issue.

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip).
+- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
 - If only non-critical issues found: output findings. No further action needed.
 - If no issues found: output `Pre-Landing Review: No issues found.`
--- a/review/SKILL.md.tmpl
+++ b/review/SKILL.md.tmpl
@@ -14,7 +14,7 @@ allowed-tools:
  - AskUserQuestion
 ---

-{{UPDATE_CHECK}}
+{{PREAMBLE}}

 # Pre-Landing PR Review

@@ -64,9 +64,11 @@ Run `git diff origin/main` to get the full diff. This includes both committed an

 Apply the checklist against the diff in two passes:

-1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
+1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness
 2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend

+**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient.
+
 Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section.

 ---
@@ -75,7 +77,7 @@ Follow the output format specified in the checklist. Respect the suppressions

 **Always output ALL findings** — both critical and informational. The user must see every issue.

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip).
+- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
  After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
 - If only non-critical issues found: output findings. No further action needed.
 - If no issues found: output `Pre-Landing Review: No issues found.`
--- a/review/checklist.md
+++ b/review/checklist.md
@@ -48,6 +48,13 @@ Be terse. For each issue: one line describing the problem, one line with the fix
 - LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting.
 - Structured tool output (arrays, hashes) accepted without type/shape checks before database writes.

+#### Enum & Value Completeness
+When the diff introduces a new enum value, status string, tier name, or type constant:
+- **Trace it through every consumer.** Read (don't just grep — READ) each file that switches on, filters by, or displays that value. If any consumer doesn't handle the new value, flag it. Common miss: adding a value to the frontend dropdown but the backend model/compute method doesn't persist it.
+- **Check allowlists/filter arrays.** Search for arrays or `%w[]` lists containing sibling values (e.g., if adding "revise" to tiers, find every `%w[quick lfg mega]` and verify "revise" is included where needed).
+- **Check `case`/`if-elsif` chains.** If existing code branches on the enum, does the new value fall through to a wrong default?
+To do this: use Grep to find all references to the sibling values (e.g., grep for "lfg" or "mega" to find all tier consumers). Read each match. This step requires reading code OUTSIDE the diff.
+
 ### Pass 2 — INFORMATIONAL

 #### Conditional Side Effects
@@ -101,8 +108,8 @@ Be terse. For each issue: one line describing the problem, one line with the fix
 CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
 ├─ SQL & Data Safety              ├─ Conditional Side Effects
 ├─ Race Conditions & Concurrency  ├─ Magic Numbers & String Coupling
-└─ LLM Output Trust Boundary      ├─ Dead Code & Consistency
-                                   ├─ LLM Prompt Issues
+├─ LLM Output Trust Boundary      ├─ Dead Code & Consistency
+└─ Enum & Value Completeness      ├─ LLM Prompt Issues
                                   ├─ Test Gaps
                                   ├─ Crypto & Entropy
                                   ├─ Time Window Safety