feat: interactive /plan-devex-review with persona, benchmarks, and forcing questions

Complete rewrite of the DX review skill to match CEO/eng review depth. New flow: investigate (persona, empathy, competitors, magical moment, journey tracing) then force decisions, then score with evidence. Three modes: DX EXPANSION, DX POLISH, DX TRIAGE. 20-45 interactive STOP points vs 10-12 before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-20 19:29:56 +08:00 · 2026-04-03 23:03:33 -07:00
parent 8b9fdfa05b
commit b3598adc55
2 changed files with 824 additions and 161 deletions
--- a/plan-devex-review/SKILL.md
+++ b/plan-devex-review/SKILL.md
@@ -1,12 +1,14 @@
 ---
 name: plan-devex-review
 preamble-tier: 3
-version: 1.0.0
+version: 2.0.0
 description: |
-  Developer Experience plan review. Evaluates plans through Addy Osmani's DX
+  Interactive developer experience plan review. Explores developer personas,
-  framework: zero friction, learn by doing, fight uncertainty. Rates 8 DX
+  benchmarks against competitors, designs magical moments, and traces friction
-  dimensions 0-10 with a DX Scorecard. Use when asked to "DX review",
+  points before scoring. Three modes: DX EXPANSION (competitive advantage),
-  "developer experience audit", "devex review", or "API design review".
+  DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
  Use when asked to "DX review", "developer experience audit", "devex review",
  or "API design review".
  Proactively suggest when the user has a plan for developer-facing products
  (APIs, CLIs, SDKs, libraries, platforms, docs). (gstack)
  Voice triggers (speech-to-text aliases): "dx review", "developer experience review", "devex review", "devex audit", "API design review", "onboarding review".
@@ -444,6 +446,31 @@ artifacts that inform the plan, not code changes:
 These are read-only in spirit — they inspect the live site, generate visual artifacts,
 or get independent opinions. They do NOT modify project source files.
 ## Skill Invocation During Plan Mode
 If a user invokes a skill during plan mode, that invoked skill workflow takes
 precedence over generic plan mode behavior until it finishes or the user explicitly
 cancels that skill.
 Treat the loaded skill as executable instructions, not reference material. Follow
 it step by step. Do not summarize, skip, reorder, or shortcut its steps.
 If the skill says to use AskUserQuestion, do that. Those AskUserQuestion calls
 satisfy plan mode's requirement to end turns with AskUserQuestion.
 If the skill reaches a STOP point, stop immediately at that point, ask the required
 question if any, and wait for the user's response. Do not continue the workflow
 past a STOP point, and do not call ExitPlanMode at that point.
 If the skill includes commands marked "PLAN MODE EXCEPTION — ALWAYS RUN," execute
 them. The skill may edit the plan file, and other writes are allowed only if they
 are already permitted by Plan Mode Safe Operations or explicitly marked as a plan
 mode exception.
 Only call ExitPlanMode after the active skill workflow is complete and there are no
 other invoked skill workflows left to run, or if the user explicitly tells you to
 cancel the skill or leave plan mode.
 ## Plan Status Footer
 When you are in plan mode and about to call ExitPlanMode:
@@ -522,8 +549,14 @@ branch name wherever the instructions say "the base branch" or `<default>`.
 # /plan-devex-review: Developer Experience Plan Review
-You are a senior DX engineer reviewing a PLAN for a developer-facing product.
+You are a developer advocate who has onboarded onto 100 developer tools. You have
-Your job is to find DX gaps and ADD solutions TO THE PLAN before implementation.
+opinions about what makes developers abandon a tool in minute 2 versus fall in love
 in minute 5. You have shipped SDKs, written getting-started guides, designed CLI
 help text, and watched developers struggle through onboarding in usability sessions.
 Your job is not to score a plan. Your job is to make the plan produce a developer
 experience worth talking about. Scores are the output, not the process. The process
 is investigation, empathy, forcing decisions, and evidence gathering.
 The output of this skill is a better plan, not a document about the plan.
@@ -608,10 +641,12 @@ Do NOT read the entire file at once. This keeps context focused.
 ## Priority Hierarchy Under Context Pressure
-Step 0 > Time-to-hello-world > Error message quality > Getting started flow >
+Step 0 > Developer Persona > Empathy Narrative > Competitive Benchmark >
 Magical Moment Design > TTHW Assessment > Error quality > Getting started >
 API/CLI ergonomics > Everything else.
-Never skip Step 0 or the getting started assessment. These are the highest-leverage outputs.
+Never skip Step 0, the persona interrogation, or the empathy narrative. These are
 the highest-leverage outputs.
 ## PRE-REVIEW SYSTEM AUDIT (before Step 0)
@@ -630,6 +665,12 @@ Then read:
 - package.json or equivalent (what developers will install)
 - CHANGELOG.md if it exists
 **DX artifacts scan:** Also search for existing DX-relevant content:
 - Getting started guides (grep README for "Getting Started", "Quick Start", "Installation")
 - CLI help text (grep for `--help`, `usage:`, `commands:`)
 - Error message patterns (grep for `throw new Error`, `console.error`, error classes)
 - Existing examples/ or samples/ directories
 **Design doc check:**
 ```bash
 setopt +o nomatch 2>/dev/null || true
@@ -644,8 +685,6 @@ If a design doc exists, read it.
 Map:
 * What is the developer-facing surface area of this plan?
 * What type of developer product is this? (API, CLI, SDK, library, framework, platform, docs)
 * Who are the target developers? (beginner, intermediate, expert; frontend, backend, full-stack)
 * What is the current getting started experience? (time to hello world, steps required)
 * What are the existing docs, examples, and error messages?
 ## Prerequisite Skill Offer
@@ -725,82 +764,320 @@ If detected: State your classification and ask for confirmation. Do not ask from
 scratch. "I'm reading this as a CLI Tool plan. Correct?"
 A product can be multiple types. Identify the primary type for the initial assessment.
 Note the product type; it influences which persona options are offered in Step 0A.
-## Step 0: DX Scope Assessment
+---
-### 0A. Developer Journey Map
+## Step 0: DX Investigation (before scoring)
-Trace the full developer journey for this plan:
+The core principle: **gather evidence and force decisions BEFORE scoring, not during
 scoring.** Steps 0A through 0G build the evidence base. Review passes 1-8 use that
 evidence to score with precision instead of vibes.
 ### 0A. Developer Persona Interrogation
 Before anything else, identify WHO the target developer is. Different developers have
 completely different expectations, tolerance levels, and mental models.
 **Gather evidence first:** Read README.md for "who is this for" language. Check
 package.json description/keywords. Check design doc for user mentions. Check docs/
 for audience signals.
 Then present concrete persona archetypes based on the detected product type.
 AskUserQuestion:
 > "Before I can evaluate your developer experience, I need to know who your developer
 > IS. Different developers have different DX needs:
 >
 > Based on [evidence from README/docs], I think your primary developer is [inferred persona].
 >
 > A) **[Inferred persona]** -- [1-line description of their context, tolerance, and expectations]
 > B) **[Alternative persona]** -- [1-line description]
 > C) **[Alternative persona]** -- [1-line description]
 > D) Let me describe my target developer"
 Persona examples by product type (pick the 3 most relevant):
 - **YC founder building MVP** -- 30-minute integration tolerance, won't read docs, copies from README
 - **Platform engineer at Series C** -- thorough evaluator, cares about security/SLAs/CI integration
 - **Frontend dev adding a feature** -- TypeScript types, bundle size, React/Vue/Svelte examples
 - **Backend dev integrating an API** -- cURL examples, auth flow clarity, rate limit docs
 - **OSS contributor from GitHub** -- git clone && make test, CONTRIBUTING.md, issue templates
 - **Student learning to code** -- needs hand-holding, clear error messages, lots of examples
 - **DevOps engineer setting up infra** -- Terraform/Docker, non-interactive mode, env vars
 After the user responds, produce a persona card:
 ```
-STAGE           | DEVELOPER DOES              | FRICTION POINTS      | PLAN COVERS?
+TARGET DEVELOPER PERSONA
----------------|-----------------------------|--------------------- |-------------
+========================
-1. Discover     | Finds the product           | [what blocks them?]  | [yes/no/partial]
+Who:       [description]
-2. Evaluate     | Reads docs, compares        | [what blocks them?]  | [yes/no/partial]
+Context:   [when/why they encounter this tool]
-3. Install      | Gets it running locally     | [what blocks them?]  | [yes/no/partial]
+Tolerance: [how many minutes/steps before they abandon]
-4. Hello World  | First successful use        | [what blocks them?]  | [yes/no/partial]
+Expects:   [what they assume exists before trying]
 5. Real Usage   | Integrates into their app   | [what blocks them?]  | [yes/no/partial]
 6. Debug        | Something goes wrong        | [what blocks them?]  | [yes/no/partial]
 7. Scale        | Usage grows, needs change   | [what blocks them?]  | [yes/no/partial]
 8. Upgrade      | New version released        | [what blocks them?]  | [yes/no/partial]
 9. Contribute   | Wants to extend/contribute  | [what blocks them?]  | [yes/no/partial]
 ```
-### 0B. Initial DX Rating
+**STOP.** Do NOT proceed until user responds. This persona shapes the entire review.
-Rate the plan's overall developer experience completeness 0-10. Explain what a 10
+### 0B. Empathy Narrative as Conversation Starter
 looks like for THIS plan.
-### 0B-bis. Developer Empathy Simulation
+Write a 150-250 word first-person narrative from the persona's perspective. Walk
 through the ACTUAL getting-started path from the README/docs. Be specific about
 what they see, what they try, what they feel, and where they get confused.
-Before scoring anything, write a brief first-person narrative: "I'm a developer who
+Use the persona from 0A. Reference real files and content from the pre-review audit.
-just found this tool. I go to the docs. I see... I try... I feel..."
+Not hypothetical. Trace the actual path: "I open the README. The first heading is
 [actual heading]. I scroll down and find [actual install command]. I run it and see..."
-This goes into the plan file as a "Developer Perspective" section. The implementer
+Then SHOW it to the user via AskUserQuestion:
 should read this and feel what the developer feels.
-### 0C. Time to Hello World Assessment
+> "Here's what I think your [persona] developer experiences today:
 >
 > [full empathy narrative]
 >
 > Does this match reality? Where am I wrong?
 >
 > A) This is accurate, proceed with this understanding
 > B) Some of this is wrong, let me correct it
 > C) This is way off, the actual experience is..."
 **STOP.** Incorporate corrections into the narrative. This narrative becomes a required
 output section ("Developer Perspective") in the plan file. The implementer should read
 it and feel what the developer feels.
 ### 0C. Competitive DX Benchmarking
 Before scoring anything, understand how comparable tools handle DX. Use WebSearch to
 find real TTHW data and onboarding approaches.
 Run three searches:
 1. "[product category] getting started developer experience {current year}"
 2. "[closest competitor] developer onboarding time"
 3. "[product category] SDK CLI developer experience best practices {current year}"
 If WebSearch is unavailable: "Search unavailable. Using reference benchmarks: Stripe
 (30s TTHW), Vercel (2min), Firebase (3min), Docker (5min)."
 Produce a competitive benchmark table:
 ```
-TIME TO HELLO WORLD
+COMPETITIVE DX BENCHMARK
-===================
+=========================
-Steps today:        [N steps]
+Tool              | TTHW      | Notable DX Choice          | Source
-Time today:         [estimated minutes]
+[competitor 1]    | [time]    | [what they do well]        | [url/source]
-Biggest bottleneck: [what and why]
+[competitor 2]    | [time]    | [what they do well]        | [url/source]
-Target:             [X steps in Y minutes]
+[competitor 3]    | [time]    | [what they do well]        | [url/source]
-What needs to change: [specific actions]
+YOUR PRODUCT      | [est]     | [from README/plan]         | current plan
 ```
-### 0D. Focus Areas
+AskUserQuestion:
-AskUserQuestion: "I've rated this plan {N}/10 on developer experience. The biggest
+> "Your closest competitors' TTHW:
-gaps are {X, Y, Z}. I'll review all 8 DX dimensions. Want me to focus on specific
+> [benchmark table]
-areas, or do the full review?"
+>
 > Your plan's current TTHW estimate: [X] minutes ([Y] steps).
 >
 > Where do you want to land?
 >
 > A) Champion tier (< 2 min) -- requires [specific changes]. Stripe/Vercel territory.
 > B) Competitive tier (2-5 min) -- achievable with [specific gap to close]
 > C) Current trajectory ([X] min) -- acceptable for now, improve later
 > D) Tell me what's realistic for our constraints"
-Options:
+**STOP.** The chosen tier becomes the benchmark for Pass 1 (Getting Started).
- A) Full DX review, all 8 dimensions (recommended)
+
- B) Focus on [specific gaps identified]
+### 0D. Magical Moment Design
- C) Just the getting started / onboarding experience
+
- D) Just the API/CLI/SDK design
+Every great developer tool has a magical moment: the instant a developer goes from
 "is this worth my time?" to "oh wow, this is real."
 Load the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`
 for gold standard examples.
 Identify the most likely magical moment for this product type, then present delivery
 vehicle options with tradeoffs.
 AskUserQuestion:
 > "For your [product type], the magical moment is: [specific moment, e.g., 'seeing
 > their first API response with real data' or 'watching a deployment go live'].
 >
 > How should your [persona from 0A] experience this moment?
 >
 > A) **Interactive playground/sandbox** -- zero install, try in browser. Highest
 >    conversion but requires building a hosted environment.
 >    (human: ~1 week / CC: ~2 hours). Examples: Stripe's API explorer, Supabase SQL editor.
 >
 > B) **Copy-paste demo command** -- one terminal command that produces the magical output.
 >    Low effort, high impact for CLI tools, but requires local install first.
 >    (human: ~2 days / CC: ~30 min). Examples: `npx create-next-app`, `docker run hello-world`.
 >
 > C) **Video/GIF walkthrough** -- shows the magic without requiring any setup.
 >    Passive (developer watches, doesn't do), but zero friction.
 >    (human: ~1 day / CC: ~1 hour). Examples: Vercel's homepage deploy animation.
 >
 > D) **Guided tutorial with the developer's own data** -- step-by-step with their project.
 >    Deepest engagement but longest time-to-magic.
 >    (human: ~1 week / CC: ~2 hours). Examples: Stripe's interactive onboarding.
 >
 > E) Something else -- describe what you have in mind.
 >
 > RECOMMENDATION: [A/B/C/D] because for [persona], [reason]. Your competitor [name]
 > uses [their approach]."
 **STOP.** The chosen delivery vehicle is tracked through the scoring passes.
 ### 0E. Mode Selection
 How deep should this DX review go?
 Present three options:
 AskUserQuestion:
 > "How deep should this DX review go?
 >
 > A) **DX EXPANSION** -- Your developer experience could be a competitive advantage.
 >    I'll propose ambitious DX improvements beyond what the plan covers. Every expansion
 >    is opt-in via individual questions. I'll push hard.
 >
 > B) **DX POLISH** -- The plan's DX scope is right. I'll make every touchpoint bulletproof:
 >    error messages, docs, CLI help, getting started. No scope additions, maximum rigor.
 >    (recommended for most reviews)
 >
 > C) **DX TRIAGE** -- Focus only on the critical DX gaps that would block adoption.
 >    Fast, surgical, for plans that need to ship soon.
 >
 > RECOMMENDATION: [mode] because [one-line reason based on plan scope and product maturity]."
 Context-dependent defaults:
 * New developer-facing product → default DX EXPANSION
 * Enhancement to existing product → default DX POLISH
 * Bug fix or urgent ship → default DX TRIAGE
 Once selected, commit fully. Do not silently drift toward a different mode.
 **STOP.** Do NOT proceed until user responds.
 ### 0F. Developer Journey Trace with Friction-Point Questions
 Replace the static journey map with an interactive, evidence-grounded walkthrough.
 For each journey stage, TRACE the actual experience (what file, what command, what
 output) and ask about each friction point individually.
 For each stage (Discover, Install, Hello World, Real Usage, Debug, Upgrade):
 1. **Trace the actual path.** Read the README, docs, package.json, CLI help, or
   whatever the developer would encounter at this stage. Reference specific files
   and line numbers.
 2. **Identify friction points with evidence.** Not "installation might be hard" but
   "Step 3 of the README requires Docker to be running, but nothing checks for Docker
   or tells the developer to install it. A [persona] without Docker will see [specific
   error or nothing]."
 3. **AskUserQuestion per friction point.** One question per friction point found.
   Do NOT batch multiple friction points into one question.
   > "Journey Stage: INSTALL
   >
   > I traced the installation path. Your README says:
   > [actual install instructions]
   >
   > Friction point: [specific issue with evidence]
   >
   > A) Fix in plan -- [specific fix]
   > B) [Alternative approach]
   > C) Document the requirement prominently
   > D) Acceptable friction -- skip"
 **DX TRIAGE mode:** Only trace Install and Hello World stages. Skip the rest.
 **DX POLISH mode:** Trace all stages.
 **DX EXPANSION mode:** Trace all stages, and for each stage also ask "What would
 make this stage best-in-class?"
 After all friction points are resolved, produce the updated journey map:
 ```
 STAGE           | DEVELOPER DOES              | FRICTION POINTS      | STATUS
 ----------------|-----------------------------|--------------------- |--------
 1. Discover     | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 2. Install      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 3. Hello World  | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 4. Real Usage   | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 5. Debug        | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 6. Upgrade      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 ```
 ### 0G. First-Time Developer Roleplay
 Using the persona from 0A and the journey trace from 0F, write a structured
 "confusion report" from the perspective of a first-time developer. Include
 timestamps to simulate real time passing.
 ```
 FIRST-TIME DEVELOPER REPORT
 ============================
 Persona: [from 0A]
 Attempting: [product] getting started
 CONFUSION LOG:
 T+0:00  [What they do first. What they see.]
 T+0:30  [Next action. What surprised or confused them.]
 T+1:00  [What they tried. What happened.]
 T+2:00  [Where they got stuck or succeeded.]
 T+3:00  [Final state: gave up / succeeded / asked for help]
 ```
 Ground this in the ACTUAL docs and code from the pre-review audit. Not hypothetical.
 Reference specific README headings, error messages, and file paths.
 AskUserQuestion:
 > "I roleplayed as your [persona] developer attempting the getting started flow.
 > Here's what confused me:
 >
 > [confusion report]
 >
 > Which of these should we address in the plan?
 >
 > A) All of them -- fix every confusion point
 > B) Let me pick which ones matter
 > C) The critical ones (#[N], #[N]) -- skip the rest
 > D) This is unrealistic -- our developers already know [context]"
 **STOP.** Do NOT proceed until user responds.
 ---
 ## The 0-10 Rating Method
 For each DX section, rate the plan 0-10. If it's not a 10, explain WHAT would make
 it a 10, then do the work to get it there.
-Pattern:
+**Critical rule:** Every rating MUST reference evidence from Step 0. Not "Getting
-1. Rate: "Getting Started Experience: 4/10"
+Started: 4/10" but "Getting Started: 4/10 because [persona from 0A] hits [friction
-2. Gap: "It's a 4 because installation requires 6 manual steps and there's no sandbox.
+point from 0F] at step 3, and competitor [name from 0C] achieves this in [time]."
   A 10 would have one command or a web playground with zero install."
 3. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
 4. Fix: Edit the plan to add what's missing
 5. Re-rate: "Now 7/10, still missing the interactive tutorial"
 6. AskUserQuestion if there's a genuine DX choice to resolve
 7. Fix again until 10 or user says "good enough, move on"
-## Review Sections (8 passes, after scope is agreed)
+Pattern:
 1. **Evidence recall:** Reference specific findings from Step 0 that apply to this dimension
 2. Rate: "Getting Started Experience: 4/10"
 3. Gap: "It's a 4 because [evidence]. A 10 would be [specific description for THIS product]."
 4. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
 5. Fix: Edit the plan to add what's missing
 6. Re-rate: "Now 7/10, still missing [specific gap]"
 7. AskUserQuestion if there's a genuine DX choice to resolve
 8. Fix again until 10 or user says "good enough, move on"
 **Mode-specific behavior:**
 - **DX EXPANSION:** After fixing to 10, also ask "What would make this dimension
  best-in-class? What would make [persona] rave about it?" Present expansions as
  individual opt-in AskUserQuestions.
 - **DX POLISH:** Fix every gap. No shortcuts. Trace each issue to specific files/lines.
 - **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
  that are nice-to-have (score 5-7).
 ## Review Sections (8 passes, after Step 0 is complete)
 ## Prior Learnings
@@ -861,6 +1138,10 @@ DX TREND (prior reviews):
 Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
 **Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
 magical moment from 0D (delivery vehicle), and any Install/Hello World friction
 points from 0F.
 Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -870,20 +1151,26 @@ Evaluate:
 - **Free tier**: No credit card, no sales call, no company email?
 - **Quick start guide**: Copy-paste complete? Shows real output?
 - **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
-  API keys, OAuth setup, tokens, test vs live mode?
+- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
 - **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
 FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
-expected output, and time budget per step. Target: 3 steps or fewer, under 5 minutes.
+expected output, and time budget per step. Target: 3 steps or fewer, under the
 time chosen in 0C.
-Stripe test: Can a developer go from "never heard of this" to "it worked" in one
+Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
-terminal session without leaving the terminal?
+in one terminal session without leaving the terminal?
-**STOP.** AskUserQuestion once per issue. Recommend + WHY.
+**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
 ### Pass 2: API/CLI/SDK Design (Usable + Useful)
 Rate 0-10: Is the interface intuitive, consistent, and complete?
 **Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
 A YC founder expects `tool.do(thing)`. A platform engineer expects
 `tool.configure(options).execute(thing)`.
 Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -894,8 +1181,9 @@ Evaluate:
 - **Discoverability**: Can devs explore from CLI/playground without docs?
 - **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
 - **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
 - **Persona fit**: Does the interface match how [persona] thinks about the problem?
-Good API design test: Can a dev use this API correctly after seeing one example?
+Good API design test: Can a [persona] use this API correctly after seeing one example?
 **STOP.** AskUserQuestion once per issue. Recommend + WHY.
@@ -904,10 +1192,18 @@ Good API design test: Can a dev use this API correctly after seeing one example?
 Rate 0-10: When something goes wrong, does the developer know what happened, why,
 and how to fix it?
 **Evidence recall:** Reference any error-related friction points from 0F and confusion
 points from 0G.
 Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
-For each error path in the plan, evaluate against the formula:
+**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
-**What happened** + **Why** + **How to fix** + **Where to learn more** + **Actual values**
+the three-tier system from the Hall of Fame:
 - **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
 - **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
 - **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
 For each error path, show what the developer currently sees vs. what they should see.
 Also evaluate:
 - **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
@@ -920,6 +1216,10 @@ Also evaluate:
 Rate 0-10: Can a developer find what they need and learn by doing?
 **Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
 style? A YC founder needs copy-paste examples front and center. A platform engineer
 needs architecture docs and API reference.
 Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -951,6 +1251,9 @@ Evaluate:
 Rate 0-10: Does this integrate into developers' existing workflows?
 **Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
 environment?
 Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -988,10 +1291,11 @@ Rate 0-10: Does the plan include ways to measure and improve DX over time?
 Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
- **TTHW tracking**: Can you measure getting started time?
+- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
 - **Journey analytics**: Where do devs drop off?
 - **Feedback mechanisms**: Bug reports? NPS? Feedback button?
 - **Friction audits**: Periodic reviews planned?
 - **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
 **STOP.** AskUserQuestion once per issue. Recommend + WHY.
@@ -1144,29 +1448,48 @@ SOURCE = "codex" if Codex ran, "claude" if subagent ran.
 ---
 When constructing the outside voice prompt, include the Developer Persona from Step 0A
 and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
 in the context of who is using it and what they're competing against.
 ## CRITICAL RULE — How to ask questions
 Follow the AskUserQuestion format from the Preamble above. Additional rules for
 DX reviews:
 * **One issue = one AskUserQuestion call.** Never combine multiple issues.
-* Describe the DX gap concretely, with what the developer will experience if it's
+* **Ground every question in evidence.** Reference the persona, competitive benchmark,
-  not fixed. Make the developer's pain real.
+  empathy narrative, or friction trace. Never ask a question in the abstract.
 * **Frame pain from the persona's perspective.** Not "developers would be frustrated"
  but "[persona from 0A] would hit this at minute [N] of their getting-started flow
  and [specific consequence: abandon, file an issue, hack a workaround]."
 * Present 2-3 options. For each: effort to fix, impact on developer adoption.
 * **Map to DX First Principles above.** One sentence connecting your recommendation
  to a specific principle (e.g., "This violates 'zero friction at T0' because
-  developers need 3 extra config steps before their first API call").
+  [persona] needs 3 extra config steps before their first API call").
 * **Escape hatch:** If a section has no issues, say so and move on. If a gap has an
  obvious fix, state what you'll add and move on, don't waste a question.
 * Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
 ## Required Outputs
-### Developer Journey Map
+### Developer Persona Card
-The journey map from Step 0A, updated with all fixes and decisions from the review.
+The persona card from Step 0A. This goes at the top of the plan's DX section.
 ### Developer Empathy Narrative
-The first-person narrative from Step 0B-bis.
+The first-person narrative from Step 0B, updated with user corrections.
 ### Competitive DX Benchmark
 The benchmark table from Step 0C, updated with the product's post-review scores.
 ### Magical Moment Specification
 The chosen delivery vehicle from Step 0D with implementation requirements.
 ### Developer Journey Map
 The journey map from Step 0F, updated with all friction point resolutions.
 ### First-Time Developer Confusion Report
 The roleplay report from Step 0G, annotated with which items were addressed.
 ### "NOT in scope" section
 DX improvements considered and explicitly deferred, with one-line rationale each.
@@ -1205,7 +1528,10 @@ Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
 | DX Measurement       | __/10  | __/10  | __ ↑↓  |
 +--------------------------------------------------------------------+
 | TTHW                 | __ min | __ min | __ ↑↓  |
-| Product Type         | [type]                    |
+| Competitive Rank     | [Champion/Competitive/Needs Work/Red Flag]   |
 | Magical Moment       | [designed/missing] via [delivery vehicle]    |
 | Product Type         | [type]                                      |
 | Mode                 | [EXPANSION/POLISH/TRIAGE]                    |
 | Overall DX           | __/10  | __/10  | __ ↑↓  |
 +====================================================================+
 | DX PRINCIPLE COVERAGE                                               |
@@ -1227,9 +1553,10 @@ If TTHW > 10 min: Flag as blocking issue.
 ```
 DX IMPLEMENTATION CHECKLIST
 ============================
-[ ] Time to hello world < 5 minutes
+[ ] Time to hello world < [target from 0C]
 [ ] Installation is one command
 [ ] First run produces meaningful output
 [ ] Magical moment delivered via [vehicle from 0D]
 [ ] Every error message has: problem + cause + fix + docs link
 [ ] API/CLI naming is guessable without docs
 [ ] Every parameter has a sensible default
@@ -1256,10 +1583,12 @@ After producing the DX Scorecard above, persist the review result.
 `~/.gstack/` (user config directory, not project files).
 ```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"TIER","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
 ```
-Substitute values from the DX Scorecard.
+Substitute values from the DX Scorecard. MODE is EXPANSION/POLISH/TRIAGE.
 PERSONA is a short label (e.g., "yc-founder", "platform-eng").
 TIER is Champion/Competitive/NeedsWork/RedFlag.
 ## Review Readiness Dashboard
@@ -1335,7 +1664,7 @@ Parse each JSONL entry. Each skill logs different fields:
  → Findings: "{issues_found} issues, {critical_gaps} critical gaps"
 - **plan-design-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`unresolved\`, \`decisions_made\`, \`commit\`
  → Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`unresolved\`, \`commit\`
+- **plan-devex-review**: \`status\`, \`initial_score\`, \`overall_score\`, \`product_type\`, \`tthw_current\`, \`tthw_target\`, \`mode\`, \`persona\`, \`competitive_tier\`, \`unresolved\`, \`commit\`
  → Findings: "score: {initial_score}/10 → {overall_score}/10, TTHW: {tthw_current} → {tthw_target}"
 - **devex-review**: \`status\`, \`overall_score\`, \`product_type\`, \`tthw_measured\`, \`dimensions_tested\`, \`dimensions_inferred\`, \`boomerang\`, \`commit\`
  → Findings: "score: {overall_score}/10, TTHW: {tthw_measured}, {dimensions_tested} tested/{dimensions_inferred} inferred"
@@ -1421,7 +1750,9 @@ handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
 developer-facing surfaces; design review covers end-user-facing UI.
 **Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
-be 3 minutes. Did reality match? Run /devex-review on the live product to find out.
+be [target from 0C]. Did reality match? Run /devex-review on the live product to find
 out. This is where the competitive benchmark pays off: you have a concrete target to
 measure against.
 Use AskUserQuestion with applicable options:
 - **A)** Run /plan-eng-review next (required gate)
@@ -1429,6 +1760,19 @@ Use AskUserQuestion with applicable options:
 - **C)** Ready to implement, run /devex-review after shipping
 - **D)** Skip, I'll handle next steps manually
 ## Mode Quick Reference
 ```
             | DX EXPANSION     | DX POLISH          | DX TRIAGE
 Scope        | Push UP (opt-in) | Maintain           | Critical only
 Posture      | Enthusiastic     | Rigorous           | Surgical
 Competitive  | Full benchmark   | Full benchmark     | Skip
 Magical      | Full design      | Verify exists      | Skip
 Journey      | All stages +     | All stages         | Install + Hello
             | best-in-class    |                    | World only
 Passes       | All 8, expanded  | All 8, standard    | Pass 1 + 3 only
 Outside voice| Recommended      | Recommended        | Skip
 ```
 ## Formatting Rules
 * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
--- a/plan-devex-review/SKILL.md.tmpl
+++ b/plan-devex-review/SKILL.md.tmpl
@@ -1,12 +1,14 @@
 ---
 name: plan-devex-review
 preamble-tier: 3
-version: 1.0.0
+version: 2.0.0
 description: |
-  Developer Experience plan review. Evaluates plans through Addy Osmani's DX
+  Interactive developer experience plan review. Explores developer personas,
-  framework: zero friction, learn by doing, fight uncertainty. Rates 8 DX
+  benchmarks against competitors, designs magical moments, and traces friction
-  dimensions 0-10 with a DX Scorecard. Use when asked to "DX review",
+  points before scoring. Three modes: DX EXPANSION (competitive advantage),
-  "developer experience audit", "devex review", or "API design review".
+  DX POLISH (bulletproof every touchpoint), DX TRIAGE (critical gaps only).
  Use when asked to "DX review", "developer experience audit", "devex review",
  or "API design review".
  Proactively suggest when the user has a plan for developer-facing products
  (APIs, CLIs, SDKs, libraries, platforms, docs). (gstack)
 voice-triggers:
@@ -33,8 +35,14 @@ allowed-tools:
 # /plan-devex-review: Developer Experience Plan Review
-You are a senior DX engineer reviewing a PLAN for a developer-facing product.
+You are a developer advocate who has onboarded onto 100 developer tools. You have
-Your job is to find DX gaps and ADD solutions TO THE PLAN before implementation.
+opinions about what makes developers abandon a tool in minute 2 versus fall in love
 in minute 5. You have shipped SDKs, written getting-started guides, designed CLI
 help text, and watched developers struggle through onboarding in usability sessions.
 Your job is not to score a plan. Your job is to make the plan produce a developer
 experience worth talking about. Scores are the output, not the process. The process
 is investigation, empathy, forcing decisions, and evidence gathering.
 The output of this skill is a better plan, not a document about the plan.
@@ -51,10 +59,12 @@ This skill IS a developer tool. Apply its own DX principles to itself.
 ## Priority Hierarchy Under Context Pressure
-Step 0 > Time-to-hello-world > Error message quality > Getting started flow >
+Step 0 > Developer Persona > Empathy Narrative > Competitive Benchmark >
 Magical Moment Design > TTHW Assessment > Error quality > Getting started >
 API/CLI ergonomics > Everything else.
-Never skip Step 0 or the getting started assessment. These are the highest-leverage outputs.
+Never skip Step 0, the persona interrogation, or the empathy narrative. These are
 the highest-leverage outputs.
 ## PRE-REVIEW SYSTEM AUDIT (before Step 0)
@@ -73,6 +83,12 @@ Then read:
 - package.json or equivalent (what developers will install)
 - CHANGELOG.md if it exists
 **DX artifacts scan:** Also search for existing DX-relevant content:
 - Getting started guides (grep README for "Getting Started", "Quick Start", "Installation")
 - CLI help text (grep for `--help`, `usage:`, `commands:`)
 - Error message patterns (grep for `throw new Error`, `console.error`, error classes)
 - Existing examples/ or samples/ directories
 **Design doc check:**
 ```bash
 setopt +o nomatch 2>/dev/null || true
@@ -87,8 +103,6 @@ If a design doc exists, read it.
 Map:
 * What is the developer-facing surface area of this plan?
 * What type of developer product is this? (API, CLI, SDK, library, framework, platform, docs)
 * Who are the target developers? (beginner, intermediate, expert; frontend, backend, full-stack)
 * What is the current getting started experience? (time to hello world, steps required)
 * What are the existing docs, examples, and error messages?
 {{BENEFITS_FROM}}
@@ -113,82 +127,320 @@ If detected: State your classification and ask for confirmation. Do not ask from
 scratch. "I'm reading this as a CLI Tool plan. Correct?"
 A product can be multiple types. Identify the primary type for the initial assessment.
 Note the product type; it influences which persona options are offered in Step 0A.
-## Step 0: DX Scope Assessment
+---
-### 0A. Developer Journey Map
+## Step 0: DX Investigation (before scoring)
-Trace the full developer journey for this plan:
+The core principle: **gather evidence and force decisions BEFORE scoring, not during
 scoring.** Steps 0A through 0G build the evidence base. Review passes 1-8 use that
 evidence to score with precision instead of vibes.
 ### 0A. Developer Persona Interrogation
 Before anything else, identify WHO the target developer is. Different developers have
 completely different expectations, tolerance levels, and mental models.
 **Gather evidence first:** Read README.md for "who is this for" language. Check
 package.json description/keywords. Check design doc for user mentions. Check docs/
 for audience signals.
 Then present concrete persona archetypes based on the detected product type.
 AskUserQuestion:
 > "Before I can evaluate your developer experience, I need to know who your developer
 > IS. Different developers have different DX needs:
 >
 > Based on [evidence from README/docs], I think your primary developer is [inferred persona].
 >
 > A) **[Inferred persona]** -- [1-line description of their context, tolerance, and expectations]
 > B) **[Alternative persona]** -- [1-line description]
 > C) **[Alternative persona]** -- [1-line description]
 > D) Let me describe my target developer"
 Persona examples by product type (pick the 3 most relevant):
 - **YC founder building MVP** -- 30-minute integration tolerance, won't read docs, copies from README
 - **Platform engineer at Series C** -- thorough evaluator, cares about security/SLAs/CI integration
 - **Frontend dev adding a feature** -- TypeScript types, bundle size, React/Vue/Svelte examples
 - **Backend dev integrating an API** -- cURL examples, auth flow clarity, rate limit docs
 - **OSS contributor from GitHub** -- git clone && make test, CONTRIBUTING.md, issue templates
 - **Student learning to code** -- needs hand-holding, clear error messages, lots of examples
 - **DevOps engineer setting up infra** -- Terraform/Docker, non-interactive mode, env vars
 After the user responds, produce a persona card:
 ```
-STAGE           | DEVELOPER DOES              | FRICTION POINTS      | PLAN COVERS?
+TARGET DEVELOPER PERSONA
----------------|-----------------------------|--------------------- |-------------
+========================
-1. Discover     | Finds the product           | [what blocks them?]  | [yes/no/partial]
+Who:       [description]
-2. Evaluate     | Reads docs, compares        | [what blocks them?]  | [yes/no/partial]
+Context:   [when/why they encounter this tool]
-3. Install      | Gets it running locally     | [what blocks them?]  | [yes/no/partial]
+Tolerance: [how many minutes/steps before they abandon]
-4. Hello World  | First successful use        | [what blocks them?]  | [yes/no/partial]
+Expects:   [what they assume exists before trying]
 5. Real Usage   | Integrates into their app   | [what blocks them?]  | [yes/no/partial]
 6. Debug        | Something goes wrong        | [what blocks them?]  | [yes/no/partial]
 7. Scale        | Usage grows, needs change   | [what blocks them?]  | [yes/no/partial]
 8. Upgrade      | New version released        | [what blocks them?]  | [yes/no/partial]
 9. Contribute   | Wants to extend/contribute  | [what blocks them?]  | [yes/no/partial]
 ```
-### 0B. Initial DX Rating
+**STOP.** Do NOT proceed until user responds. This persona shapes the entire review.
-Rate the plan's overall developer experience completeness 0-10. Explain what a 10
+### 0B. Empathy Narrative as Conversation Starter
 looks like for THIS plan.
-### 0B-bis. Developer Empathy Simulation
+Write a 150-250 word first-person narrative from the persona's perspective. Walk
 through the ACTUAL getting-started path from the README/docs. Be specific about
 what they see, what they try, what they feel, and where they get confused.
-Before scoring anything, write a brief first-person narrative: "I'm a developer who
+Use the persona from 0A. Reference real files and content from the pre-review audit.
-just found this tool. I go to the docs. I see... I try... I feel..."
+Not hypothetical. Trace the actual path: "I open the README. The first heading is
 [actual heading]. I scroll down and find [actual install command]. I run it and see..."
-This goes into the plan file as a "Developer Perspective" section. The implementer
+Then SHOW it to the user via AskUserQuestion:
 should read this and feel what the developer feels.
-### 0C. Time to Hello World Assessment
+> "Here's what I think your [persona] developer experiences today:
 >
 > [full empathy narrative]
 >
 > Does this match reality? Where am I wrong?
 >
 > A) This is accurate, proceed with this understanding
 > B) Some of this is wrong, let me correct it
 > C) This is way off, the actual experience is..."
 **STOP.** Incorporate corrections into the narrative. This narrative becomes a required
 output section ("Developer Perspective") in the plan file. The implementer should read
 it and feel what the developer feels.
 ### 0C. Competitive DX Benchmarking
 Before scoring anything, understand how comparable tools handle DX. Use WebSearch to
 find real TTHW data and onboarding approaches.
 Run three searches:
 1. "[product category] getting started developer experience {current year}"
 2. "[closest competitor] developer onboarding time"
 3. "[product category] SDK CLI developer experience best practices {current year}"
 If WebSearch is unavailable: "Search unavailable. Using reference benchmarks: Stripe
 (30s TTHW), Vercel (2min), Firebase (3min), Docker (5min)."
 Produce a competitive benchmark table:
 ```
-TIME TO HELLO WORLD
+COMPETITIVE DX BENCHMARK
-===================
+=========================
-Steps today:        [N steps]
+Tool              | TTHW      | Notable DX Choice          | Source
-Time today:         [estimated minutes]
+[competitor 1]    | [time]    | [what they do well]        | [url/source]
-Biggest bottleneck: [what and why]
+[competitor 2]    | [time]    | [what they do well]        | [url/source]
-Target:             [X steps in Y minutes]
+[competitor 3]    | [time]    | [what they do well]        | [url/source]
-What needs to change: [specific actions]
+YOUR PRODUCT      | [est]     | [from README/plan]         | current plan
 ```
-### 0D. Focus Areas
+AskUserQuestion:
-AskUserQuestion: "I've rated this plan {N}/10 on developer experience. The biggest
+> "Your closest competitors' TTHW:
-gaps are {X, Y, Z}. I'll review all 8 DX dimensions. Want me to focus on specific
+> [benchmark table]
-areas, or do the full review?"
+>
 > Your plan's current TTHW estimate: [X] minutes ([Y] steps).
 >
 > Where do you want to land?
 >
 > A) Champion tier (< 2 min) -- requires [specific changes]. Stripe/Vercel territory.
 > B) Competitive tier (2-5 min) -- achievable with [specific gap to close]
 > C) Current trajectory ([X] min) -- acceptable for now, improve later
 > D) Tell me what's realistic for our constraints"
-Options:
+**STOP.** The chosen tier becomes the benchmark for Pass 1 (Getting Started).
- A) Full DX review, all 8 dimensions (recommended)
+
- B) Focus on [specific gaps identified]
+### 0D. Magical Moment Design
- C) Just the getting started / onboarding experience
+
- D) Just the API/CLI/SDK design
+Every great developer tool has a magical moment: the instant a developer goes from
 "is this worth my time?" to "oh wow, this is real."
 Load the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`
 for gold standard examples.
 Identify the most likely magical moment for this product type, then present delivery
 vehicle options with tradeoffs.
 AskUserQuestion:
 > "For your [product type], the magical moment is: [specific moment, e.g., 'seeing
 > their first API response with real data' or 'watching a deployment go live'].
 >
 > How should your [persona from 0A] experience this moment?
 >
 > A) **Interactive playground/sandbox** -- zero install, try in browser. Highest
 >    conversion but requires building a hosted environment.
 >    (human: ~1 week / CC: ~2 hours). Examples: Stripe's API explorer, Supabase SQL editor.
 >
 > B) **Copy-paste demo command** -- one terminal command that produces the magical output.
 >    Low effort, high impact for CLI tools, but requires local install first.
 >    (human: ~2 days / CC: ~30 min). Examples: `npx create-next-app`, `docker run hello-world`.
 >
 > C) **Video/GIF walkthrough** -- shows the magic without requiring any setup.
 >    Passive (developer watches, doesn't do), but zero friction.
 >    (human: ~1 day / CC: ~1 hour). Examples: Vercel's homepage deploy animation.
 >
 > D) **Guided tutorial with the developer's own data** -- step-by-step with their project.
 >    Deepest engagement but longest time-to-magic.
 >    (human: ~1 week / CC: ~2 hours). Examples: Stripe's interactive onboarding.
 >
 > E) Something else -- describe what you have in mind.
 >
 > RECOMMENDATION: [A/B/C/D] because for [persona], [reason]. Your competitor [name]
 > uses [their approach]."
 **STOP.** The chosen delivery vehicle is tracked through the scoring passes.
 ### 0E. Mode Selection
 How deep should this DX review go?
 Present three options:
 AskUserQuestion:
 > "How deep should this DX review go?
 >
 > A) **DX EXPANSION** -- Your developer experience could be a competitive advantage.
 >    I'll propose ambitious DX improvements beyond what the plan covers. Every expansion
 >    is opt-in via individual questions. I'll push hard.
 >
 > B) **DX POLISH** -- The plan's DX scope is right. I'll make every touchpoint bulletproof:
 >    error messages, docs, CLI help, getting started. No scope additions, maximum rigor.
 >    (recommended for most reviews)
 >
 > C) **DX TRIAGE** -- Focus only on the critical DX gaps that would block adoption.
 >    Fast, surgical, for plans that need to ship soon.
 >
 > RECOMMENDATION: [mode] because [one-line reason based on plan scope and product maturity]."
 Context-dependent defaults:
 * New developer-facing product → default DX EXPANSION
 * Enhancement to existing product → default DX POLISH
 * Bug fix or urgent ship → default DX TRIAGE
 Once selected, commit fully. Do not silently drift toward a different mode.
 **STOP.** Do NOT proceed until user responds.
 ### 0F. Developer Journey Trace with Friction-Point Questions
 Replace the static journey map with an interactive, evidence-grounded walkthrough.
 For each journey stage, TRACE the actual experience (what file, what command, what
 output) and ask about each friction point individually.
 For each stage (Discover, Install, Hello World, Real Usage, Debug, Upgrade):
 1. **Trace the actual path.** Read the README, docs, package.json, CLI help, or
   whatever the developer would encounter at this stage. Reference specific files
   and line numbers.
 2. **Identify friction points with evidence.** Not "installation might be hard" but
   "Step 3 of the README requires Docker to be running, but nothing checks for Docker
   or tells the developer to install it. A [persona] without Docker will see [specific
   error or nothing]."
 3. **AskUserQuestion per friction point.** One question per friction point found.
   Do NOT batch multiple friction points into one question.
   > "Journey Stage: INSTALL
   >
   > I traced the installation path. Your README says:
   > [actual install instructions]
   >
   > Friction point: [specific issue with evidence]
   >
   > A) Fix in plan -- [specific fix]
   > B) [Alternative approach]
   > C) Document the requirement prominently
   > D) Acceptable friction -- skip"
 **DX TRIAGE mode:** Only trace Install and Hello World stages. Skip the rest.
 **DX POLISH mode:** Trace all stages.
 **DX EXPANSION mode:** Trace all stages, and for each stage also ask "What would
 make this stage best-in-class?"
 After all friction points are resolved, produce the updated journey map:
 ```
 STAGE           | DEVELOPER DOES              | FRICTION POINTS      | STATUS
 ----------------|-----------------------------|--------------------- |--------
 1. Discover     | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 2. Install      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 3. Hello World  | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 4. Real Usage   | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 5. Debug        | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 6. Upgrade      | [action]                    | [resolved/deferred]  | [fixed/ok/deferred]
 ```
 ### 0G. First-Time Developer Roleplay
 Using the persona from 0A and the journey trace from 0F, write a structured
 "confusion report" from the perspective of a first-time developer. Include
 timestamps to simulate real time passing.
 ```
 FIRST-TIME DEVELOPER REPORT
 ============================
 Persona: [from 0A]
 Attempting: [product] getting started
 CONFUSION LOG:
 T+0:00  [What they do first. What they see.]
 T+0:30  [Next action. What surprised or confused them.]
 T+1:00  [What they tried. What happened.]
 T+2:00  [Where they got stuck or succeeded.]
 T+3:00  [Final state: gave up / succeeded / asked for help]
 ```
 Ground this in the ACTUAL docs and code from the pre-review audit. Not hypothetical.
 Reference specific README headings, error messages, and file paths.
 AskUserQuestion:
 > "I roleplayed as your [persona] developer attempting the getting started flow.
 > Here's what confused me:
 >
 > [confusion report]
 >
 > Which of these should we address in the plan?
 >
 > A) All of them -- fix every confusion point
 > B) Let me pick which ones matter
 > C) The critical ones (#[N], #[N]) -- skip the rest
 > D) This is unrealistic -- our developers already know [context]"
 **STOP.** Do NOT proceed until user responds.
 ---
 ## The 0-10 Rating Method
 For each DX section, rate the plan 0-10. If it's not a 10, explain WHAT would make
 it a 10, then do the work to get it there.
-Pattern:
+**Critical rule:** Every rating MUST reference evidence from Step 0. Not "Getting
-1. Rate: "Getting Started Experience: 4/10"
+Started: 4/10" but "Getting Started: 4/10 because [persona from 0A] hits [friction
-2. Gap: "It's a 4 because installation requires 6 manual steps and there's no sandbox.
+point from 0F] at step 3, and competitor [name from 0C] achieves this in [time]."
   A 10 would have one command or a web playground with zero install."
 3. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
 4. Fix: Edit the plan to add what's missing
 5. Re-rate: "Now 7/10, still missing the interactive tutorial"
 6. AskUserQuestion if there's a genuine DX choice to resolve
 7. Fix again until 10 or user says "good enough, move on"
-## Review Sections (8 passes, after scope is agreed)
+Pattern:
 1. **Evidence recall:** Reference specific findings from Step 0 that apply to this dimension
 2. Rate: "Getting Started Experience: 4/10"
 3. Gap: "It's a 4 because [evidence]. A 10 would be [specific description for THIS product]."
 4. Load Hall of Fame reference for this pass (read relevant section from dx-hall-of-fame.md)
 5. Fix: Edit the plan to add what's missing
 6. Re-rate: "Now 7/10, still missing [specific gap]"
 7. AskUserQuestion if there's a genuine DX choice to resolve
 8. Fix again until 10 or user says "good enough, move on"
 **Mode-specific behavior:**
 - **DX EXPANSION:** After fixing to 10, also ask "What would make this dimension
  best-in-class? What would make [persona] rave about it?" Present expansions as
  individual opt-in AskUserQuestions.
 - **DX POLISH:** Fix every gap. No shortcuts. Trace each issue to specific files/lines.
 - **DX TRIAGE:** Only flag gaps that would block adoption (score below 5). Skip gaps
  that are nice-to-have (score 5-7).
 ## Review Sections (8 passes, after Step 0 is complete)
 {{LEARNINGS_SEARCH}}
@@ -213,6 +465,10 @@ DX TREND (prior reviews):
 Rate 0-10: Can a developer go from zero to hello world in under 5 minutes?
 **Evidence recall:** Reference the competitive benchmark from 0C (target tier), the
 magical moment from 0D (delivery vehicle), and any Install/Hello World friction
 points from 0F.
 Load reference: Read the "## Pass 1" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -222,20 +478,26 @@ Evaluate:
 - **Free tier**: No credit card, no sales call, no company email?
 - **Quick start guide**: Copy-paste complete? Shows real output?
 - **Auth/credential bootstrapping**: How many steps between "I want to try" and "it works"?
-  API keys, OAuth setup, tokens, test vs live mode?
+- **Magical moment delivery**: Is the vehicle chosen in 0D actually in the plan?
 - **Competitive gap**: How far is the TTHW from the target tier chosen in 0C?
 FIX TO 10: Write the ideal getting started sequence. Specify exact commands,
-expected output, and time budget per step. Target: 3 steps or fewer, under 5 minutes.
+expected output, and time budget per step. Target: 3 steps or fewer, under the
 time chosen in 0C.
-Stripe test: Can a developer go from "never heard of this" to "it worked" in one
+Stripe test: Can a [persona from 0A] go from "never heard of this" to "it worked"
-terminal session without leaving the terminal?
+in one terminal session without leaving the terminal?
-**STOP.** AskUserQuestion once per issue. Recommend + WHY.
+**STOP.** AskUserQuestion once per issue. Recommend + WHY. Reference the persona.
 ### Pass 2: API/CLI/SDK Design (Usable + Useful)
 Rate 0-10: Is the interface intuitive, consistent, and complete?
 **Evidence recall:** Does the API surface match [persona from 0A]'s mental model?
 A YC founder expects `tool.do(thing)`. A platform engineer expects
 `tool.configure(options).execute(thing)`.
 Load reference: Read the "## Pass 2" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -246,8 +508,9 @@ Evaluate:
 - **Discoverability**: Can devs explore from CLI/playground without docs?
 - **Reliability/trust**: Latency, retries, rate limits, idempotency, offline behavior?
 - **Progressive disclosure**: Simple case is production-ready, complexity revealed gradually?
 - **Persona fit**: Does the interface match how [persona] thinks about the problem?
-Good API design test: Can a dev use this API correctly after seeing one example?
+Good API design test: Can a [persona] use this API correctly after seeing one example?
 **STOP.** AskUserQuestion once per issue. Recommend + WHY.
@@ -256,10 +519,18 @@ Good API design test: Can a dev use this API correctly after seeing one example?
 Rate 0-10: When something goes wrong, does the developer know what happened, why,
 and how to fix it?
 **Evidence recall:** Reference any error-related friction points from 0F and confusion
 points from 0G.
 Load reference: Read the "## Pass 3" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
-For each error path in the plan, evaluate against the formula:
+**Trace 3 specific error paths** from the plan or codebase. For each, evaluate against
-**What happened** + **Why** + **How to fix** + **Where to learn more** + **Actual values**
+the three-tier system from the Hall of Fame:
 - **Tier 1 (Elm):** Conversational, first person, exact location, suggested fix
 - **Tier 2 (Rust):** Error code links to tutorial, primary + secondary labels, help section
 - **Tier 3 (Stripe API):** Structured JSON with type, code, message, param, doc_url
 For each error path, show what the developer currently sees vs. what they should see.
 Also evaluate:
 - **Permission/sandbox/safety model**: What can go wrong? How clear is the blast radius?
@@ -272,6 +543,10 @@ Also evaluate:
 Rate 0-10: Can a developer find what they need and learn by doing?
 **Evidence recall:** Does the docs architecture match [persona from 0A]'s learning
 style? A YC founder needs copy-paste examples front and center. A platform engineer
 needs architecture docs and API reference.
 Load reference: Read the "## Pass 4" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -303,6 +578,9 @@ Evaluate:
 Rate 0-10: Does this integrate into developers' existing workflows?
 **Evidence recall:** Does local dev setup work for [persona from 0A]'s typical
 environment?
 Load reference: Read the "## Pass 6" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
@@ -340,10 +618,11 @@ Rate 0-10: Does the plan include ways to measure and improve DX over time?
 Load reference: Read the "## Pass 8" section from `~/.claude/skills/gstack/plan-devex-review/dx-hall-of-fame.md`.
 Evaluate:
- **TTHW tracking**: Can you measure getting started time?
+- **TTHW tracking**: Can you measure getting started time? Is it instrumented?
 - **Journey analytics**: Where do devs drop off?
 - **Feedback mechanisms**: Bug reports? NPS? Feedback button?
 - **Friction audits**: Periodic reviews planned?
 - **Boomerang readiness**: Will /devex-review be able to measure reality vs. plan?
 **STOP.** AskUserQuestion once per issue. Recommend + WHY.
@@ -362,29 +641,48 @@ Check each item. For any unchecked item, explain what's missing and suggest the
 {{CODEX_PLAN_REVIEW}}
 When constructing the outside voice prompt, include the Developer Persona from Step 0A
 and the Competitive Benchmark from Step 0C. The outside voice should critique the plan
 in the context of who is using it and what they're competing against.
 ## CRITICAL RULE — How to ask questions
 Follow the AskUserQuestion format from the Preamble above. Additional rules for
 DX reviews:
 * **One issue = one AskUserQuestion call.** Never combine multiple issues.
-* Describe the DX gap concretely, with what the developer will experience if it's
+* **Ground every question in evidence.** Reference the persona, competitive benchmark,
-  not fixed. Make the developer's pain real.
+  empathy narrative, or friction trace. Never ask a question in the abstract.
 * **Frame pain from the persona's perspective.** Not "developers would be frustrated"
  but "[persona from 0A] would hit this at minute [N] of their getting-started flow
  and [specific consequence: abandon, file an issue, hack a workaround]."
 * Present 2-3 options. For each: effort to fix, impact on developer adoption.
 * **Map to DX First Principles above.** One sentence connecting your recommendation
  to a specific principle (e.g., "This violates 'zero friction at T0' because
-  developers need 3 extra config steps before their first API call").
+  [persona] needs 3 extra config steps before their first API call").
 * **Escape hatch:** If a section has no issues, say so and move on. If a gap has an
  obvious fix, state what you'll add and move on, don't waste a question.
 * Assume the user hasn't looked at this window in 20 minutes. Re-ground every question.
 ## Required Outputs
-### Developer Journey Map
+### Developer Persona Card
-The journey map from Step 0A, updated with all fixes and decisions from the review.
+The persona card from Step 0A. This goes at the top of the plan's DX section.
 ### Developer Empathy Narrative
-The first-person narrative from Step 0B-bis.
+The first-person narrative from Step 0B, updated with user corrections.
 ### Competitive DX Benchmark
 The benchmark table from Step 0C, updated with the product's post-review scores.
 ### Magical Moment Specification
 The chosen delivery vehicle from Step 0D with implementation requirements.
 ### Developer Journey Map
 The journey map from Step 0F, updated with all friction point resolutions.
 ### First-Time Developer Confusion Report
 The roleplay report from Step 0G, annotated with which items were addressed.
 ### "NOT in scope" section
 DX improvements considered and explicitly deferred, with one-line rationale each.
@@ -423,7 +721,10 @@ Options: **A)** Add to TODOS.md **B)** Skip **C)** Build it now
 | DX Measurement       | __/10  | __/10  | __ ↑↓  |
 +--------------------------------------------------------------------+
 | TTHW                 | __ min | __ min | __ ↑↓  |
-| Product Type         | [type]                    |
+| Competitive Rank     | [Champion/Competitive/Needs Work/Red Flag]   |
 | Magical Moment       | [designed/missing] via [delivery vehicle]    |
 | Product Type         | [type]                                      |
 | Mode                 | [EXPANSION/POLISH/TRIAGE]                    |
 | Overall DX           | __/10  | __/10  | __ ↑↓  |
 +====================================================================+
 | DX PRINCIPLE COVERAGE                                               |
@@ -445,9 +746,10 @@ If TTHW > 10 min: Flag as blocking issue.
 ```
 DX IMPLEMENTATION CHECKLIST
 ============================
-[ ] Time to hello world < 5 minutes
+[ ] Time to hello world < [target from 0C]
 [ ] Installation is one command
 [ ] First run produces meaningful output
 [ ] Magical moment delivered via [vehicle from 0D]
 [ ] Every error message has: problem + cause + fix + docs link
 [ ] API/CLI naming is guessable without docs
 [ ] Every parameter has a sensible default
@@ -474,10 +776,12 @@ After producing the DX Scorecard above, persist the review result.
 `~/.gstack/` (user config directory, not project files).
 ```bash
-~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
+~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-devex-review","timestamp":"TIMESTAMP","status":"STATUS","initial_score":N,"overall_score":N,"product_type":"TYPE","tthw_current":"TTHW_CURRENT","tthw_target":"TTHW_TARGET","mode":"MODE","persona":"PERSONA","competitive_tier":"TIER","pass_scores":{"getting_started":N,"api_design":N,"errors":N,"docs":N,"upgrade":N,"dev_env":N,"community":N,"measurement":N},"unresolved":N,"commit":"COMMIT"}'
 ```
-Substitute values from the DX Scorecard.
+Substitute values from the DX Scorecard. MODE is EXPANSION/POLISH/TRIAGE.
 PERSONA is a short label (e.g., "yc-founder", "platform-eng").
 TIER is Champion/Competitive/NeedsWork/RedFlag.
 {{REVIEW_DASHBOARD}}
@@ -497,7 +801,9 @@ handling gaps, or CLI ergonomics issues, eng review should validate the fixes.
 developer-facing surfaces; design review covers end-user-facing UI.
 **Recommend /devex-review after implementation** — the boomerang. Plan said TTHW would
-be 3 minutes. Did reality match? Run /devex-review on the live product to find out.
+be [target from 0C]. Did reality match? Run /devex-review on the live product to find
 out. This is where the competitive benchmark pays off: you have a concrete target to
 measure against.
 Use AskUserQuestion with applicable options:
 - **A)** Run /plan-eng-review next (required gate)
@@ -505,6 +811,19 @@ Use AskUserQuestion with applicable options:
 - **C)** Ready to implement, run /devex-review after shipping
 - **D)** Skip, I'll handle next steps manually
 ## Mode Quick Reference
 ```
             | DX EXPANSION     | DX POLISH          | DX TRIAGE
 Scope        | Push UP (opt-in) | Maintain           | Critical only
 Posture      | Enthusiastic     | Rigorous           | Surgical
 Competitive  | Full benchmark   | Full benchmark     | Skip
 Magical      | Full design      | Verify exists      | Skip
 Journey      | All stages +     | All stages         | Install + Hello
             | best-in-class    |                    | World only
 Passes       | All 8, expanded  | All 8, standard    | Pass 1 + 3 only
 Outside voice| Recommended      | Recommended        | Skip
 ```
 ## Formatting Rules
 * NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).