feat: adversarial spec review loop + skill chaining (v0.9.1.0) (#249)

* feat: add {{SPEC_REVIEW_LOOP}}, {{DESIGN_SKETCH}}, benefits-from resolvers

Three new resolvers in gen-skill-docs.ts:

- {{SPEC_REVIEW_LOOP}}: adversarial subagent reviews documents on 5
  dimensions (completeness, consistency, clarity, scope, feasibility)
  with convergence guard, quality score, and JSONL metrics
- {{DESIGN_SKETCH}}: generates rough HTML wireframes for UI ideas using
  DESIGN.md constraints and design principles, renders via $B
- {{BENEFITS_FROM}}: parses benefits-from frontmatter and generates
  skill chaining offer prose (one-hop-max, never blocks)

Also extends TemplateContext with benefitsFrom field and adds inline
YAML frontmatter parsing for the new field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /office-hours spec review loop + visual sketch phases

- Phase 4.5 ({{DESIGN_SKETCH}}): for UI ideas, generates rough HTML
  wireframe using design principles from {{DESIGN_METHODOLOGY}} and
  DESIGN.md, renders via $B, presents screenshot for iteration
- Phase 5.5 ({{SPEC_REVIEW_LOOP}}): adversarial subagent reviews the
  design doc before user sees it — catches gaps in completeness,
  consistency, clarity, scope, and feasibility
- Adds {{BROWSE_SETUP}} for $B availability in sketch phase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: skill chaining — plan reviews offer /office-hours

- plan-ceo-review: benefits-from office-hours, offers /office-hours when
  no design doc found, mid-session detection when user seems lost,
  spec review loop on CEO plan documents
- plan-eng-review: benefits-from office-hours, offers /office-hours when
  no design doc found
- One-hop-max chaining: never blocks, max one offer per session

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add validation + E2E tests for spec review, sketch, benefits-from

Unit tests (32 new assertions):
- SPEC_REVIEW_LOOP: 5 dimensions, Agent dispatch, 3 iterations, quality
  score, metrics path, convergence guard, graceful failure
- DESIGN_SKETCH: DESIGN.md awareness, wireframe, $B goto/screenshot,
  rough aesthetic, skip conditions
- BENEFITS_FROM: prerequisite offer in CEO + eng review, graceful
  decline, skills without benefits-from don't get offer
- office-hours structure: spec review loop, adversarial dimensions,
  visual sketch section

E2E tests (2 new):
- office-hours-spec-review: verifies agent understands the spec review
  loop from SKILL.md
- plan-ceo-review-benefits: verifies agent understands the skill
  chaining offer

Touchfiles updated for diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-20 06:24:22 -07:00
committed by GitHub
parent 91bea06675
commit ae2d841012
16 changed files with 1020 additions and 5 deletions

View File

@@ -8,6 +8,7 @@ description: |
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation.
benefits-from: [office-hours]
allowed-tools:
- Read
- Write
@@ -277,6 +278,25 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
## Prerequisite Skill Offer
When the design doc check above prints "No design doc found," offer the prerequisite
skill before proceeding.
Say to the user via AskUserQuestion:
> "No design doc found for this branch. `/office-hours` produces a structured problem
> statement, premise challenge, and explored alternatives — it gives this review much
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
> not per-product — it captures the thinking behind this specific change."
Options:
- A) Run /office-hours first (in another window, then come back)
- B) Skip — proceed with standard review
If they skip: "No worries — standard review. If you ever want sharper input, try
/office-hours first next time." Then proceed normally. Do not re-offer later in the session.
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?

View File

@@ -8,6 +8,7 @@ description: |
"review the architecture", "engineering review", or "lock in the plan".
Proactively suggest when the user has a plan or design doc and is about to
start coding — to catch architecture issues before implementation.
benefits-from: [office-hours]
allowed-tools:
- Read
- Write
@@ -73,6 +74,8 @@ DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head
```
If a design doc exists, read it. Use it as the source of truth for the problem statement, constraints, and chosen approach. If it has a `Supersedes:` field, note that this is a revised design — check the prior version for context on what changed and why.
{{BENEFITS_FROM}}
### Step 0: Scope Challenge
Before reviewing anything, answer these questions:
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?