feat: adversarial spec review loop + skill chaining (v0.9.1.0) (#249)

* feat: add {{SPEC_REVIEW_LOOP}}, {{DESIGN_SKETCH}}, benefits-from resolvers Three new resolvers in gen-skill-docs.ts: - {{SPEC_REVIEW_LOOP}}: adversarial subagent reviews documents on 5 dimensions (completeness, consistency, clarity, scope, feasibility) with convergence guard, quality score, and JSONL metrics - {{DESIGN_SKETCH}}: generates rough HTML wireframes for UI ideas using DESIGN.md constraints and design principles, renders via $B - {{BENEFITS_FROM}}: parses benefits-from frontmatter and generates skill chaining offer prose (one-hop-max, never blocks) Also extends TemplateContext with benefitsFrom field and adds inline YAML frontmatter parsing for the new field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /office-hours spec review loop + visual sketch phases - Phase 4.5 ({{DESIGN_SKETCH}}): for UI ideas, generates rough HTML wireframe using design principles from {{DESIGN_METHODOLOGY}} and DESIGN.md, renders via $B, presents screenshot for iteration - Phase 5.5 ({{SPEC_REVIEW_LOOP}}): adversarial subagent reviews the design doc before user sees it — catches gaps in completeness, consistency, clarity, scope, and feasibility - Adds {{BROWSE_SETUP}} for $B availability in sketch phase Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: skill chaining — plan reviews offer /office-hours - plan-ceo-review: benefits-from office-hours, offers /office-hours when no design doc found, mid-session detection when user seems lost, spec review loop on CEO plan documents - plan-eng-review: benefits-from office-hours, offers /office-hours when no design doc found - One-hop-max chaining: never blocks, max one offer per session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add validation + E2E tests for spec review, sketch, benefits-from Unit tests (32 new assertions): - SPEC_REVIEW_LOOP: 5 dimensions, Agent dispatch, 3 iterations, quality score, metrics path, convergence guard, graceful failure - DESIGN_SKETCH: DESIGN.md awareness, wireframe, $B goto/screenshot, rough aesthetic, skip conditions - BENEFITS_FROM: prerequisite offer in CEO + eng review, graceful decline, skills without benefits-from don't get offer - office-hours structure: spec review loop, adversarial dimensions, visual sketch section E2E tests (2 new): - office-hours-spec-review: verifies agent understands the spec review loop from SKILL.md - plan-ceo-review-benefits: verifies agent understands the skill chaining offer Touchfiles updated for diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.9.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-08 21:49:45 +08:00 · 2026-03-20 06:24:22 -07:00
parent 91bea06675
commit ae2d841012
16 changed files with 1020 additions and 5 deletions
--- a/test/skill-validation.test.ts
+++ b/test/skill-validation.test.ts
@@ -644,6 +644,59 @@ describe('office-hours skill structure', () => {
  test('contains builder operating principles', () => {
    expect(content).toContain('Delight is the currency');
  });
+
+  // Spec Review Loop (Phase 5.5)
+  test('contains spec review loop', () => {
+    expect(content).toContain('Spec Review Loop');
+  });
+
+  test('contains adversarial review dimensions', () => {
+    for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
+      expect(content).toContain(dim);
+    }
+  });
+
+  test('contains subagent dispatch instruction', () => {
+    expect(content).toMatch(/Agent.*tool|subagent/i);
+  });
+
+  test('contains max 3 iterations', () => {
+    expect(content).toMatch(/3.*iteration|maximum.*3/i);
+  });
+
+  test('contains quality score', () => {
+    expect(content).toContain('quality score');
+  });
+
+  test('contains spec review metrics path', () => {
+    expect(content).toContain('spec-review.jsonl');
+  });
+
+  test('contains convergence guard', () => {
+    expect(content).toMatch(/convergence/i);
+  });
+
+  // Visual Sketch (Phase 4.5)
+  test('contains visual sketch section', () => {
+    expect(content).toContain('Visual Sketch');
+  });
+
+  test('contains wireframe generation', () => {
+    expect(content).toMatch(/wireframe|sketch/i);
+  });
+
+  test('contains DESIGN.md awareness', () => {
+    expect(content).toContain('DESIGN.md');
+  });
+
+  test('contains browse rendering', () => {
+    expect(content).toContain('$B goto');
+    expect(content).toContain('$B screenshot');
+  });
+
+  test('contains rough aesthetic instruction', () => {
+    expect(content).toMatch(/rough|hand-drawn/i);
+  });
 });

 describe('investigate skill structure', () => {
@@ -856,6 +909,22 @@ describe('CEO review mode validation', () => {
    expect(content).toContain('HOLD SCOPE');
    expect(content).toContain('REDUCTION');
  });
+
+  // Skill chaining (benefits-from)
+  test('contains prerequisite skill offer for office-hours', () => {
+    expect(content).toContain('Prerequisite Skill Offer');
+    expect(content).toContain('/office-hours');
+  });
+
+  test('contains mid-session detection', () => {
+    expect(content).toContain('Mid-session detection');
+    expect(content).toMatch(/still figuring out|seems lost/i);
+  });
+
+  // Spec review on CEO plans
+  test('contains spec review loop for CEO plan documents', () => {
+    expect(content).toContain('Spec Review Loop');
+  });
 });

 // --- gstack-slug helper ---