fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43)

* fix: enrich command descriptions and snapshot flags for LLM eval quality 14 command descriptions enriched with specific arg formats, valid values, error behavior, and return types. Fixed header usage from <name> <value> to <name>:<value>. Added cookie usage syntax. Snapshot flags now show long names, ref numbering, and output format examples. * refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS Replace hand-maintained help block with generateHelpText() that reads from COMMAND_DESCRIPTIONS and SNAPSHOT_FLAGS. Eliminates help text drift from source of truth. * test: add usage consistency and pipe guard tests Usage consistency test cross-checks Usage: patterns in implementation against COMMAND_DESCRIPTIONS using structural skeleton comparison. Pipe guard test ensures descriptions don't contain | which would break markdown table rendering. * chore: upgrade eval judge to Sonnet 4.6, update changelog Switch LLM-as-judge evals from Haiku to Sonnet 4.6 for more stable, nuanced scoring. Add changelog entry for all eval improvements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-21 20:28:24 +08:00 · 2026-03-13 22:14:14 -07:00
parent 5205070299
commit a468374272
10 changed files with 233 additions and 112 deletions
--- a/test/gen-skill-docs.test.ts
+++ b/test/gen-skill-docs.test.ts
@@ -139,6 +139,14 @@ describe('description quality evals', () => {
    }
  });

+  // Guard: descriptions must not contain pipe (breaks markdown table cells)
+  // Usage strings are backtick-wrapped in the table so pipes there are safe.
+  test('no command description contains pipe character', () => {
+    for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) {
+      expect(meta.description).not.toContain('|');
+    }
+  });
+
  // Guard: generated output uses → not ->
  test('generated SKILL.md uses unicode arrows', () => {
    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');