mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-21 20:28:24 +08:00
fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43)
* fix: enrich command descriptions and snapshot flags for LLM eval quality 14 command descriptions enriched with specific arg formats, valid values, error behavior, and return types. Fixed header usage from <name> <value> to <name>:<value>. Added cookie usage syntax. Snapshot flags now show long names, ref numbering, and output format examples. * refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS Replace hand-maintained help block with generateHelpText() that reads from COMMAND_DESCRIPTIONS and SNAPSHOT_FLAGS. Eliminates help text drift from source of truth. * test: add usage consistency and pipe guard tests Usage consistency test cross-checks Usage: patterns in implementation against COMMAND_DESCRIPTIONS using structural skeleton comparison. Pipe guard test ensures descriptions don't contain | which would break markdown table rendering. * chore: upgrade eval judge to Sonnet 4.6, update changelog Switch LLM-as-judge evals from Haiku to Sonnet 4.6 for more stable, nuanced scoring. Add changelog entry for all eval improvements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -139,6 +139,14 @@ describe('description quality evals', () => {
|
||||
}
|
||||
});
|
||||
|
||||
// Guard: descriptions must not contain pipe (breaks markdown table cells)
|
||||
// Usage strings are backtick-wrapped in the table so pipes there are safe.
|
||||
test('no command description contains pipe character', () => {
|
||||
for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) {
|
||||
expect(meta.description).not.toContain('|');
|
||||
}
|
||||
});
|
||||
|
||||
// Guard: generated output uses → not ->
|
||||
test('generated SKILL.md uses unicode arrows', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
|
||||
Reference in New Issue
Block a user