gstack/test/skill-llm-eval.test.ts at 0adc71a13b48c8c6cbbbe467a7653456a69a7d4b

hai/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-17 17:51:27 +08:00

Files

Garry Tan 0adc71a13b fix: lower command reference completeness threshold to 3

The LLM judge consistently scores the command reference table's
completeness at 3/5 because it's a terse quick-reference format.
Detailed argument docs live in per-command sections, not the summary
table. The baseline already expects 3 — align the direct test threshold.

2026-03-24 14:27:11 -07:00

34 KiB

Raw Blame History

View Raw

34 KiB Raw Blame History

34 KiB

Raw Blame History