mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-20 19:29:56 +08:00
fix: browse binary discovery broken for agents (v0.3.5) (#44)
* fix: replace find-browse with direct path in SKILL.md setup blocks
Agents were skipping the find-browse binary and guessing bin/browse
(wrong path). Now the setup block explicitly checks browse/dist/browse
with workspace-local priority, global fallback.
Also adds || true to update check to prevent misleading exit code 1.
Adds {{UPDATE_CHECK}} and {{BROWSE_SETUP}} template placeholders to
gen-skill-docs.ts so all skills share a single source of truth.
* refactor: convert qa/ and setup-browser-cookies/ to .tmpl templates
Replaces hardcoded update check and find-browse blocks with
{{UPDATE_CHECK}} and {{BROWSE_SETUP}} placeholders. Both skills
are now generated from templates via gen-skill-docs.
* test: add e2e and LLM eval tests for SKILL.md setup block
- 3 Agent SDK e2e tests: happy path, NEEDS_SETUP, non-git-repo
- LLM eval: setup block clarity + actionability >= 4
- New error pattern: 'no such file or directory.*browse'
These tests catch the exact failure mode where agents can't discover
the browse binary via SKILL.md instructions.
* chore: bump version and changelog (v0.3.5)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -115,6 +115,19 @@ describeEval('LLM-as-judge quality evals', () => {
|
||||
expect(scores.actionability).toBeGreaterThanOrEqual(4);
|
||||
}, 30_000);
|
||||
|
||||
test('setup block scores >= 4 on actionability and clarity', async () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
const setupStart = content.indexOf('## SETUP');
|
||||
const setupEnd = content.indexOf('## IMPORTANT');
|
||||
const section = content.slice(setupStart, setupEnd);
|
||||
|
||||
const scores = await judge('setup/binary discovery instructions', section);
|
||||
console.log('Setup block scores:', JSON.stringify(scores, null, 2));
|
||||
|
||||
expect(scores.actionability).toBeGreaterThanOrEqual(4);
|
||||
expect(scores.clarity).toBeGreaterThanOrEqual(4);
|
||||
}, 30_000);
|
||||
|
||||
test('regression check: compare branch vs baseline quality', async () => {
|
||||
// This test compares the generated output against the hand-maintained
|
||||
// baseline from main. The generated version should score equal or higher.
|
||||
|
||||
Reference in New Issue
Block a user