mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-20 03:12:28 +08:00
feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83)
* feat: browser ref staleness detection via async count() validation resolveRef() now checks element count to detect stale refs after page mutations (e.g. SPA navigation). RefEntry stores role+name metadata for better diagnostics. 3 new snapshot tests for staleness detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: qa-only skill, qa fix loop, plan-to-QA artifact flow Add /qa-only (report-only, Edit tool blocked), restructure /qa with find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for shared methodology. /plan-eng-review now writes test-plan artifacts to ~/.gstack/projects/<slug>/ for QA consumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: eval efficiency metrics — turns, duration, commentary across all surfaces Add generateCommentary() for natural-language delta interpretation, per-test turns/duration in comparison and summary output, judgePassed unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0 - ARCHITECTURE: add ref staleness detection section, update RefEntry type - BROWSER: add ref staleness paragraph to snapshot system docs - CONTRIBUTING: update eval tool descriptions with commentary feature - README: fix missing qa-only in project-local uninstall command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add user-facing benefit descriptions to v0.4.0 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -120,7 +120,7 @@ Refs (`@e1`, `@e2`, `@c1`) are how the agent addresses page elements without wri
|
||||
2. Server calls Playwright's page.accessibility.snapshot()
|
||||
3. Parser walks the ARIA tree, assigns sequential refs: @e1, @e2, @e3...
|
||||
4. For each ref, builds a Playwright Locator: getByRole(role, { name }).nth(index)
|
||||
5. Stores Map<string, Locator> on the BrowserManager instance
|
||||
5. Stores Map<string, RefEntry> on the BrowserManager instance (role + name + Locator)
|
||||
6. Returns the annotated tree as plain text
|
||||
|
||||
Later:
|
||||
@@ -142,6 +142,19 @@ Playwright Locators are external to the DOM. They use the accessibility tree (wh
|
||||
|
||||
Refs are cleared on navigation (the `framenavigated` event on the main frame). This is correct — after navigation, all locators are stale. The agent must run `snapshot` again to get fresh refs. This is by design: stale refs should fail loudly, not click the wrong element.
|
||||
|
||||
### Ref staleness detection
|
||||
|
||||
SPAs can mutate the DOM without triggering `framenavigated` (e.g. React router transitions, tab switches, modal opens). This makes refs stale even though the page URL didn't change. To catch this, `resolveRef()` performs an async `count()` check before using any ref:
|
||||
|
||||
```
|
||||
resolveRef(@e3) → entry = refMap.get("e3")
|
||||
→ count = await entry.locator.count()
|
||||
→ if count === 0: throw "Ref @e3 is stale — element no longer exists. Run 'snapshot' to get fresh refs."
|
||||
→ if count > 0: return { locator }
|
||||
```
|
||||
|
||||
This fails fast (~5ms overhead) instead of letting Playwright's 30-second action timeout expire on a missing element. The `RefEntry` stores `role` and `name` metadata alongside the Locator so the error message can tell the agent what the element was.
|
||||
|
||||
### Cursor-interactive refs (@c)
|
||||
|
||||
The `-C` flag finds elements that are clickable but not in the ARIA tree — things styled with `cursor: pointer`, elements with `onclick` attributes, or custom `tabindex`. These get `@c1`, `@c2` refs in a separate namespace. This catches custom components that frameworks render as `<div>` but are actually buttons.
|
||||
|
||||
Reference in New Issue
Block a user