Phase 2: Rewrite SKILL.md as QA playbook + command reference

Reorient SKILL.md files from raw command reference to QA-first playbook
with 10 workflow patterns (test user flows, verify deployments, dogfood
features, responsive layouts, file upload, forms, dialogs, compare pages).
Compact command reference tables at the bottom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-12 13:33:54 -07:00
parent f3ebd0adbf
commit d827276a8d
3 changed files with 383 additions and 428 deletions

View File

@@ -8,15 +8,16 @@ This document covers the command reference and internals of gstack's headless br
|----------|----------|----------| |----------|----------|----------|
| Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page | | Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page |
| Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content | | Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content |
| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel]` | Get refs for interaction | | Snapshot | `snapshot [-i] [-c] [-d N] [-s sel] [-D] [-a] [-o] [-C]` | Get refs, diff, annotate |
| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport` | Use the page | | Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`, `upload` | Use the page |
| Inspect | `js`, `eval`, `css`, `attrs`, `console`, `network`, `cookies`, `storage`, `perf` | Debug and verify | | Inspect | `js`, `eval`, `css`, `attrs`, `is`, `console`, `network`, `dialog`, `cookies`, `storage`, `perf` | Debug and verify |
| Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees | | Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees |
| Compare | `diff <url1> <url2>` | Spot differences between environments | | Compare | `diff <url1> <url2>` | Spot differences between environments |
| Dialogs | `dialog-accept [text]`, `dialog-dismiss` | Control alert/confirm/prompt handling |
| Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows | | Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows |
| Multi-step | `chain` (JSON from stdin) | Batch commands in one call | | Multi-step | `chain` (JSON from stdin) | Batch commands in one call |
All selector arguments accept CSS selectors or `@ref` after `snapshot`. 40+ commands total. All selector arguments accept CSS selectors, `@e` refs after `snapshot`, or `@c` refs after `snapshot -C`. 50+ commands total.
## How it works ## How it works
@@ -60,11 +61,11 @@ browse/
│ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response │ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response
│ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright │ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright
│ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling │ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling
│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map │ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map + diff/annotate/-C
│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, etc.) │ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, is, dialog, etc.)
│ ├── write-commands.ts # Mutating commands (click, fill, select, navigate, etc.) │ ├── write-commands.ts # Mutating commands (click, fill, select, upload, dialog-accept, etc.)
│ ├── meta-commands.ts # Server management (status, stop, restart) │ ├── meta-commands.ts # Server management, chain, diff, snapshot routing
│ └── buffers.ts # Console + network log capture (in-memory + disk flush) │ └── buffers.ts # CircularBuffer<T> + console/network/dialog capture
├── test/ # Integration tests + HTML fixtures ├── test/ # Integration tests + HTML fixtures
└── dist/ └── dist/
└── browse # Compiled binary (~58MB, Bun --compile) └── browse # Compiled binary (~58MB, Bun --compile)
@@ -82,18 +83,28 @@ The browser's key innovation is ref-based element selection, built on Playwright
No DOM mutation. No injected scripts. Just Playwright's native accessibility API. No DOM mutation. No injected scripts. Just Playwright's native accessibility API.
**Extended snapshot features:**
- `--diff` (`-D`): Stores each snapshot as a baseline. On the next `-D` call, returns a unified diff showing what changed. Use this to verify that an action (click, fill, etc.) actually worked.
- `--annotate` (`-a`): Injects temporary overlay divs at each ref's bounding box, takes a screenshot with ref labels visible, then removes the overlays. Use `-o <path>` to control the output path.
- `--cursor-interactive` (`-C`): Scans for non-ARIA interactive elements (divs with `cursor:pointer`, `onclick`, `tabindex>=0`) using `page.evaluate`. Assigns `@c1`, `@c2`... refs with deterministic `nth-child` CSS selectors. These are elements the ARIA tree misses but users can still click.
### Authentication ### Authentication
Each server session generates a random UUID as a bearer token. The token is written to the state file (`/tmp/browse-server.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer <token>`. This prevents other processes on the machine from controlling the browser. Each server session generates a random UUID as a bearer token. The token is written to the state file (`/tmp/browse-server.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer <token>`. This prevents other processes on the machine from controlling the browser.
### Console and network capture ### Console, network, and dialog capture
The server hooks into Playwright's `page.on('console')` and `page.on('response')` events. All entries are kept in memory and flushed to disk every second: The server hooks into Playwright's `page.on('console')`, `page.on('response')`, and `page.on('dialog')` events. All entries are kept in O(1) circular buffers (50,000 capacity each) and flushed to disk asynchronously via `Bun.write()`:
- Console: `/tmp/browse-console.log` - Console: `/tmp/browse-console.log`
- Network: `/tmp/browse-network.log` - Network: `/tmp/browse-network.log`
- Dialog: `/tmp/browse-dialog.log`
The `console` and `network` commands read from the in-memory buffers, not disk. The `console`, `network`, and `dialog` commands read from the in-memory buffers, not disk.
### Dialog handling
Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.
### Multi-workspace support ### Multi-workspace support
@@ -184,7 +195,7 @@ bun test browse/test/commands # run command integration tests only
bun test browse/test/snapshot # run snapshot tests only bun test browse/test/snapshot # run snapshot tests only
``` ```
Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. Tests take ~3 seconds. Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. 148 tests across 2 files, ~15 seconds total.
### Source map ### Source map
@@ -193,11 +204,11 @@ Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fi
| `browse/src/cli.ts` | Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response. | | `browse/src/cli.ts` | Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response. |
| `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. | | `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. |
| `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. | | `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. |
| `browse/src/snapshot.ts` | Parses Playwright's accessibility tree, assigns `@ref` labels, builds Locator map. | | `browse/src/snapshot.ts` | Parses accessibility tree, assigns `@e`/`@c` refs, builds Locator map. Handles `--diff`, `--annotate`, `-C`. |
| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `forms`, etc. | | `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `is`, `dialog`, `forms`, etc. Exports `getCleanText()`. |
| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `select`, `scroll`, etc. | | `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `upload`, `dialog-accept`, `useragent` (with context recreation), etc. |
| `browse/src/meta-commands.ts` | Server management: `status`, `stop`, `restart`. | | `browse/src/meta-commands.ts` | Server management, chain routing, diff (DRY via `getCleanText`), snapshot delegation. |
| `browse/src/buffers.ts` | In-memory + disk capture for console and network logs. | | `browse/src/buffers.ts` | `CircularBuffer<T>` (O(1) ring buffer) + console/network/dialog capture with async disk flush. |
### Deploying to the active skill ### Deploying to the active skill

464
SKILL.md
View File

@@ -1,254 +1,324 @@
--- ---
name: gstack name: gstack
version: 1.0.0 version: 1.1.0
description: | description: |
Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL, Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
read page content, click elements, fill forms, run JavaScript, take screenshots, elements, verify page state, diff before/after actions, take annotated screenshots, check
inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after responsive layouts, test forms and uploads, handle dialogs, and assert element states.
first call. Use when you need to check a website, verify a deployment, read docs, ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
or interact with any web page. No MCP, no Chrome extension — just fast CLI. user flow, or file a bug with evidence.
allowed-tools: allowed-tools:
- Bash - Bash
- Read - Read
--- ---
# gstack: Persistent Browser for Claude Code # gstack browse: QA Testing & Dogfooding
Persistent headless Chromium daemon. First call auto-starts the server (~3s). Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.
Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle. Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, sessions).
## SETUP (run this check BEFORE any browse command) ## SETUP (run this check BEFORE any browse command)
Before using any browse command, find the skill and check if the binary exists:
```bash ```bash
# Check project-level first, then user-level B=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
if test -x .claude/skills/gstack/browse/dist/browse; then if [ -n "$B" ]; then
echo "READY_PROJECT" echo "READY: $B"
elif test -x ~/.claude/skills/gstack/browse/dist/browse; then
echo "READY_USER"
else else
echo "NEEDS_SETUP" echo "NEEDS_SETUP"
fi fi
``` ```
Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist.
If `NEEDS_SETUP`: If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response. 1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run: 2. Run: `cd <SKILL_DIR> && ./setup`
```bash 3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
cd <SKILL_DIR> && ./setup
```
3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
Once setup is done, it never needs to run again (the compiled binary persists).
## IMPORTANT ## IMPORTANT
- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user). - Use the compiled binary via Bash: `$B <command>`
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable. - NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
- The browser persists between calls — cookies, tabs, and state carry over. - Browser persists between calls — cookies, login sessions, and tabs carry over.
- The server auto-starts on first command. No setup needed. - Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup.
## Quick Reference ## QA Workflows
### Test a user flow (login, signup, checkout, etc.)
```bash ```bash
B=~/.claude/skills/gstack/browse/dist/browse B=~/.claude/skills/gstack/browse/dist/browse
# Navigate to a page # 1. Go to the page
$B goto https://example.com $B goto https://app.example.com/login
# Read cleaned page text # 2. See what's interactive
$B text
# Take a screenshot (then Read the image)
$B screenshot /tmp/page.png
# Snapshot: accessibility tree with refs
$B snapshot -i $B snapshot -i
# Click by ref (after snapshot) # 3. Fill the form using refs
$B click @e3 $B fill @e3 "test@example.com"
$B fill @e4 "password123"
$B click @e5
# Fill by ref # 4. Verify it worked
$B fill @e4 "test@test.com" $B snapshot -D # diff shows what changed after clicking
$B is visible ".dashboard" # assert the dashboard appeared
# Run JavaScript $B screenshot /tmp/after-login.png
$B js "document.title"
# Get all links
$B links
# Click by CSS selector
$B click "button.submit"
# Fill a form by CSS selector
$B fill "#email" "test@test.com"
$B fill "#password" "abc123"
$B click "button[type=submit]"
# Get HTML of an element
$B html "main"
# Get computed CSS
$B css "body" "font-family"
# Get element attributes
$B attrs "nav"
# Wait for element to appear
$B wait ".loaded"
# Accessibility tree
$B accessibility
# Set viewport
$B viewport 375x812
# Set cookies / headers
$B cookie "session=abc123"
$B header "Authorization:Bearer token123"
``` ```
## Command Reference ### Verify a deployment / check prod
### Navigation ```bash
``` $B goto https://yourapp.com
browse goto <url> Navigate current tab $B text # read the page — does it load?
browse back Go back $B console # any JS errors?
browse forward Go forward $B network # any failed requests?
browse reload Reload page $B js "document.title" # correct title?
browse url Print current URL $B is visible ".hero-section" # key elements present?
$B screenshot /tmp/prod-check.png
``` ```
### Content extraction ### Dogfood a feature end-to-end
```
browse text Cleaned page text (no scripts/styles) ```bash
browse html [selector] innerHTML of element, or full page HTML # Navigate to the feature
browse links All links as "text → href" $B goto https://app.example.com/new-feature
browse forms All forms + fields as JSON
browse accessibility Accessibility tree snapshot (ARIA) # Take annotated screenshot — shows every interactive element with labels
$B snapshot -i -a -o /tmp/feature-annotated.png
# Find ALL clickable things (including divs with cursor:pointer)
$B snapshot -C
# Walk through the flow
$B snapshot -i # baseline
$B click @e3 # interact
$B snapshot -D # what changed? (unified diff)
# Check element states
$B is visible ".success-toast"
$B is enabled "#next-step-btn"
$B is checked "#agree-checkbox"
# Check console for errors after interactions
$B console
``` ```
### Snapshot (ref-based element selection) ### Test responsive layouts
```
browse snapshot Full accessibility tree with @refs ```bash
browse snapshot -i Interactive elements only (buttons, links, inputs) # Quick: 3 screenshots at mobile/tablet/desktop
browse snapshot -c Compact (no empty structural elements) $B goto https://yourapp.com
browse snapshot -d <N> Limit depth to N levels $B responsive /tmp/layout
browse snapshot -s <sel> Scope to CSS selector
# Manual: specific viewport
$B viewport 375x812 # iPhone
$B screenshot /tmp/mobile.png
$B viewport 1440x900 # Desktop
$B screenshot /tmp/desktop.png
``` ```
After snapshot, use @refs as selectors in any command: ### Test file upload
```bash
$B goto https://app.example.com/upload
$B snapshot -i
$B upload @e3 /path/to/test-file.pdf
$B is visible ".upload-success"
$B screenshot /tmp/upload-result.png
``` ```
browse click @e3 Click the element assigned ref @e3
browse fill @e4 "value" Fill the input assigned ref @e4 ### Test forms with validation
browse hover @e1 Hover the element assigned ref @e1
browse html @e2 Get innerHTML of ref @e2 ```bash
browse css @e5 "color" Get computed CSS of ref @e5 $B goto https://app.example.com/form
browse attrs @e6 Get attributes of ref @e6 $B snapshot -i
# Submit empty — check validation errors appear
$B click @e10 # submit button
$B snapshot -D # diff shows error messages appeared
$B is visible ".error-message"
# Fill and resubmit
$B fill @e3 "valid input"
$B click @e10
$B snapshot -D # diff shows errors gone, success state
```
### Test dialogs (delete confirmations, prompts)
```bash
# Set up dialog handling BEFORE triggering
$B dialog-accept # will auto-accept next alert/confirm
$B click "#delete-button" # triggers confirmation dialog
$B dialog # see what dialog appeared
$B snapshot -D # verify the item was deleted
# For prompts that need input
$B dialog-accept "my answer" # accept with text
$B click "#rename-button" # triggers prompt
```
### Compare two pages / environments
```bash
$B diff https://staging.app.com https://prod.app.com
```
### Multi-step chain (efficient for long flows)
```bash
echo '[
["goto","https://app.example.com"],
["snapshot","-i"],
["fill","@e3","test@test.com"],
["fill","@e4","password"],
["click","@e5"],
["snapshot","-D"],
["screenshot","/tmp/result.png"]
]' | $B chain
```
## Quick Assertion Patterns
```bash
# Element exists and is visible
$B is visible ".modal"
# Button is enabled/disabled
$B is enabled "#submit-btn"
$B is disabled "#submit-btn"
# Checkbox state
$B is checked "#agree"
# Input is editable
$B is editable "#name-field"
# Element has focus
$B is focused "#search-input"
# Page contains text
$B js "document.body.textContent.includes('Success')"
# Element count
$B js "document.querySelectorAll('.list-item').length"
# Specific attribute value
$B attrs "#logo" # returns all attributes as JSON
# CSS property
$B css ".button" "background-color"
```
## Snapshot System
The snapshot is your primary tool for understanding and interacting with pages.
```bash
$B snapshot -i # Interactive elements only (buttons, links, inputs) with @e refs
$B snapshot -c # Compact (no empty structural elements)
$B snapshot -d 3 # Limit depth to 3 levels
$B snapshot -s "main" # Scope to CSS selector
$B snapshot -D # Diff against previous snapshot (what changed?)
$B snapshot -a # Annotated screenshot with ref labels
$B snapshot -o /tmp/x.png # Output path for annotated screenshot
$B snapshot -C # Cursor-interactive elements (@c refs — divs with pointer, onclick)
```
Combine flags: `$B snapshot -i -a -C -o /tmp/annotated.png`
After snapshot, use @refs everywhere:
```bash
$B click @e3 $B fill @e4 "value" $B hover @e1
$B html @e2 $B css @e5 "color" $B attrs @e6
$B click @c1 # cursor-interactive ref (from -C)
``` ```
Refs are invalidated on navigation — run `snapshot` again after `goto`. Refs are invalidated on navigation — run `snapshot` again after `goto`.
## Command Reference
### Navigation
| Command | Description |
|---------|-------------|
| `goto <url>` | Navigate to URL |
| `back` / `forward` | History navigation |
| `reload` | Reload page |
| `url` | Print current URL |
### Reading
| Command | Description |
|---------|-------------|
| `text` | Cleaned page text |
| `html [selector]` | innerHTML |
| `links` | All links as "text -> href" |
| `forms` | Forms + fields as JSON |
| `accessibility` | Full ARIA tree |
### Interaction ### Interaction
``` | Command | Description |
browse click <selector> Click element (CSS selector or @ref) |---------|-------------|
browse fill <selector> <value> Fill input field | `click <sel>` | Click element |
browse select <selector> <val> Select dropdown value | `fill <sel> <val>` | Fill input |
browse hover <selector> Hover over element | `select <sel> <val>` | Select dropdown |
browse type <text> Type into focused element | `hover <sel>` | Hover element |
browse press <key> Press key (Enter, Tab, Escape, etc.) | `type <text>` | Type into focused element |
browse scroll [selector] Scroll element into view, or page bottom | `press <key>` | Press key (Enter, Tab, Escape) |
browse wait <selector> Wait for element to appear (max 10s) | `scroll [sel]` | Scroll element into view |
browse viewport <WxH> Set viewport size (e.g. 375x812) | `wait <sel>` | Wait for element (max 10s) |
``` | `wait --networkidle` | Wait for network to be idle |
| `wait --load` | Wait for page load event |
| `upload <sel> <file...>` | Upload file(s) |
| `cookie-import <json>` | Import cookies from JSON file |
| `dialog-accept [text]` | Auto-accept dialogs |
| `dialog-dismiss` | Auto-dismiss dialogs |
| `viewport <WxH>` | Set viewport size |
### Inspection ### Inspection
``` | Command | Description |
browse js <expression> Run JS, print result |---------|-------------|
browse eval <js-file> Run JS file against page | `js <expr>` | Run JavaScript |
browse css <selector> <prop> Get computed CSS property | `eval <file>` | Run JS file |
browse attrs <selector> Get element attributes as JSON | `css <sel> <prop>` | Computed CSS |
browse console Dump captured console messages | `attrs <sel>` | Element attributes |
browse console --clear Clear console buffer | `is <prop> <sel>` | State check (visible/hidden/enabled/disabled/checked/editable/focused) |
browse network Dump captured network requests | `console [--clear\|--errors]` | Console messages (--errors filters to error/warning) |
browse network --clear Clear network buffer | `network [--clear]` | Network requests |
browse cookies Dump all cookies as JSON | `dialog [--clear]` | Dialog messages |
browse storage localStorage + sessionStorage as JSON | `cookies` | All cookies |
browse storage set <key> <val> Set localStorage value | `storage` | localStorage + sessionStorage |
browse perf Page load performance timings | `perf` | Page load timings |
```
### Visual ### Visual
``` | Command | Description |
browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png) |---------|-------------|
browse pdf [path] Save as PDF | `screenshot [path]` | Screenshot |
browse responsive [prefix] Screenshots at mobile/tablet/desktop | `pdf [path]` | Save as PDF |
``` | `responsive [prefix]` | Mobile/tablet/desktop screenshots |
| `diff <url1> <url2>` | Text diff between pages |
### Compare
```
browse diff <url1> <url2> Text diff between two pages
```
### Multi-step (chain)
```
echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
```
### Tabs ### Tabs
``` | Command | Description |
browse tabs List tabs (id, url, title) |---------|-------------|
browse tab <id> Switch to tab | `tabs` | List tabs |
browse newtab [url] Open new tab | `tab <id>` | Switch tab |
browse closetab [id] Close tab | `newtab [url]` | Open tab |
``` | `closetab [id]` | Close tab |
### Server management ### Server
``` | Command | Description |
browse status Server health, uptime, tab count |---------|-------------|
browse stop Shutdown server | `status` | Health check |
browse restart Kill + restart server | `stop` | Shutdown |
``` | `restart` | Restart |
## Speed Rules ## Tips
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly. 1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `screenshot` all hit the loaded page instantly.
2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors. 2. **Use `snapshot -i` first.** See all interactive elements, then click/fill by ref. No CSS selector guessing.
3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text. 3. **Use `snapshot -D` to verify.** Baseline → action → diff. See exactly what changed.
4. **Use `links` to survey.** Faster than `text` when you just need navigation structure. 4. **Use `is` for assertions.** `is visible .modal` is faster and more reliable than parsing page text.
5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step. 5. **Use `snapshot -a` for evidence.** Annotated screenshots are great for bug reports.
6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots. 6. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
7. **Check `console` after actions.** Catch JS errors that don't surface visually.
## When to Use What 8. **Use `chain` for long flows.** Single command, no per-step CLI overhead.
| Task | Commands |
|------|----------|
| Read a page | `goto <url>` then `text` |
| Interact with elements | `snapshot -i` then `click @e3` |
| Check if element exists | `js "!!document.querySelector('.thing')"` |
| Extract specific data | `js "document.querySelector('.price').textContent"` |
| Visual check | `screenshot /tmp/x.png` then Read the image |
| Fill and submit form | `snapshot -i``fill @e4 "val"``click @e5``screenshot` |
| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
| Inspect DOM | `html "selector"` or `attrs @e3` |
| Debug console errors | `console` |
| Check network requests | `network` |
| Check local dev | `goto http://127.0.0.1:3000` |
| Compare two pages | `diff <url1> <url2>` |
| Mobile layout check | `responsive /tmp/prefix` |
| Multi-step flow | `echo '[...]' \| browse chain` |
## Architecture
- Persistent Chromium daemon on localhost (port 9400-9410)
- Bearer token auth per session
- State file: `/tmp/browse-server.json`
- Console log: `/tmp/browse-console.log`
- Network log: `/tmp/browse-network.log`
- Auto-shutdown after 30 min idle
- Chromium crash → server exits → auto-restarts on next command

View File

@@ -1,254 +1,128 @@
--- ---
name: browse name: browse
version: 1.0.0 version: 1.1.0
description: | description: |
Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL, Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
read page content, click elements, fill forms, run JavaScript, take screenshots, elements, verify page state, diff before/after actions, take annotated screenshots, check
inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after responsive layouts, test forms and uploads, handle dialogs, and assert element states.
first call. Use when you need to check a website, verify a deployment, read docs, ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
or interact with any web page. No MCP, no Chrome extension — just fast CLI. user flow, or file a bug with evidence.
allowed-tools: allowed-tools:
- Bash - Bash
- Read - Read
--- ---
# gstack: Persistent Browser for Claude Code # browse: QA Testing & Dogfooding
Persistent headless Chromium daemon. First call auto-starts the server (~3s). Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, login sessions).
## SETUP (run this check BEFORE any browse command) ## Core QA Patterns
Before using any browse command, find the skill and check if the binary exists:
### 1. Verify a page loads correctly
```bash ```bash
# Check project-level first, then user-level $B goto https://yourapp.com
if test -x .claude/skills/gstack/browse/dist/browse; then $B text # content loads?
echo "READY_PROJECT" $B console # JS errors?
elif test -x ~/.claude/skills/gstack/browse/dist/browse; then $B network # failed requests?
echo "READY_USER" $B is visible ".main-content" # key elements present?
else
echo "NEEDS_SETUP"
fi
``` ```
Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist. ### 2. Test a user flow
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response.
2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run:
```bash ```bash
cd <SKILL_DIR> && ./setup $B goto https://app.com/login
$B snapshot -i # see all interactive elements
$B fill @e3 "user@test.com"
$B fill @e4 "password"
$B click @e5 # submit
$B snapshot -D # diff: what changed after submit?
$B is visible ".dashboard" # success state present?
``` ```
3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
Once setup is done, it never needs to run again (the compiled binary persists).
## IMPORTANT
- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user).
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
- The browser persists between calls — cookies, tabs, and state carry over.
- The server auto-starts on first command. No setup needed.
## Quick Reference
### 3. Verify an action worked
```bash ```bash
B=~/.claude/skills/gstack/browse/dist/browse $B snapshot # baseline
$B click @e3 # do something
# Navigate to a page $B snapshot -D # unified diff shows exactly what changed
$B goto https://example.com
# Read cleaned page text
$B text
# Take a screenshot (then Read the image)
$B screenshot /tmp/page.png
# Snapshot: accessibility tree with refs
$B snapshot -i
# Click by ref (after snapshot)
$B click @e3
# Fill by ref
$B fill @e4 "test@test.com"
# Run JavaScript
$B js "document.title"
# Get all links
$B links
# Click by CSS selector
$B click "button.submit"
# Fill a form by CSS selector
$B fill "#email" "test@test.com"
$B fill "#password" "abc123"
$B click "button[type=submit]"
# Get HTML of an element
$B html "main"
# Get computed CSS
$B css "body" "font-family"
# Get element attributes
$B attrs "nav"
# Wait for element to appear
$B wait ".loaded"
# Accessibility tree
$B accessibility
# Set viewport
$B viewport 375x812
# Set cookies / headers
$B cookie "session=abc123"
$B header "Authorization:Bearer token123"
``` ```
## Command Reference ### 4. Visual evidence for bug reports
```bash
### Navigation $B snapshot -i -a -o /tmp/annotated.png # labeled screenshot
``` $B screenshot /tmp/bug.png # plain screenshot
browse goto <url> Navigate current tab $B console # error log
browse back Go back
browse forward Go forward
browse reload Reload page
browse url Print current URL
``` ```
### Content extraction ### 5. Find all clickable elements (including non-ARIA)
``` ```bash
browse text Cleaned page text (no scripts/styles) $B snapshot -C # finds divs with cursor:pointer, onclick, tabindex
browse html [selector] innerHTML of element, or full page HTML $B click @c1 # interact with them
browse links All links as "text → href"
browse forms All forms + fields as JSON
browse accessibility Accessibility tree snapshot (ARIA)
``` ```
### Snapshot (ref-based element selection) ### 6. Assert element states
``` ```bash
browse snapshot Full accessibility tree with @refs $B is visible ".modal"
browse snapshot -i Interactive elements only (buttons, links, inputs) $B is enabled "#submit-btn"
browse snapshot -c Compact (no empty structural elements) $B is disabled "#submit-btn"
browse snapshot -d <N> Limit depth to N levels $B is checked "#agree-checkbox"
browse snapshot -s <sel> Scope to CSS selector $B is editable "#name-field"
$B is focused "#search-input"
$B js "document.body.textContent.includes('Success')"
``` ```
After snapshot, use @refs as selectors in any command: ### 7. Test responsive layouts
``` ```bash
browse click @e3 Click the element assigned ref @e3 $B responsive /tmp/layout # mobile + tablet + desktop screenshots
browse fill @e4 "value" Fill the input assigned ref @e4 $B viewport 375x812 # or set specific viewport
browse hover @e1 Hover the element assigned ref @e1 $B screenshot /tmp/mobile.png
browse html @e2 Get innerHTML of ref @e2
browse css @e5 "color" Get computed CSS of ref @e5
browse attrs @e6 Get attributes of ref @e6
``` ```
Refs are invalidated on navigation — run `snapshot` again after `goto`. ### 8. Test file uploads
```bash
### Interaction $B upload "#file-input" /path/to/file.pdf
``` $B is visible ".upload-success"
browse click <selector> Click element (CSS selector or @ref)
browse fill <selector> <value> Fill input field
browse select <selector> <val> Select dropdown value
browse hover <selector> Hover over element
browse type <text> Type into focused element
browse press <key> Press key (Enter, Tab, Escape, etc.)
browse scroll [selector] Scroll element into view, or page bottom
browse wait <selector> Wait for element to appear (max 10s)
browse viewport <WxH> Set viewport size (e.g. 375x812)
``` ```
### Inspection ### 9. Test dialogs
``` ```bash
browse js <expression> Run JS, print result $B dialog-accept "yes" # set up handler
browse eval <js-file> Run JS file against page $B click "#delete-button" # trigger dialog
browse css <selector> <prop> Get computed CSS property $B dialog # see what appeared
browse attrs <selector> Get element attributes as JSON $B snapshot -D # verify deletion happened
browse console Dump captured console messages
browse console --clear Clear console buffer
browse network Dump captured network requests
browse network --clear Clear network buffer
browse cookies Dump all cookies as JSON
browse storage localStorage + sessionStorage as JSON
browse storage set <key> <val> Set localStorage value
browse perf Page load performance timings
``` ```
### Visual ### 10. Compare environments
``` ```bash
browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png) $B diff https://staging.app.com https://prod.app.com
browse pdf [path] Save as PDF
browse responsive [prefix] Screenshots at mobile/tablet/desktop
``` ```
### Compare ## Snapshot Flags
``` ```
browse diff <url1> <url2> Text diff between two pages -i Interactive elements only (buttons, links, inputs)
-c Compact (no empty structural nodes)
-d <N> Limit depth
-s <sel> Scope to CSS selector
-D Diff against previous snapshot
-a Annotated screenshot with ref labels
-o <path> Output path for screenshot
-C Cursor-interactive elements (@c refs)
``` ```
### Multi-step (chain) Combine: `$B snapshot -i -a -C -o /tmp/annotated.png`
```
echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
```
### Tabs Use @refs after snapshot: `$B click @e3`, `$B fill @e4 "value"`, `$B click @c1`
```
browse tabs List tabs (id, url, title)
browse tab <id> Switch to tab
browse newtab [url] Open new tab
browse closetab [id] Close tab
```
### Server management ## Full Command List
```
browse status Server health, uptime, tab count
browse stop Shutdown server
browse restart Kill + restart server
```
## Speed Rules **Navigate:** goto, back, forward, reload, url
**Read:** text, html, links, forms, accessibility
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly. **Snapshot:** snapshot (with flags above)
2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors. **Interact:** click, fill, select, hover, type, press, scroll, wait, wait --networkidle, wait --load, viewport, upload, cookie-import, dialog-accept, dialog-dismiss
3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text. **Inspect:** js, eval, css, attrs, is, console, console --errors, network, dialog, cookies, storage, perf
4. **Use `links` to survey.** Faster than `text` when you just need navigation structure. **Visual:** screenshot, pdf, responsive
5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step. **Compare:** diff
6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots. **Multi-step:** chain (pipe JSON array)
**Tabs:** tabs, tab, newtab, closetab
## When to Use What **Server:** status, stop, restart
| Task | Commands |
|------|----------|
| Read a page | `goto <url>` then `text` |
| Interact with elements | `snapshot -i` then `click @e3` |
| Check if element exists | `js "!!document.querySelector('.thing')"` |
| Extract specific data | `js "document.querySelector('.price').textContent"` |
| Visual check | `screenshot /tmp/x.png` then Read the image |
| Fill and submit form | `snapshot -i``fill @e4 "val"``click @e5``screenshot` |
| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
| Inspect DOM | `html "selector"` or `attrs @e3` |
| Debug console errors | `console` |
| Check network requests | `network` |
| Check local dev | `goto http://127.0.0.1:3000` |
| Compare two pages | `diff <url1> <url2>` |
| Mobile layout check | `responsive /tmp/prefix` |
| Multi-step flow | `echo '[...]' \| browse chain` |
## Architecture
- Persistent Chromium daemon on localhost (port 9400-9410)
- Bearer token auth per session
- State file: `/tmp/browse-server.json`
- Console log: `/tmp/browse-console.log`
- Network log: `/tmp/browse-network.log`
- Auto-shutdown after 30 min idle
- Chromium crash → server exits → auto-restarts on next command