mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-18 02:22:04 +08:00
VERSION 1.27.1.0 → 1.28.0.0 (MINOR — substantial new capability: five new flags/features, ~600 LOC added, new socks dep, multiple new modules). browse/SKILL.md.tmpl: new "Headed Mode + Proxy + Anti-Bot Sites" section between User Handoff and Snapshot Flags. Documents --headed (auto-Xvfb on Linux), --proxy (with embedded SOCKS5 bridge for auth), download --navigate, the cred-mixing policy, daemon-discipline (refuse-on-mismatch), the narrowed webdriver-only stealth, container support caveats, and the fail-fast/no-retry failure modes. CHANGELOG entry follows the release-summary format from CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table tied to specific test files that prove each capability, "What this means for AI agents" closing tied to a real workflow shift, then itemized Added/Changed/Fixed/For-contributors sections. Browse SKILL.md regenerated via bun run gen:skill-docs. gstack/llms.txt regenerated automatically from the same pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
258 lines
11 KiB
Cheetah
258 lines
11 KiB
Cheetah
---
|
|
name: browse
|
|
preamble-tier: 1
|
|
version: 1.1.0
|
|
description: |
|
|
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
|
|
elements, verify page state, diff before/after actions, take annotated screenshots, check
|
|
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
|
|
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
|
|
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
|
|
site", "take a screenshot", or "dogfood this". (gstack)
|
|
triggers:
|
|
- browse a page
|
|
- headless browser
|
|
- take page screenshot
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- AskUserQuestion
|
|
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
# browse: QA Testing & Dogfooding
|
|
|
|
Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
|
|
State persists between calls (cookies, tabs, login sessions).
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
## Core QA Patterns
|
|
|
|
### 1. Verify a page loads correctly
|
|
```bash
|
|
$B goto https://yourapp.com
|
|
$B text # content loads?
|
|
$B console # JS errors?
|
|
$B network # failed requests?
|
|
$B is visible ".main-content" # key elements present?
|
|
```
|
|
|
|
### 2. Test a user flow
|
|
```bash
|
|
$B goto https://app.com/login
|
|
$B snapshot -i # see all interactive elements
|
|
$B fill @e3 "user@test.com"
|
|
$B fill @e4 "password"
|
|
$B click @e5 # submit
|
|
$B snapshot -D # diff: what changed after submit?
|
|
$B is visible ".dashboard" # success state present?
|
|
```
|
|
|
|
### 3. Verify an action worked
|
|
```bash
|
|
$B snapshot # baseline
|
|
$B click @e3 # do something
|
|
$B snapshot -D # unified diff shows exactly what changed
|
|
```
|
|
|
|
### 4. Visual evidence for bug reports
|
|
```bash
|
|
$B snapshot -i -a -o /tmp/annotated.png # labeled screenshot
|
|
$B screenshot /tmp/bug.png # plain screenshot
|
|
$B console # error log
|
|
```
|
|
|
|
### 5. Find all clickable elements (including non-ARIA)
|
|
```bash
|
|
$B snapshot -C # finds divs with cursor:pointer, onclick, tabindex
|
|
$B click @c1 # interact with them
|
|
```
|
|
|
|
### 6. Assert element states
|
|
```bash
|
|
$B is visible ".modal"
|
|
$B is enabled "#submit-btn"
|
|
$B is disabled "#submit-btn"
|
|
$B is checked "#agree-checkbox"
|
|
$B is editable "#name-field"
|
|
$B is focused "#search-input"
|
|
$B js "document.body.textContent.includes('Success')"
|
|
```
|
|
|
|
### 7. Test responsive layouts
|
|
```bash
|
|
$B responsive /tmp/layout # mobile + tablet + desktop screenshots
|
|
$B viewport 375x812 # or set specific viewport
|
|
$B screenshot /tmp/mobile.png
|
|
```
|
|
|
|
### 8. Test file uploads
|
|
```bash
|
|
$B upload "#file-input" /path/to/file.pdf
|
|
$B is visible ".upload-success"
|
|
```
|
|
|
|
### 9. Test dialogs
|
|
```bash
|
|
$B dialog-accept "yes" # set up handler
|
|
$B click "#delete-button" # trigger dialog
|
|
$B dialog # see what appeared
|
|
$B snapshot -D # verify deletion happened
|
|
```
|
|
|
|
### 10. Compare environments
|
|
```bash
|
|
$B diff https://staging.app.com https://prod.app.com
|
|
```
|
|
|
|
### 11. Show screenshots to the user
|
|
After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.
|
|
|
|
### 12. Render local HTML (no HTTP server needed)
|
|
Two paths, pick the cleaner one:
|
|
```bash
|
|
# HTML file on disk → goto file:// (absolute, or cwd-relative)
|
|
$B goto file:///tmp/report.html
|
|
$B goto file://./docs/page.html # cwd-relative
|
|
$B goto file://~/Documents/page.html # home-relative
|
|
|
|
# HTML generated in memory → load-html reads the file into setContent
|
|
echo '<div class="tweet">hello</div>' > /tmp/tweet.html
|
|
$B load-html /tmp/tweet.html
|
|
```
|
|
|
|
`goto file://...` is usually cleaner (URL is saved in state, relative asset URLs resolve against the file's dir, scale changes replay naturally). `load-html` uses `page.setContent()` — URL stays `about:blank`, but the content survives `viewport --scale` via in-memory replay. Both are scoped to files under cwd or `$TMPDIR`.
|
|
|
|
### 13. Retina screenshots (deviceScaleFactor)
|
|
```bash
|
|
$B viewport 480x600 --scale 2 # 2x deviceScaleFactor
|
|
$B load-html /tmp/tweet.html # or: $B goto file://./tweet.html
|
|
$B screenshot /tmp/out.png --selector .tweet-card
|
|
# → /tmp/out.png is 2x the pixel dimensions of the element
|
|
```
|
|
Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode.
|
|
|
|
## Puppeteer → browse cheatsheet
|
|
|
|
Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow:
|
|
|
|
| Puppeteer | browse |
|
|
|---|---|
|
|
| `await page.goto(url)` | `$B goto <url>` |
|
|
| `await page.setContent(html)` | `$B load-html <file>` (or `$B goto file://<abs>`) |
|
|
| `await page.setViewport({width, height})` | `$B viewport WxH` |
|
|
| `await page.setViewport({width, height, deviceScaleFactor: 2})` | `$B viewport WxH --scale 2` |
|
|
| `await (await page.$('.x')).screenshot({path})` | `$B screenshot <path> --selector .x` |
|
|
| `await page.screenshot({fullPage: true, path})` | `$B screenshot <path>` (full page default) |
|
|
| `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot <path> --clip x,y,w,h` |
|
|
|
|
Worked example (the tweet-renderer flow — Puppeteer → browse):
|
|
|
|
```bash
|
|
# Generate HTML in memory, render at 2x scale, screenshot the tweet card.
|
|
echo '<div class="tweet-card" style="width:400px;height:200px;background:#1da1f2;color:white;padding:20px">hello</div>' > /tmp/tweet.html
|
|
$B viewport 480x600 --scale 2
|
|
$B load-html /tmp/tweet.html
|
|
$B screenshot /tmp/out.png --selector .tweet-card
|
|
# /tmp/out.png is 800x400 px, crisp (2x deviceScaleFactor).
|
|
```
|
|
|
|
Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`.
|
|
|
|
## User Handoff
|
|
|
|
When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor
|
|
login), hand off to the user:
|
|
|
|
```bash
|
|
# 1. Open a visible Chrome at the current page
|
|
$B handoff "Stuck on CAPTCHA at login page"
|
|
|
|
# 2. Tell the user what happened (via AskUserQuestion)
|
|
# "I've opened Chrome at the login page. Please solve the CAPTCHA
|
|
# and let me know when you're done."
|
|
|
|
# 3. When user says "done", re-snapshot and continue
|
|
$B resume
|
|
```
|
|
|
|
**When to use handoff:**
|
|
- CAPTCHAs or bot detection
|
|
- Multi-factor authentication (SMS, authenticator app)
|
|
- OAuth flows that require user interaction
|
|
- Complex interactions the AI can't handle after 3 attempts
|
|
|
|
The browser preserves all state (cookies, localStorage, tabs) across the handoff.
|
|
After `resume`, you get a fresh snapshot of wherever the user left off.
|
|
|
|
## Headed Mode + Proxy + Anti-Bot Sites
|
|
|
|
For sites that block headless browsers, fingerprint Playwright defaults, or require routing through an authenticated SOCKS5 proxy (residential VPN, etc.), browse exposes three coordinated flags:
|
|
|
|
```bash
|
|
# Headed mode — visible Chromium window. Auto-spawns Xvfb on Linux
|
|
# containers without DISPLAY (no extra setup needed on Debian/Ubuntu).
|
|
browse --headed goto https://example.com
|
|
|
|
# SOCKS5 with auth (Chromium can't prompt for SOCKS5 creds itself —
|
|
# browse runs a local 127.0.0.1 bridge that handles the auth handshake).
|
|
browse --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com
|
|
|
|
# HTTP/HTTPS proxy (passes through to Chromium directly):
|
|
browse --proxy http://corp-proxy:3128 goto https://example.com
|
|
|
|
# Browser-triggered file download (Content-Disposition, redirect chain,
|
|
# anti-bot CDN — falls back from page.request.fetch() to browser native
|
|
# download handler):
|
|
browse download "https://protected.example.com/file" /tmp/file.bin --navigate
|
|
|
|
# Combined: headed + proxy + navigate-download
|
|
browse --headed --proxy socks5://user:pass@host:1080 \
|
|
download "https://protected.example.com/file" /tmp/file.bin --navigate
|
|
```
|
|
|
|
**Credential policy.** Pass creds via either the URL (`socks5://user:pass@host`) OR the env vars `BROWSE_PROXY_USER` and `BROWSE_PROXY_PASS` — never both. Browse refuses with a clear hint when both are set, because silent override creates "works on my machine" debugging traps.
|
|
|
|
**Daemon discipline.** Browse runs as a long-lived daemon. `--proxy` and `--headed` change daemon-startup config, so they only apply on a fresh daemon. If a daemon is already running with different config, browse refuses and tells you to `browse disconnect` first. No silent restart that would drop tab state, cookies, or logged-in sessions.
|
|
|
|
**Stealth.** When `--headed` or `--proxy` are set, browse masks `navigator.webdriver` (the obvious automation tell) via Chromium's `--disable-blink-features=AutomationControlled` plus a small init script. We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome` — modern fingerprinters check those for consistency, and synthesizing fixed values can flag MORE bot-like, not less.
|
|
|
|
**Container support.** `--headed` on Linux without `DISPLAY` automatically picks a free X display (`:99`, `:100`, ...) and spawns Xvfb. Cleanup on `browse disconnect` validates the recorded PID's `/proc/<pid>/cmdline` matches `Xvfb` AND start-time matches before sending any signal — no PID-reuse footguns. Standard Debian/Ubuntu containers work out of the box; minimal images (alpine, distroless) may also need fonts/dbus/gtk libs for headed Chromium to render.
|
|
|
|
**Failure modes.** SOCKS5 upstream rejected or unreachable → fail-fast at startup with a redacted error after 3 retries (5s budget). Mid-stream upstream drop → browse kills the affected client connection only; no transport retries (which could corrupt browser traffic). Mismatched daemon config → exit 1 with a `browse disconnect` hint.
|
|
|
|
## Snapshot Flags
|
|
|
|
{{SNAPSHOT_FLAGS}}
|
|
|
|
## CSS Inspector & Style Modification
|
|
|
|
### Inspect element CSS
|
|
```bash
|
|
$B inspect .header # full CSS cascade for selector
|
|
$B inspect # latest picked element from sidebar
|
|
$B inspect --all # include user-agent stylesheet rules
|
|
$B inspect --history # show modification history
|
|
```
|
|
|
|
### Modify styles live
|
|
```bash
|
|
$B style .header background-color #1a1a1a # modify CSS property
|
|
$B style --undo # revert last change
|
|
$B style --undo 2 # revert specific change
|
|
```
|
|
|
|
### Clean screenshots
|
|
```bash
|
|
$B cleanup --all # remove ads, cookies, sticky, social
|
|
$B cleanup --ads --cookies # selective cleanup
|
|
$B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero.png
|
|
```
|
|
|
|
## Full Command List
|
|
|
|
{{COMMAND_REFERENCE}}
|