mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-11 06:57:25 +08:00
docs: v1.28.0.0 — browse SKILL section + VERSION + CHANGELOG
VERSION 1.27.1.0 → 1.28.0.0 (MINOR — substantial new capability: five new flags/features, ~600 LOC added, new socks dep, multiple new modules). browse/SKILL.md.tmpl: new "Headed Mode + Proxy + Anti-Bot Sites" section between User Handoff and Snapshot Flags. Documents --headed (auto-Xvfb on Linux), --proxy (with embedded SOCKS5 bridge for auth), download --navigate, the cred-mixing policy, daemon-discipline (refuse-on-mismatch), the narrowed webdriver-only stealth, container support caveats, and the fail-fast/no-retry failure modes. CHANGELOG entry follows the release-summary format from CLAUDE.md: two-line headline, lead paragraph, "The numbers that matter" table tied to specific test files that prove each capability, "What this means for AI agents" closing tied to a real workflow shift, then itemized Added/Changed/Fixed/For-contributors sections. Browse SKILL.md regenerated via bun run gen:skill-docs. gstack/llms.txt regenerated automatically from the same pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
116
CHANGELOG.md
116
CHANGELOG.md
@@ -1,5 +1,121 @@
|
||||
# Changelog
|
||||
|
||||
## [1.28.0.0] - 2026-05-07
|
||||
|
||||
## **Browse handles real-world automation now: SOCKS5 with auth, container Xvfb, browser-native downloads. Plus a single-file `llms.txt` index agents can crawl in one read.**
|
||||
|
||||
Five capabilities ship in one PR. Browse picks up `--proxy` (with an
|
||||
embedded SOCKS5 bridge so Chromium can speak to authenticated
|
||||
upstreams it can't speak to natively), `--headed` (auto-spawns Xvfb
|
||||
on Linux containers without DISPLAY), and `download --navigate` (uses
|
||||
the browser's native download handler for Content-Disposition,
|
||||
multi-hop CDN redirects, and anti-bot CDN chains where
|
||||
`page.request.fetch()` falls over). Stealth is narrowed to
|
||||
`navigator.webdriver` masking only — modern fingerprinters punish
|
||||
inconsistent fakes, so faking plugins/languages was making
|
||||
detection easier, not harder. And `gstack/llms.txt` is now
|
||||
auto-generated from the same source as every SKILL.md, so any agent
|
||||
that reads `llms.txt` boots into the full surface (47 skills, 75
|
||||
browse commands) in one fetch.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
End-to-end verified via `bun test browse/test/{socks-bridge,proxy-config,proxy-redact,xvfb,stealth-webdriver,bridge-chromium-e2e}.test.ts test/llms-txt-shape.test.ts`:
|
||||
|
||||
| Surface | Before | After | Δ |
|
||||
|---|---|---|---|
|
||||
| `browse --proxy` (SOCKS5 with auth) | not supported | works end-to-end | new capability |
|
||||
| `browse --headed` on Linux without DISPLAY | not supported | auto-Xvfb on first free display | new capability |
|
||||
| `download --navigate` (browser-native) | only `page.request.fetch()` | added native download path | new capability |
|
||||
| `gstack/llms.txt` index for agents | none | 47 skills + 75 commands in 11KB | new capability |
|
||||
| Bridge PID validation defenses | n/a | both `/proc/<pid>/cmdline` AND start-time | full safety |
|
||||
| Tests covering proxy + headed + navigate | 0 | 70+ tests across 7 files | from zero to comprehensive |
|
||||
|
||||
The `bridge-chromium-e2e.test.ts` is the one that proves the feature
|
||||
actually works: real Chromium launches with `proxy.server =
|
||||
socks5://127.0.0.1:<bridgePort>`, navigates to a local HTTP fixture,
|
||||
and we assert the auth upstream's connect counter and the HTTP
|
||||
fixture's hit counter both increment. Without that test we could
|
||||
ship a working byte-relay and a broken Chromium integration and never
|
||||
notice.
|
||||
|
||||
### What this means for AI agents
|
||||
|
||||
Any agent on any project can now hit any site. DDoS-Guard'd CDN
|
||||
behind an auth-required residential SOCKS5 → `browse --proxy
|
||||
socks5://user:pass@host:1080 --headed download <url> /tmp/file
|
||||
--navigate` and the file lands. Linux container without DISPLAY →
|
||||
`--headed` auto-spawns Xvfb, no manual setup. The `llms.txt` index
|
||||
makes discovery a one-fetch operation: agents stop scanning 47
|
||||
SKILL.md files and start with the right skill on the first try.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Added
|
||||
- `browse --proxy <url>` flag. Supports SOCKS5 with username/password
|
||||
auth, HTTP, and HTTPS. SOCKS5+auth runs through an embedded local
|
||||
bridge (`browse/src/socks-bridge.ts`, ~250 LOC) bound to 127.0.0.1
|
||||
on an ephemeral port. The bridge handles the SOCKS5 auth handshake
|
||||
so Chromium (which can't prompt for SOCKS5 creds) can still use
|
||||
authenticated upstreams.
|
||||
- Pre-flight `testUpstream()` runs before Chromium launches: 5s total
|
||||
budget, 3 retries with 500ms backoff (handles VPN warm-up race).
|
||||
On failure, exits 1 with a redacted error message — no confusing
|
||||
"connection refused" on first navigation.
|
||||
- `browse --headed` flag with auto-Xvfb on Linux. Walks the display
|
||||
range (`:99`, `:100`, ...) until `xdpyinfo` says free; never
|
||||
hardcodes `:99` and never unlinks `/tmp/.X<n>-lock` for displays
|
||||
it didn't create. Xvfb child PID + start-time + display recorded
|
||||
in `~/.gstack/browse.json` so cleanup-on-disconnect can validate
|
||||
ownership before signaling. Skips spawn when `WAYLAND_DISPLAY` is
|
||||
set (Chromium uses Wayland natively).
|
||||
- `download --navigate` flag (community PR #1355, attribution preserved).
|
||||
Uses `page.waitForEvent('download')` and `page.goto(url, {
|
||||
waitUntil: 'commit' })` instead of `page.request.fetch()`.
|
||||
Required for sites where the download is triggered by browser
|
||||
navigation (Content-Disposition headers, redirect chains, anti-bot
|
||||
CDNs).
|
||||
- `gstack/llms.txt` auto-generated from skill frontmatter and the
|
||||
browse `COMMAND_DESCRIPTIONS` registry. Regenerates on every
|
||||
`bun run gen:skill-docs`. Strict mode (used in tests) refuses any
|
||||
skill missing `name` or `description` in its frontmatter.
|
||||
|
||||
#### Changed
|
||||
- Stealth narrowed to `navigator.webdriver` masking only. The
|
||||
pre-existing `launchHeaded` patches that faked `navigator.plugins`
|
||||
and `navigator.languages` were removed because modern
|
||||
fingerprinters check those for consistency with `userAgent`/
|
||||
`platform`, and synthesized fixed values can flag MORE bot-like,
|
||||
not less. The cdc_/__webdriver runtime cleanup and Permissions API
|
||||
patch are kept — those remove ChromeDriver-injected artifacts
|
||||
rather than synthesize natural-browser values.
|
||||
- Browse daemon refuses to silently restart on `--proxy`/`--headed`
|
||||
flag mismatch. Existing daemon with config A + new invocation with
|
||||
config B → exits 1 with a `browse disconnect` hint. No silent
|
||||
state loss.
|
||||
- Cred policy: passing creds in BOTH the URL and `BROWSE_PROXY_USER`/
|
||||
`BROWSE_PROXY_PASS` env vars now fails fast with a clear error.
|
||||
Silent override was a debugging trap.
|
||||
|
||||
#### Fixed
|
||||
- N/A — all-new code paths.
|
||||
|
||||
#### For contributors
|
||||
- New module boundary: `browse/src/socks-bridge.ts`,
|
||||
`browse/src/proxy-config.ts`, `browse/src/proxy-redact.ts`,
|
||||
`browse/src/xvfb.ts`, `browse/src/stealth.ts`. Each is small,
|
||||
testable in isolation, and has matching `*.test.ts` coverage.
|
||||
- 70+ new tests across 7 files. The `bridge-chromium-e2e.test.ts`
|
||||
test launches real Chromium through the bridge and asserts the
|
||||
request actually traversed it (upstream connect counter + HTTP
|
||||
fixture hit counter both increment).
|
||||
- `socks` npm dependency added (~30KB).
|
||||
- Xvfb + x11-utils added to `.github/docker/Dockerfile.ci` so
|
||||
`headed-xvfb`/`headed-orphan-cleanup` exercise the Linux container
|
||||
path on every CI run instead of only manual smoke tests.
|
||||
- Community PR #1355 from @garrytan-agents merged; attribution
|
||||
preserved on the merging commit.
|
||||
|
||||
## [1.27.1.0] - 2026-05-06
|
||||
|
||||
## **Plan-mode reviews now refuse to dump findings without asking. Four gate-tier tests catch the regression on every PR.**
|
||||
|
||||
2
SKILL.md
2
SKILL.md
@@ -862,7 +862,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `archive [path]` | Save complete page as MHTML via CDP |
|
||||
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
|
||||
| `download <url|@ref> [path] [--base64] [--navigate]` | Download URL or media element to disk using browser cookies. Use --navigate for URLs that trigger browser downloads (CDN redirects, Content-Disposition, anti-bot protected sites) |
|
||||
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
|
||||
|
||||
### Interaction
|
||||
|
||||
@@ -679,6 +679,42 @@ $B resume
|
||||
The browser preserves all state (cookies, localStorage, tabs) across the handoff.
|
||||
After `resume`, you get a fresh snapshot of wherever the user left off.
|
||||
|
||||
## Headed Mode + Proxy + Anti-Bot Sites
|
||||
|
||||
For sites that block headless browsers, fingerprint Playwright defaults, or require routing through an authenticated SOCKS5 proxy (residential VPN, etc.), browse exposes three coordinated flags:
|
||||
|
||||
```bash
|
||||
# Headed mode — visible Chromium window. Auto-spawns Xvfb on Linux
|
||||
# containers without DISPLAY (no extra setup needed on Debian/Ubuntu).
|
||||
browse --headed goto https://example.com
|
||||
|
||||
# SOCKS5 with auth (Chromium can't prompt for SOCKS5 creds itself —
|
||||
# browse runs a local 127.0.0.1 bridge that handles the auth handshake).
|
||||
browse --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com
|
||||
|
||||
# HTTP/HTTPS proxy (passes through to Chromium directly):
|
||||
browse --proxy http://corp-proxy:3128 goto https://example.com
|
||||
|
||||
# Browser-triggered file download (Content-Disposition, redirect chain,
|
||||
# anti-bot CDN — falls back from page.request.fetch() to browser native
|
||||
# download handler):
|
||||
browse download "https://protected.example.com/file" /tmp/file.bin --navigate
|
||||
|
||||
# Combined: headed + proxy + navigate-download
|
||||
browse --headed --proxy socks5://user:pass@host:1080 \
|
||||
download "https://protected.example.com/file" /tmp/file.bin --navigate
|
||||
```
|
||||
|
||||
**Credential policy.** Pass creds via either the URL (`socks5://user:pass@host`) OR the env vars `BROWSE_PROXY_USER` and `BROWSE_PROXY_PASS` — never both. Browse refuses with a clear hint when both are set, because silent override creates "works on my machine" debugging traps.
|
||||
|
||||
**Daemon discipline.** Browse runs as a long-lived daemon. `--proxy` and `--headed` change daemon-startup config, so they only apply on a fresh daemon. If a daemon is already running with different config, browse refuses and tells you to `browse disconnect` first. No silent restart that would drop tab state, cookies, or logged-in sessions.
|
||||
|
||||
**Stealth.** When `--headed` or `--proxy` are set, browse masks `navigator.webdriver` (the obvious automation tell) via Chromium's `--disable-blink-features=AutomationControlled` plus a small init script. We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome` — modern fingerprinters check those for consistency, and synthesizing fixed values can flag MORE bot-like, not less.
|
||||
|
||||
**Container support.** `--headed` on Linux without `DISPLAY` automatically picks a free X display (`:99`, `:100`, ...) and spawns Xvfb. Cleanup on `browse disconnect` validates the recorded PID's `/proc/<pid>/cmdline` matches `Xvfb` AND start-time matches before sending any signal — no PID-reuse footguns. Standard Debian/Ubuntu containers work out of the box; minimal images (alpine, distroless) may also need fonts/dbus/gtk libs for headed Chromium to render.
|
||||
|
||||
**Failure modes.** SOCKS5 upstream rejected or unreachable → fail-fast at startup with a redacted error after 3 retries (5s budget). Mid-stream upstream drop → browse kills the affected client connection only; no transport retries (which could corrupt browser traffic). Mismatched daemon config → exit 1 with a `browse disconnect` hint.
|
||||
|
||||
## Snapshot Flags
|
||||
|
||||
The snapshot is your primary tool for understanding and interacting with pages.
|
||||
@@ -786,7 +822,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `archive [path]` | Save complete page as MHTML via CDP |
|
||||
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
|
||||
| `download <url|@ref> [path] [--base64] [--navigate]` | Download URL or media element to disk using browser cookies. Use --navigate for URLs that trigger browser downloads (CDN redirects, Content-Disposition, anti-bot protected sites) |
|
||||
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
|
||||
|
||||
### Interaction
|
||||
|
||||
@@ -188,6 +188,42 @@ $B resume
|
||||
The browser preserves all state (cookies, localStorage, tabs) across the handoff.
|
||||
After `resume`, you get a fresh snapshot of wherever the user left off.
|
||||
|
||||
## Headed Mode + Proxy + Anti-Bot Sites
|
||||
|
||||
For sites that block headless browsers, fingerprint Playwright defaults, or require routing through an authenticated SOCKS5 proxy (residential VPN, etc.), browse exposes three coordinated flags:
|
||||
|
||||
```bash
|
||||
# Headed mode — visible Chromium window. Auto-spawns Xvfb on Linux
|
||||
# containers without DISPLAY (no extra setup needed on Debian/Ubuntu).
|
||||
browse --headed goto https://example.com
|
||||
|
||||
# SOCKS5 with auth (Chromium can't prompt for SOCKS5 creds itself —
|
||||
# browse runs a local 127.0.0.1 bridge that handles the auth handshake).
|
||||
browse --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com
|
||||
|
||||
# HTTP/HTTPS proxy (passes through to Chromium directly):
|
||||
browse --proxy http://corp-proxy:3128 goto https://example.com
|
||||
|
||||
# Browser-triggered file download (Content-Disposition, redirect chain,
|
||||
# anti-bot CDN — falls back from page.request.fetch() to browser native
|
||||
# download handler):
|
||||
browse download "https://protected.example.com/file" /tmp/file.bin --navigate
|
||||
|
||||
# Combined: headed + proxy + navigate-download
|
||||
browse --headed --proxy socks5://user:pass@host:1080 \
|
||||
download "https://protected.example.com/file" /tmp/file.bin --navigate
|
||||
```
|
||||
|
||||
**Credential policy.** Pass creds via either the URL (`socks5://user:pass@host`) OR the env vars `BROWSE_PROXY_USER` and `BROWSE_PROXY_PASS` — never both. Browse refuses with a clear hint when both are set, because silent override creates "works on my machine" debugging traps.
|
||||
|
||||
**Daemon discipline.** Browse runs as a long-lived daemon. `--proxy` and `--headed` change daemon-startup config, so they only apply on a fresh daemon. If a daemon is already running with different config, browse refuses and tells you to `browse disconnect` first. No silent restart that would drop tab state, cookies, or logged-in sessions.
|
||||
|
||||
**Stealth.** When `--headed` or `--proxy` are set, browse masks `navigator.webdriver` (the obvious automation tell) via Chromium's `--disable-blink-features=AutomationControlled` plus a small init script. We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome` — modern fingerprinters check those for consistency, and synthesizing fixed values can flag MORE bot-like, not less.
|
||||
|
||||
**Container support.** `--headed` on Linux without `DISPLAY` automatically picks a free X display (`:99`, `:100`, ...) and spawns Xvfb. Cleanup on `browse disconnect` validates the recorded PID's `/proc/<pid>/cmdline` matches `Xvfb` AND start-time matches before sending any signal — no PID-reuse footguns. Standard Debian/Ubuntu containers work out of the box; minimal images (alpine, distroless) may also need fonts/dbus/gtk libs for headed Chromium to render.
|
||||
|
||||
**Failure modes.** SOCKS5 upstream rejected or unreachable → fail-fast at startup with a redacted error after 3 retries (5s budget). Mid-stream upstream drop → browse kills the affected client connection only; no transport retries (which could corrupt browser traffic). Mismatched daemon config → exit 1 with a `browse disconnect` hint.
|
||||
|
||||
## Snapshot Flags
|
||||
|
||||
{{SNAPSHOT_FLAGS}}
|
||||
|
||||
Reference in New Issue
Block a user