diff --git a/CHANGELOG.md b/CHANGELOG.md index 6cccadb9..9eeb30a1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,121 @@ # Changelog +## [1.28.0.0] - 2026-05-07 + +## **Browse handles real-world automation now: SOCKS5 with auth, container Xvfb, browser-native downloads. Plus a single-file `llms.txt` index agents can crawl in one read.** + +Five capabilities ship in one PR. Browse picks up `--proxy` (with an +embedded SOCKS5 bridge so Chromium can speak to authenticated +upstreams it can't speak to natively), `--headed` (auto-spawns Xvfb +on Linux containers without DISPLAY), and `download --navigate` (uses +the browser's native download handler for Content-Disposition, +multi-hop CDN redirects, and anti-bot CDN chains where +`page.request.fetch()` falls over). Stealth is narrowed to +`navigator.webdriver` masking only — modern fingerprinters punish +inconsistent fakes, so faking plugins/languages was making +detection easier, not harder. And `gstack/llms.txt` is now +auto-generated from the same source as every SKILL.md, so any agent +that reads `llms.txt` boots into the full surface (47 skills, 75 +browse commands) in one fetch. + +### The numbers that matter + +End-to-end verified via `bun test browse/test/{socks-bridge,proxy-config,proxy-redact,xvfb,stealth-webdriver,bridge-chromium-e2e}.test.ts test/llms-txt-shape.test.ts`: + +| Surface | Before | After | Δ | +|---|---|---|---| +| `browse --proxy` (SOCKS5 with auth) | not supported | works end-to-end | new capability | +| `browse --headed` on Linux without DISPLAY | not supported | auto-Xvfb on first free display | new capability | +| `download --navigate` (browser-native) | only `page.request.fetch()` | added native download path | new capability | +| `gstack/llms.txt` index for agents | none | 47 skills + 75 commands in 11KB | new capability | +| Bridge PID validation defenses | n/a | both `/proc//cmdline` AND start-time | full safety | +| Tests covering proxy + headed + navigate | 0 | 70+ tests across 7 files | from zero to comprehensive | + +The `bridge-chromium-e2e.test.ts` is the one that proves the feature +actually works: real Chromium launches with `proxy.server = +socks5://127.0.0.1:`, navigates to a local HTTP fixture, +and we assert the auth upstream's connect counter and the HTTP +fixture's hit counter both increment. Without that test we could +ship a working byte-relay and a broken Chromium integration and never +notice. + +### What this means for AI agents + +Any agent on any project can now hit any site. DDoS-Guard'd CDN +behind an auth-required residential SOCKS5 → `browse --proxy +socks5://user:pass@host:1080 --headed download /tmp/file +--navigate` and the file lands. Linux container without DISPLAY → +`--headed` auto-spawns Xvfb, no manual setup. The `llms.txt` index +makes discovery a one-fetch operation: agents stop scanning 47 +SKILL.md files and start with the right skill on the first try. + +### Itemized changes + +#### Added +- `browse --proxy ` flag. Supports SOCKS5 with username/password + auth, HTTP, and HTTPS. SOCKS5+auth runs through an embedded local + bridge (`browse/src/socks-bridge.ts`, ~250 LOC) bound to 127.0.0.1 + on an ephemeral port. The bridge handles the SOCKS5 auth handshake + so Chromium (which can't prompt for SOCKS5 creds) can still use + authenticated upstreams. +- Pre-flight `testUpstream()` runs before Chromium launches: 5s total + budget, 3 retries with 500ms backoff (handles VPN warm-up race). + On failure, exits 1 with a redacted error message — no confusing + "connection refused" on first navigation. +- `browse --headed` flag with auto-Xvfb on Linux. Walks the display + range (`:99`, `:100`, ...) until `xdpyinfo` says free; never + hardcodes `:99` and never unlinks `/tmp/.X-lock` for displays + it didn't create. Xvfb child PID + start-time + display recorded + in `~/.gstack/browse.json` so cleanup-on-disconnect can validate + ownership before signaling. Skips spawn when `WAYLAND_DISPLAY` is + set (Chromium uses Wayland natively). +- `download --navigate` flag (community PR #1355, attribution preserved). + Uses `page.waitForEvent('download')` and `page.goto(url, { + waitUntil: 'commit' })` instead of `page.request.fetch()`. + Required for sites where the download is triggered by browser + navigation (Content-Disposition headers, redirect chains, anti-bot + CDNs). +- `gstack/llms.txt` auto-generated from skill frontmatter and the + browse `COMMAND_DESCRIPTIONS` registry. Regenerates on every + `bun run gen:skill-docs`. Strict mode (used in tests) refuses any + skill missing `name` or `description` in its frontmatter. + +#### Changed +- Stealth narrowed to `navigator.webdriver` masking only. The + pre-existing `launchHeaded` patches that faked `navigator.plugins` + and `navigator.languages` were removed because modern + fingerprinters check those for consistency with `userAgent`/ + `platform`, and synthesized fixed values can flag MORE bot-like, + not less. The cdc_/__webdriver runtime cleanup and Permissions API + patch are kept — those remove ChromeDriver-injected artifacts + rather than synthesize natural-browser values. +- Browse daemon refuses to silently restart on `--proxy`/`--headed` + flag mismatch. Existing daemon with config A + new invocation with + config B → exits 1 with a `browse disconnect` hint. No silent + state loss. +- Cred policy: passing creds in BOTH the URL and `BROWSE_PROXY_USER`/ + `BROWSE_PROXY_PASS` env vars now fails fast with a clear error. + Silent override was a debugging trap. + +#### Fixed +- N/A — all-new code paths. + +#### For contributors +- New module boundary: `browse/src/socks-bridge.ts`, + `browse/src/proxy-config.ts`, `browse/src/proxy-redact.ts`, + `browse/src/xvfb.ts`, `browse/src/stealth.ts`. Each is small, + testable in isolation, and has matching `*.test.ts` coverage. +- 70+ new tests across 7 files. The `bridge-chromium-e2e.test.ts` + test launches real Chromium through the bridge and asserts the + request actually traversed it (upstream connect counter + HTTP + fixture hit counter both increment). +- `socks` npm dependency added (~30KB). +- Xvfb + x11-utils added to `.github/docker/Dockerfile.ci` so + `headed-xvfb`/`headed-orphan-cleanup` exercise the Linux container + path on every CI run instead of only manual smoke tests. +- Community PR #1355 from @garrytan-agents merged; attribution + preserved on the merging commit. + ## [1.27.1.0] - 2026-05-06 ## **Plan-mode reviews now refuse to dump findings without asking. Four gate-tier tests catch the regression on every PR.** diff --git a/SKILL.md b/SKILL.md index ddeee4be..c9070438 100644 --- a/SKILL.md +++ b/SKILL.md @@ -862,7 +862,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | Command | Description | |---------|-------------| | `archive [path]` | Save complete page as MHTML via CDP | -| `download [path] [--base64]` | Download URL or media element to disk using browser cookies | +| `download [path] [--base64] [--navigate]` | Download URL or media element to disk using browser cookies. Use --navigate for URLs that trigger browser downloads (CDN redirects, Content-Disposition, anti-bot protected sites) | | `scrape [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json | ### Interaction diff --git a/VERSION b/VERSION index a1f241e2..06513fc2 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.27.1.0 +1.28.0.0 diff --git a/browse/SKILL.md b/browse/SKILL.md index 7ebc3c62..fff54bcc 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -679,6 +679,42 @@ $B resume The browser preserves all state (cookies, localStorage, tabs) across the handoff. After `resume`, you get a fresh snapshot of wherever the user left off. +## Headed Mode + Proxy + Anti-Bot Sites + +For sites that block headless browsers, fingerprint Playwright defaults, or require routing through an authenticated SOCKS5 proxy (residential VPN, etc.), browse exposes three coordinated flags: + +```bash +# Headed mode — visible Chromium window. Auto-spawns Xvfb on Linux +# containers without DISPLAY (no extra setup needed on Debian/Ubuntu). +browse --headed goto https://example.com + +# SOCKS5 with auth (Chromium can't prompt for SOCKS5 creds itself — +# browse runs a local 127.0.0.1 bridge that handles the auth handshake). +browse --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com + +# HTTP/HTTPS proxy (passes through to Chromium directly): +browse --proxy http://corp-proxy:3128 goto https://example.com + +# Browser-triggered file download (Content-Disposition, redirect chain, +# anti-bot CDN — falls back from page.request.fetch() to browser native +# download handler): +browse download "https://protected.example.com/file" /tmp/file.bin --navigate + +# Combined: headed + proxy + navigate-download +browse --headed --proxy socks5://user:pass@host:1080 \ + download "https://protected.example.com/file" /tmp/file.bin --navigate +``` + +**Credential policy.** Pass creds via either the URL (`socks5://user:pass@host`) OR the env vars `BROWSE_PROXY_USER` and `BROWSE_PROXY_PASS` — never both. Browse refuses with a clear hint when both are set, because silent override creates "works on my machine" debugging traps. + +**Daemon discipline.** Browse runs as a long-lived daemon. `--proxy` and `--headed` change daemon-startup config, so they only apply on a fresh daemon. If a daemon is already running with different config, browse refuses and tells you to `browse disconnect` first. No silent restart that would drop tab state, cookies, or logged-in sessions. + +**Stealth.** When `--headed` or `--proxy` are set, browse masks `navigator.webdriver` (the obvious automation tell) via Chromium's `--disable-blink-features=AutomationControlled` plus a small init script. We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome` — modern fingerprinters check those for consistency, and synthesizing fixed values can flag MORE bot-like, not less. + +**Container support.** `--headed` on Linux without `DISPLAY` automatically picks a free X display (`:99`, `:100`, ...) and spawns Xvfb. Cleanup on `browse disconnect` validates the recorded PID's `/proc//cmdline` matches `Xvfb` AND start-time matches before sending any signal — no PID-reuse footguns. Standard Debian/Ubuntu containers work out of the box; minimal images (alpine, distroless) may also need fonts/dbus/gtk libs for headed Chromium to render. + +**Failure modes.** SOCKS5 upstream rejected or unreachable → fail-fast at startup with a redacted error after 3 retries (5s budget). Mid-stream upstream drop → browse kills the affected client connection only; no transport retries (which could corrupt browser traffic). Mismatched daemon config → exit 1 with a `browse disconnect` hint. + ## Snapshot Flags The snapshot is your primary tool for understanding and interacting with pages. @@ -786,7 +822,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | Command | Description | |---------|-------------| | `archive [path]` | Save complete page as MHTML via CDP | -| `download [path] [--base64]` | Download URL or media element to disk using browser cookies | +| `download [path] [--base64] [--navigate]` | Download URL or media element to disk using browser cookies. Use --navigate for URLs that trigger browser downloads (CDN redirects, Content-Disposition, anti-bot protected sites) | | `scrape [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json | ### Interaction diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index ec4fcad7..a466fc44 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -188,6 +188,42 @@ $B resume The browser preserves all state (cookies, localStorage, tabs) across the handoff. After `resume`, you get a fresh snapshot of wherever the user left off. +## Headed Mode + Proxy + Anti-Bot Sites + +For sites that block headless browsers, fingerprint Playwright defaults, or require routing through an authenticated SOCKS5 proxy (residential VPN, etc.), browse exposes three coordinated flags: + +```bash +# Headed mode — visible Chromium window. Auto-spawns Xvfb on Linux +# containers without DISPLAY (no extra setup needed on Debian/Ubuntu). +browse --headed goto https://example.com + +# SOCKS5 with auth (Chromium can't prompt for SOCKS5 creds itself — +# browse runs a local 127.0.0.1 bridge that handles the auth handshake). +browse --proxy socks5://user:pass@residential.proxy.host:1080 goto https://example.com + +# HTTP/HTTPS proxy (passes through to Chromium directly): +browse --proxy http://corp-proxy:3128 goto https://example.com + +# Browser-triggered file download (Content-Disposition, redirect chain, +# anti-bot CDN — falls back from page.request.fetch() to browser native +# download handler): +browse download "https://protected.example.com/file" /tmp/file.bin --navigate + +# Combined: headed + proxy + navigate-download +browse --headed --proxy socks5://user:pass@host:1080 \ + download "https://protected.example.com/file" /tmp/file.bin --navigate +``` + +**Credential policy.** Pass creds via either the URL (`socks5://user:pass@host`) OR the env vars `BROWSE_PROXY_USER` and `BROWSE_PROXY_PASS` — never both. Browse refuses with a clear hint when both are set, because silent override creates "works on my machine" debugging traps. + +**Daemon discipline.** Browse runs as a long-lived daemon. `--proxy` and `--headed` change daemon-startup config, so they only apply on a fresh daemon. If a daemon is already running with different config, browse refuses and tells you to `browse disconnect` first. No silent restart that would drop tab state, cookies, or logged-in sessions. + +**Stealth.** When `--headed` or `--proxy` are set, browse masks `navigator.webdriver` (the obvious automation tell) via Chromium's `--disable-blink-features=AutomationControlled` plus a small init script. We do NOT fake `navigator.plugins`, `navigator.languages`, or `window.chrome` — modern fingerprinters check those for consistency, and synthesizing fixed values can flag MORE bot-like, not less. + +**Container support.** `--headed` on Linux without `DISPLAY` automatically picks a free X display (`:99`, `:100`, ...) and spawns Xvfb. Cleanup on `browse disconnect` validates the recorded PID's `/proc//cmdline` matches `Xvfb` AND start-time matches before sending any signal — no PID-reuse footguns. Standard Debian/Ubuntu containers work out of the box; minimal images (alpine, distroless) may also need fonts/dbus/gtk libs for headed Chromium to render. + +**Failure modes.** SOCKS5 upstream rejected or unreachable → fail-fast at startup with a redacted error after 3 retries (5s budget). Mid-stream upstream drop → browse kills the affected client connection only; no transport retries (which could corrupt browser traffic). Mismatched daemon config → exit 1 with a `browse disconnect` hint. + ## Snapshot Flags {{SNAPSHOT_FLAGS}}