Files
gstack/browse/src/commands.ts
Garry Tan b73f364411 feat: browser data platform for AI agents (v0.16.0.0) (#907)
* refactor: extract path-security.ts shared module

validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated
across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract
to a single shared module with re-exports for backward compatibility.

Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR
only, not cwd, to prevent remote agents from reading project files).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: default paired agents to full access, split SCOPE_CONTROL

The trust boundary for paired agents is the pairing ceremony itself, not
the scope. An agent with write scope can already click anything and navigate
anywhere. Gating js/cookies behind --admin was security theater.

Changes:
- Default pair scopes: read+write+admin+meta (was read+write)
- New SCOPE_CONTROL for browser-wide destructive ops (stop, restart,
  disconnect, state, handoff, resume, connect)
- --admin flag now grants control scope (backward compat)
- New --restrict flag for limited access (e.g., --restrict read)
- Updated hint text: "re-pair with --control" instead of "--admin"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add media and data commands for page content extraction

media command: discovers all img/video/audio/background-image elements
on the page. Returns JSON with URLs, dimensions, srcset, loading state,
HLS/DASH detection. Supports --images/--videos/--audio filters and
optional CSS selector scoping.

data command: extracts structured data embedded in pages (JSON-LD,
Open Graph, Twitter Cards, meta tags). One command returns product
prices, article metadata, social share info without DOM scraping.

Both are READ scope with untrusted content wrapping.
Shared media-extract.ts helper for reuse by the upcoming scrape command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add download, scrape, and archive commands

download: fetch any URL or @ref element to disk using browser session
cookies via page.request.fetch(). Supports blob: URLs via in-page
base64 conversion. --base64 flag returns inline data URI (cap 10MB).
Detects HLS/DASH and rejects with yt-dlp hint.

scrape: bulk media download composing media discovery + download loop.
Sequential with 100ms delay, URL deduplication, configurable --limit.
Writes manifest.json with per-file metadata for machine consumption.

archive: saves complete page as MHTML via CDP Page.captureSnapshot.
No silent fallback -- errors clearly if CDP unavailable.

All three are WRITE scope (write to disk, blocked in watch mode).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add GET /file endpoint for remote agent file retrieval

Remote paired agents can now retrieve downloaded files over HTTP.
TEMP_DIR only (not cwd) to prevent project file exfiltration.

- Bearer token auth (root or scoped with read scope)
- Path validation via validateTempPath() (symlink-aware)
- 200MB size cap
- Extension-based MIME detection
- Zero-copy streaming via Bun.file()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add scroll --times N for automated repeated scrolling

Extends the scroll command with --times N flag for infinite feed
scraping. Scrolls N times with configurable --wait delay (default
1000ms) between each scroll for content loading.

Usage: scroll --times 10
       scroll --times 5 --wait 2000
       scroll --times 3 .feed-container

Composable with scrape: scroll to load content, then scrape images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add network response body capture (--capture/--export/--bodies)

The killer feature for social media scraping. Extends the existing
network command to intercept API response bodies:

  network --capture [--filter graphql]  # start capturing
  network --capture stop                # stop
  network --export /tmp/api.jsonl       # export as JSONL
  network --bodies                      # show summary

Uses page.on('response') listener with URL pattern filtering.
SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest
entries when full. Binary responses stored as base64, text as-is.

This lets agents tap Instagram's GraphQL API, TikTok's hydration
data, and any SPA's internal API responses instead of fragile DOM
scraping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add screenshot --base64 for inline image return

Returns data:image/png;base64,... instead of writing to disk.
Cap at 10MB. Works with all screenshot modes (element, clip, viewport).

Eliminates the two-step screenshot+file-serve dance for remote agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add data platform tests and media fixture

Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath
(TEMP_DIR only, rejects cwd), command registration (all new commands in
correct scope sets), and MIME mapping source checks.

Rich HTML fixture with: standard images, lazy-loaded images, srcset,
video with sources + HLS, audio, CSS background-images, JSON-LD,
Open Graph, Twitter Cards, and meta tags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: regenerate SKILL.md with Extraction category

Add Extraction category to browse command table ordering. Regenerate
SKILL.md files to include media, data, download, scrape, archive
commands in the generated documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.16.0.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 00:41:55 -07:00

160 lines
12 KiB
TypeScript

/**
* Command registry — single source of truth for all browse commands.
*
* Dependency graph:
* commands.ts ──▶ server.ts (runtime dispatch)
* ──▶ gen-skill-docs.ts (doc generation)
* ──▶ skill-parser.ts (validation)
* ──▶ skill-check.ts (health reporting)
*
* Zero side effects. Safe to import from build scripts and tests.
*/
export const READ_COMMANDS = new Set([
'text', 'html', 'links', 'forms', 'accessibility',
'js', 'eval', 'css', 'attrs',
'console', 'network', 'cookies', 'storage', 'perf',
'dialog', 'is',
'inspect',
'media', 'data',
]);
export const WRITE_COMMANDS = new Set([
'goto', 'back', 'forward', 'reload',
'click', 'fill', 'select', 'hover', 'type', 'press', 'scroll', 'wait',
'viewport', 'cookie', 'cookie-import', 'cookie-import-browser', 'header', 'useragent',
'upload', 'dialog-accept', 'dialog-dismiss',
'style', 'cleanup', 'prettyscreenshot',
'download', 'scrape', 'archive',
]);
export const META_COMMANDS = new Set([
'tabs', 'tab', 'newtab', 'closetab',
'status', 'stop', 'restart',
'screenshot', 'pdf', 'responsive',
'chain', 'diff',
'url', 'snapshot',
'handoff', 'resume',
'connect', 'disconnect', 'focus',
'inbox',
'watch',
'state',
'frame',
]);
export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
/** Commands that return untrusted third-party page content */
export const PAGE_CONTENT_COMMANDS = new Set([
'text', 'html', 'links', 'forms', 'accessibility', 'attrs',
'console', 'dialog',
'media', 'data',
]);
/** Wrap output from untrusted-content commands with trust boundary markers */
export function wrapUntrustedContent(result: string, url: string): string {
// Sanitize URL: remove newlines to prevent marker injection via history.pushState
const safeUrl = url.replace(/[\n\r]/g, '').slice(0, 200);
// Escape marker strings in content to prevent boundary escape attacks
const safeResult = result.replace(/--- (BEGIN|END) UNTRUSTED EXTERNAL CONTENT/g, '--- $1 UNTRUSTED EXTERNAL C\u200BONTENT');
return `--- BEGIN UNTRUSTED EXTERNAL CONTENT (source: ${safeUrl}) ---\n${safeResult}\n--- END UNTRUSTED EXTERNAL CONTENT ---`;
}
export const COMMAND_DESCRIPTIONS: Record<string, { category: string; description: string; usage?: string }> = {
// Navigation
'goto': { category: 'Navigation', description: 'Navigate to URL', usage: 'goto <url>' },
'back': { category: 'Navigation', description: 'History back' },
'forward': { category: 'Navigation', description: 'History forward' },
'reload': { category: 'Navigation', description: 'Reload page' },
'url': { category: 'Navigation', description: 'Print current URL' },
// Reading
'text': { category: 'Reading', description: 'Cleaned page text' },
'html': { category: 'Reading', description: 'innerHTML of selector (throws if not found), or full page HTML if no selector given', usage: 'html [selector]' },
'links': { category: 'Reading', description: 'All links as "text → href"' },
'forms': { category: 'Reading', description: 'Form fields as JSON' },
'accessibility': { category: 'Reading', description: 'Full ARIA tree' },
'media': { category: 'Reading', description: 'All media elements (images, videos, audio) with URLs, dimensions, types', usage: 'media [--images|--videos|--audio] [selector]' },
'data': { category: 'Reading', description: 'Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags', usage: 'data [--jsonld|--og|--meta|--twitter]' },
// Inspection
'js': { category: 'Inspection', description: 'Run JavaScript expression and return result as string', usage: 'js <expr>' },
'eval': { category: 'Inspection', description: 'Run JavaScript from file and return result as string (path must be under /tmp or cwd)', usage: 'eval <file>' },
'css': { category: 'Inspection', description: 'Computed CSS value', usage: 'css <sel> <prop>' },
'attrs': { category: 'Inspection', description: 'Element attributes as JSON', usage: 'attrs <sel|@ref>' },
'is': { category: 'Inspection', description: 'State check (visible/hidden/enabled/disabled/checked/editable/focused)', usage: 'is <prop> <sel>' },
'console': { category: 'Inspection', description: 'Console messages (--errors filters to error/warning)', usage: 'console [--clear|--errors]' },
'network': { category: 'Inspection', description: 'Network requests', usage: 'network [--clear]' },
'dialog': { category: 'Inspection', description: 'Dialog messages', usage: 'dialog [--clear]' },
'cookies': { category: 'Inspection', description: 'All cookies as JSON' },
'storage': { category: 'Inspection', description: 'Read all localStorage + sessionStorage as JSON, or set <key> <value> to write localStorage', usage: 'storage [set k v]' },
'perf': { category: 'Inspection', description: 'Page load timings' },
// Interaction
'click': { category: 'Interaction', description: 'Click element', usage: 'click <sel>' },
'fill': { category: 'Interaction', description: 'Fill input', usage: 'fill <sel> <val>' },
'select': { category: 'Interaction', description: 'Select dropdown option by value, label, or visible text', usage: 'select <sel> <val>' },
'hover': { category: 'Interaction', description: 'Hover element', usage: 'hover <sel>' },
'type': { category: 'Interaction', description: 'Type into focused element', usage: 'type <text>' },
'press': { category: 'Interaction', description: 'Press key — Enter, Tab, Escape, ArrowUp/Down/Left/Right, Backspace, Delete, Home, End, PageUp, PageDown, or modifiers like Shift+Enter', usage: 'press <key>' },
'scroll': { category: 'Interaction', description: 'Scroll element into view, or scroll to page bottom if no selector', usage: 'scroll [sel]' },
'wait': { category: 'Interaction', description: 'Wait for element, network idle, or page load (timeout: 15s)', usage: 'wait <sel|--networkidle|--load>' },
'upload': { category: 'Interaction', description: 'Upload file(s)', usage: 'upload <sel> <file> [file2...]' },
'viewport':{ category: 'Interaction', description: 'Set viewport size', usage: 'viewport <WxH>' },
'cookie': { category: 'Interaction', description: 'Set cookie on current page domain', usage: 'cookie <name>=<value>' },
'cookie-import': { category: 'Interaction', description: 'Import cookies from JSON file', usage: 'cookie-import <json>' },
'cookie-import-browser': { category: 'Interaction', description: 'Import cookies from installed Chromium browsers (opens picker, or use --domain for direct import)', usage: 'cookie-import-browser [browser] [--domain d]' },
'header': { category: 'Interaction', description: 'Set custom request header (colon-separated, sensitive values auto-redacted)', usage: 'header <name>:<value>' },
'useragent': { category: 'Interaction', description: 'Set user agent', usage: 'useragent <string>' },
'dialog-accept': { category: 'Interaction', description: 'Auto-accept next alert/confirm/prompt. Optional text is sent as the prompt response', usage: 'dialog-accept [text]' },
'dialog-dismiss': { category: 'Interaction', description: 'Auto-dismiss next dialog' },
// Data extraction
'download': { category: 'Extraction', description: 'Download URL or media element to disk using browser cookies', usage: 'download <url|@ref> [path] [--base64]' },
'scrape': { category: 'Extraction', description: 'Bulk download all media from page. Writes manifest.json', usage: 'scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]' },
'archive': { category: 'Extraction', description: 'Save complete page as MHTML via CDP', usage: 'archive [path]' },
// Visual
'screenshot': { category: 'Visual', description: 'Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport)', usage: 'screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]' },
'pdf': { category: 'Visual', description: 'Save as PDF', usage: 'pdf [path]' },
'responsive': { category: 'Visual', description: 'Screenshots at mobile (375x812), tablet (768x1024), desktop (1280x720). Saves as {prefix}-mobile.png etc.', usage: 'responsive [prefix]' },
'diff': { category: 'Visual', description: 'Text diff between pages', usage: 'diff <url1> <url2>' },
// Tabs
'tabs': { category: 'Tabs', description: 'List open tabs' },
'tab': { category: 'Tabs', description: 'Switch to tab', usage: 'tab <id>' },
'newtab': { category: 'Tabs', description: 'Open new tab', usage: 'newtab [url]' },
'closetab':{ category: 'Tabs', description: 'Close tab', usage: 'closetab [id]' },
// Server
'status': { category: 'Server', description: 'Health check' },
'stop': { category: 'Server', description: 'Shutdown server' },
'restart': { category: 'Server', description: 'Restart server' },
// Meta
'snapshot':{ category: 'Snapshot', description: 'Accessibility tree with @e refs for element selection. Flags: -i interactive only, -c compact, -d N depth limit, -s sel scope, -D diff vs previous, -a annotated screenshot, -o path output, -C cursor-interactive @c refs', usage: 'snapshot [flags]' },
'chain': { category: 'Meta', description: 'Run commands from JSON stdin. Format: [["cmd","arg1",...],...]' },
// Handoff
'handoff': { category: 'Server', description: 'Open visible Chrome at current page for user takeover', usage: 'handoff [message]' },
'resume': { category: 'Server', description: 'Re-snapshot after user takeover, return control to AI', usage: 'resume' },
// Headed mode
'connect': { category: 'Server', description: 'Launch headed Chromium with Chrome extension', usage: 'connect' },
'disconnect': { category: 'Server', description: 'Disconnect headed browser, return to headless mode' },
'focus': { category: 'Server', description: 'Bring headed browser window to foreground (macOS)', usage: 'focus [@ref]' },
// Inbox
'inbox': { category: 'Meta', description: 'List messages from sidebar scout inbox', usage: 'inbox [--clear]' },
// Watch
'watch': { category: 'Meta', description: 'Passive observation — periodic snapshots while user browses', usage: 'watch [stop]' },
// State
'state': { category: 'Server', description: 'Save/load browser state (cookies + URLs)', usage: 'state save|load <name>' },
// Frame
'frame': { category: 'Meta', description: 'Switch to iframe context (or main to return)', usage: 'frame <sel|@ref|--name n|--url pattern|main>' },
// CSS Inspector
'inspect': { category: 'Inspection', description: 'Deep CSS inspection via CDP — full rule cascade, box model, computed styles', usage: 'inspect [selector] [--all] [--history]' },
'style': { category: 'Interaction', description: 'Modify CSS property on element (with undo support)', usage: 'style <sel> <prop> <value> | style --undo [N]' },
'cleanup': { category: 'Interaction', description: 'Remove page clutter (ads, cookie banners, sticky elements, social widgets)', usage: 'cleanup [--ads] [--cookies] [--sticky] [--social] [--all]' },
'prettyscreenshot': { category: 'Visual', description: 'Clean screenshot with optional cleanup, scroll positioning, and element hiding', usage: 'prettyscreenshot [--scroll-to sel|text] [--cleanup] [--hide sel...] [--width px] [path]' },
};
// Load-time validation: descriptions must cover exactly the command sets
const allCmds = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
const descKeys = new Set(Object.keys(COMMAND_DESCRIPTIONS));
for (const cmd of allCmds) {
if (!descKeys.has(cmd)) throw new Error(`COMMAND_DESCRIPTIONS missing entry for: ${cmd}`);
}
for (const key of descKeys) {
if (!allCmds.has(key)) throw new Error(`COMMAND_DESCRIPTIONS has unknown command: ${key}`);
}