Files
gstack/browse/src/write-commands.ts
Garry Tan 7e96fe299b fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) (#988)
* fix(security): validateOutputPath symlink bypass — check file-level symlinks

validateOutputPath() previously only resolved symlinks on the parent directory.
A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is
/tmp, which is safe) but the write followed the symlink outside safe dirs.

Add lstatSync() check: if the target file exists and is a symlink, resolve
through it and verify the real target is within SAFE_DIRECTORIES. ENOENT
(file doesn't exist yet) falls through to the existing parent-dir check.

Closes #921

Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com>

* fix(security): shell injection in bin/ scripts — use env vars instead of interpolation

gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e
double-quoted blocks. A path containing quotes or backticks breaks the JS
string context, enabling arbitrary code execution.

Replace direct interpolation with environment variables (process.env).
Same fix applied to gstack-team-init which had the same pattern.

Systematic audit confirmed only these two scripts were vulnerable — all
other bin/ scripts already use stdin piping or env vars.

Closes #858

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): cookie-import path validation bypass + hardcoded /tmp

Two fixes:
1. cookie-import relative path bypass (#707): path.isAbsolute() gated the
   entire validation, so relative paths like "sensitive-file.json" bypassed
   the safe-directory check entirely. Now always resolves to absolute path
   with realpathSync for symlink resolution, matching validateOutputPath().

2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used
   /tmp directly instead of os.tmpdir(), breaking Windows support.

Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in
write-commands.ts (previously resolved implicitly through bundler).

Closes #852

Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com>

* fix(security): redact form fields with sensitive names, not just type=password

Form redaction only applied to type="password" fields. Hidden and text
fields named csrf_token, api_key, session_id, etc. were exposed unredacted
in LLM context, leaking secrets.

Extend redaction to check field name and id against sensitive patterns:
token, secret, key, password, credential, auth, jwt, session, csrf, sid,
api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME.

Closes #860

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): restrict session file permissions to owner-only

Design session files written to /tmp with default umask (0644) were
world-readable on shared systems. Sessions contain design prompts and
feedback history.

Set mode 0o600 (owner read/write only) on both create and update paths.

Closes #859

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): enforce frozen lockfile during setup

bun install without --frozen-lockfile resolves ^semver ranges from npm on
every run. If an attacker publishes a compromised compatible version of any
dependency, the next ./setup pulls it silently.

Add --frozen-lockfile with fallback to plain install (for fresh clones
where bun.lock may not exist yet). Matches the pattern already used in
the .agents/ generation block (line 237).

Closes #614

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci

chmod -R 1777 /tmp recursively sets sticky bit on files (no defined
behavior), not just the directory. Deduplicate to single chmod 1777 /tmp.

Closes #747

Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com>

* fix(security): learnings input validation + cross-project trust gate

Three fixes to the learnings system:

1. Input validation in gstack-learnings-log: type must be from allowed list,
   key must be alphanumeric, confidence must be 1-10 integer, source must
   be from allowed list. Prevents injection via malformed fields.

2. Prompt injection defense: insight field checked against 10 instruction-like
   patterns (ignore previous, system:, override, etc.). Rejected with clear
   error message.

3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings
   from other projects are filtered out. Only user-stated learnings cross
   project boundaries. Prevents silent prompt injection across codebases.

Also adds trusted field (true for user-stated source, false for AI-generated)
to enable the trust gate at read time.

Closes #841

Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com>

* feat(security): track cookie-imported domains and scope cookie imports

Foundation for origin-pinned JS execution (#616). Tracks which domains
cookies were imported from so the JS/eval commands can verify execution
stays within imported origins.

Changes:
- BrowserManager: new cookieImportedDomains Set with track/get/has methods
- cookie-import: tracks imported cookie domains after addCookies
- cookie-import-browser: tracks domains on --domain direct import
- cookie-import-browser --all: new explicit opt-in for all-domain import
  (previously implicit behavior, now requires deliberate flag)

Closes #615

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* feat(security): pin JS/eval execution to cookie-imported origins

When cookies have been imported for specific domains, block JS execution
on pages whose origin doesn't match. Prevents the attack chain:
1. Agent imports cookies for github.com
2. Prompt injection navigates to attacker.com
3. Agent runs js document.cookie → exfiltrates github cookies

assertJsOriginAllowed() checks the current page hostname against imported
cookie domains with subdomain matching (.github.com allows api.github.com).
When no cookies are imported, all origins allowed (nothing to protect).
about:blank and data: URIs are allowed (no cookies at risk).

Depends on #615 (cookie domain tracking).

Closes #616

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* feat(security): add persistent command audit log

Append-only JSONL audit trail for all browse server commands. Unlike
in-memory ring buffers, the audit log persists across restarts and is
never truncated. Each entry records: timestamp, command, args (truncated
to 200 chars), page origin, duration, status, error (truncated to 300
chars), hasCookies flag, connection mode.

All writes are best-effort — audit failures never block command execution.
Log stored at ~/.gstack/.browse/browse-audit.jsonl.

Closes #617

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass

URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe
(hex form), which was not in the blocklist. Similarly, ::169.254.169.254
normalizes to ::a9fe:a9fe.

Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught
by the direct hostname check in validateNavigationUrl.

Closes #739

Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com>

* chore: bump version and changelog (v0.16.4.0)

Security wave 3: 12 fixes, 7 contributors.
Cookie origin pinning, command audit log, domain tracking.
Symlink bypass, path validation, shell injection, form redaction,
learnings injection, IPv6 SSRF, session permissions, frozen lockfile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com>
Co-authored-by: Gus <garagon@users.noreply.github.com>
Co-authored-by: Toby Morning <urbantech@users.noreply.github.com>
Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com>
Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com>
Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com>
Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:49:37 -10:00

1164 lines
50 KiB
TypeScript

/**
* Write commands — navigate and interact with pages (side effects)
*
* goto, back, forward, reload, click, fill, select, hover, type,
* press, scroll, wait, viewport, cookie, header, useragent
*/
import type { TabSession } from './tab-session';
import type { BrowserManager } from './browser-manager';
import { findInstalledBrowsers, importCookies, listSupportedBrowserNames } from './cookie-import-browser';
import { generatePickerCode } from './cookie-picker-routes';
import { validateNavigationUrl } from './url-validation';
import { validateOutputPath } from './path-security';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR, isPathWithin } from './platform';
import { SAFE_DIRECTORIES } from './path-security';
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
/**
* Aggressive page cleanup selectors and heuristics.
* Goal: make the page readable and clean while keeping it recognizable.
* Inspired by uBlock Origin filter lists, Readability.js, and reader mode heuristics.
*/
const CLEANUP_SELECTORS = {
ads: [
// Google Ads
'ins.adsbygoogle', '[id^="google_ads"]', '[id^="div-gpt-ad"]',
'iframe[src*="doubleclick"]', 'iframe[src*="googlesyndication"]',
'[data-google-query-id]', '.google-auto-placed',
// Generic ad patterns (uBlock Origin common filters)
'[class*="ad-banner"]', '[class*="ad-wrapper"]', '[class*="ad-container"]',
'[class*="ad-slot"]', '[class*="ad-unit"]', '[class*="ad-zone"]',
'[class*="ad-placement"]', '[class*="ad-holder"]', '[class*="ad-block"]',
'[class*="adbox"]', '[class*="adunit"]', '[class*="adwrap"]',
'[id*="ad-banner"]', '[id*="ad-wrapper"]', '[id*="ad-container"]',
'[id*="ad-slot"]', '[id*="ad_banner"]', '[id*="ad_container"]',
'[data-ad]', '[data-ad-slot]', '[data-ad-unit]', '[data-adunit]',
'[class*="sponsored"]', '[class*="Sponsored"]',
'.ad', '.ads', '.advert', '.advertisement',
'#ad', '#ads', '#advert', '#advertisement',
// Common ad network iframes
'iframe[src*="amazon-adsystem"]', 'iframe[src*="outbrain"]',
'iframe[src*="taboola"]', 'iframe[src*="criteo"]',
'iframe[src*="adsafeprotected"]', 'iframe[src*="moatads"]',
// Promoted/sponsored content
'[class*="promoted"]', '[class*="Promoted"]',
'[data-testid*="promo"]', '[class*="native-ad"]',
// Empty ad placeholders (divs with only ad classes, no real content)
'aside[class*="ad"]', 'section[class*="ad-"]',
],
cookies: [
// Cookie consent frameworks
'[class*="cookie-consent"]', '[class*="cookie-banner"]', '[class*="cookie-notice"]',
'[id*="cookie-consent"]', '[id*="cookie-banner"]', '[id*="cookie-notice"]',
'[class*="consent-banner"]', '[class*="consent-modal"]', '[class*="consent-wall"]',
'[class*="gdpr"]', '[id*="gdpr"]', '[class*="GDPR"]',
'[class*="CookieConsent"]', '[id*="CookieConsent"]',
// OneTrust (very common)
'#onetrust-consent-sdk', '.onetrust-pc-dark-filter', '#onetrust-banner-sdk',
// Cookiebot
'#CybotCookiebotDialog', '#CybotCookiebotDialogBodyUnderlay',
// TrustArc / TRUSTe
'#truste-consent-track', '.truste_overlay', '.truste_box_overlay',
// Quantcast
'.qc-cmp2-container', '#qc-cmp2-main',
// Generic patterns
'[class*="cc-banner"]', '[class*="cc-window"]', '[class*="cc-overlay"]',
'[class*="privacy-banner"]', '[class*="privacy-notice"]',
'[id*="privacy-banner"]', '[id*="privacy-notice"]',
'[class*="accept-cookies"]', '[id*="accept-cookies"]',
],
overlays: [
// Paywall / subscription overlays
'[class*="paywall"]', '[class*="Paywall"]', '[id*="paywall"]',
'[class*="subscribe-wall"]', '[class*="subscription-wall"]',
'[class*="meter-wall"]', '[class*="regwall"]', '[class*="reg-wall"]',
// Newsletter / signup popups
'[class*="newsletter-popup"]', '[class*="newsletter-modal"]',
'[class*="signup-modal"]', '[class*="signup-popup"]',
'[class*="email-capture"]', '[class*="lead-capture"]',
'[class*="popup-modal"]', '[class*="modal-overlay"]',
// Interstitials
'[class*="interstitial"]', '[id*="interstitial"]',
// Push notification prompts
'[class*="push-notification"]', '[class*="notification-prompt"]',
'[class*="web-push"]',
// Survey / feedback popups
'[class*="survey-"]', '[class*="feedback-modal"]',
'[id*="survey-"]', '[class*="nps-"]',
// App download banners
'[class*="app-banner"]', '[class*="smart-banner"]', '[class*="app-download"]',
'[id*="branch-banner"]', '.smartbanner',
// Cross-promotion / "follow us" / "preferred source" widgets
'[class*="promo-banner"]', '[class*="cross-promo"]', '[class*="partner-promo"]',
'[class*="preferred-source"]', '[class*="google-promo"]',
],
clutter: [
// Audio/podcast player widgets (not part of the article text)
'[class*="audio-player"]', '[class*="podcast-player"]', '[class*="listen-widget"]',
'[class*="everlit"]', '[class*="Everlit"]',
'audio', // bare audio elements
// Sidebar games/puzzles widgets
'[class*="puzzle"]', '[class*="daily-game"]', '[class*="games-widget"]',
'[class*="crossword-promo"]', '[class*="mini-game"]',
// "Most Popular" / "Trending" sidebar recirculation (not the top nav trending bar)
'aside [class*="most-popular"]', 'aside [class*="trending"]',
'aside [class*="most-read"]', 'aside [class*="recommended"]',
// Related articles / recirculation at bottom
'[class*="related-articles"]', '[class*="more-stories"]',
'[class*="recirculation"]', '[class*="taboola"]', '[class*="outbrain"]',
// Hearst-specific (SF Chronicle, etc.)
'[class*="nativo"]', '[data-tb-region]',
],
sticky: [
// Handled via JavaScript evaluation, not pure selectors
],
social: [
'[class*="social-share"]', '[class*="share-buttons"]', '[class*="share-bar"]',
'[class*="social-widget"]', '[class*="social-icons"]', '[class*="share-tools"]',
'iframe[src*="facebook.com/plugins"]', 'iframe[src*="platform.twitter"]',
'[class*="fb-like"]', '[class*="tweet-button"]',
'[class*="addthis"]', '[class*="sharethis"]',
// Follow prompts
'[class*="follow-us"]', '[class*="social-follow"]',
],
};
export async function handleWriteCommand(
command: string,
args: string[],
session: TabSession,
bm: BrowserManager
): Promise<string> {
const page = session.getPage();
// Frame-aware target for locator-based operations (click, fill, etc.)
const target = session.getActiveFrameOrPage();
const inFrame = session.getFrame() !== null;
switch (command) {
case 'goto': {
if (inFrame) throw new Error('Cannot use goto inside a frame. Run \'frame main\' first.');
const url = args[0];
if (!url) throw new Error('Usage: browse goto <url>');
await validateNavigationUrl(url);
const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
const status = response?.status() || 'unknown';
return `Navigated to ${url} (${status})`;
}
case 'back': {
if (inFrame) throw new Error('Cannot use back inside a frame. Run \'frame main\' first.');
await page.goBack({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Back → ${page.url()}`;
}
case 'forward': {
if (inFrame) throw new Error('Cannot use forward inside a frame. Run \'frame main\' first.');
await page.goForward({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Forward → ${page.url()}`;
}
case 'reload': {
if (inFrame) throw new Error('Cannot use reload inside a frame. Run \'frame main\' first.');
await page.reload({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Reloaded ${page.url()}`;
}
case 'click': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse click <selector>');
// Auto-route: if ref points to a real <option> inside a <select>, use selectOption
const role = session.getRefRole(selector);
if (role === 'option') {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
const optionInfo = await resolved.locator.evaluate(el => {
if (el.tagName !== 'OPTION') return null; // custom [role=option], not real <option>
const option = el as HTMLOptionElement;
const select = option.closest('select');
if (!select) return null;
return { value: option.value, text: option.text };
});
if (optionInfo) {
await resolved.locator.locator('xpath=ancestor::select').selectOption(optionInfo.value, { timeout: 5000 });
return `Selected "${optionInfo.text}" (auto-routed from click on <option>) → now at ${page.url()}`;
}
// Real <option> with no parent <select> or custom [role=option] — fall through to normal click
}
}
const resolved = await session.resolveRef(selector);
try {
if ('locator' in resolved) {
await resolved.locator.click({ timeout: 5000 });
} else {
await target.locator(resolved.selector).click({ timeout: 5000 });
}
} catch (err: any) {
// Enhanced error guidance: clicking <option> elements always fails (not visible / timeout)
const isOption = 'locator' in resolved
? await resolved.locator.evaluate(el => el.tagName === 'OPTION').catch(() => false)
: await target.locator(resolved.selector).evaluate(
el => el.tagName === 'OPTION'
).catch(() => false);
if (isOption) {
throw new Error(
`Cannot click <option> elements. Use 'browse select <parent-select> <value>' instead of 'click' for dropdown options.`
);
}
throw err;
}
// Wait for network to settle (catches XHR/fetch triggered by clicks)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Clicked ${selector} → now at ${page.url()}`;
}
case 'fill': {
const [selector, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !value) throw new Error('Usage: browse fill <selector> <value>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.fill(value, { timeout: 5000 });
} else {
await target.locator(resolved.selector).fill(value, { timeout: 5000 });
}
// Wait for network to settle (form validation XHRs)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Filled ${selector}`;
}
case 'select': {
const [selector, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !value) throw new Error('Usage: browse select <selector> <value>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.selectOption(value, { timeout: 5000 });
} else {
await target.locator(resolved.selector).selectOption(value, { timeout: 5000 });
}
// Wait for network to settle (dropdown-triggered requests)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Selected "${value}" in ${selector}`;
}
case 'hover': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse hover <selector>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.hover({ timeout: 5000 });
} else {
await target.locator(resolved.selector).hover({ timeout: 5000 });
}
return `Hovered ${selector}`;
}
case 'type': {
const text = args.join(' ');
if (!text) throw new Error('Usage: browse type <text>');
await page.keyboard.type(text);
return `Typed ${text.length} characters`;
}
case 'press': {
const key = args[0];
if (!key) throw new Error('Usage: browse press <key> (e.g., Enter, Tab, Escape)');
await page.keyboard.press(key);
return `Pressed ${key}`;
}
case 'scroll': {
// Parse --times N and --wait ms flags
const timesIdx = args.indexOf('--times');
const times = timesIdx >= 0 ? parseInt(args[timesIdx + 1], 10) || 1 : 0;
const waitIdx = args.indexOf('--wait');
const waitMs = waitIdx >= 0 ? parseInt(args[waitIdx + 1], 10) || 1000 : 1000;
const selector = args.find(a => !a.startsWith('--') && args.indexOf(a) !== timesIdx + 1 && args.indexOf(a) !== waitIdx + 1);
if (times > 0) {
// Repeated scroll mode
for (let i = 0; i < times; i++) {
if (selector) {
const resolved = await bm.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
} else {
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
}
} else {
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
}
if (i < times - 1) await new Promise(r => setTimeout(r, waitMs));
}
return `Scrolled ${times} times${selector ? ` (${selector})` : ''} with ${waitMs}ms delay`;
}
// Single scroll (original behavior)
if (selector) {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
} else {
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
}
return `Scrolled ${selector} into view`;
}
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
return 'Scrolled to bottom';
}
case 'wait': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse wait <selector|--networkidle|--load|--domcontentloaded>');
if (selector === '--networkidle') {
const MAX_WAIT_MS = 300_000;
const MIN_WAIT_MS = 1_000;
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
await page.waitForLoadState('networkidle', { timeout });
return 'Network idle';
}
if (selector === '--load') {
await page.waitForLoadState('load');
return 'Page loaded';
}
if (selector === '--domcontentloaded') {
await page.waitForLoadState('domcontentloaded');
return 'DOM content loaded';
}
const MAX_WAIT_MS = 300_000;
const MIN_WAIT_MS = 1_000;
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.waitFor({ state: 'visible', timeout });
} else {
await target.locator(resolved.selector).waitFor({ state: 'visible', timeout });
}
return `Element ${selector} appeared`;
}
case 'viewport': {
const size = args[0];
if (!size || !size.includes('x')) throw new Error('Usage: browse viewport <WxH> (e.g., 375x812)');
const [rawW, rawH] = size.split('x').map(Number);
const w = Math.min(Math.max(Math.round(rawW) || 1280, 1), 16384);
const h = Math.min(Math.max(Math.round(rawH) || 720, 1), 16384);
await bm.setViewport(w, h);
return `Viewport set to ${w}x${h}`;
}
case 'cookie': {
const cookieStr = args[0];
if (!cookieStr || !cookieStr.includes('=')) throw new Error('Usage: browse cookie <name>=<value>');
const eq = cookieStr.indexOf('=');
const name = cookieStr.slice(0, eq);
const value = cookieStr.slice(eq + 1);
const url = new URL(page.url());
await page.context().addCookies([{
name,
value,
domain: url.hostname,
path: '/',
}]);
return `Cookie set: ${name}=****`;
}
case 'header': {
const headerStr = args[0];
if (!headerStr || !headerStr.includes(':')) throw new Error('Usage: browse header <name>:<value>');
const sep = headerStr.indexOf(':');
const name = headerStr.slice(0, sep).trim();
const value = headerStr.slice(sep + 1).trim();
await bm.setExtraHeader(name, value);
const sensitiveHeaders = ['authorization', 'cookie', 'set-cookie', 'x-api-key', 'x-auth-token'];
const redactedValue = sensitiveHeaders.includes(name.toLowerCase()) ? '****' : value;
return `Header set: ${name}: ${redactedValue}`;
}
case 'useragent': {
const ua = args.join(' ');
if (!ua) throw new Error('Usage: browse useragent <string>');
bm.setUserAgent(ua);
const error = await bm.recreateContext();
if (error) {
return `User agent set to "${ua}" but: ${error}`;
}
return `User agent set: ${ua}`;
}
case 'upload': {
const [selector, ...filePaths] = args;
if (!selector || filePaths.length === 0) throw new Error('Usage: browse upload <selector> <file1> [file2...]');
// Validate paths are within safe directories (same check as cookie-import)
for (const fp of filePaths) {
if (!fs.existsSync(fp)) throw new Error(`File not found: ${fp}`);
if (path.isAbsolute(fp)) {
let resolvedFp: string;
try { resolvedFp = fs.realpathSync(path.resolve(fp)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; resolvedFp = path.resolve(fp); }
if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedFp, dir))) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
if (path.normalize(fp).includes('..')) {
throw new Error('Path traversal sequences (..) are not allowed');
}
}
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.setInputFiles(filePaths);
} else {
await target.locator(resolved.selector).setInputFiles(filePaths);
}
const fileInfo = filePaths.map(fp => {
const stat = fs.statSync(fp);
return `${path.basename(fp)} (${stat.size}B)`;
}).join(', ');
return `Uploaded: ${fileInfo}`;
}
case 'dialog-accept': {
const text = args.length > 0 ? args.join(' ') : null;
bm.setDialogAutoAccept(true);
bm.setDialogPromptText(text);
return text
? `Dialogs will be accepted with text: "${text}"`
: 'Dialogs will be accepted';
}
case 'dialog-dismiss': {
bm.setDialogAutoAccept(false);
bm.setDialogPromptText(null);
return 'Dialogs will be dismissed';
}
case 'cookie-import': {
const filePath = args[0];
if (!filePath) throw new Error('Usage: browse cookie-import <json-file>');
// Path validation — resolve to absolute and check against safe dirs.
// Fixes #707: relative paths previously bypassed the safe directory check.
// Mirrors validateOutputPath() — resolves symlinks (e.g., macOS /tmp → /private/tmp).
const resolved = path.resolve(filePath);
let resolvedReal = resolved;
try { resolvedReal = fs.realpathSync(resolved); } catch {
// File may not exist yet — resolve parent dir instead
try { resolvedReal = path.join(fs.realpathSync(path.dirname(resolved)), path.basename(resolved)); } catch {}
}
if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedReal, dir))) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
const raw = fs.readFileSync(filePath, 'utf-8');
let cookies: any[];
try { cookies = JSON.parse(raw); } catch (err: any) { throw new Error(`Invalid JSON in ${filePath}: ${err?.message || err}`); }
if (!Array.isArray(cookies)) throw new Error('Cookie file must contain a JSON array');
// Auto-fill domain from current page URL when missing (consistent with cookie command)
const pageUrl = new URL(page.url());
const defaultDomain = pageUrl.hostname;
for (const c of cookies) {
if (!c.name || c.value === undefined) throw new Error('Each cookie must have "name" and "value" fields');
if (!c.domain) {
c.domain = defaultDomain;
} else {
const cookieDomain = c.domain.startsWith('.') ? c.domain.slice(1) : c.domain;
if (cookieDomain !== defaultDomain && !defaultDomain.endsWith('.' + cookieDomain)) {
throw new Error(`Cookie domain "${c.domain}" does not match current page domain "${defaultDomain}". Use the target site first.`);
}
}
if (!c.path) c.path = '/';
}
await page.context().addCookies(cookies);
const importedDomains = [...new Set(cookies.map((c: any) => c.domain).filter(Boolean))];
if (importedDomains.length > 0) bm.trackCookieImportDomains(importedDomains);
return `Loaded ${cookies.length} cookies from ${filePath}`;
}
case 'cookie-import-browser': {
// Two modes:
// 1. Direct CLI import: cookie-import-browser <browser> --domain <domain> [--profile <profile>]
// Requires --domain (or --all to explicitly import everything).
// 2. Open picker UI: cookie-import-browser [browser] (interactive domain selection)
const browserArg = args[0];
const domainIdx = args.indexOf('--domain');
const profileIdx = args.indexOf('--profile');
const hasAll = args.includes('--all');
const profile = (profileIdx !== -1 && profileIdx + 1 < args.length) ? args[profileIdx + 1] : 'Default';
if (domainIdx !== -1 && domainIdx + 1 < args.length) {
// Direct import mode — scoped to specific domain
const domain = args[domainIdx + 1];
// Validate --domain against current page hostname to prevent cross-site cookie injection
const pageHostname = new URL(page.url()).hostname;
const normalizedDomain = domain.startsWith('.') ? domain.slice(1) : domain;
if (normalizedDomain !== pageHostname && !pageHostname.endsWith('.' + normalizedDomain)) {
throw new Error(`--domain "${domain}" does not match current page domain "${pageHostname}". Navigate to the target site first.`);
}
const browser = browserArg || 'comet';
const result = await importCookies(browser, [domain], profile);
if (result.cookies.length > 0) {
await page.context().addCookies(result.cookies);
bm.trackCookieImportDomains([domain]);
}
const msg = [`Imported ${result.count} cookies for ${domain} from ${browser}`];
if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
return msg.join(' ');
}
if (hasAll) {
// Explicit all-cookies import — requires --all flag as a deliberate opt-in.
// Imports every non-expired cookie domain from the browser.
const browser = browserArg || 'comet';
const { listDomains } = await import('./cookie-import-browser');
const { domains } = listDomains(browser, profile);
const allDomainNames = domains.map((d: any) => d.domain);
if (allDomainNames.length === 0) {
return `No cookies found in ${browser} (profile: ${profile})`;
}
const result = await importCookies(browser, allDomainNames, profile);
if (result.cookies.length > 0) {
await page.context().addCookies(result.cookies);
bm.trackCookieImportDomains(allDomainNames);
}
const msg = [`Imported ${result.count} cookies across ${Object.keys(result.domainCounts).length} domains from ${browser}`];
msg.push('(used --all: all browser cookies imported, consider --domain for tighter scoping)');
if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
return msg.join(' ');
}
// Picker UI mode — open in user's browser for interactive domain selection
const port = bm.serverPort;
if (!port) throw new Error('Server port not available');
const browsers = findInstalledBrowsers();
if (browsers.length === 0) {
throw new Error(`No Chromium browsers found. Supported: ${listSupportedBrowserNames().join(', ')}`);
}
const code = generatePickerCode();
const pickerUrl = `http://127.0.0.1:${port}/cookie-picker?code=${code}`;
try {
Bun.spawn(['open', pickerUrl], { stdout: 'ignore', stderr: 'ignore' });
} catch (err: any) {
// open may fail on non-macOS or if 'open' binary is missing — URL is in the message below
if (err?.code !== 'ENOENT' && !err?.message?.includes('spawn')) throw err;
}
return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.\n\nTip: For scripted imports, use --domain <domain> to scope cookies to a single domain.`;
}
case 'style': {
// style --undo [N] → revert modification
if (args[0] === '--undo') {
const idx = args[1] ? parseInt(args[1], 10) : undefined;
await undoModification(page, idx);
return idx !== undefined ? `Reverted modification #${idx}` : 'Reverted last modification';
}
// style <selector> <property> <value>
const [selector, property, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !property || !value) {
throw new Error('Usage: browse style <sel> <prop> <value> | style --undo [N]');
}
// Validate CSS property name
if (!/^[a-zA-Z-]+$/.test(property)) {
throw new Error(`Invalid CSS property name: ${property}. Only letters and hyphens allowed.`);
}
// Validate CSS value — block data exfiltration patterns
const DANGEROUS_CSS = /url\s*\(|expression\s*\(|@import|javascript:|data:/i;
if (DANGEROUS_CSS.test(value)) {
throw new Error('CSS value rejected: contains potentially dangerous pattern.');
}
const mod = await modifyStyle(page, selector, property, value);
return `Style modified: ${selector} { ${property}: ${mod.oldValue || '(none)'}${value} } (${mod.method})`;
}
case 'cleanup': {
// Parse flags
let doAds = false, doCookies = false, doSticky = false, doSocial = false;
let doOverlays = false, doClutter = false;
let doAll = false;
// Default to --all if no args (most common use case from sidebar button)
if (args.length === 0) {
doAll = true;
}
for (const arg of args) {
switch (arg) {
case '--ads': doAds = true; break;
case '--cookies': doCookies = true; break;
case '--sticky': doSticky = true; break;
case '--social': doSocial = true; break;
case '--overlays': doOverlays = true; break;
case '--clutter': doClutter = true; break;
case '--all': doAll = true; break;
default:
throw new Error(`Unknown cleanup flag: ${arg}. Use: --ads, --cookies, --sticky, --social, --overlays, --clutter, --all`);
}
}
if (doAll) {
doAds = doCookies = doSticky = doSocial = doOverlays = doClutter = true;
}
const removed: string[] = [];
// Build selector list for categories to clean
const selectors: string[] = [];
if (doAds) selectors.push(...CLEANUP_SELECTORS.ads);
if (doCookies) selectors.push(...CLEANUP_SELECTORS.cookies);
if (doSocial) selectors.push(...CLEANUP_SELECTORS.social);
if (doOverlays) selectors.push(...CLEANUP_SELECTORS.overlays);
if (doClutter) selectors.push(...CLEANUP_SELECTORS.clutter);
if (selectors.length > 0) {
const count = await page.evaluate((sels: string[]) => {
let removed = 0;
for (const sel of sels) {
try {
const els = document.querySelectorAll(sel);
els.forEach(el => {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
removed++;
});
} catch (err: any) {
// querySelectorAll throws DOMException on invalid CSS selectors — skip those
if (!(err instanceof DOMException)) throw err;
}
}
return removed;
}, selectors);
if (count > 0) {
if (doAds) removed.push('ads');
if (doCookies) removed.push('cookie banners');
if (doSocial) removed.push('social widgets');
if (doOverlays) removed.push('overlays/popups');
if (doClutter) removed.push('clutter');
}
}
// Sticky/fixed elements — handled separately with computed style check
if (doSticky) {
const stickyCount = await page.evaluate(() => {
let removed = 0;
// Collect all sticky/fixed elements, sort by vertical position
const stickyEls: Array<{ el: Element; top: number; width: number; height: number }> = [];
const allElements = document.querySelectorAll('*');
const viewportWidth = window.innerWidth;
for (const el of allElements) {
const style = getComputedStyle(el);
if (style.position === 'fixed' || style.position === 'sticky') {
const rect = el.getBoundingClientRect();
stickyEls.push({ el, top: rect.top, width: rect.width, height: rect.height });
}
}
// Sort by vertical position (topmost first)
stickyEls.sort((a, b) => a.top - b.top);
let preservedTopNav = false;
for (const { el, top, width, height } of stickyEls) {
const tag = el.tagName.toLowerCase();
// Always skip nav/header semantic elements
if (tag === 'nav' || tag === 'header') continue;
if (el.getAttribute('role') === 'navigation') continue;
// Skip the gstack control indicator
if ((el as HTMLElement).id === 'gstack-ctrl') continue;
// Preserve the FIRST full-width element near the top (site's main nav bar)
// This catches divs that act as navbars but aren't semantic <nav> elements
if (!preservedTopNav && top <= 50 && width > viewportWidth * 0.8 && height < 120) {
preservedTopNav = true;
continue;
}
(el as HTMLElement).style.setProperty('display', 'none', 'important');
removed++;
}
return removed;
});
if (stickyCount > 0) removed.push(`${stickyCount} sticky/fixed elements`);
}
// Unlock scrolling (many sites lock body scroll when modals are open)
const scrollFixed = await page.evaluate(() => {
let fixed = 0;
// Unlock body and html scroll
for (const el of [document.body, document.documentElement]) {
if (!el) continue;
const style = getComputedStyle(el);
if (style.overflow === 'hidden' || style.overflowY === 'hidden') {
(el as HTMLElement).style.setProperty('overflow', 'auto', 'important');
(el as HTMLElement).style.setProperty('overflow-y', 'auto', 'important');
fixed++;
}
// Remove height:100% + position:fixed that locks scroll
if (style.position === 'fixed' && (el === document.body || el === document.documentElement)) {
(el as HTMLElement).style.setProperty('position', 'static', 'important');
fixed++;
}
}
// Remove blur/filter effects (paywalls often blur the content)
const blurred = document.querySelectorAll('[style*="blur"], [style*="filter"]');
blurred.forEach(el => {
const s = (el as HTMLElement).style;
if (s.filter?.includes('blur') || s.webkitFilter?.includes('blur')) {
s.setProperty('filter', 'none', 'important');
s.setProperty('-webkit-filter', 'none', 'important');
fixed++;
}
});
// Remove max-height truncation (article truncation)
const truncated = document.querySelectorAll('[class*="truncat"], [class*="preview"], [class*="teaser"]');
truncated.forEach(el => {
const s = getComputedStyle(el);
if (s.maxHeight && s.maxHeight !== 'none' && parseInt(s.maxHeight) < 500) {
(el as HTMLElement).style.setProperty('max-height', 'none', 'important');
(el as HTMLElement).style.setProperty('overflow', 'visible', 'important');
fixed++;
}
});
return fixed;
});
if (scrollFixed > 0) removed.push('scroll unlocked');
// Remove "ADVERTISEMENT" / "Article continues below" text labels
const adLabelCount = await page.evaluate(() => {
let removed = 0;
const adTextPatterns = [
/^advertisement$/i, /^sponsored$/i, /^promoted$/i,
/article continues/i, /continues below/i,
/^ad$/i, /^paid content$/i, /^partner content$/i,
];
// Walk text-heavy small elements looking for ad labels
const candidates = document.querySelectorAll('div, span, p, figcaption, label');
for (const el of candidates) {
const text = (el.textContent || '').trim();
if (text.length > 50) continue; // Too much text, probably real content
if (adTextPatterns.some(p => p.test(text))) {
// Also hide the parent if it's a wrapper with little else
const parent = el.parentElement;
if (parent && (parent.textContent || '').trim().length < 80) {
(parent as HTMLElement).style.setProperty('display', 'none', 'important');
} else {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
}
removed++;
}
}
return removed;
});
if (adLabelCount > 0) removed.push(`${adLabelCount} ad labels`);
// Remove empty ad placeholder whitespace (divs that are now empty after ad removal)
const collapsedCount = await page.evaluate(() => {
let collapsed = 0;
const candidates = document.querySelectorAll(
'div[class*="ad"], div[id*="ad"], aside[class*="ad"], div[class*="sidebar"], ' +
'div[class*="rail"], div[class*="right-col"], div[class*="widget"]'
);
for (const el of candidates) {
const rect = el.getBoundingClientRect();
// If the element has significant height but no visible text content, collapse it
if (rect.height > 50 && rect.width > 0) {
const text = (el.textContent || '').trim();
const images = el.querySelectorAll('img:not([src*="logo"]):not([src*="icon"])');
const links = el.querySelectorAll('a');
// Empty or mostly empty: collapse
if (text.length < 20 && images.length === 0 && links.length < 2) {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
collapsed++;
}
}
}
return collapsed;
});
if (collapsedCount > 0) removed.push(`${collapsedCount} empty placeholders`);
if (removed.length === 0) return 'No clutter elements found to remove.';
return `Cleaned up: ${removed.join(', ')}`;
}
case 'prettyscreenshot': {
// Parse flags
let scrollTo: string | undefined;
let doCleanup = false;
const hideSelectors: string[] = [];
let viewportWidth: number | undefined;
let outputPath: string | undefined;
for (let i = 0; i < args.length; i++) {
if (args[i] === '--scroll-to' && i + 1 < args.length) {
scrollTo = args[++i];
} else if (args[i] === '--cleanup') {
doCleanup = true;
} else if (args[i] === '--hide' && i + 1 < args.length) {
// Collect all following non-flag args as selectors to hide
i++;
while (i < args.length && !args[i].startsWith('--')) {
hideSelectors.push(args[i]);
i++;
}
i--; // Back up since the for loop will increment
} else if (args[i] === '--width' && i + 1 < args.length) {
viewportWidth = parseInt(args[++i], 10);
if (isNaN(viewportWidth)) throw new Error('--width must be a number');
} else if (!args[i].startsWith('--')) {
outputPath = args[i];
} else {
throw new Error(`Unknown prettyscreenshot flag: ${args[i]}`);
}
}
// Default output path
if (!outputPath) {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
outputPath = `${TEMP_DIR}/browse-pretty-${timestamp}.png`;
}
validateOutputPath(outputPath);
const originalViewport = page.viewportSize();
// Set viewport width if specified
if (viewportWidth && originalViewport) {
await page.setViewportSize({ width: viewportWidth, height: originalViewport.height });
}
// Run cleanup if requested
if (doCleanup) {
const allSelectors = [
...CLEANUP_SELECTORS.ads,
...CLEANUP_SELECTORS.cookies,
...CLEANUP_SELECTORS.social,
];
await page.evaluate((sels: string[]) => {
for (const sel of sels) {
try {
document.querySelectorAll(sel).forEach(el => {
(el as HTMLElement).style.display = 'none';
});
} catch (err: any) {
if (!(err instanceof DOMException)) throw err;
}
}
// Also hide fixed/sticky (except nav)
for (const el of document.querySelectorAll('*')) {
const style = getComputedStyle(el);
if (style.position === 'fixed' || style.position === 'sticky') {
const tag = el.tagName.toLowerCase();
if (tag === 'nav' || tag === 'header') continue;
if (el.getAttribute('role') === 'navigation') continue;
(el as HTMLElement).style.display = 'none';
}
}
}, allSelectors);
}
// Hide specific elements
if (hideSelectors.length > 0) {
await page.evaluate((sels: string[]) => {
for (const sel of sels) {
try {
document.querySelectorAll(sel).forEach(el => {
(el as HTMLElement).style.display = 'none';
});
} catch (err: any) {
if (!(err instanceof DOMException)) throw err;
}
}
}, hideSelectors);
}
// Scroll to target
if (scrollTo) {
// Try as CSS selector first, then as text content
const scrolled = await page.evaluate((target: string) => {
// Try CSS selector
let el = document.querySelector(target);
if (el) {
el.scrollIntoView({ behavior: 'instant', block: 'center' });
return true;
}
// Try text match
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null,
);
let node: Node | null;
while ((node = walker.nextNode())) {
if (node.textContent?.includes(target)) {
const parent = node.parentElement;
if (parent) {
parent.scrollIntoView({ behavior: 'instant', block: 'center' });
return true;
}
}
}
return false;
}, scrollTo);
if (!scrolled) {
// Restore viewport before throwing
if (viewportWidth && originalViewport) {
await page.setViewportSize(originalViewport);
}
throw new Error(`Could not find element or text to scroll to: ${scrollTo}`);
}
// Brief wait for scroll to settle
await page.waitForTimeout(300);
}
// Take screenshot
await page.screenshot({ path: outputPath, fullPage: !scrollTo });
// Restore viewport
if (viewportWidth && originalViewport) {
await page.setViewportSize(originalViewport);
}
const parts = ['Screenshot saved'];
if (doCleanup) parts.push('(cleaned)');
if (scrollTo) parts.push(`(scrolled to: ${scrollTo})`);
parts.push(`: ${outputPath}`);
return parts.join(' ');
}
case 'download': {
if (args.length === 0) throw new Error('Usage: download <url|@ref> [path] [--base64]');
const isBase64 = args.includes('--base64');
const filteredArgs = args.filter(a => a !== '--base64');
let url = filteredArgs[0];
const outputPath = filteredArgs[1];
// Resolve @ref to element src
if (url.startsWith('@')) {
const resolved = await bm.resolveRef(url);
if (!('locator' in resolved)) throw new Error(`Expected @ref, got CSS selector: ${url}`);
const locator = resolved.locator;
const tagName = await locator.evaluate(el => el.tagName.toLowerCase());
if (tagName === 'img') {
url = await locator.evaluate(el => {
const img = el as HTMLImageElement;
return img.currentSrc || img.src || img.getAttribute('data-src') || '';
});
} else if (tagName === 'video') {
url = await locator.evaluate(el => (el as HTMLVideoElement).currentSrc || (el as HTMLVideoElement).src || '');
} else if (tagName === 'audio') {
url = await locator.evaluate(el => (el as HTMLAudioElement).currentSrc || (el as HTMLAudioElement).src || '');
} else {
// Try src attribute on any element
url = await locator.evaluate(el => el.getAttribute('src') || '');
}
if (!url) throw new Error(`Could not extract URL from ${filteredArgs[0]} (${tagName})`);
}
// Check for HLS/DASH
if (url.includes('.m3u8') || url.includes('.mpd')) {
throw new Error('This is an HLS/DASH stream. Use yt-dlp or ffmpeg for adaptive stream downloads.');
}
// Determine output path and extension
const page = bm.getPage();
let contentType = 'application/octet-stream';
let buffer: Buffer;
if (url.startsWith('blob:')) {
// Strategy 3: Blob URL -- in-page fetch + base64
const dataUrl = await page.evaluate(async (blobUrl) => {
try {
const resp = await fetch(blobUrl);
const blob = await resp.blob();
if (blob.size > 100 * 1024 * 1024) return 'ERROR:TOO_LARGE';
return new Promise<string>((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result as string);
reader.onerror = () => reject('Failed to read blob');
reader.readAsDataURL(blob);
});
} catch (err: any) {
return `ERROR:EXPIRED:${err?.message || 'unknown'}`;
}
}, url);
if (dataUrl === 'ERROR:TOO_LARGE') throw new Error('Blob too large (>100MB). Use a different approach.');
if (dataUrl.startsWith('ERROR:EXPIRED')) throw new Error(`Blob URL expired or inaccessible: ${dataUrl.slice('ERROR:EXPIRED:'.length)}`);
const match = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
if (!match) throw new Error('Failed to decode blob data');
contentType = match[1];
buffer = Buffer.from(match[2], 'base64');
} else {
// Strategy 1: Direct URL via page.request.fetch()
const response = await page.request.fetch(url, { timeout: 30000 });
const status = response.status();
if (status >= 400) {
throw new Error(`Download failed: HTTP ${status} ${response.statusText()}`);
}
contentType = response.headers()['content-type'] || 'application/octet-stream';
buffer = Buffer.from(await response.body());
if (buffer.length > 200 * 1024 * 1024) {
throw new Error('File too large (>200MB).');
}
}
// --base64 mode: return inline
if (isBase64) {
if (buffer.length > 10 * 1024 * 1024) {
throw new Error('File too large for --base64 (>10MB). Use disk download + GET /file instead.');
}
const mimeType = contentType.split(';')[0].trim();
return `data:${mimeType};base64,${buffer.toString('base64')}`;
}
// Write to disk
const ext = contentType.split(';')[0].includes('/')
? mimeToExt(contentType.split(';')[0].trim())
: '.bin';
const destPath = outputPath || path.join(TEMP_DIR, `browse-download-${Date.now()}${ext}`);
validateOutputPath(destPath);
fs.writeFileSync(destPath, buffer);
const sizeKB = Math.round(buffer.length / 1024);
return `Downloaded: ${destPath} (${sizeKB}KB, ${contentType.split(';')[0].trim()})`;
}
case 'scrape': {
if (args.length === 0) throw new Error('Usage: scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]');
const mediaType = args[0];
if (!['images', 'videos', 'media'].includes(mediaType)) {
throw new Error(`Invalid type: ${mediaType}. Use: images, videos, or media`);
}
// Parse flags
const selectorIdx = args.indexOf('--selector');
const selector = selectorIdx >= 0 ? args[selectorIdx + 1] : undefined;
const dirIdx = args.indexOf('--dir');
const dir = dirIdx >= 0 ? args[dirIdx + 1] : path.join(TEMP_DIR, `browse-scrape-${Date.now()}`);
const limitIdx = args.indexOf('--limit');
const limit = Math.min(limitIdx >= 0 ? parseInt(args[limitIdx + 1], 10) || 50 : 50, 200);
validateOutputPath(dir);
fs.mkdirSync(dir, { recursive: true });
const { extractMedia } = await import('./media-extract');
const target = bm.getActiveFrameOrPage();
const filter = mediaType === 'images' ? 'images' as const
: mediaType === 'videos' ? 'videos' as const
: undefined;
const mediaResult = await extractMedia(target, { selector, filter });
// Collect URLs to download
const urls: Array<{ url: string; type: string }> = [];
const seen = new Set<string>();
for (const img of mediaResult.images) {
const url = img.currentSrc || img.src || img.dataSrc;
if (url && !seen.has(url) && !url.startsWith('data:')) {
seen.add(url);
urls.push({ url, type: 'image' });
}
}
for (const vid of mediaResult.videos) {
const url = vid.currentSrc || vid.src;
if (url && !seen.has(url) && !url.startsWith('blob:') && !vid.isHLS && !vid.isDASH) {
seen.add(url);
urls.push({ url, type: 'video' });
}
}
for (const bg of mediaResult.backgroundImages) {
if (bg.url && !seen.has(bg.url)) {
seen.add(bg.url);
urls.push({ url: bg.url, type: 'image' });
}
}
const toDownload = urls.slice(0, limit);
const page = bm.getPage();
const manifest: any = {
url: page.url(),
scraped_at: new Date().toISOString(),
files: [] as any[],
total_size: 0,
succeeded: 0,
failed: 0,
};
const lines: string[] = [];
for (let i = 0; i < toDownload.length; i++) {
const { url, type } = toDownload[i];
try {
const response = await page.request.fetch(url, { timeout: 30000 });
if (response.status() >= 400) throw new Error(`HTTP ${response.status()}`);
const ct = response.headers()['content-type'] || 'application/octet-stream';
const ext = mimeToExt(ct.split(';')[0].trim());
const filename = `${type}-${String(i + 1).padStart(3, '0')}${ext}`;
const filePath = path.join(dir, filename);
const body = Buffer.from(await response.body());
try {
fs.writeFileSync(filePath, body);
} catch (writeErr: any) {
throw new Error(`Disk write failed: ${writeErr.message}`);
}
manifest.files.push({ path: filename, src: url, size: body.length, type: ct.split(';')[0].trim() });
manifest.total_size += body.length;
manifest.succeeded++;
lines.push(` [${i + 1}/${toDownload.length}] ${filename} (${Math.round(body.length / 1024)}KB)`);
} catch (err: any) {
manifest.files.push({ path: null, src: url, size: 0, type: '', error: err.message });
manifest.failed++;
lines.push(` [${i + 1}/${toDownload.length}] FAILED: ${err.message}`);
}
// 100ms delay between downloads
if (i < toDownload.length - 1) await new Promise(r => setTimeout(r, 100));
}
// Write manifest
fs.writeFileSync(path.join(dir, 'manifest.json'), JSON.stringify(manifest, null, 2));
return `Scraped ${toDownload.length} items to ${dir}/\n${lines.join('\n')}\n\nSummary: ${manifest.succeeded} succeeded, ${manifest.failed} failed, ${Math.round(manifest.total_size / 1024)}KB total`;
}
case 'archive': {
const page = bm.getPage();
const outputPath = args[0] || path.join(TEMP_DIR, `browse-archive-${Date.now()}.mhtml`);
validateOutputPath(outputPath);
try {
const cdp = await page.context().newCDPSession(page);
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
await cdp.detach();
fs.writeFileSync(outputPath, data);
return `Archive saved: ${outputPath} (${Math.round(data.length / 1024)}KB, MHTML)`;
} catch (err: any) {
throw new Error(`MHTML archive requires Chromium CDP. Use 'text' or 'html' for raw page content. (${err.message})`);
}
}
default:
throw new Error(`Unknown write command: ${command}`);
}
}
/** Map MIME type to file extension. */
function mimeToExt(mime: string): string {
const map: Record<string, string> = {
'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif',
'image/webp': '.webp', 'image/svg+xml': '.svg', 'image/avif': '.avif',
'video/mp4': '.mp4', 'video/webm': '.webm', 'video/quicktime': '.mov',
'audio/mpeg': '.mp3', 'audio/wav': '.wav', 'audio/ogg': '.ogg',
'application/pdf': '.pdf', 'application/json': '.json',
'text/html': '.html', 'text/plain': '.txt',
};
return map[mime] || '.bin';
}