mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-10 06:28:23 +08:00
* fix(security): validateOutputPath symlink bypass — check file-level symlinks validateOutputPath() previously only resolved symlinks on the parent directory. A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is /tmp, which is safe) but the write followed the symlink outside safe dirs. Add lstatSync() check: if the target file exists and is a symlink, resolve through it and verify the real target is within SAFE_DIRECTORIES. ENOENT (file doesn't exist yet) falls through to the existing parent-dir check. Closes #921 Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com> * fix(security): shell injection in bin/ scripts — use env vars instead of interpolation gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e double-quoted blocks. A path containing quotes or backticks breaks the JS string context, enabling arbitrary code execution. Replace direct interpolation with environment variables (process.env). Same fix applied to gstack-team-init which had the same pattern. Systematic audit confirmed only these two scripts were vulnerable — all other bin/ scripts already use stdin piping or env vars. Closes #858 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): cookie-import path validation bypass + hardcoded /tmp Two fixes: 1. cookie-import relative path bypass (#707): path.isAbsolute() gated the entire validation, so relative paths like "sensitive-file.json" bypassed the safe-directory check entirely. Now always resolves to absolute path with realpathSync for symlink resolution, matching validateOutputPath(). 2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used /tmp directly instead of os.tmpdir(), breaking Windows support. Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in write-commands.ts (previously resolved implicitly through bundler). Closes #852 Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com> * fix(security): redact form fields with sensitive names, not just type=password Form redaction only applied to type="password" fields. Hidden and text fields named csrf_token, api_key, session_id, etc. were exposed unredacted in LLM context, leaking secrets. Extend redaction to check field name and id against sensitive patterns: token, secret, key, password, credential, auth, jwt, session, csrf, sid, api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME. Closes #860 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): restrict session file permissions to owner-only Design session files written to /tmp with default umask (0644) were world-readable on shared systems. Sessions contain design prompts and feedback history. Set mode 0o600 (owner read/write only) on both create and update paths. Closes #859 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): enforce frozen lockfile during setup bun install without --frozen-lockfile resolves ^semver ranges from npm on every run. If an attacker publishes a compromised compatible version of any dependency, the next ./setup pulls it silently. Add --frozen-lockfile with fallback to plain install (for fresh clones where bun.lock may not exist yet). Matches the pattern already used in the .agents/ generation block (line 237). Closes #614 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci chmod -R 1777 /tmp recursively sets sticky bit on files (no defined behavior), not just the directory. Deduplicate to single chmod 1777 /tmp. Closes #747 Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com> * fix(security): learnings input validation + cross-project trust gate Three fixes to the learnings system: 1. Input validation in gstack-learnings-log: type must be from allowed list, key must be alphanumeric, confidence must be 1-10 integer, source must be from allowed list. Prevents injection via malformed fields. 2. Prompt injection defense: insight field checked against 10 instruction-like patterns (ignore previous, system:, override, etc.). Rejected with clear error message. 3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings from other projects are filtered out. Only user-stated learnings cross project boundaries. Prevents silent prompt injection across codebases. Also adds trusted field (true for user-stated source, false for AI-generated) to enable the trust gate at read time. Closes #841 Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com> * feat(security): track cookie-imported domains and scope cookie imports Foundation for origin-pinned JS execution (#616). Tracks which domains cookies were imported from so the JS/eval commands can verify execution stays within imported origins. Changes: - BrowserManager: new cookieImportedDomains Set with track/get/has methods - cookie-import: tracks imported cookie domains after addCookies - cookie-import-browser: tracks domains on --domain direct import - cookie-import-browser --all: new explicit opt-in for all-domain import (previously implicit behavior, now requires deliberate flag) Closes #615 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): pin JS/eval execution to cookie-imported origins When cookies have been imported for specific domains, block JS execution on pages whose origin doesn't match. Prevents the attack chain: 1. Agent imports cookies for github.com 2. Prompt injection navigates to attacker.com 3. Agent runs js document.cookie → exfiltrates github cookies assertJsOriginAllowed() checks the current page hostname against imported cookie domains with subdomain matching (.github.com allows api.github.com). When no cookies are imported, all origins allowed (nothing to protect). about:blank and data: URIs are allowed (no cookies at risk). Depends on #615 (cookie domain tracking). Closes #616 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): add persistent command audit log Append-only JSONL audit trail for all browse server commands. Unlike in-memory ring buffers, the audit log persists across restarts and is never truncated. Each entry records: timestamp, command, args (truncated to 200 chars), page origin, duration, status, error (truncated to 300 chars), hasCookies flag, connection mode. All writes are best-effort — audit failures never block command execution. Log stored at ~/.gstack/.browse/browse-audit.jsonl. Closes #617 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe (hex form), which was not in the blocklist. Similarly, ::169.254.169.254 normalizes to ::a9fe:a9fe. Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught by the direct hostname check in validateNavigationUrl. Closes #739 Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com> * chore: bump version and changelog (v0.16.4.0) Security wave 3: 12 fixes, 7 contributors. Cookie origin pinning, command audit log, domain tracking. Symlink bypass, path validation, shell injection, form redaction, learnings injection, IPv6 SSRF, session permissions, frozen lockfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com> Co-authored-by: Gus <garagon@users.noreply.github.com> Co-authored-by: Toby Morning <urbantech@users.noreply.github.com> Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com> Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com> Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com> Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1164 lines
50 KiB
TypeScript
1164 lines
50 KiB
TypeScript
/**
|
|
* Write commands — navigate and interact with pages (side effects)
|
|
*
|
|
* goto, back, forward, reload, click, fill, select, hover, type,
|
|
* press, scroll, wait, viewport, cookie, header, useragent
|
|
*/
|
|
|
|
import type { TabSession } from './tab-session';
|
|
import type { BrowserManager } from './browser-manager';
|
|
import { findInstalledBrowsers, importCookies, listSupportedBrowserNames } from './cookie-import-browser';
|
|
import { generatePickerCode } from './cookie-picker-routes';
|
|
import { validateNavigationUrl } from './url-validation';
|
|
import { validateOutputPath } from './path-security';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import { TEMP_DIR, isPathWithin } from './platform';
|
|
import { SAFE_DIRECTORIES } from './path-security';
|
|
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
|
|
|
|
/**
|
|
* Aggressive page cleanup selectors and heuristics.
|
|
* Goal: make the page readable and clean while keeping it recognizable.
|
|
* Inspired by uBlock Origin filter lists, Readability.js, and reader mode heuristics.
|
|
*/
|
|
const CLEANUP_SELECTORS = {
|
|
ads: [
|
|
// Google Ads
|
|
'ins.adsbygoogle', '[id^="google_ads"]', '[id^="div-gpt-ad"]',
|
|
'iframe[src*="doubleclick"]', 'iframe[src*="googlesyndication"]',
|
|
'[data-google-query-id]', '.google-auto-placed',
|
|
// Generic ad patterns (uBlock Origin common filters)
|
|
'[class*="ad-banner"]', '[class*="ad-wrapper"]', '[class*="ad-container"]',
|
|
'[class*="ad-slot"]', '[class*="ad-unit"]', '[class*="ad-zone"]',
|
|
'[class*="ad-placement"]', '[class*="ad-holder"]', '[class*="ad-block"]',
|
|
'[class*="adbox"]', '[class*="adunit"]', '[class*="adwrap"]',
|
|
'[id*="ad-banner"]', '[id*="ad-wrapper"]', '[id*="ad-container"]',
|
|
'[id*="ad-slot"]', '[id*="ad_banner"]', '[id*="ad_container"]',
|
|
'[data-ad]', '[data-ad-slot]', '[data-ad-unit]', '[data-adunit]',
|
|
'[class*="sponsored"]', '[class*="Sponsored"]',
|
|
'.ad', '.ads', '.advert', '.advertisement',
|
|
'#ad', '#ads', '#advert', '#advertisement',
|
|
// Common ad network iframes
|
|
'iframe[src*="amazon-adsystem"]', 'iframe[src*="outbrain"]',
|
|
'iframe[src*="taboola"]', 'iframe[src*="criteo"]',
|
|
'iframe[src*="adsafeprotected"]', 'iframe[src*="moatads"]',
|
|
// Promoted/sponsored content
|
|
'[class*="promoted"]', '[class*="Promoted"]',
|
|
'[data-testid*="promo"]', '[class*="native-ad"]',
|
|
// Empty ad placeholders (divs with only ad classes, no real content)
|
|
'aside[class*="ad"]', 'section[class*="ad-"]',
|
|
],
|
|
cookies: [
|
|
// Cookie consent frameworks
|
|
'[class*="cookie-consent"]', '[class*="cookie-banner"]', '[class*="cookie-notice"]',
|
|
'[id*="cookie-consent"]', '[id*="cookie-banner"]', '[id*="cookie-notice"]',
|
|
'[class*="consent-banner"]', '[class*="consent-modal"]', '[class*="consent-wall"]',
|
|
'[class*="gdpr"]', '[id*="gdpr"]', '[class*="GDPR"]',
|
|
'[class*="CookieConsent"]', '[id*="CookieConsent"]',
|
|
// OneTrust (very common)
|
|
'#onetrust-consent-sdk', '.onetrust-pc-dark-filter', '#onetrust-banner-sdk',
|
|
// Cookiebot
|
|
'#CybotCookiebotDialog', '#CybotCookiebotDialogBodyUnderlay',
|
|
// TrustArc / TRUSTe
|
|
'#truste-consent-track', '.truste_overlay', '.truste_box_overlay',
|
|
// Quantcast
|
|
'.qc-cmp2-container', '#qc-cmp2-main',
|
|
// Generic patterns
|
|
'[class*="cc-banner"]', '[class*="cc-window"]', '[class*="cc-overlay"]',
|
|
'[class*="privacy-banner"]', '[class*="privacy-notice"]',
|
|
'[id*="privacy-banner"]', '[id*="privacy-notice"]',
|
|
'[class*="accept-cookies"]', '[id*="accept-cookies"]',
|
|
],
|
|
overlays: [
|
|
// Paywall / subscription overlays
|
|
'[class*="paywall"]', '[class*="Paywall"]', '[id*="paywall"]',
|
|
'[class*="subscribe-wall"]', '[class*="subscription-wall"]',
|
|
'[class*="meter-wall"]', '[class*="regwall"]', '[class*="reg-wall"]',
|
|
// Newsletter / signup popups
|
|
'[class*="newsletter-popup"]', '[class*="newsletter-modal"]',
|
|
'[class*="signup-modal"]', '[class*="signup-popup"]',
|
|
'[class*="email-capture"]', '[class*="lead-capture"]',
|
|
'[class*="popup-modal"]', '[class*="modal-overlay"]',
|
|
// Interstitials
|
|
'[class*="interstitial"]', '[id*="interstitial"]',
|
|
// Push notification prompts
|
|
'[class*="push-notification"]', '[class*="notification-prompt"]',
|
|
'[class*="web-push"]',
|
|
// Survey / feedback popups
|
|
'[class*="survey-"]', '[class*="feedback-modal"]',
|
|
'[id*="survey-"]', '[class*="nps-"]',
|
|
// App download banners
|
|
'[class*="app-banner"]', '[class*="smart-banner"]', '[class*="app-download"]',
|
|
'[id*="branch-banner"]', '.smartbanner',
|
|
// Cross-promotion / "follow us" / "preferred source" widgets
|
|
'[class*="promo-banner"]', '[class*="cross-promo"]', '[class*="partner-promo"]',
|
|
'[class*="preferred-source"]', '[class*="google-promo"]',
|
|
],
|
|
clutter: [
|
|
// Audio/podcast player widgets (not part of the article text)
|
|
'[class*="audio-player"]', '[class*="podcast-player"]', '[class*="listen-widget"]',
|
|
'[class*="everlit"]', '[class*="Everlit"]',
|
|
'audio', // bare audio elements
|
|
// Sidebar games/puzzles widgets
|
|
'[class*="puzzle"]', '[class*="daily-game"]', '[class*="games-widget"]',
|
|
'[class*="crossword-promo"]', '[class*="mini-game"]',
|
|
// "Most Popular" / "Trending" sidebar recirculation (not the top nav trending bar)
|
|
'aside [class*="most-popular"]', 'aside [class*="trending"]',
|
|
'aside [class*="most-read"]', 'aside [class*="recommended"]',
|
|
// Related articles / recirculation at bottom
|
|
'[class*="related-articles"]', '[class*="more-stories"]',
|
|
'[class*="recirculation"]', '[class*="taboola"]', '[class*="outbrain"]',
|
|
// Hearst-specific (SF Chronicle, etc.)
|
|
'[class*="nativo"]', '[data-tb-region]',
|
|
],
|
|
sticky: [
|
|
// Handled via JavaScript evaluation, not pure selectors
|
|
],
|
|
social: [
|
|
'[class*="social-share"]', '[class*="share-buttons"]', '[class*="share-bar"]',
|
|
'[class*="social-widget"]', '[class*="social-icons"]', '[class*="share-tools"]',
|
|
'iframe[src*="facebook.com/plugins"]', 'iframe[src*="platform.twitter"]',
|
|
'[class*="fb-like"]', '[class*="tweet-button"]',
|
|
'[class*="addthis"]', '[class*="sharethis"]',
|
|
// Follow prompts
|
|
'[class*="follow-us"]', '[class*="social-follow"]',
|
|
],
|
|
};
|
|
|
|
export async function handleWriteCommand(
|
|
command: string,
|
|
args: string[],
|
|
session: TabSession,
|
|
bm: BrowserManager
|
|
): Promise<string> {
|
|
const page = session.getPage();
|
|
// Frame-aware target for locator-based operations (click, fill, etc.)
|
|
const target = session.getActiveFrameOrPage();
|
|
const inFrame = session.getFrame() !== null;
|
|
|
|
switch (command) {
|
|
case 'goto': {
|
|
if (inFrame) throw new Error('Cannot use goto inside a frame. Run \'frame main\' first.');
|
|
const url = args[0];
|
|
if (!url) throw new Error('Usage: browse goto <url>');
|
|
await validateNavigationUrl(url);
|
|
const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
|
|
const status = response?.status() || 'unknown';
|
|
return `Navigated to ${url} (${status})`;
|
|
}
|
|
|
|
case 'back': {
|
|
if (inFrame) throw new Error('Cannot use back inside a frame. Run \'frame main\' first.');
|
|
await page.goBack({ waitUntil: 'domcontentloaded', timeout: 15000 });
|
|
return `Back → ${page.url()}`;
|
|
}
|
|
|
|
case 'forward': {
|
|
if (inFrame) throw new Error('Cannot use forward inside a frame. Run \'frame main\' first.');
|
|
await page.goForward({ waitUntil: 'domcontentloaded', timeout: 15000 });
|
|
return `Forward → ${page.url()}`;
|
|
}
|
|
|
|
case 'reload': {
|
|
if (inFrame) throw new Error('Cannot use reload inside a frame. Run \'frame main\' first.');
|
|
await page.reload({ waitUntil: 'domcontentloaded', timeout: 15000 });
|
|
return `Reloaded ${page.url()}`;
|
|
}
|
|
|
|
case 'click': {
|
|
const selector = args[0];
|
|
if (!selector) throw new Error('Usage: browse click <selector>');
|
|
|
|
// Auto-route: if ref points to a real <option> inside a <select>, use selectOption
|
|
const role = session.getRefRole(selector);
|
|
if (role === 'option') {
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
const optionInfo = await resolved.locator.evaluate(el => {
|
|
if (el.tagName !== 'OPTION') return null; // custom [role=option], not real <option>
|
|
const option = el as HTMLOptionElement;
|
|
const select = option.closest('select');
|
|
if (!select) return null;
|
|
return { value: option.value, text: option.text };
|
|
});
|
|
if (optionInfo) {
|
|
await resolved.locator.locator('xpath=ancestor::select').selectOption(optionInfo.value, { timeout: 5000 });
|
|
return `Selected "${optionInfo.text}" (auto-routed from click on <option>) → now at ${page.url()}`;
|
|
}
|
|
// Real <option> with no parent <select> or custom [role=option] — fall through to normal click
|
|
}
|
|
}
|
|
|
|
const resolved = await session.resolveRef(selector);
|
|
try {
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.click({ timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).click({ timeout: 5000 });
|
|
}
|
|
} catch (err: any) {
|
|
// Enhanced error guidance: clicking <option> elements always fails (not visible / timeout)
|
|
const isOption = 'locator' in resolved
|
|
? await resolved.locator.evaluate(el => el.tagName === 'OPTION').catch(() => false)
|
|
: await target.locator(resolved.selector).evaluate(
|
|
el => el.tagName === 'OPTION'
|
|
).catch(() => false);
|
|
if (isOption) {
|
|
throw new Error(
|
|
`Cannot click <option> elements. Use 'browse select <parent-select> <value>' instead of 'click' for dropdown options.`
|
|
);
|
|
}
|
|
throw err;
|
|
}
|
|
// Wait for network to settle (catches XHR/fetch triggered by clicks)
|
|
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
|
|
return `Clicked ${selector} → now at ${page.url()}`;
|
|
}
|
|
|
|
case 'fill': {
|
|
const [selector, ...valueParts] = args;
|
|
const value = valueParts.join(' ');
|
|
if (!selector || !value) throw new Error('Usage: browse fill <selector> <value>');
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.fill(value, { timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).fill(value, { timeout: 5000 });
|
|
}
|
|
// Wait for network to settle (form validation XHRs)
|
|
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
|
|
return `Filled ${selector}`;
|
|
}
|
|
|
|
case 'select': {
|
|
const [selector, ...valueParts] = args;
|
|
const value = valueParts.join(' ');
|
|
if (!selector || !value) throw new Error('Usage: browse select <selector> <value>');
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.selectOption(value, { timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).selectOption(value, { timeout: 5000 });
|
|
}
|
|
// Wait for network to settle (dropdown-triggered requests)
|
|
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
|
|
return `Selected "${value}" in ${selector}`;
|
|
}
|
|
|
|
case 'hover': {
|
|
const selector = args[0];
|
|
if (!selector) throw new Error('Usage: browse hover <selector>');
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.hover({ timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).hover({ timeout: 5000 });
|
|
}
|
|
return `Hovered ${selector}`;
|
|
}
|
|
|
|
case 'type': {
|
|
const text = args.join(' ');
|
|
if (!text) throw new Error('Usage: browse type <text>');
|
|
await page.keyboard.type(text);
|
|
return `Typed ${text.length} characters`;
|
|
}
|
|
|
|
case 'press': {
|
|
const key = args[0];
|
|
if (!key) throw new Error('Usage: browse press <key> (e.g., Enter, Tab, Escape)');
|
|
await page.keyboard.press(key);
|
|
return `Pressed ${key}`;
|
|
}
|
|
|
|
case 'scroll': {
|
|
// Parse --times N and --wait ms flags
|
|
const timesIdx = args.indexOf('--times');
|
|
const times = timesIdx >= 0 ? parseInt(args[timesIdx + 1], 10) || 1 : 0;
|
|
const waitIdx = args.indexOf('--wait');
|
|
const waitMs = waitIdx >= 0 ? parseInt(args[waitIdx + 1], 10) || 1000 : 1000;
|
|
const selector = args.find(a => !a.startsWith('--') && args.indexOf(a) !== timesIdx + 1 && args.indexOf(a) !== waitIdx + 1);
|
|
|
|
if (times > 0) {
|
|
// Repeated scroll mode
|
|
for (let i = 0; i < times; i++) {
|
|
if (selector) {
|
|
const resolved = await bm.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
|
|
}
|
|
} else {
|
|
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
|
|
}
|
|
if (i < times - 1) await new Promise(r => setTimeout(r, waitMs));
|
|
}
|
|
return `Scrolled ${times} times${selector ? ` (${selector})` : ''} with ${waitMs}ms delay`;
|
|
}
|
|
|
|
// Single scroll (original behavior)
|
|
if (selector) {
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
|
|
} else {
|
|
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
|
|
}
|
|
return `Scrolled ${selector} into view`;
|
|
}
|
|
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
|
|
return 'Scrolled to bottom';
|
|
}
|
|
|
|
case 'wait': {
|
|
const selector = args[0];
|
|
if (!selector) throw new Error('Usage: browse wait <selector|--networkidle|--load|--domcontentloaded>');
|
|
if (selector === '--networkidle') {
|
|
const MAX_WAIT_MS = 300_000;
|
|
const MIN_WAIT_MS = 1_000;
|
|
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
|
|
await page.waitForLoadState('networkidle', { timeout });
|
|
return 'Network idle';
|
|
}
|
|
if (selector === '--load') {
|
|
await page.waitForLoadState('load');
|
|
return 'Page loaded';
|
|
}
|
|
if (selector === '--domcontentloaded') {
|
|
await page.waitForLoadState('domcontentloaded');
|
|
return 'DOM content loaded';
|
|
}
|
|
const MAX_WAIT_MS = 300_000;
|
|
const MIN_WAIT_MS = 1_000;
|
|
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.waitFor({ state: 'visible', timeout });
|
|
} else {
|
|
await target.locator(resolved.selector).waitFor({ state: 'visible', timeout });
|
|
}
|
|
return `Element ${selector} appeared`;
|
|
}
|
|
|
|
case 'viewport': {
|
|
const size = args[0];
|
|
if (!size || !size.includes('x')) throw new Error('Usage: browse viewport <WxH> (e.g., 375x812)');
|
|
const [rawW, rawH] = size.split('x').map(Number);
|
|
const w = Math.min(Math.max(Math.round(rawW) || 1280, 1), 16384);
|
|
const h = Math.min(Math.max(Math.round(rawH) || 720, 1), 16384);
|
|
await bm.setViewport(w, h);
|
|
return `Viewport set to ${w}x${h}`;
|
|
}
|
|
|
|
case 'cookie': {
|
|
const cookieStr = args[0];
|
|
if (!cookieStr || !cookieStr.includes('=')) throw new Error('Usage: browse cookie <name>=<value>');
|
|
const eq = cookieStr.indexOf('=');
|
|
const name = cookieStr.slice(0, eq);
|
|
const value = cookieStr.slice(eq + 1);
|
|
const url = new URL(page.url());
|
|
await page.context().addCookies([{
|
|
name,
|
|
value,
|
|
domain: url.hostname,
|
|
path: '/',
|
|
}]);
|
|
return `Cookie set: ${name}=****`;
|
|
}
|
|
|
|
case 'header': {
|
|
const headerStr = args[0];
|
|
if (!headerStr || !headerStr.includes(':')) throw new Error('Usage: browse header <name>:<value>');
|
|
const sep = headerStr.indexOf(':');
|
|
const name = headerStr.slice(0, sep).trim();
|
|
const value = headerStr.slice(sep + 1).trim();
|
|
await bm.setExtraHeader(name, value);
|
|
const sensitiveHeaders = ['authorization', 'cookie', 'set-cookie', 'x-api-key', 'x-auth-token'];
|
|
const redactedValue = sensitiveHeaders.includes(name.toLowerCase()) ? '****' : value;
|
|
return `Header set: ${name}: ${redactedValue}`;
|
|
}
|
|
|
|
case 'useragent': {
|
|
const ua = args.join(' ');
|
|
if (!ua) throw new Error('Usage: browse useragent <string>');
|
|
bm.setUserAgent(ua);
|
|
const error = await bm.recreateContext();
|
|
if (error) {
|
|
return `User agent set to "${ua}" but: ${error}`;
|
|
}
|
|
return `User agent set: ${ua}`;
|
|
}
|
|
|
|
case 'upload': {
|
|
const [selector, ...filePaths] = args;
|
|
if (!selector || filePaths.length === 0) throw new Error('Usage: browse upload <selector> <file1> [file2...]');
|
|
|
|
// Validate paths are within safe directories (same check as cookie-import)
|
|
for (const fp of filePaths) {
|
|
if (!fs.existsSync(fp)) throw new Error(`File not found: ${fp}`);
|
|
if (path.isAbsolute(fp)) {
|
|
let resolvedFp: string;
|
|
try { resolvedFp = fs.realpathSync(path.resolve(fp)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; resolvedFp = path.resolve(fp); }
|
|
if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedFp, dir))) {
|
|
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
|
|
}
|
|
}
|
|
if (path.normalize(fp).includes('..')) {
|
|
throw new Error('Path traversal sequences (..) are not allowed');
|
|
}
|
|
}
|
|
|
|
const resolved = await session.resolveRef(selector);
|
|
if ('locator' in resolved) {
|
|
await resolved.locator.setInputFiles(filePaths);
|
|
} else {
|
|
await target.locator(resolved.selector).setInputFiles(filePaths);
|
|
}
|
|
|
|
const fileInfo = filePaths.map(fp => {
|
|
const stat = fs.statSync(fp);
|
|
return `${path.basename(fp)} (${stat.size}B)`;
|
|
}).join(', ');
|
|
return `Uploaded: ${fileInfo}`;
|
|
}
|
|
|
|
case 'dialog-accept': {
|
|
const text = args.length > 0 ? args.join(' ') : null;
|
|
bm.setDialogAutoAccept(true);
|
|
bm.setDialogPromptText(text);
|
|
return text
|
|
? `Dialogs will be accepted with text: "${text}"`
|
|
: 'Dialogs will be accepted';
|
|
}
|
|
|
|
case 'dialog-dismiss': {
|
|
bm.setDialogAutoAccept(false);
|
|
bm.setDialogPromptText(null);
|
|
return 'Dialogs will be dismissed';
|
|
}
|
|
|
|
case 'cookie-import': {
|
|
const filePath = args[0];
|
|
if (!filePath) throw new Error('Usage: browse cookie-import <json-file>');
|
|
// Path validation — resolve to absolute and check against safe dirs.
|
|
// Fixes #707: relative paths previously bypassed the safe directory check.
|
|
// Mirrors validateOutputPath() — resolves symlinks (e.g., macOS /tmp → /private/tmp).
|
|
const resolved = path.resolve(filePath);
|
|
let resolvedReal = resolved;
|
|
try { resolvedReal = fs.realpathSync(resolved); } catch {
|
|
// File may not exist yet — resolve parent dir instead
|
|
try { resolvedReal = path.join(fs.realpathSync(path.dirname(resolved)), path.basename(resolved)); } catch {}
|
|
}
|
|
if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedReal, dir))) {
|
|
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
|
|
}
|
|
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
|
|
const raw = fs.readFileSync(filePath, 'utf-8');
|
|
let cookies: any[];
|
|
try { cookies = JSON.parse(raw); } catch (err: any) { throw new Error(`Invalid JSON in ${filePath}: ${err?.message || err}`); }
|
|
if (!Array.isArray(cookies)) throw new Error('Cookie file must contain a JSON array');
|
|
|
|
// Auto-fill domain from current page URL when missing (consistent with cookie command)
|
|
const pageUrl = new URL(page.url());
|
|
const defaultDomain = pageUrl.hostname;
|
|
|
|
for (const c of cookies) {
|
|
if (!c.name || c.value === undefined) throw new Error('Each cookie must have "name" and "value" fields');
|
|
if (!c.domain) {
|
|
c.domain = defaultDomain;
|
|
} else {
|
|
const cookieDomain = c.domain.startsWith('.') ? c.domain.slice(1) : c.domain;
|
|
if (cookieDomain !== defaultDomain && !defaultDomain.endsWith('.' + cookieDomain)) {
|
|
throw new Error(`Cookie domain "${c.domain}" does not match current page domain "${defaultDomain}". Use the target site first.`);
|
|
}
|
|
}
|
|
if (!c.path) c.path = '/';
|
|
}
|
|
|
|
await page.context().addCookies(cookies);
|
|
const importedDomains = [...new Set(cookies.map((c: any) => c.domain).filter(Boolean))];
|
|
if (importedDomains.length > 0) bm.trackCookieImportDomains(importedDomains);
|
|
return `Loaded ${cookies.length} cookies from ${filePath}`;
|
|
}
|
|
|
|
case 'cookie-import-browser': {
|
|
// Two modes:
|
|
// 1. Direct CLI import: cookie-import-browser <browser> --domain <domain> [--profile <profile>]
|
|
// Requires --domain (or --all to explicitly import everything).
|
|
// 2. Open picker UI: cookie-import-browser [browser] (interactive domain selection)
|
|
const browserArg = args[0];
|
|
const domainIdx = args.indexOf('--domain');
|
|
const profileIdx = args.indexOf('--profile');
|
|
const hasAll = args.includes('--all');
|
|
const profile = (profileIdx !== -1 && profileIdx + 1 < args.length) ? args[profileIdx + 1] : 'Default';
|
|
|
|
if (domainIdx !== -1 && domainIdx + 1 < args.length) {
|
|
// Direct import mode — scoped to specific domain
|
|
const domain = args[domainIdx + 1];
|
|
// Validate --domain against current page hostname to prevent cross-site cookie injection
|
|
const pageHostname = new URL(page.url()).hostname;
|
|
const normalizedDomain = domain.startsWith('.') ? domain.slice(1) : domain;
|
|
if (normalizedDomain !== pageHostname && !pageHostname.endsWith('.' + normalizedDomain)) {
|
|
throw new Error(`--domain "${domain}" does not match current page domain "${pageHostname}". Navigate to the target site first.`);
|
|
}
|
|
const browser = browserArg || 'comet';
|
|
const result = await importCookies(browser, [domain], profile);
|
|
if (result.cookies.length > 0) {
|
|
await page.context().addCookies(result.cookies);
|
|
bm.trackCookieImportDomains([domain]);
|
|
}
|
|
const msg = [`Imported ${result.count} cookies for ${domain} from ${browser}`];
|
|
if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
|
|
return msg.join(' ');
|
|
}
|
|
|
|
if (hasAll) {
|
|
// Explicit all-cookies import — requires --all flag as a deliberate opt-in.
|
|
// Imports every non-expired cookie domain from the browser.
|
|
const browser = browserArg || 'comet';
|
|
const { listDomains } = await import('./cookie-import-browser');
|
|
const { domains } = listDomains(browser, profile);
|
|
const allDomainNames = domains.map((d: any) => d.domain);
|
|
if (allDomainNames.length === 0) {
|
|
return `No cookies found in ${browser} (profile: ${profile})`;
|
|
}
|
|
const result = await importCookies(browser, allDomainNames, profile);
|
|
if (result.cookies.length > 0) {
|
|
await page.context().addCookies(result.cookies);
|
|
bm.trackCookieImportDomains(allDomainNames);
|
|
}
|
|
const msg = [`Imported ${result.count} cookies across ${Object.keys(result.domainCounts).length} domains from ${browser}`];
|
|
msg.push('(used --all: all browser cookies imported, consider --domain for tighter scoping)');
|
|
if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
|
|
return msg.join(' ');
|
|
}
|
|
|
|
// Picker UI mode — open in user's browser for interactive domain selection
|
|
const port = bm.serverPort;
|
|
if (!port) throw new Error('Server port not available');
|
|
|
|
const browsers = findInstalledBrowsers();
|
|
if (browsers.length === 0) {
|
|
throw new Error(`No Chromium browsers found. Supported: ${listSupportedBrowserNames().join(', ')}`);
|
|
}
|
|
|
|
const code = generatePickerCode();
|
|
const pickerUrl = `http://127.0.0.1:${port}/cookie-picker?code=${code}`;
|
|
try {
|
|
Bun.spawn(['open', pickerUrl], { stdout: 'ignore', stderr: 'ignore' });
|
|
} catch (err: any) {
|
|
// open may fail on non-macOS or if 'open' binary is missing — URL is in the message below
|
|
if (err?.code !== 'ENOENT' && !err?.message?.includes('spawn')) throw err;
|
|
}
|
|
|
|
return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.\n\nTip: For scripted imports, use --domain <domain> to scope cookies to a single domain.`;
|
|
}
|
|
|
|
case 'style': {
|
|
// style --undo [N] → revert modification
|
|
if (args[0] === '--undo') {
|
|
const idx = args[1] ? parseInt(args[1], 10) : undefined;
|
|
await undoModification(page, idx);
|
|
return idx !== undefined ? `Reverted modification #${idx}` : 'Reverted last modification';
|
|
}
|
|
|
|
// style <selector> <property> <value>
|
|
const [selector, property, ...valueParts] = args;
|
|
const value = valueParts.join(' ');
|
|
if (!selector || !property || !value) {
|
|
throw new Error('Usage: browse style <sel> <prop> <value> | style --undo [N]');
|
|
}
|
|
|
|
// Validate CSS property name
|
|
if (!/^[a-zA-Z-]+$/.test(property)) {
|
|
throw new Error(`Invalid CSS property name: ${property}. Only letters and hyphens allowed.`);
|
|
}
|
|
|
|
// Validate CSS value — block data exfiltration patterns
|
|
const DANGEROUS_CSS = /url\s*\(|expression\s*\(|@import|javascript:|data:/i;
|
|
if (DANGEROUS_CSS.test(value)) {
|
|
throw new Error('CSS value rejected: contains potentially dangerous pattern.');
|
|
}
|
|
|
|
const mod = await modifyStyle(page, selector, property, value);
|
|
return `Style modified: ${selector} { ${property}: ${mod.oldValue || '(none)'} → ${value} } (${mod.method})`;
|
|
}
|
|
|
|
case 'cleanup': {
|
|
// Parse flags
|
|
let doAds = false, doCookies = false, doSticky = false, doSocial = false;
|
|
let doOverlays = false, doClutter = false;
|
|
let doAll = false;
|
|
|
|
// Default to --all if no args (most common use case from sidebar button)
|
|
if (args.length === 0) {
|
|
doAll = true;
|
|
}
|
|
|
|
for (const arg of args) {
|
|
switch (arg) {
|
|
case '--ads': doAds = true; break;
|
|
case '--cookies': doCookies = true; break;
|
|
case '--sticky': doSticky = true; break;
|
|
case '--social': doSocial = true; break;
|
|
case '--overlays': doOverlays = true; break;
|
|
case '--clutter': doClutter = true; break;
|
|
case '--all': doAll = true; break;
|
|
default:
|
|
throw new Error(`Unknown cleanup flag: ${arg}. Use: --ads, --cookies, --sticky, --social, --overlays, --clutter, --all`);
|
|
}
|
|
}
|
|
|
|
if (doAll) {
|
|
doAds = doCookies = doSticky = doSocial = doOverlays = doClutter = true;
|
|
}
|
|
|
|
const removed: string[] = [];
|
|
|
|
// Build selector list for categories to clean
|
|
const selectors: string[] = [];
|
|
if (doAds) selectors.push(...CLEANUP_SELECTORS.ads);
|
|
if (doCookies) selectors.push(...CLEANUP_SELECTORS.cookies);
|
|
if (doSocial) selectors.push(...CLEANUP_SELECTORS.social);
|
|
if (doOverlays) selectors.push(...CLEANUP_SELECTORS.overlays);
|
|
if (doClutter) selectors.push(...CLEANUP_SELECTORS.clutter);
|
|
|
|
if (selectors.length > 0) {
|
|
const count = await page.evaluate((sels: string[]) => {
|
|
let removed = 0;
|
|
for (const sel of sels) {
|
|
try {
|
|
const els = document.querySelectorAll(sel);
|
|
els.forEach(el => {
|
|
(el as HTMLElement).style.setProperty('display', 'none', 'important');
|
|
removed++;
|
|
});
|
|
} catch (err: any) {
|
|
// querySelectorAll throws DOMException on invalid CSS selectors — skip those
|
|
if (!(err instanceof DOMException)) throw err;
|
|
}
|
|
}
|
|
return removed;
|
|
}, selectors);
|
|
if (count > 0) {
|
|
if (doAds) removed.push('ads');
|
|
if (doCookies) removed.push('cookie banners');
|
|
if (doSocial) removed.push('social widgets');
|
|
if (doOverlays) removed.push('overlays/popups');
|
|
if (doClutter) removed.push('clutter');
|
|
}
|
|
}
|
|
|
|
// Sticky/fixed elements — handled separately with computed style check
|
|
if (doSticky) {
|
|
const stickyCount = await page.evaluate(() => {
|
|
let removed = 0;
|
|
// Collect all sticky/fixed elements, sort by vertical position
|
|
const stickyEls: Array<{ el: Element; top: number; width: number; height: number }> = [];
|
|
const allElements = document.querySelectorAll('*');
|
|
const viewportWidth = window.innerWidth;
|
|
for (const el of allElements) {
|
|
const style = getComputedStyle(el);
|
|
if (style.position === 'fixed' || style.position === 'sticky') {
|
|
const rect = el.getBoundingClientRect();
|
|
stickyEls.push({ el, top: rect.top, width: rect.width, height: rect.height });
|
|
}
|
|
}
|
|
// Sort by vertical position (topmost first)
|
|
stickyEls.sort((a, b) => a.top - b.top);
|
|
let preservedTopNav = false;
|
|
for (const { el, top, width, height } of stickyEls) {
|
|
const tag = el.tagName.toLowerCase();
|
|
// Always skip nav/header semantic elements
|
|
if (tag === 'nav' || tag === 'header') continue;
|
|
if (el.getAttribute('role') === 'navigation') continue;
|
|
// Skip the gstack control indicator
|
|
if ((el as HTMLElement).id === 'gstack-ctrl') continue;
|
|
// Preserve the FIRST full-width element near the top (site's main nav bar)
|
|
// This catches divs that act as navbars but aren't semantic <nav> elements
|
|
if (!preservedTopNav && top <= 50 && width > viewportWidth * 0.8 && height < 120) {
|
|
preservedTopNav = true;
|
|
continue;
|
|
}
|
|
(el as HTMLElement).style.setProperty('display', 'none', 'important');
|
|
removed++;
|
|
}
|
|
return removed;
|
|
});
|
|
if (stickyCount > 0) removed.push(`${stickyCount} sticky/fixed elements`);
|
|
}
|
|
|
|
// Unlock scrolling (many sites lock body scroll when modals are open)
|
|
const scrollFixed = await page.evaluate(() => {
|
|
let fixed = 0;
|
|
// Unlock body and html scroll
|
|
for (const el of [document.body, document.documentElement]) {
|
|
if (!el) continue;
|
|
const style = getComputedStyle(el);
|
|
if (style.overflow === 'hidden' || style.overflowY === 'hidden') {
|
|
(el as HTMLElement).style.setProperty('overflow', 'auto', 'important');
|
|
(el as HTMLElement).style.setProperty('overflow-y', 'auto', 'important');
|
|
fixed++;
|
|
}
|
|
// Remove height:100% + position:fixed that locks scroll
|
|
if (style.position === 'fixed' && (el === document.body || el === document.documentElement)) {
|
|
(el as HTMLElement).style.setProperty('position', 'static', 'important');
|
|
fixed++;
|
|
}
|
|
}
|
|
// Remove blur/filter effects (paywalls often blur the content)
|
|
const blurred = document.querySelectorAll('[style*="blur"], [style*="filter"]');
|
|
blurred.forEach(el => {
|
|
const s = (el as HTMLElement).style;
|
|
if (s.filter?.includes('blur') || s.webkitFilter?.includes('blur')) {
|
|
s.setProperty('filter', 'none', 'important');
|
|
s.setProperty('-webkit-filter', 'none', 'important');
|
|
fixed++;
|
|
}
|
|
});
|
|
// Remove max-height truncation (article truncation)
|
|
const truncated = document.querySelectorAll('[class*="truncat"], [class*="preview"], [class*="teaser"]');
|
|
truncated.forEach(el => {
|
|
const s = getComputedStyle(el);
|
|
if (s.maxHeight && s.maxHeight !== 'none' && parseInt(s.maxHeight) < 500) {
|
|
(el as HTMLElement).style.setProperty('max-height', 'none', 'important');
|
|
(el as HTMLElement).style.setProperty('overflow', 'visible', 'important');
|
|
fixed++;
|
|
}
|
|
});
|
|
return fixed;
|
|
});
|
|
if (scrollFixed > 0) removed.push('scroll unlocked');
|
|
|
|
// Remove "ADVERTISEMENT" / "Article continues below" text labels
|
|
const adLabelCount = await page.evaluate(() => {
|
|
let removed = 0;
|
|
const adTextPatterns = [
|
|
/^advertisement$/i, /^sponsored$/i, /^promoted$/i,
|
|
/article continues/i, /continues below/i,
|
|
/^ad$/i, /^paid content$/i, /^partner content$/i,
|
|
];
|
|
// Walk text-heavy small elements looking for ad labels
|
|
const candidates = document.querySelectorAll('div, span, p, figcaption, label');
|
|
for (const el of candidates) {
|
|
const text = (el.textContent || '').trim();
|
|
if (text.length > 50) continue; // Too much text, probably real content
|
|
if (adTextPatterns.some(p => p.test(text))) {
|
|
// Also hide the parent if it's a wrapper with little else
|
|
const parent = el.parentElement;
|
|
if (parent && (parent.textContent || '').trim().length < 80) {
|
|
(parent as HTMLElement).style.setProperty('display', 'none', 'important');
|
|
} else {
|
|
(el as HTMLElement).style.setProperty('display', 'none', 'important');
|
|
}
|
|
removed++;
|
|
}
|
|
}
|
|
return removed;
|
|
});
|
|
if (adLabelCount > 0) removed.push(`${adLabelCount} ad labels`);
|
|
|
|
// Remove empty ad placeholder whitespace (divs that are now empty after ad removal)
|
|
const collapsedCount = await page.evaluate(() => {
|
|
let collapsed = 0;
|
|
const candidates = document.querySelectorAll(
|
|
'div[class*="ad"], div[id*="ad"], aside[class*="ad"], div[class*="sidebar"], ' +
|
|
'div[class*="rail"], div[class*="right-col"], div[class*="widget"]'
|
|
);
|
|
for (const el of candidates) {
|
|
const rect = el.getBoundingClientRect();
|
|
// If the element has significant height but no visible text content, collapse it
|
|
if (rect.height > 50 && rect.width > 0) {
|
|
const text = (el.textContent || '').trim();
|
|
const images = el.querySelectorAll('img:not([src*="logo"]):not([src*="icon"])');
|
|
const links = el.querySelectorAll('a');
|
|
// Empty or mostly empty: collapse
|
|
if (text.length < 20 && images.length === 0 && links.length < 2) {
|
|
(el as HTMLElement).style.setProperty('display', 'none', 'important');
|
|
collapsed++;
|
|
}
|
|
}
|
|
}
|
|
return collapsed;
|
|
});
|
|
if (collapsedCount > 0) removed.push(`${collapsedCount} empty placeholders`);
|
|
|
|
if (removed.length === 0) return 'No clutter elements found to remove.';
|
|
return `Cleaned up: ${removed.join(', ')}`;
|
|
}
|
|
|
|
case 'prettyscreenshot': {
|
|
// Parse flags
|
|
let scrollTo: string | undefined;
|
|
let doCleanup = false;
|
|
const hideSelectors: string[] = [];
|
|
let viewportWidth: number | undefined;
|
|
let outputPath: string | undefined;
|
|
|
|
for (let i = 0; i < args.length; i++) {
|
|
if (args[i] === '--scroll-to' && i + 1 < args.length) {
|
|
scrollTo = args[++i];
|
|
} else if (args[i] === '--cleanup') {
|
|
doCleanup = true;
|
|
} else if (args[i] === '--hide' && i + 1 < args.length) {
|
|
// Collect all following non-flag args as selectors to hide
|
|
i++;
|
|
while (i < args.length && !args[i].startsWith('--')) {
|
|
hideSelectors.push(args[i]);
|
|
i++;
|
|
}
|
|
i--; // Back up since the for loop will increment
|
|
} else if (args[i] === '--width' && i + 1 < args.length) {
|
|
viewportWidth = parseInt(args[++i], 10);
|
|
if (isNaN(viewportWidth)) throw new Error('--width must be a number');
|
|
} else if (!args[i].startsWith('--')) {
|
|
outputPath = args[i];
|
|
} else {
|
|
throw new Error(`Unknown prettyscreenshot flag: ${args[i]}`);
|
|
}
|
|
}
|
|
|
|
// Default output path
|
|
if (!outputPath) {
|
|
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
|
|
outputPath = `${TEMP_DIR}/browse-pretty-${timestamp}.png`;
|
|
}
|
|
validateOutputPath(outputPath);
|
|
|
|
const originalViewport = page.viewportSize();
|
|
|
|
// Set viewport width if specified
|
|
if (viewportWidth && originalViewport) {
|
|
await page.setViewportSize({ width: viewportWidth, height: originalViewport.height });
|
|
}
|
|
|
|
// Run cleanup if requested
|
|
if (doCleanup) {
|
|
const allSelectors = [
|
|
...CLEANUP_SELECTORS.ads,
|
|
...CLEANUP_SELECTORS.cookies,
|
|
...CLEANUP_SELECTORS.social,
|
|
];
|
|
await page.evaluate((sels: string[]) => {
|
|
for (const sel of sels) {
|
|
try {
|
|
document.querySelectorAll(sel).forEach(el => {
|
|
(el as HTMLElement).style.display = 'none';
|
|
});
|
|
} catch (err: any) {
|
|
if (!(err instanceof DOMException)) throw err;
|
|
}
|
|
}
|
|
// Also hide fixed/sticky (except nav)
|
|
for (const el of document.querySelectorAll('*')) {
|
|
const style = getComputedStyle(el);
|
|
if (style.position === 'fixed' || style.position === 'sticky') {
|
|
const tag = el.tagName.toLowerCase();
|
|
if (tag === 'nav' || tag === 'header') continue;
|
|
if (el.getAttribute('role') === 'navigation') continue;
|
|
(el as HTMLElement).style.display = 'none';
|
|
}
|
|
}
|
|
}, allSelectors);
|
|
}
|
|
|
|
// Hide specific elements
|
|
if (hideSelectors.length > 0) {
|
|
await page.evaluate((sels: string[]) => {
|
|
for (const sel of sels) {
|
|
try {
|
|
document.querySelectorAll(sel).forEach(el => {
|
|
(el as HTMLElement).style.display = 'none';
|
|
});
|
|
} catch (err: any) {
|
|
if (!(err instanceof DOMException)) throw err;
|
|
}
|
|
}
|
|
}, hideSelectors);
|
|
}
|
|
|
|
// Scroll to target
|
|
if (scrollTo) {
|
|
// Try as CSS selector first, then as text content
|
|
const scrolled = await page.evaluate((target: string) => {
|
|
// Try CSS selector
|
|
let el = document.querySelector(target);
|
|
if (el) {
|
|
el.scrollIntoView({ behavior: 'instant', block: 'center' });
|
|
return true;
|
|
}
|
|
// Try text match
|
|
const walker = document.createTreeWalker(
|
|
document.body,
|
|
NodeFilter.SHOW_TEXT,
|
|
null,
|
|
);
|
|
let node: Node | null;
|
|
while ((node = walker.nextNode())) {
|
|
if (node.textContent?.includes(target)) {
|
|
const parent = node.parentElement;
|
|
if (parent) {
|
|
parent.scrollIntoView({ behavior: 'instant', block: 'center' });
|
|
return true;
|
|
}
|
|
}
|
|
}
|
|
return false;
|
|
}, scrollTo);
|
|
|
|
if (!scrolled) {
|
|
// Restore viewport before throwing
|
|
if (viewportWidth && originalViewport) {
|
|
await page.setViewportSize(originalViewport);
|
|
}
|
|
throw new Error(`Could not find element or text to scroll to: ${scrollTo}`);
|
|
}
|
|
// Brief wait for scroll to settle
|
|
await page.waitForTimeout(300);
|
|
}
|
|
|
|
// Take screenshot
|
|
await page.screenshot({ path: outputPath, fullPage: !scrollTo });
|
|
|
|
// Restore viewport
|
|
if (viewportWidth && originalViewport) {
|
|
await page.setViewportSize(originalViewport);
|
|
}
|
|
|
|
const parts = ['Screenshot saved'];
|
|
if (doCleanup) parts.push('(cleaned)');
|
|
if (scrollTo) parts.push(`(scrolled to: ${scrollTo})`);
|
|
parts.push(`: ${outputPath}`);
|
|
return parts.join(' ');
|
|
}
|
|
|
|
case 'download': {
|
|
if (args.length === 0) throw new Error('Usage: download <url|@ref> [path] [--base64]');
|
|
const isBase64 = args.includes('--base64');
|
|
const filteredArgs = args.filter(a => a !== '--base64');
|
|
let url = filteredArgs[0];
|
|
const outputPath = filteredArgs[1];
|
|
|
|
// Resolve @ref to element src
|
|
if (url.startsWith('@')) {
|
|
const resolved = await bm.resolveRef(url);
|
|
if (!('locator' in resolved)) throw new Error(`Expected @ref, got CSS selector: ${url}`);
|
|
const locator = resolved.locator;
|
|
const tagName = await locator.evaluate(el => el.tagName.toLowerCase());
|
|
if (tagName === 'img') {
|
|
url = await locator.evaluate(el => {
|
|
const img = el as HTMLImageElement;
|
|
return img.currentSrc || img.src || img.getAttribute('data-src') || '';
|
|
});
|
|
} else if (tagName === 'video') {
|
|
url = await locator.evaluate(el => (el as HTMLVideoElement).currentSrc || (el as HTMLVideoElement).src || '');
|
|
} else if (tagName === 'audio') {
|
|
url = await locator.evaluate(el => (el as HTMLAudioElement).currentSrc || (el as HTMLAudioElement).src || '');
|
|
} else {
|
|
// Try src attribute on any element
|
|
url = await locator.evaluate(el => el.getAttribute('src') || '');
|
|
}
|
|
if (!url) throw new Error(`Could not extract URL from ${filteredArgs[0]} (${tagName})`);
|
|
}
|
|
|
|
// Check for HLS/DASH
|
|
if (url.includes('.m3u8') || url.includes('.mpd')) {
|
|
throw new Error('This is an HLS/DASH stream. Use yt-dlp or ffmpeg for adaptive stream downloads.');
|
|
}
|
|
|
|
// Determine output path and extension
|
|
const page = bm.getPage();
|
|
let contentType = 'application/octet-stream';
|
|
let buffer: Buffer;
|
|
|
|
if (url.startsWith('blob:')) {
|
|
// Strategy 3: Blob URL -- in-page fetch + base64
|
|
const dataUrl = await page.evaluate(async (blobUrl) => {
|
|
try {
|
|
const resp = await fetch(blobUrl);
|
|
const blob = await resp.blob();
|
|
if (blob.size > 100 * 1024 * 1024) return 'ERROR:TOO_LARGE';
|
|
return new Promise<string>((resolve, reject) => {
|
|
const reader = new FileReader();
|
|
reader.onloadend = () => resolve(reader.result as string);
|
|
reader.onerror = () => reject('Failed to read blob');
|
|
reader.readAsDataURL(blob);
|
|
});
|
|
} catch (err: any) {
|
|
return `ERROR:EXPIRED:${err?.message || 'unknown'}`;
|
|
}
|
|
}, url);
|
|
|
|
if (dataUrl === 'ERROR:TOO_LARGE') throw new Error('Blob too large (>100MB). Use a different approach.');
|
|
if (dataUrl.startsWith('ERROR:EXPIRED')) throw new Error(`Blob URL expired or inaccessible: ${dataUrl.slice('ERROR:EXPIRED:'.length)}`);
|
|
|
|
const match = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
|
|
if (!match) throw new Error('Failed to decode blob data');
|
|
contentType = match[1];
|
|
buffer = Buffer.from(match[2], 'base64');
|
|
} else {
|
|
// Strategy 1: Direct URL via page.request.fetch()
|
|
const response = await page.request.fetch(url, { timeout: 30000 });
|
|
const status = response.status();
|
|
if (status >= 400) {
|
|
throw new Error(`Download failed: HTTP ${status} ${response.statusText()}`);
|
|
}
|
|
contentType = response.headers()['content-type'] || 'application/octet-stream';
|
|
buffer = Buffer.from(await response.body());
|
|
if (buffer.length > 200 * 1024 * 1024) {
|
|
throw new Error('File too large (>200MB).');
|
|
}
|
|
}
|
|
|
|
// --base64 mode: return inline
|
|
if (isBase64) {
|
|
if (buffer.length > 10 * 1024 * 1024) {
|
|
throw new Error('File too large for --base64 (>10MB). Use disk download + GET /file instead.');
|
|
}
|
|
const mimeType = contentType.split(';')[0].trim();
|
|
return `data:${mimeType};base64,${buffer.toString('base64')}`;
|
|
}
|
|
|
|
// Write to disk
|
|
const ext = contentType.split(';')[0].includes('/')
|
|
? mimeToExt(contentType.split(';')[0].trim())
|
|
: '.bin';
|
|
const destPath = outputPath || path.join(TEMP_DIR, `browse-download-${Date.now()}${ext}`);
|
|
validateOutputPath(destPath);
|
|
fs.writeFileSync(destPath, buffer);
|
|
const sizeKB = Math.round(buffer.length / 1024);
|
|
return `Downloaded: ${destPath} (${sizeKB}KB, ${contentType.split(';')[0].trim()})`;
|
|
}
|
|
|
|
case 'scrape': {
|
|
if (args.length === 0) throw new Error('Usage: scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]');
|
|
const mediaType = args[0];
|
|
if (!['images', 'videos', 'media'].includes(mediaType)) {
|
|
throw new Error(`Invalid type: ${mediaType}. Use: images, videos, or media`);
|
|
}
|
|
|
|
// Parse flags
|
|
const selectorIdx = args.indexOf('--selector');
|
|
const selector = selectorIdx >= 0 ? args[selectorIdx + 1] : undefined;
|
|
const dirIdx = args.indexOf('--dir');
|
|
const dir = dirIdx >= 0 ? args[dirIdx + 1] : path.join(TEMP_DIR, `browse-scrape-${Date.now()}`);
|
|
const limitIdx = args.indexOf('--limit');
|
|
const limit = Math.min(limitIdx >= 0 ? parseInt(args[limitIdx + 1], 10) || 50 : 50, 200);
|
|
|
|
validateOutputPath(dir);
|
|
fs.mkdirSync(dir, { recursive: true });
|
|
|
|
const { extractMedia } = await import('./media-extract');
|
|
const target = bm.getActiveFrameOrPage();
|
|
const filter = mediaType === 'images' ? 'images' as const
|
|
: mediaType === 'videos' ? 'videos' as const
|
|
: undefined;
|
|
const mediaResult = await extractMedia(target, { selector, filter });
|
|
|
|
// Collect URLs to download
|
|
const urls: Array<{ url: string; type: string }> = [];
|
|
const seen = new Set<string>();
|
|
|
|
for (const img of mediaResult.images) {
|
|
const url = img.currentSrc || img.src || img.dataSrc;
|
|
if (url && !seen.has(url) && !url.startsWith('data:')) {
|
|
seen.add(url);
|
|
urls.push({ url, type: 'image' });
|
|
}
|
|
}
|
|
for (const vid of mediaResult.videos) {
|
|
const url = vid.currentSrc || vid.src;
|
|
if (url && !seen.has(url) && !url.startsWith('blob:') && !vid.isHLS && !vid.isDASH) {
|
|
seen.add(url);
|
|
urls.push({ url, type: 'video' });
|
|
}
|
|
}
|
|
for (const bg of mediaResult.backgroundImages) {
|
|
if (bg.url && !seen.has(bg.url)) {
|
|
seen.add(bg.url);
|
|
urls.push({ url: bg.url, type: 'image' });
|
|
}
|
|
}
|
|
|
|
const toDownload = urls.slice(0, limit);
|
|
const page = bm.getPage();
|
|
const manifest: any = {
|
|
url: page.url(),
|
|
scraped_at: new Date().toISOString(),
|
|
files: [] as any[],
|
|
total_size: 0,
|
|
succeeded: 0,
|
|
failed: 0,
|
|
};
|
|
|
|
const lines: string[] = [];
|
|
for (let i = 0; i < toDownload.length; i++) {
|
|
const { url, type } = toDownload[i];
|
|
try {
|
|
const response = await page.request.fetch(url, { timeout: 30000 });
|
|
if (response.status() >= 400) throw new Error(`HTTP ${response.status()}`);
|
|
const ct = response.headers()['content-type'] || 'application/octet-stream';
|
|
const ext = mimeToExt(ct.split(';')[0].trim());
|
|
const filename = `${type}-${String(i + 1).padStart(3, '0')}${ext}`;
|
|
const filePath = path.join(dir, filename);
|
|
const body = Buffer.from(await response.body());
|
|
try {
|
|
fs.writeFileSync(filePath, body);
|
|
} catch (writeErr: any) {
|
|
throw new Error(`Disk write failed: ${writeErr.message}`);
|
|
}
|
|
manifest.files.push({ path: filename, src: url, size: body.length, type: ct.split(';')[0].trim() });
|
|
manifest.total_size += body.length;
|
|
manifest.succeeded++;
|
|
lines.push(` [${i + 1}/${toDownload.length}] ${filename} (${Math.round(body.length / 1024)}KB)`);
|
|
} catch (err: any) {
|
|
manifest.files.push({ path: null, src: url, size: 0, type: '', error: err.message });
|
|
manifest.failed++;
|
|
lines.push(` [${i + 1}/${toDownload.length}] FAILED: ${err.message}`);
|
|
}
|
|
// 100ms delay between downloads
|
|
if (i < toDownload.length - 1) await new Promise(r => setTimeout(r, 100));
|
|
}
|
|
|
|
// Write manifest
|
|
fs.writeFileSync(path.join(dir, 'manifest.json'), JSON.stringify(manifest, null, 2));
|
|
|
|
return `Scraped ${toDownload.length} items to ${dir}/\n${lines.join('\n')}\n\nSummary: ${manifest.succeeded} succeeded, ${manifest.failed} failed, ${Math.round(manifest.total_size / 1024)}KB total`;
|
|
}
|
|
|
|
case 'archive': {
|
|
const page = bm.getPage();
|
|
const outputPath = args[0] || path.join(TEMP_DIR, `browse-archive-${Date.now()}.mhtml`);
|
|
validateOutputPath(outputPath);
|
|
|
|
try {
|
|
const cdp = await page.context().newCDPSession(page);
|
|
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
|
|
await cdp.detach();
|
|
fs.writeFileSync(outputPath, data);
|
|
return `Archive saved: ${outputPath} (${Math.round(data.length / 1024)}KB, MHTML)`;
|
|
} catch (err: any) {
|
|
throw new Error(`MHTML archive requires Chromium CDP. Use 'text' or 'html' for raw page content. (${err.message})`);
|
|
}
|
|
}
|
|
|
|
default:
|
|
throw new Error(`Unknown write command: ${command}`);
|
|
}
|
|
}
|
|
|
|
/** Map MIME type to file extension. */
|
|
function mimeToExt(mime: string): string {
|
|
const map: Record<string, string> = {
|
|
'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif',
|
|
'image/webp': '.webp', 'image/svg+xml': '.svg', 'image/avif': '.avif',
|
|
'video/mp4': '.mp4', 'video/webm': '.webm', 'video/quicktime': '.mov',
|
|
'audio/mpeg': '.mp3', 'audio/wav': '.wav', 'audio/ogg': '.ogg',
|
|
'application/pdf': '.pdf', 'application/json': '.json',
|
|
'text/html': '.html', 'text/plain': '.txt',
|
|
};
|
|
return map[mime] || '.bin';
|
|
}
|