feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062)

* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives

Adds the foundation layer for Puppeteer-parity features:

- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
  enables load-html content to survive context recreation (viewport --scale)
  via in-memory replay. ASCII lifecycle diagram in the source explains the
  clear-before-navigation contract.

- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
  for name aliases (setcontent / set-content / setContent → load-html),
  consumed by server dispatch and chain prevalidation.

- buildUnknownCommandError() pure function — rich error messages with
  Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
  length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.

- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
  tokens can use it.

- screenshot and viewport descriptions updated for upcoming flags.

- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
  Levenshtein threshold + alphabetical tiebreak, short-input guard,
  NEW_IN_VERSION upgrade hint, alias + scope integration invariants.

No consumers yet — pure additive foundation. Safe to bisect on its own.

* feat(browse): accept file:// in goto with smart cwd/home-relative parsing

Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:

    file:///abs/path.html       → unchanged
    file://./docs/page.html     → file://<cwd>/docs/page.html
    file://~/Documents/page.html → file://<HOME>/Documents/page.html
    file://docs/page.html       → file://<cwd>/docs/page.html
    file://localhost/abs/path   → unchanged
    file://host.example.com/... → rejected (UNC/network)
    file:// and file:///        → rejected (would list a directory)

Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.

Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.

Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).

Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.

References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.

* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing

Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:

1. deviceScaleFactor + currentViewport tracking. New private fields (default
   scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
   deviceScaleFactor is a context-level Playwright option — changing it
   requires recreateContext(). The method validates (finite number, 1-3 cap,
   headed-mode rejected), stores new values, calls recreateContext(), and
   rolls back the fields on failure so a bad call doesn't leave inconsistent
   state. Context options at all three sites (launch, recreate happy path,
   recreate fallback) now honor the stored values instead of hardcoding
   1280x720.

2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
   loadedHtml from the session; restoreState replays it via newSession.
   setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
   is rehydrated and survives *subsequent* scale changes. In-memory only,
   never persisted to disk (HTML may contain secrets or customer data).

3. newTab + restoreState now consume validateNavigationUrl's normalized
   return value. file://./x, file://~/x, and bare-segment forms now take
   effect at every navigation site, not just the top-level goto command.

Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.

References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.

* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch

Wires the new handlers and dispatch logic that the previous commits made
possible:

write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
  actionable errors (not found, directory, oversize), extension allowlist
  (.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
  any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
  <div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
  override, frame-context rejection. Calls session.setTabContent() so replay
  metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
  scale-only variant reads current size via page.viewportSize(). Invalid
  scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
  mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
  (not after) so a timed-out goto post-commit doesn't leave stale metadata
  that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.

meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
  Rejects alongside positional selector (both = error), preserves --clip
  conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
  now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
  watch blocking, and result labels all use canonical names while audit
  labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
  only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.

server.ts
- Command canonicalization inserted immediately after parse, before scope /
  watch / tab-ownership / content-wrapping checks. rawCommand preserved for
  future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
  commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
  upgrade hint for NEW_IN_VERSION entries.

security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
  'for (const c of commands)' to match the new chain step shape. Same
  isWatching + BLOCKED invariants still asserted.

* chore: bump version and changelog (v1.1.0.0)

- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
  viewport --scale, file:// support, setContent replay, and DX polish in user
  voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
  "Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
  side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
  to reflect the new command descriptions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: pre-landing review fixes (9 findings from specialist + adversarial review)

Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:

1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
   await. Prior order left phantom HTML in replay metadata if setContent
   threw (timeout, browser crash), which a later viewport --scale would
   silently replay. Now loadedHtml is only recorded on successful load.

2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
   recreateContext after restoring the old fields. The fallback path in
   the original recreateContext builds a blank context using whatever
   this.deviceScaleFactor/currentViewport hold at that moment (which were
   the NEW values we were trying to apply). Rolling back the fields without
   a second recreate left the live context at new-scale while state tracked
   old-scale. Now: restore fields, force re-recreate with old values, only
   if that ALSO fails do we return a combined error.

3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
   to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
   alphabetically, so first equal-distance wins by default. The prior
   '(d === bestDist && best !== undefined && cand < best)' clause was dead
   code.

4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
   refs + frame. Without this, a user who load-html'd then clicked a link
   (or had a form submit / JS redirect / OAuth flow) would retain the stale
   replay metadata. The next viewport --scale would silently revert the
   tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
   content was. Silent data corruption. Browser-emitted navigations trigger
   this path via wirePageEvents.

5. browser-manager.ts:saveState + restoreState — tab ownership now flows
   through BrowserState.owner. Without this, a scoped agent's viewport
   --scale would strand them: tab IDs change during recreate, ownership
   map held stale IDs, owner lookup failed. New IDs had no owner, so
   writes without tabId were denied (DoS). Worse, if the agent sent a
   stale tabId the server's swallowed-tab-switch-error path would let the
   command hit whatever tab was currently active (cross-tab authz bypass).
   Now: clear ownership before restore, re-add per-tab with new IDs.

6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
   allowlist (url, isActive, storage:null) instead of object spread.
   Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
   user-writable state file, letting a tampered state.json smuggle HTML
   past load-html's safe-dirs / extension / magic-byte / 50MB-cap
   validators, or forge tab ownership. Now stripped at the boundary.

7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
   across normalization. file://./app.html?route=home#login previously
   resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
   %23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
   and fixture URLs with query params 404'd or loaded the wrong route.
   Now: split on ?/# before path resolution, reattach after.

8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
   parsed.hash to the normalized file:// URL. Same fix at the main
   validator for absolute paths that go through fileURLToPath round-trip.

9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
   user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
   'setcontent'). Previously the isAliased variable was computed but
   dropped, losing the raw input from the forensic trail. Completes the
   plan's codex v3 P2 requirement.

Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).

Tests pass: exit 0, no failures. Build clean.

* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases

Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).

**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected

**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage

**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws

**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
  resurrect the stale HTML (validates the onMainFrameNavigated fix)

**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch

Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).

Full suite: exit 0, 223/223 pass.

* docs: update BROWSER.md + CHANGELOG for v1.1.0.0

BROWSER.md:
- Command reference table updated: goto now lists file:// support,
  load-html added to Navigate row, viewport flagged with --scale
  option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
  --selector flag) and notes the tag-selector-not-caught-positionally
  gotcha
- New "Retina screenshots — viewport --scale" subsection explains
  deviceScaleFactor mechanics, context recreation side effects, and
  headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
  explains the two paths, their tradeoffs (URL state, relative asset
  resolution), the safe-dirs policy, extension allowlist + magic-byte
  sniff, 50MB cap, setContent replay across recreateContext, and the
  alias routing (setcontent → load-html before scope check)

CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
  on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
  an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
  JS redirects) — not just explicit goto. Also notes SPA query/fragment
  preservation for goto file://

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-18 23:25:33 +08:00
committed by GitHub
parent 4d2c8d94d0
commit c15b805cd8
20 changed files with 1439 additions and 92 deletions

View File

@@ -10,9 +10,10 @@ import type { BrowserManager } from './browser-manager';
import { findInstalledBrowsers, importCookies, importCookiesViaCdp, hasV20Cookies, listSupportedBrowserNames } from './cookie-import-browser';
import { generatePickerCode } from './cookie-picker-routes';
import { validateNavigationUrl } from './url-validation';
import { validateOutputPath } from './path-security';
import { validateOutputPath, validateReadPath } from './path-security';
import * as fs from 'fs';
import * as path from 'path';
import type { SetContentWaitUntil } from './tab-session';
import { TEMP_DIR, isPathWithin } from './platform';
import { SAFE_DIRECTORIES } from './path-security';
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
@@ -142,30 +143,129 @@ export async function handleWriteCommand(
if (inFrame) throw new Error('Cannot use goto inside a frame. Run \'frame main\' first.');
const url = args[0];
if (!url) throw new Error('Usage: browse goto <url>');
await validateNavigationUrl(url);
const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
// Clear loadedHtml BEFORE navigation — a timeout after the main-frame commit
// must not leave stale content that could resurrect on a later context recreation.
session.clearLoadedHtml();
const normalizedUrl = await validateNavigationUrl(url);
const response = await page.goto(normalizedUrl, { waitUntil: 'domcontentloaded', timeout: 15000 });
const status = response?.status() || 'unknown';
return `Navigated to ${url} (${status})`;
return `Navigated to ${normalizedUrl} (${status})`;
}
case 'back': {
if (inFrame) throw new Error('Cannot use back inside a frame. Run \'frame main\' first.');
session.clearLoadedHtml();
await page.goBack({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Back → ${page.url()}`;
}
case 'forward': {
if (inFrame) throw new Error('Cannot use forward inside a frame. Run \'frame main\' first.');
session.clearLoadedHtml();
await page.goForward({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Forward → ${page.url()}`;
}
case 'reload': {
if (inFrame) throw new Error('Cannot use reload inside a frame. Run \'frame main\' first.');
session.clearLoadedHtml();
await page.reload({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Reloaded ${page.url()}`;
}
case 'load-html': {
if (inFrame) throw new Error('Cannot use load-html inside a frame. Run \'frame main\' first.');
const filePath = args[0];
if (!filePath) throw new Error('Usage: browse load-html <file> [--wait-until load|domcontentloaded|networkidle]');
// Parse --wait-until flag
let waitUntil: SetContentWaitUntil = 'domcontentloaded';
for (let i = 1; i < args.length; i++) {
if (args[i] === '--wait-until') {
const val = args[++i];
if (val !== 'load' && val !== 'domcontentloaded' && val !== 'networkidle') {
throw new Error(`Invalid --wait-until '${val}'. Must be one of: load, domcontentloaded, networkidle.`);
}
waitUntil = val;
} else if (args[i].startsWith('--')) {
throw new Error(`Unknown flag: ${args[i]}`);
}
}
// Extension allowlist
const ALLOWED_EXT = ['.html', '.htm', '.xhtml', '.svg'];
const ext = path.extname(filePath).toLowerCase();
if (!ALLOWED_EXT.includes(ext)) {
throw new Error(
`load-html: file does not appear to be HTML. Expected .html/.htm/.xhtml/.svg, got ${ext || '(no extension)'}. Rename the file if it's really HTML.`
);
}
const absolutePath = path.resolve(filePath);
// Safe-dirs check (reuses canonical read-side policy)
try {
validateReadPath(absolutePath);
} catch (e: any) {
throw new Error(
`load-html: ${absolutePath} must be under ${SAFE_DIRECTORIES.join(' or ')} (security policy). Copy the file into the project tree or /tmp first.`
);
}
// stat check — reject non-file targets with actionable error
let stat: fs.Stats;
try {
stat = await fs.promises.stat(absolutePath);
} catch (e: any) {
if (e.code === 'ENOENT') {
throw new Error(
`load-html: file not found at ${absolutePath}. Check spelling or copy the file under ${process.cwd()} or ${TEMP_DIR}.`
);
}
throw e;
}
if (stat.isDirectory()) {
throw new Error(`load-html: ${absolutePath} is a directory, not a file. Pass a .html file.`);
}
if (!stat.isFile()) {
throw new Error(`load-html: ${absolutePath} is not a regular file.`);
}
// Size cap
const MAX_BYTES = parseInt(process.env.GSTACK_BROWSE_MAX_HTML_BYTES || '', 10) || (50 * 1024 * 1024);
if (stat.size > MAX_BYTES) {
throw new Error(
`load-html: file too large (${stat.size} bytes > ${MAX_BYTES} cap). Raise with GSTACK_BROWSE_MAX_HTML_BYTES=<N> or split the HTML.`
);
}
// Single read: Buffer → magic-byte peek → utf-8 string
const buf = await fs.promises.readFile(absolutePath);
// Magic-byte check: strip UTF-8 BOM + leading whitespace, then verify the first
// non-whitespace byte starts a markup construct. Accepts any <tag, <!doctype,
// <!-- comment, <?xml prolog — including bare HTML fragments like `<div>...</div>`
// which setContent wraps in a full document. Rejects binary files mis-renamed .html
// (first byte won't be `<`).
let peek = buf.slice(0, 200);
if (peek[0] === 0xEF && peek[1] === 0xBB && peek[2] === 0xBF) {
peek = peek.slice(3);
}
const peekStr = peek.toString('utf8').trimStart();
// Valid markup opener: '<' followed by alpha (tag), '!' (doctype/comment), or '?' (xml prolog)
const looksLikeMarkup = /^<[a-zA-Z!?]/.test(peekStr);
if (!looksLikeMarkup) {
const hexDump = Array.from(buf.slice(0, 16)).map(b => b.toString(16).padStart(2, '0')).join(' ');
throw new Error(
`load-html: ${absolutePath} has ${ext} extension but content does not look like HTML. First bytes: ${hexDump}`
);
}
const html = buf.toString('utf8');
await session.setTabContent(html, { waitUntil });
return `Loaded HTML: ${absolutePath} (${stat.size} bytes)`;
}
case 'click': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse click <selector>');
@@ -343,11 +443,55 @@ export async function handleWriteCommand(
}
case 'viewport': {
const size = args[0];
if (!size || !size.includes('x')) throw new Error('Usage: browse viewport <WxH> (e.g., 375x812)');
const [rawW, rawH] = size.split('x').map(Number);
const w = Math.min(Math.max(Math.round(rawW) || 1280, 1), 16384);
const h = Math.min(Math.max(Math.round(rawH) || 720, 1), 16384);
// Parse args: [<WxH>] [--scale <n>]. Either may be omitted, but NOT both.
let sizeArg: string | undefined;
let scaleArg: number | undefined;
for (let i = 0; i < args.length; i++) {
if (args[i] === '--scale') {
const val = args[++i];
if (val === undefined || val === '') {
throw new Error('viewport --scale: missing value. Usage: viewport [WxH] --scale <n>');
}
const parsed = Number(val);
if (!Number.isFinite(parsed)) {
throw new Error(`viewport --scale: value '${val}' is not a finite number.`);
}
scaleArg = parsed;
} else if (args[i].startsWith('--')) {
throw new Error(`Unknown viewport flag: ${args[i]}`);
} else if (sizeArg === undefined) {
sizeArg = args[i];
} else {
throw new Error(`Unexpected positional arg: ${args[i]}. Usage: viewport [WxH] [--scale <n>]`);
}
}
if (sizeArg === undefined && scaleArg === undefined) {
throw new Error('Usage: browse viewport [<WxH>] [--scale <n>] (e.g. 375x812, or --scale 2 to keep current size)');
}
// Resolve width/height: either from sizeArg or from current viewport if --scale-only.
let w: number, h: number;
if (sizeArg) {
if (!sizeArg.includes('x')) throw new Error('Usage: browse viewport [<WxH>] [--scale <n>] (e.g., 375x812)');
const [rawW, rawH] = sizeArg.split('x').map(Number);
w = Math.min(Math.max(Math.round(rawW) || 1280, 1), 16384);
h = Math.min(Math.max(Math.round(rawH) || 720, 1), 16384);
} else {
// --scale without WxH → use BrowserManager's tracked viewport (source of truth
// since setViewport + launchContext keep it in sync). Falls back reliably on
// headed → headless transitions or contexts with viewport:null.
const current = bm.getCurrentViewport();
w = current.width;
h = current.height;
}
if (scaleArg !== undefined) {
const err = await bm.setDeviceScaleFactor(scaleArg, w, h);
if (err) return `Viewport partially set: ${err}`;
return `Viewport set to ${w}x${h} @ ${scaleArg}x (context recreated; refs and load-html content replayed)`;
}
await bm.setViewport(w, h);
return `Viewport set to ${w}x${h}`;
}