feat(extension): Terminal-only sidebar — auth fix, UX polish, chat rip

The chat queue path is gone. The Chrome side panel is now just an
interactive claude PTY in xterm.js. Activity / Refs / Inspector still
exist behind the `debug` toggle in the footer.

Three threads of change, all from dogfood iteration on top of
cc-pty-import:

1. fix(server): cross-port WS auth via Sec-WebSocket-Protocol
   - Browsers can't set Authorization on a WebSocket upgrade. We had
     been minting an HttpOnly gstack_pty cookie via /pty-session, but
     SameSite=Strict cookies don't survive the cross-port jump from
     server.ts:34567 to the agent's random port from a chrome-extension
     origin. The WS opened then immediately closed → "Session ended."
   - /pty-session now also returns ptySessionToken in the JSON body.
   - Extension calls `new WebSocket(url, [`gstack-pty.<token>`])`.
     Browser sends Sec-WebSocket-Protocol on the upgrade.
   - Agent reads the protocol header, validates against validTokens,
     and MUST echo the protocol back (Chromium closes the connection
     immediately if a server doesn't pick one of the offered protocols).
   - Cookie path is kept as a fallback for non-browser callers (curl,
     integration tests).
   - New integration test exercises the full protocol-auth round-trip
     via raw fetch+Upgrade so a future regression of this exact class
     fails in CI.

2. fix(extension): UX polish on the Terminal pane
   - Eager auto-connect when the sidebar opens — no "Press any key to
     start" friction every reload.
   - Always-visible ↻ Restart button in the terminal toolbar (not
     gated on the ENDED state) so the user can force a fresh claude
     mid-session.
   - MutationObserver on #tab-terminal's class attribute drives a
     fitAddon.fit() + term.refresh() when the pane becomes visible
     again — xterm doesn't auto-redraw after display:none → display:flex.

3. feat(extension): rip the chat tab + sidebar-agent.ts
   - Sidebar is Terminal-only. No more Terminal | Chat primary nav.
   - sidebar-agent.ts deleted. /sidebar-command, /sidebar-chat,
     /sidebar-agent/event, /sidebar-tabs* and friends all deleted.
   - The pickSidebarModel router (sonnet vs opus) is gone — the live
     PTY uses whatever model the user's `claude` CLI is configured with.
   - Quick-actions (🧹 Cleanup / 📸 Screenshot / 🍪 Cookies) survive
     in the Terminal toolbar. Cleanup now injects its prompt into the
     live PTY via window.gstackInjectToTerminal — no more
     /sidebar-command POST. The Inspector "Send to Code" action uses
     the same injection path.
   - clear-chat button removed from the footer.
   - sidepanel.js shed ~900 lines of chat polling, optimistic UI,
     stop-agent, etc.

Net diff: -3.4k lines across 16 files. CLAUDE.md, TODOS.md, and
docs/designs/SIDEBAR_MESSAGE_FLOW.md rewritten to match. The sidebar
regression test (browse/test/sidebar-tabs.test.ts) is rewritten as 27
structural assertions locking the new layout — Terminal sole pane,
no chat input, quick-actions in toolbar, eager-connect, MutationObserver
repaint, restart helper.
This commit is contained in:
Garry Tan
2026-04-25 21:03:04 -07:00
parent 0361acfb6a
commit 006dbe19f1
16 changed files with 771 additions and 4229 deletions

View File

@@ -225,24 +225,35 @@ When you need to interact with a browser (QA, dogfooding, cookie setup), use the
project uses. project uses.
**Sidebar architecture:** Before modifying `sidepanel.js`, `background.js`, **Sidebar architecture:** Before modifying `sidepanel.js`, `background.js`,
`content.js`, `sidebar-agent.ts`, `terminal-agent.ts`, or sidebar-related `content.js`, `terminal-agent.ts`, or sidebar-related server endpoints,
server endpoints, read `docs/designs/SIDEBAR_MESSAGE_FLOW.md`. It documents read `docs/designs/SIDEBAR_MESSAGE_FLOW.md`. The sidebar has one primary
the full initialization timeline, message flow, auth token chain, tab surface — the **Terminal** pane (interactive `claude` PTY) — with
concurrency model, the Terminal-tab PTY flow, and known failure modes. Activity / Refs / Inspector as debug overlays behind the footer's
The sidebar spans 6 files across 2 codebases (extension + server) with `debug` toggle. The chat queue path was ripped once the PTY proved out;
non-obvious ordering dependencies. The doc exists to prevent the kind of `sidebar-agent.ts` and the `/sidebar-command` / `/sidebar-chat` /
silent failures that come from not understanding the cross-component flow. `/sidebar-agent/event` endpoints are gone. The doc covers the WS auth
flow, dual-token model, and threat-model boundary — silent failures
here usually trace to not understanding the cross-component flow.
**Terminal tab is its own process.** `terminal-agent.ts` is a separate **WebSocket auth uses Sec-WebSocket-Protocol, not cookies.** Browsers
non-compiled bun process from `sidebar-agent.ts`. Do not bolt PTY logic can't set `Authorization` on a WebSocket upgrade, but they CAN set
onto sidebar-agent — codex confirmed it would couple chat reliability to `Sec-WebSocket-Protocol` via `new WebSocket(url, [token])`. The agent
PTY framing bugs. Cookie minting (`pty-session-cookie.ts`) lives in the reads it, validates against `validTokens`, and MUST echo the protocol
server; the cookie travels via `Set-Cookie` and back via `Cookie:` on the back in the upgrade response — without the echo, Chromium closes the
WebSocket upgrade. The WS upgrade gates on Origin AND cookie; both are connection immediately. `Set-Cookie: gstack_pty=...` is kept as a
load-bearing for the Terminal tab to be safe. `/health` MUST NOT surface fallback for non-browser callers (the cross-port `SameSite=Strict`
the cookie value or any shell-grant token (codex finding: existing cookie path doesn't survive from a chrome-extension origin).
`AUTH_TOKEN` is already exposed there in headed mode; that's a separate
v1.1+ TODO, not something to widen). **Cross-pane PTY injection.** The toolbar's Cleanup button and the
Inspector's "Send to Code" action both pipe text into the live claude
PTY via `window.gstackInjectToTerminal(text)`, exposed by
`sidepanel-terminal.js`. No `/sidebar-command` POST — the live REPL is
the only execution surface in the sidebar now.
**`/health` MUST NOT surface any shell-grant token.** It already leaks
`AUTH_TOKEN` to localhost callers in headed mode (a v1.1+ TODO). Don't
make that worse by adding the PTY session token there. PTY auth flows
through `POST /pty-session` only.
**Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel, **Transport-layer security** (v1.6.0.0+). When `pair-agent` starts an ngrok tunnel,
the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command the daemon binds two HTTP listeners: a local listener (127.0.0.1, full command

View File

@@ -52,28 +52,6 @@ scope of that PR; deliberately deferred to keep PTY-import small.
--- ---
### v1.1+: Apply terminal-agent's exception handlers to sidebar-agent
**What:** While reviewing cc-pty-import, codex noted that `sidebar-agent.ts`
has no `process.on('uncaughtException'|'unhandledRejection')` handlers.
A bug in claude stream parsing or queue I/O can take down the chat path
silently. terminal-agent.ts ships with these handlers; sidebar-agent
should get them too.
**Why:** Today a single uncaught exception in chat = entire sidebar chat
dies and nothing tells the user. The CLI doesn't supervise the agent.
**Pros:** Chat survives transient bugs. **Cons:** Catching uncaught
exceptions can hide real failures — pair the handlers with structured
logging so we still see the bug.
**Context:** codex finding #4 on cc-pty-import plan-eng review.
**Priority:** P2.
**Effort:** S.
---
## Testing ## Testing
### Pre-existing test failures surfaced during v1.12.0.0 ship ### Pre-existing test failures surfaced during v1.12.0.0 ship

View File

@@ -853,7 +853,7 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
// Delete stale state file // Delete stale state file
safeUnlinkQuiet(config.stateFile); safeUnlinkQuiet(config.stateFile);
console.log('Launching headed Chromium with extension + sidebar agent...'); console.log('Launching headed Chromium with extension + terminal agent...');
try { try {
// Start server in headed mode with extension auto-loaded // Start server in headed mode with extension auto-loaded
// Use a well-known port so the Chrome extension auto-connects // Use a well-known port so the Chrome extension auto-connects
@@ -882,61 +882,12 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
const status = await resp.text(); const status = await resp.text();
console.log(`Connected to real Chrome\n${status}`); console.log(`Connected to real Chrome\n${status}`);
// Auto-start sidebar agent // sidebar-agent.ts spawn was here. Ripped alongside the chat queue —
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead // the Terminal pane runs an interactive PTY now, no more one-shot
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts'); // claude -p subprocesses to multiplex.
if (!fs.existsSync(agentScript)) {
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
}
try {
if (!fs.existsSync(agentScript)) {
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
}
// Clear old agent queue
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
try {
fs.mkdirSync(path.dirname(agentQueue), { recursive: true, mode: 0o700 });
fs.writeFileSync(agentQueue, '', { mode: 0o600 });
} catch (err: any) {
if (err?.code !== 'EACCES') throw err;
}
// Resolve browse binary path the same way — execPath-relative // Auto-start terminal agent (non-compiled bun process). Owns the PTY
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse'); // WebSocket for the sidebar Terminal pane.
if (!fs.existsSync(browseBin)) {
browseBin = process.execPath; // the compiled binary itself
}
// Kill any existing sidebar-agent processes before starting a new one.
// Old agents have stale auth tokens and will silently fail to relay events,
// causing the server to mark the agent as "hung".
try {
const { spawnSync } = require('child_process');
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
} catch (err: any) {
if (err?.code !== 'ENOENT') throw err;
}
const agentProc = Bun.spawn(['bun', 'run', agentScript], {
cwd: config.projectDir,
env: {
...process.env,
BROWSE_BIN: browseBin,
BROWSE_STATE_FILE: config.stateFile,
BROWSE_SERVER_PORT: String(newState.port),
},
stdio: ['ignore', 'ignore', 'ignore'],
});
agentProc.unref();
console.log(`[browse] Sidebar agent started (PID: ${agentProc.pid})`);
} catch (err: any) {
console.error(`[browse] Sidebar agent failed to start: ${err.message}`);
console.error(`[browse] Run manually: bun run ${agentScript}`);
}
// Auto-start terminal agent (non-compiled, parallel to sidebar-agent).
// Owns the PTY WebSocket for the Terminal sidebar tab. Crash-isolated
// from the chat agent per codex outside-voice review.
let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts'); let termAgentScript = path.resolve(__dirname, 'terminal-agent.ts');
if (!fs.existsSync(termAgentScript)) { if (!fs.existsSync(termAgentScript)) {
termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts'); termAgentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'terminal-agent.ts');

File diff suppressed because it is too large Load Diff

View File

@@ -1,947 +0,0 @@
/**
* Sidebar Agent — polls agent-queue from server, spawns claude -p for each
* message, streams live events back to the server via /sidebar-agent/event.
*
* This runs as a NON-COMPILED bun process because compiled bun binaries
* cannot posix_spawn external executables. The server writes to the queue
* file, this process reads it and spawns claude.
*
* Usage: BROWSE_BIN=/path/to/browse bun run browse/src/sidebar-agent.ts
*/
import { spawn } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import { safeUnlink } from './error-handling';
import {
checkCanaryInStructure, logAttempt, hashPayload, extractDomain,
combineVerdict, writeSessionState, readSessionState, THRESHOLDS,
readDecision, clearDecision, excerptForReview,
type LayerSignal,
} from './security';
import {
loadTestsavant, scanPageContent, checkTranscript,
shouldRunTranscriptCheck, getClassifierStatus,
loadDeberta, scanPageContentDeberta,
type ToolCallInput,
} from './security-classifier';
const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
const KILL_FILE = path.join(path.dirname(QUEUE), 'sidebar-agent-kill');
const SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '34567', 10);
const SERVER_URL = `http://127.0.0.1:${SERVER_PORT}`;
const POLL_MS = 200; // 200ms poll — keeps time-to-first-token low
const B = process.env.BROWSE_BIN || path.resolve(__dirname, '../../.claude/skills/gstack/browse/dist/browse');
const CANCEL_DIR = path.join(process.env.HOME || '/tmp', '.gstack');
function cancelFileForTab(tabId: number): string {
return path.join(CANCEL_DIR, `sidebar-agent-cancel-${tabId}`);
}
interface QueueEntry {
prompt: string;
args?: string[];
stateFile?: string;
cwd?: string;
tabId?: number | null;
message?: string | null;
pageUrl?: string | null;
sessionId?: string | null;
ts?: string;
canary?: string; // session-scoped token; leak = prompt injection evidence
}
function isValidQueueEntry(e: unknown): e is QueueEntry {
if (typeof e !== 'object' || e === null) return false;
const obj = e as Record<string, unknown>;
if (typeof obj.prompt !== 'string' || obj.prompt.length === 0) return false;
if (obj.args !== undefined && (!Array.isArray(obj.args) || !obj.args.every(a => typeof a === 'string'))) return false;
if (obj.stateFile !== undefined) {
if (typeof obj.stateFile !== 'string') return false;
if (obj.stateFile.includes('..')) return false;
}
if (obj.cwd !== undefined) {
if (typeof obj.cwd !== 'string') return false;
if (obj.cwd.includes('..')) return false;
}
if (obj.tabId !== undefined && obj.tabId !== null && typeof obj.tabId !== 'number') return false;
if (obj.message !== undefined && obj.message !== null && typeof obj.message !== 'string') return false;
if (obj.pageUrl !== undefined && obj.pageUrl !== null && typeof obj.pageUrl !== 'string') return false;
if (obj.sessionId !== undefined && obj.sessionId !== null && typeof obj.sessionId !== 'string') return false;
if (obj.canary !== undefined && typeof obj.canary !== 'string') return false;
return true;
}
let lastLine = 0;
let authToken: string | null = null;
// Per-tab processing — each tab can run its own agent concurrently
const processingTabs = new Set<number>();
// Active claude subprocesses — keyed by tabId for targeted kill
const activeProcs = new Map<number, ReturnType<typeof spawn>>();
let activeProc: ReturnType<typeof spawn> | null = null;
// Kill-file timestamp last seen — avoids double-kill on same write
let lastKillTs = 0;
// ─── File drop relay ──────────────────────────────────────────
function getGitRoot(): string | null {
try {
const { execSync } = require('child_process');
return execSync('git rev-parse --show-toplevel', { encoding: 'utf-8', stdio: ['pipe', 'pipe', 'pipe'] }).trim();
} catch (err: any) {
console.debug('[sidebar-agent] Not in a git repo:', err.message);
return null;
}
}
function writeToInbox(message: string, pageUrl?: string, sessionId?: string): void {
const gitRoot = getGitRoot();
if (!gitRoot) {
console.error('[sidebar-agent] Cannot write to inbox — not in a git repo');
return;
}
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
fs.mkdirSync(inboxDir, { recursive: true, mode: 0o700 });
const now = new Date();
const timestamp = now.toISOString().replace(/:/g, '-');
const filename = `${timestamp}-observation.json`;
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
const finalFile = path.join(inboxDir, filename);
const inboxMessage = {
type: 'observation',
timestamp: now.toISOString(),
page: { url: pageUrl || 'unknown', title: '' },
userMessage: message,
sidebarSessionId: sessionId || 'unknown',
};
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2), { mode: 0o600 });
fs.renameSync(tmpFile, finalFile);
console.log(`[sidebar-agent] Wrote inbox message: ${filename}`);
}
// ─── Auth ────────────────────────────────────────────────────────
async function refreshToken(): Promise<string | null> {
// Read token from state file (same-user, mode 0o600) instead of /health
try {
const stateFile = process.env.BROWSE_STATE_FILE ||
path.join(process.env.HOME || '/tmp', '.gstack', 'browse.json');
const data = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
authToken = data.token || null;
return authToken;
} catch (err: any) {
console.error('[sidebar-agent] Failed to refresh auth token:', err.message);
return null;
}
}
// ─── Event relay to server ──────────────────────────────────────
async function sendEvent(event: Record<string, any>, tabId?: number): Promise<void> {
if (!authToken) await refreshToken();
if (!authToken) return;
try {
await fetch(`${SERVER_URL}/sidebar-agent/event`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${authToken}`,
},
body: JSON.stringify({ ...event, tabId: tabId ?? null }),
});
} catch (err) {
console.error('[sidebar-agent] Failed to send event:', err);
}
}
// ─── Claude subprocess ──────────────────────────────────────────
function shorten(str: string): string {
return str
.replace(new RegExp(B.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'g'), '$B')
.replace(/\/Users\/[^/]+/g, '~')
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
.replace(/\.claude\/skills\/gstack\//g, '')
.replace(/browse\/dist\/browse/g, '$B');
}
function describeToolCall(tool: string, input: any): string {
if (!input) return '';
// For Bash commands, generate a plain-English description
if (tool === 'Bash' && input.command) {
const cmd = input.command;
// Browse binary commands — the most common case
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
if (browseMatch) {
const browseCmd = browseMatch[1] || browseMatch[2];
const args = cmd.split(/\s+/).slice(2).join(' ');
switch (browseCmd) {
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
case 'click': return `Clicking ${args}`;
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
case 'text': return 'Reading page text';
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
case 'links': return 'Finding all links on the page';
case 'forms': return 'Looking for forms';
case 'console': return 'Checking browser console for errors';
case 'network': return 'Checking network requests';
case 'url': return 'Checking current URL';
case 'back': return 'Going back';
case 'forward': return 'Going forward';
case 'reload': return 'Reloading the page';
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
case 'wait': return `Waiting for ${args}`;
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
case 'style': return `Changing CSS: ${args}`;
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
case 'prettyscreenshot': return 'Taking a clean screenshot';
case 'css': return `Checking CSS property: ${args}`;
case 'is': return `Checking if element is ${args}`;
case 'diff': return `Comparing ${args}`;
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
case 'status': return 'Checking browser status';
case 'tabs': return 'Listing open tabs';
case 'focus': return 'Bringing browser to front';
case 'select': return `Selecting option in ${args}`;
case 'hover': return `Hovering over ${args}`;
case 'viewport': return `Setting viewport to ${args}`;
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
default: return `Running browse ${browseCmd} ${args}`.trim();
}
}
// Non-browse bash commands
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
let short = shorten(cmd);
return short.length > 100 ? short.slice(0, 100) + '…' : short;
}
if (tool === 'Read' && input.file_path) {
// Skip Claude's internal tool-result file reads — they're plumbing, not user-facing
if (input.file_path.includes('/tool-results/') || input.file_path.includes('/.claude/projects/')) return '';
return `Reading ${shorten(input.file_path)}`;
}
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
}
// Keep the old name as an alias for backward compat
function summarizeToolInput(tool: string, input: any): string {
return describeToolCall(tool, input);
}
/**
* Scan a Claude stream event for the session canary. Returns the channel where
* it leaked, or null if clean. Covers every outbound channel: text blocks,
* text deltas, tool_use arguments (including nested URL/path/command strings),
* and result payloads.
*/
function detectCanaryLeak(event: any, canary: string, buf?: DeltaBuffer): string | null {
if (!canary) return null;
if (event.type === 'assistant' && event.message?.content) {
for (const block of event.message.content) {
if (block.type === 'text' && typeof block.text === 'string' && block.text.includes(canary)) {
return 'assistant_text';
}
if (block.type === 'tool_use' && checkCanaryInStructure(block.input, canary)) {
return `tool_use:${block.name}`;
}
}
}
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
if (checkCanaryInStructure(event.content_block.input, canary)) {
return `tool_use:${event.content_block.name}`;
}
}
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta') {
if (typeof event.delta.text === 'string') {
// Rolling buffer: an attacker can ask Claude to emit the canary split
// across two deltas (e.g., "CANARY-" then "ABCDEF"). A per-delta
// substring check misses this. Concatenate the previous tail with
// this chunk and search, then trim the tail to last canary.length-1
// chars for the next event.
const combined = buf ? buf.text_delta + event.delta.text : event.delta.text;
if (combined.includes(canary)) return 'text_delta';
if (buf) buf.text_delta = combined.slice(-(canary.length - 1));
}
}
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
if (typeof event.delta.partial_json === 'string') {
const combined = buf ? buf.input_json_delta + event.delta.partial_json : event.delta.partial_json;
if (combined.includes(canary)) return 'tool_input_delta';
if (buf) buf.input_json_delta = combined.slice(-(canary.length - 1));
}
}
if (event.type === 'content_block_stop' && buf) {
// Block boundary — reset the rolling buffer so a canary straddling
// two independent tool_use blocks isn't inferred.
buf.text_delta = '';
buf.input_json_delta = '';
}
if (event.type === 'result' && typeof event.result === 'string' && event.result.includes(canary)) {
return 'result';
}
return null;
}
/** Rolling-window tails for delta canary detection. See detectCanaryLeak. */
interface DeltaBuffer {
text_delta: string;
input_json_delta: string;
}
interface CanaryContext {
canary: string;
pageUrl: string;
onLeak: (channel: string) => void;
deltaBuf: DeltaBuffer;
}
interface ToolResultScanContext {
scan: (toolName: string, text: string) => Promise<void>;
}
/**
* Per-tab map of tool_use_id → tool name. Lets the tool_result handler
* know what tool produced the content (Read, Grep, Glob, Bash $B ...) so
* we can tag attack logs with the ingress source.
*/
const toolUseRegistry = new Map<string, { toolName: string; toolInput: unknown }>();
/**
* Extract plain-text content from a tool_result block. The Claude stream
* encodes it as either a string or an array of content blocks (text, image).
* We care about text — images can't carry prompt injection at this layer.
*/
function extractToolResultText(content: unknown): string {
if (typeof content === 'string') return content;
if (!Array.isArray(content)) return '';
const parts: string[] = [];
for (const block of content) {
if (block && typeof block === 'object') {
const b = block as Record<string, unknown>;
if (b.type === 'text' && typeof b.text === 'string') parts.push(b.text);
}
}
return parts.join('\n');
}
/**
* Tools whose outputs should be ML-scanned. Bash/$B outputs already get
* scanned via the page-content flow. Read/Glob/Grep outputs have been
* uncovered — Codex review flagged this gap. Adding coverage here closes it.
*/
const SCANNED_TOOLS = new Set(['Read', 'Grep', 'Glob', 'Bash', 'WebFetch']);
async function handleStreamEvent(event: any, tabId?: number, canaryCtx?: CanaryContext, toolResultScanCtx?: ToolResultScanContext): Promise<void> {
// Canary check runs BEFORE any outbound send — we never want to relay
// a leaked token to the sidepanel UI.
if (canaryCtx) {
const channel = detectCanaryLeak(event, canaryCtx.canary, canaryCtx.deltaBuf);
if (channel) {
canaryCtx.onLeak(channel);
return; // drop the event — never relay content that leaked the canary
}
}
if (event.type === 'system' && event.session_id) {
// Relay claude session ID for --resume support
await sendEvent({ type: 'system', claudeSessionId: event.session_id }, tabId);
}
if (event.type === 'assistant' && event.message?.content) {
for (const block of event.message.content) {
if (block.type === 'tool_use') {
// Register the tool_use so we can correlate tool_results back to
// the originating tool when they arrive in the next user-role message.
if (block.id) toolUseRegistry.set(block.id, { toolName: block.name, toolInput: block.input });
await sendEvent({ type: 'tool_use', tool: block.name, input: summarizeToolInput(block.name, block.input) }, tabId);
} else if (block.type === 'text' && block.text) {
await sendEvent({ type: 'text', text: block.text }, tabId);
}
}
}
// Tool results come back in user-role messages. Content can be a string
// or an array of typed content blocks.
if (event.type === 'user' && event.message?.content) {
for (const block of event.message.content) {
if (block && typeof block === 'object' && block.type === 'tool_result') {
const meta = block.tool_use_id ? toolUseRegistry.get(block.tool_use_id) : null;
const toolName = meta?.toolName ?? 'Unknown';
const text = extractToolResultText(block.content);
// Scan this tool output with the ML classifier if the tool is in
// the SCANNED_TOOLS set and the content is non-trivial.
if (SCANNED_TOOLS.has(toolName) && text.length >= 32 && toolResultScanCtx) {
// Fire-and-forget — never block the stream handler. If BLOCK
// fires, onToolResultBlock handles kill + emit.
toolResultScanCtx.scan(toolName, text).catch(() => {});
}
if (block.tool_use_id) toolUseRegistry.delete(block.tool_use_id);
}
}
}
if (event.type === 'content_block_start' && event.content_block?.type === 'tool_use') {
if (event.content_block.id) {
toolUseRegistry.set(event.content_block.id, {
toolName: event.content_block.name,
toolInput: event.content_block.input,
});
}
await sendEvent({ type: 'tool_use', tool: event.content_block.name, input: summarizeToolInput(event.content_block.name, event.content_block.input) }, tabId);
}
if (event.type === 'content_block_delta' && event.delta?.type === 'text_delta' && event.delta.text) {
await sendEvent({ type: 'text_delta', text: event.delta.text }, tabId);
}
// Relay tool results so the sidebar can show what happened
if (event.type === 'content_block_delta' && event.delta?.type === 'input_json_delta') {
// Tool input streaming — skip, we already announced the tool
}
if (event.type === 'result') {
await sendEvent({ type: 'result', text: event.result || '' }, tabId);
}
// Tool result events — summarize and relay
if (event.type === 'tool_result' || (event.type === 'assistant' && event.message?.content)) {
// Tool results come in the next assistant turn — handled above
}
}
/**
* Fire the prompt-injection-detected event to the server. This terminates
* the session from the sidepanel's perspective and renders the canary leak
* banner. Also logs locally (salted hash + domain only) and fires telemetry
* if configured.
*/
async function onCanaryLeaked(params: {
tabId: number;
channel: string;
canary: string;
pageUrl: string;
}): Promise<void> {
const { tabId, channel, canary, pageUrl } = params;
const domain = extractDomain(pageUrl);
console.warn(`[sidebar-agent] CANARY LEAK detected on ${channel} for tab ${tabId} (domain=${domain || 'unknown'})`);
// Local log — salted hash + domain only, never the payload
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash: hashPayload(canary), // hash the canary, not the payload (which might be leaked content)
confidence: 1.0,
layer: 'canary',
verdict: 'block',
});
// Broadcast to sidepanel so it can render the approved banner
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: 'canary_leaked',
layer: 'canary',
channel,
domain,
}, tabId);
// Also emit agent_error so the sidepanel's existing error surface
// reflects that the session terminated. Keeps old clients working.
await sendEvent({
type: 'agent_error',
error: `Session terminated — prompt injection detected${domain ? ` from ${domain}` : ''}`,
}, tabId);
}
/**
* Pre-spawn ML scan of the user message. If the classifier fires at BLOCK,
* we log the attempt, emit a security_event to the sidepanel, and DO NOT
* spawn claude. Returns true if the scan blocked the session.
*
* Fail-open: any classifier error or degraded state returns false (safe) so
* the sidebar keeps working. The architectural controls (XML framing +
* command allowlist, live in server.ts:554-577) still defend.
*/
async function preSpawnSecurityCheck(entry: QueueEntry): Promise<boolean> {
const { message, canary, pageUrl, tabId } = entry;
if (!message || message.length === 0) return false;
const tid = tabId ?? 0;
// L4: scan the user message for direct injection patterns (TestSavantAI)
// L4c: also scan with DeBERTa-v3 when ensemble is enabled (opt-in)
const [contentSignal, debertaSignal] = await Promise.all([
scanPageContent(message),
scanPageContentDeberta(message),
]);
const signals: LayerSignal[] = [contentSignal, debertaSignal];
// L4b: only bother with Haiku if another layer already lit up at >= LOG_ONLY.
// Saves ~70% of Haiku calls per plan §E1 "gating optimization".
if (shouldRunTranscriptCheck(signals)) {
const transcriptSignal = await checkTranscript({
user_message: message,
tool_calls: [], // no tool calls yet at session start
});
signals.push(transcriptSignal);
}
const result = combineVerdict(signals);
if (result.verdict !== 'block') return false;
// BLOCK verdict. Log + emit + refuse to spawn.
const domain = extractDomain(pageUrl ?? '');
const leaderSignal = signals.reduce((a, b) => (a.confidence > b.confidence ? a : b));
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash: hashPayload(message),
confidence: result.confidence,
layer: leaderSignal.layer,
verdict: 'block',
});
console.warn(`[sidebar-agent] Pre-spawn BLOCK (${result.reason}) for tab ${tid}, confidence=${result.confidence.toFixed(3)}`);
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: result.reason ?? 'ml_classifier',
layer: leaderSignal.layer,
confidence: result.confidence,
domain,
}, tid);
await sendEvent({
type: 'agent_error',
error: `Session blocked — prompt injection detected${domain ? ` from ${domain}` : ' in your message'}`,
}, tid);
return true;
}
async function askClaude(queueEntry: QueueEntry): Promise<void> {
const { prompt, args, stateFile, cwd, tabId, canary, pageUrl } = queueEntry;
const tid = tabId ?? 0;
processingTabs.add(tid);
await sendEvent({ type: 'agent_start' }, tid);
// Pre-spawn ML scan: if the user message trips the ensemble, refuse to
// spawn claude. Fail-open on classifier errors.
if (await preSpawnSecurityCheck(queueEntry)) {
processingTabs.delete(tid);
return;
}
return new Promise((resolve) => {
// Canary context is set after proc is spawned (needs proc reference for kill).
let canaryCtx: CanaryContext | undefined;
let canaryTriggered = false;
// Use args from queue entry (server sets --model, --allowedTools, prompt framing).
// Fall back to defaults only if queue entry has no args (backward compat).
// Write doesn't expand attack surface beyond what Bash already provides.
// The security boundary is the localhost-only message path, not the tool allowlist.
let claudeArgs = args || ['-p', prompt, '--output-format', 'stream-json', '--verbose',
'--allowedTools', 'Bash,Read,Glob,Grep,Write'];
// Validate cwd exists — queue may reference a stale worktree
let effectiveCwd = cwd || process.cwd();
try { fs.accessSync(effectiveCwd); } catch (err: any) {
console.warn('[sidebar-agent] Worktree path inaccessible, falling back to cwd:', effectiveCwd, err.message);
effectiveCwd = process.cwd();
}
// Clear any stale cancel signal for this tab before starting
const cancelFile = cancelFileForTab(tid);
safeUnlink(cancelFile);
const proc = spawn('claude', claudeArgs, {
stdio: ['pipe', 'pipe', 'pipe'],
cwd: effectiveCwd,
env: {
...process.env,
BROWSE_STATE_FILE: stateFile || '',
// Connect to the existing headed browse server, never start a new one.
// BROWSE_PORT tells the CLI which port to check.
// BROWSE_NO_AUTOSTART prevents spawning an invisible headless browser
// if the headed server is down — fail fast with a clear error instead.
BROWSE_PORT: process.env.BROWSE_PORT || '34567',
BROWSE_NO_AUTOSTART: '1',
// Pin this agent to its tab — prevents cross-tab interference
// when multiple agents run simultaneously
BROWSE_TAB: String(tid),
},
});
// Track active procs so kill-file polling can terminate them
activeProcs.set(tid, proc);
activeProc = proc;
proc.stdin.end();
// Now that proc exists, set up the canary-leak handler. It fires at most
// once; on fire we kill the subprocess, emit security_event + agent_error,
// and let the normal close handler resolve the promise.
if (canary) {
canaryCtx = {
canary,
pageUrl: pageUrl ?? '',
deltaBuf: { text_delta: '', input_json_delta: '' },
onLeak: (channel: string) => {
if (canaryTriggered) return;
canaryTriggered = true;
onCanaryLeaked({ tabId: tid, channel, canary, pageUrl: pageUrl ?? '' });
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => {
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
}, 2000);
},
};
}
// Tool-result ML scan context. Addresses the Codex review gap: Read,
// Grep, Glob, and WebFetch outputs enter Claude's context without
// passing through the Bash $B pipeline that content-security.ts
// already wraps. Scan them here.
let toolResultBlockFired = false;
const toolResultScanCtx: ToolResultScanContext = {
scan: async (toolName: string, text: string) => {
if (toolResultBlockFired) return;
// Parallel L4 + L4c ensemble scan (DeBERTa no-op when disabled).
// We run L4/L4c AND Haiku in parallel on tool outputs regardless of
// L4's score, because BrowseSafe-Bench shows L4 (TestSavantAI) has
// low recall on browser-agent-specific attacks (~15% at v1). Gating
// Haiku on L4 meant our best signal almost never ran. The cost is
// ~$0.002 + ~300ms per tool output, bounded by the Haiku timeout
// and offset by Haiku actually seeing the real attack context.
//
// Haiku only runs when the Claude CLI is available (checkHaikuAvailable
// caches the probe). In environments without it, the call returns a
// degraded signal and the verdict falls back to L4 alone.
const [contentSignal, debertaSignal, transcriptSignal] = await Promise.all([
scanPageContent(text),
scanPageContentDeberta(text),
checkTranscript({
user_message: queueEntry.message ?? '',
tool_calls: [{ tool_name: toolName, tool_input: {} }],
tool_output: text,
}),
]);
const signals: LayerSignal[] = [contentSignal, debertaSignal, transcriptSignal];
const result = combineVerdict(signals, { toolOutput: true });
if (result.verdict !== 'block') return;
toolResultBlockFired = true;
const domain = extractDomain(pageUrl ?? '');
const payloadHash = hashPayload(text.slice(0, 4096));
// Log pending — if the user overrides, we'll update via a separate
// log line. The attempts.jsonl is append-only so both entries survive.
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash,
confidence: result.confidence,
layer: 'testsavant_content',
verdict: 'block',
});
console.warn(`[sidebar-agent] Tool-result BLOCK on ${toolName} for tab ${tid} (confidence=${result.confidence.toFixed(3)}) — awaiting user decision`);
// Surface a REVIEWABLE block event. Sidepanel renders the suspected
// text + layer scores + [Allow and continue] / [Block session] buttons.
// The user has 60s to decide; default is BLOCK (safe fallback).
const layerScores = signals
.filter((s) => s.confidence > 0)
.map((s) => ({ layer: s.layer, confidence: s.confidence }));
await sendEvent({
type: 'security_event',
verdict: 'block',
reason: 'tool_result_ml',
layer: 'testsavant_content',
confidence: result.confidence,
domain,
tool: toolName,
reviewable: true,
suspected_text: excerptForReview(text),
signals: layerScores,
}, tid);
// Poll for the user's decision. Default to BLOCK on timeout.
const REVIEW_TIMEOUT_MS = 60_000;
const POLL_MS = 500;
clearDecision(tid); // clear any stale decision from a prior session
const deadline = Date.now() + REVIEW_TIMEOUT_MS;
let decision: 'allow' | 'block' = 'block';
let decisionReason = 'timeout';
while (Date.now() < deadline) {
const rec = readDecision(tid);
if (rec?.decision === 'allow' || rec?.decision === 'block') {
decision = rec.decision;
decisionReason = rec.reason ?? 'user';
break;
}
await new Promise((r) => setTimeout(r, POLL_MS));
}
clearDecision(tid);
if (decision === 'allow') {
// User overrode. Log the override so the audit trail captures it.
// toolResultBlockFired stays true so we don't re-prompt within the
// same message — one override per BLOCK event.
logAttempt({
ts: new Date().toISOString(),
urlDomain: domain,
payloadHash,
confidence: result.confidence,
layer: 'testsavant_content',
verdict: 'user_overrode',
});
await sendEvent({
type: 'security_event',
verdict: 'user_overrode',
reason: 'tool_result_ml',
layer: 'testsavant_content',
confidence: result.confidence,
domain,
tool: toolName,
}, tid);
console.warn(`[sidebar-agent] Tab ${tid}: user overrode BLOCK — session continues`);
// Let the block stay consumed; reset the flag so subsequent tool
// results get scanned fresh.
toolResultBlockFired = false;
return;
}
// User chose BLOCK (or timed out). Kill the session as before.
await sendEvent({
type: 'agent_error',
error: `Session terminated — prompt injection detected in ${toolName} output${decisionReason === 'timeout' ? ' (review timeout)' : ''}`,
}, tid);
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => {
try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
}, 2000);
},
};
// Poll for per-tab cancel signal from server's killAgent()
const cancelCheck = setInterval(() => {
try {
if (fs.existsSync(cancelFile)) {
console.log(`[sidebar-agent] Cancel signal received for tab ${tid} — killing claude subprocess`);
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
fs.unlinkSync(cancelFile);
clearInterval(cancelCheck);
}
} catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
}, 500);
let buffer = '';
proc.stdout.on('data', (data: Buffer) => {
buffer += data.toString();
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.trim()) continue;
try { handleStreamEvent(JSON.parse(line), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse stream line:`, line.slice(0, 100), err.message);
}
}
});
let stderrBuffer = '';
proc.stderr.on('data', (data: Buffer) => {
stderrBuffer += data.toString();
});
proc.on('close', (code) => {
clearInterval(cancelCheck);
activeProc = null;
activeProcs.delete(tid);
if (buffer.trim()) {
try { handleStreamEvent(JSON.parse(buffer), tid, canaryCtx, toolResultScanCtx); } catch (err: any) {
console.error(`[sidebar-agent] Tab ${tid}: Failed to parse final buffer:`, buffer.slice(0, 100), err.message);
}
}
const doneEvent: Record<string, any> = { type: 'agent_done' };
if (code !== 0 && stderrBuffer.trim()) {
doneEvent.stderr = stderrBuffer.trim().slice(-500);
}
sendEvent(doneEvent, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
});
proc.on('error', (err) => {
clearInterval(cancelCheck);
activeProc = null;
const errorMsg = stderrBuffer.trim()
? `${err.message}\nstderr: ${stderrBuffer.trim().slice(-500)}`
: err.message;
sendEvent({ type: 'agent_error', error: errorMsg }, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
});
// Timeout (default 300s / 5 min — multi-page tasks need time)
const timeoutMs = parseInt(process.env.SIDEBAR_AGENT_TIMEOUT || '300000', 10);
setTimeout(() => {
try { proc.kill('SIGTERM'); } catch (killErr: any) {
console.warn(`[sidebar-agent] Tab ${tid}: Failed to kill timed-out process:`, killErr.message);
}
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 3000);
const timeoutMsg = stderrBuffer.trim()
? `Timed out after ${timeoutMs / 1000}s\nstderr: ${stderrBuffer.trim().slice(-500)}`
: `Timed out after ${timeoutMs / 1000}s`;
sendEvent({ type: 'agent_error', error: timeoutMsg }, tid).then(() => {
processingTabs.delete(tid);
resolve();
});
}, timeoutMs);
});
}
// ─── Poll loop ───────────────────────────────────────────────────
function countLines(): number {
try {
return fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean).length;
} catch (err: any) {
console.error('[sidebar-agent] Failed to read queue file:', err.message);
return 0;
}
}
function readLine(n: number): string | null {
try {
const lines = fs.readFileSync(QUEUE, 'utf-8').split('\n').filter(Boolean);
return lines[n - 1] || null;
} catch (err: any) {
console.error(`[sidebar-agent] Failed to read queue line ${n}:`, err.message);
return null;
}
}
async function poll() {
const current = countLines();
if (current <= lastLine) return;
while (lastLine < current) {
lastLine++;
const line = readLine(lastLine);
if (!line) continue;
let parsed: unknown;
try { parsed = JSON.parse(line); } catch (err: any) {
console.warn(`[sidebar-agent] Skipping malformed queue entry at line ${lastLine}:`, line.slice(0, 80), err.message);
continue;
}
if (!isValidQueueEntry(parsed)) {
console.warn(`[sidebar-agent] Skipping invalid queue entry at line ${lastLine}: failed schema validation`);
continue;
}
const entry = parsed;
const tid = entry.tabId ?? 0;
// Skip if this tab already has an agent running — server queues per-tab
if (processingTabs.has(tid)) continue;
console.log(`[sidebar-agent] Processing tab ${tid}: "${entry.message}"`);
// Write to inbox so workspace agent can pick it up
writeToInbox(entry.message || entry.prompt, entry.pageUrl, entry.sessionId);
// Fire and forget — each tab's agent runs concurrently
askClaude(entry).catch((err) => {
console.error(`[sidebar-agent] Error on tab ${tid}:`, err);
sendEvent({ type: 'agent_error', error: String(err) }, tid);
});
}
}
// ─── Main ────────────────────────────────────────────────────────
function pollKillFile(): void {
try {
const stat = fs.statSync(KILL_FILE);
const mtime = stat.mtimeMs;
if (mtime > lastKillTs) {
lastKillTs = mtime;
if (activeProcs.size > 0) {
console.log(`[sidebar-agent] Kill signal received — terminating ${activeProcs.size} active agent(s)`);
for (const [tid, proc] of activeProcs) {
try { proc.kill('SIGTERM'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; }
setTimeout(() => { try { proc.kill('SIGKILL'); } catch (err: any) { if (err?.code !== 'ESRCH') throw err; } }, 2000);
processingTabs.delete(tid);
}
activeProcs.clear();
}
}
} catch {
// Kill file doesn't exist yet — normal state
}
}
async function main() {
const dir = path.dirname(QUEUE);
fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
if (!fs.existsSync(QUEUE)) fs.writeFileSync(QUEUE, '', { mode: 0o600 });
try { fs.chmodSync(QUEUE, 0o600); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; }
lastLine = countLines();
await refreshToken();
console.log(`[sidebar-agent] Started. Watching ${QUEUE} from line ${lastLine}`);
console.log(`[sidebar-agent] Server: ${SERVER_URL}`);
console.log(`[sidebar-agent] Browse binary: ${B}`);
// If GSTACK_SECURITY_ENSEMBLE=deberta is set, also warm the DeBERTa-v3
// ensemble classifier. Fire-and-forget alongside TestSavantAI — they
// warm in parallel. No-op when the env var is unset.
loadDeberta((msg) => console.log(`[security-classifier] ${msg}`))
.catch((err) => console.warn('[sidebar-agent] DeBERTa warmup failed:', err?.message));
// Warm up the ML classifier in the background. First call triggers a 112MB
// download (~30s on average broadband). Non-blocking — the sidebar stays
// functional on cold start; classifier just reports 'off' until warmed.
//
// On warmup completion (success or failure), write the classifier status to
// ~/.gstack/security/session-state.json so server.ts's /health endpoint can
// report it to the sidepanel for shield icon rendering.
loadTestsavant((msg) => console.log(`[security-classifier] ${msg}`))
.then(() => {
const s = getClassifierStatus();
console.log(`[sidebar-agent] Classifier warmup complete: ${JSON.stringify(s)}`);
const existing = readSessionState();
writeSessionState({
sessionId: existing?.sessionId ?? String(process.pid),
canary: existing?.canary ?? '',
warnedDomains: existing?.warnedDomains ?? [],
classifierStatus: s,
lastUpdated: new Date().toISOString(),
});
})
.catch((err) => console.warn('[sidebar-agent] Classifier warmup failed (degraded mode):', err?.message));
setInterval(poll, POLL_MS);
setInterval(pollKillFile, POLL_MS);
}
main().catch(console.error);

View File

@@ -200,10 +200,18 @@ function buildServer() {
// /ws — WebSocket upgrade. CRITICAL gates: // /ws — WebSocket upgrade. CRITICAL gates:
// (1) Origin must be chrome-extension://<id>. Cross-site WS hijacking // (1) Origin must be chrome-extension://<id>. Cross-site WS hijacking
// defense per codex finding #9. // defense — required, not optional.
// (2) Cookie gstack_pty must be in validTokens. The cookie was // (2) Token must be in validTokens. We accept the token via two
// minted by the parent server's /pty-session route under a // transports for compatibility:
// valid AUTH_TOKEN, so a request without it can't get a shell. // - Sec-WebSocket-Protocol (preferred for browsers — the only
// auth header settable from the browser WebSocket API)
// - Cookie gstack_pty (works for non-browser callers and
// same-port browser callers; doesn't survive the cross-port
// jump from server.ts:34567 to the agent's random port
// when SameSite=Strict is set)
// Either path works; both verify against the same in-memory
// validTokens Set, populated by the parent server's
// authenticated /pty-session → /internal/grant chain.
if (url.pathname === '/ws') { if (url.pathname === '/ws') {
const origin = req.headers.get('origin') || ''; const origin = req.headers.get('origin') || '';
const isExtensionOrigin = origin.startsWith('chrome-extension://'); const isExtensionOrigin = origin.startsWith('chrome-extension://');
@@ -214,18 +222,48 @@ function buildServer() {
return new Response('forbidden origin', { status: 403 }); return new Response('forbidden origin', { status: 403 });
} }
const cookieHeader = req.headers.get('cookie') || ''; // Try Sec-WebSocket-Protocol first. Format: a single token, possibly
let cookieToken: string | null = null; // with a `gstack-pty.` prefix (which we strip). Browsers send a
for (const part of cookieHeader.split(';')) { // comma-separated list when multiple were requested; we pick the
const [name, ...rest] = part.trim().split('='); // first that matches a known token.
if (name === 'gstack_pty') { cookieToken = rest.join('=') || null; break; } const protoHeader = req.headers.get('sec-websocket-protocol') || '';
let token: string | null = null;
let acceptedProtocol: string | null = null;
for (const raw of protoHeader.split(',').map(s => s.trim()).filter(Boolean)) {
const candidate = raw.startsWith('gstack-pty.') ? raw.slice('gstack-pty.'.length) : raw;
if (validTokens.has(candidate)) {
token = candidate;
acceptedProtocol = raw;
break;
}
} }
if (!cookieToken || !validTokens.has(cookieToken)) {
// Fallback: Cookie gstack_pty (legacy / non-browser callers).
if (!token) {
const cookieHeader = req.headers.get('cookie') || '';
for (const part of cookieHeader.split(';')) {
const [name, ...rest] = part.trim().split('=');
if (name === 'gstack_pty') {
const candidate = rest.join('=') || null;
if (candidate && validTokens.has(candidate)) {
token = candidate;
}
break;
}
}
}
if (!token) {
return new Response('unauthorized', { status: 401 }); return new Response('unauthorized', { status: 401 });
} }
const upgraded = server.upgrade(req, { const upgraded = server.upgrade(req, {
data: { cookie: cookieToken }, data: { cookie: token },
// Echo the protocol back so the browser accepts the upgrade.
// Required when the client sends Sec-WebSocket-Protocol — the
// server MUST select one of the offered protocols, otherwise
// the browser closes the connection immediately.
...(acceptedProtocol ? { headers: { 'Sec-WebSocket-Protocol': acceptedProtocol } } : {}),
}); });
return upgraded ? undefined : new Response('upgrade failed', { status: 500 }); return upgraded ? undefined : new Response('upgrade failed', { status: 500 });
} }

View File

@@ -1,226 +0,0 @@
/**
* Layer 3: Sidebar agent round-trip tests.
* Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
* script that outputs canned stream-json. Verifies events flow end-to-end:
* POST /sidebar-command → queue → sidebar-agent → mock claude → events → /sidebar-chat
*/
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
import { spawn, type Subprocess } from 'bun';
import * as fs from 'fs';
import * as os from 'os';
import * as path from 'path';
let serverProc: Subprocess | null = null;
let agentProc: Subprocess | null = null;
let serverPort: number = 0;
let authToken: string = '';
let tmpDir: string = '';
let stateFile: string = '';
let queueFile: string = '';
let mockBinDir: string = '';
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
const headers: Record<string, string> = {
'Content-Type': 'application/json',
...(opts.headers as Record<string, string> || {}),
};
if (!headers['Authorization'] && authToken) {
headers['Authorization'] = `Bearer ${authToken}`;
}
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
}
async function resetState() {
await api('/sidebar-session/new', { method: 'POST' });
fs.writeFileSync(queueFile, '');
}
async function pollChatUntil(
predicate: (entries: any[]) => boolean,
timeoutMs = 10000,
): Promise<any[]> {
const deadline = Date.now() + timeoutMs;
while (Date.now() < deadline) {
const resp = await api('/sidebar-chat?after=0');
const data = await resp.json();
if (predicate(data.entries)) return data.entries;
await new Promise(r => setTimeout(r, 300));
}
// Return whatever we have on timeout
const resp = await api('/sidebar-chat?after=0');
return (await resp.json()).entries;
}
function writeMockClaude(script: string) {
const mockPath = path.join(mockBinDir, 'claude');
fs.writeFileSync(mockPath, script, { mode: 0o755 });
}
beforeAll(async () => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
stateFile = path.join(tmpDir, 'browse.json');
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
mockBinDir = path.join(tmpDir, 'bin');
fs.mkdirSync(mockBinDir, { recursive: true });
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
// Write default mock claude that outputs canned events
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"mock-session-123"}'
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
echo '{"type":"result","result":"Done."}'
`);
// Start server (no browser)
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
serverProc = spawn(['bun', 'run', serverScript], {
env: {
...process.env,
BROWSE_STATE_FILE: stateFile,
BROWSE_HEADLESS_SKIP: '1',
BROWSE_PORT: '0',
SIDEBAR_QUEUE_PATH: queueFile,
BROWSE_IDLE_TIMEOUT: '300',
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Wait for server
const deadline = Date.now() + 15000;
while (Date.now() < deadline) {
if (fs.existsSync(stateFile)) {
try {
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
if (state.port && state.token) {
serverPort = state.port;
authToken = state.token;
break;
}
} catch {}
}
await new Promise(r => setTimeout(r, 100));
}
if (!serverPort) throw new Error('Server did not start in time');
// Start sidebar-agent with mock claude on PATH
const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
agentProc = spawn(['bun', 'run', agentScript], {
env: {
...process.env,
PATH: `${mockBinDir}:${process.env.PATH}`,
BROWSE_SERVER_PORT: String(serverPort),
BROWSE_STATE_FILE: stateFile,
SIDEBAR_QUEUE_PATH: queueFile,
SIDEBAR_AGENT_TIMEOUT: '10000',
BROWSE_BIN: 'browse', // doesn't matter, mock claude doesn't use it
},
stdio: ['ignore', 'pipe', 'pipe'],
});
// Give sidebar-agent time to start polling
await new Promise(r => setTimeout(r, 1000));
}, 20000);
afterAll(() => {
if (agentProc) { try { agentProc.kill(); } catch {} }
if (serverProc) { try { serverProc.kill(); } catch {} }
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
});
describe('sidebar-agent round-trip', () => {
test('full message round-trip with mock claude', async () => {
await resetState();
// Send a command
const resp = await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({
message: 'what is on this page?',
activeTabUrl: 'https://example.com/test',
}),
});
expect(resp.status).toBe(200);
// Wait for mock claude to process and events to arrive
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done'),
15000,
);
// Verify the flow: user message → agent_start → text → agent_done
const userEntry = entries.find((e: any) => e.role === 'user');
expect(userEntry).toBeDefined();
expect(userEntry.message).toBe('what is on this page?');
// The mock claude outputs text — check for any agent text entry
const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
expect(textEntries.length).toBeGreaterThan(0);
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
expect(doneEntry).toBeDefined();
// Agent should be back to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
}, 20000);
test('claude crash produces agent_error', async () => {
await resetState();
// Replace mock claude with one that crashes
writeMockClaude(`#!/bin/bash
echo '{"type":"system","session_id":"crash-test"}' >&2
exit 1
`);
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'crash test' }),
});
// Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
const entries = await pollChatUntil(
(entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
15000,
);
// Agent should recover to idle
const session = await (await api('/sidebar-session')).json();
expect(session.agent.status).toBe('idle');
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
`);
}, 20000);
test('sequential queue drain', async () => {
await resetState();
// Restore working mock
writeMockClaude(`#!/bin/bash
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
`);
// Send two messages rapidly — first processes, second queues
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'first message' }),
});
await api('/sidebar-command', {
method: 'POST',
body: JSON.stringify({ message: 'second message' }),
});
// Wait for both to complete (two agent_done events)
const entries = await pollChatUntil(
(entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
20000,
);
// Both user messages should be in chat
const userEntries = entries.filter((e: any) => e.role === 'user');
expect(userEntries.length).toBeGreaterThanOrEqual(2);
}, 25000);
});

View File

@@ -1,562 +0,0 @@
/**
* Tests for sidebar agent queue parsing and inbox writing.
*
* sidebar-agent.ts functions are not exported (it's an entry-point script),
* so we test the same logic inline: JSONL parsing, writeToInbox filesystem
* behavior, and edge cases.
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
// ─── Helpers: replicate sidebar-agent logic for unit testing ──────
/** Parse a single JSONL line — same logic as sidebar-agent poll() */
function parseQueueLine(line: string): any | null {
if (!line.trim()) return null;
try {
const entry = JSON.parse(line);
if (!entry.message && !entry.prompt) return null;
return entry;
} catch {
return null;
}
}
/** Read all valid entries from a JSONL string — same as countLines + readLine loop */
function parseQueueFile(content: string): any[] {
const entries: any[] = [];
const lines = content.split('\n').filter(Boolean);
for (const line of lines) {
const entry = parseQueueLine(line);
if (entry) entries.push(entry);
}
return entries;
}
/** Write to inbox — extracted logic from sidebar-agent.ts writeToInbox() */
function writeToInbox(
gitRoot: string,
message: string,
pageUrl?: string,
sessionId?: string,
): string | null {
if (!gitRoot) return null;
const inboxDir = path.join(gitRoot, '.context', 'sidebar-inbox');
fs.mkdirSync(inboxDir, { recursive: true });
const now = new Date();
const timestamp = now.toISOString().replace(/:/g, '-');
const filename = `${timestamp}-observation.json`;
const tmpFile = path.join(inboxDir, `.${filename}.tmp`);
const finalFile = path.join(inboxDir, filename);
const inboxMessage = {
type: 'observation',
timestamp: now.toISOString(),
page: { url: pageUrl || 'unknown', title: '' },
userMessage: message,
sidebarSessionId: sessionId || 'unknown',
};
fs.writeFileSync(tmpFile, JSON.stringify(inboxMessage, null, 2));
fs.renameSync(tmpFile, finalFile);
return finalFile;
}
/** Shorten paths — same logic as sidebar-agent.ts shorten() */
function shorten(str: string): string {
return str
.replace(/\/Users\/[^/]+/g, '~')
.replace(/\/conductor\/workspaces\/[^/]+\/[^/]+/g, '')
.replace(/\.claude\/skills\/gstack\//g, '')
.replace(/browse\/dist\/browse/g, '$B');
}
/** describeToolCall — replicated from sidebar-agent.ts for unit testing */
function describeToolCall(tool: string, input: any): string {
if (!input) return '';
if (tool === 'Bash' && input.command) {
const cmd = input.command;
const browseMatch = cmd.match(/\$B\s+(\w+)|browse[^\s]*\s+(\w+)/);
if (browseMatch) {
const browseCmd = browseMatch[1] || browseMatch[2];
const args = cmd.split(/\s+/).slice(2).join(' ');
switch (browseCmd) {
case 'goto': return `Opening ${args.replace(/['"]/g, '')}`;
case 'snapshot': return args.includes('-i') ? 'Scanning for interactive elements' : args.includes('-D') ? 'Checking what changed' : 'Taking a snapshot of the page';
case 'screenshot': return `Saving screenshot${args ? ` to ${shorten(args)}` : ''}`;
case 'click': return `Clicking ${args}`;
case 'fill': { const parts = args.split(/\s+/); return `Typing "${parts.slice(1).join(' ')}" into ${parts[0]}`; }
case 'text': return 'Reading page text';
case 'html': return args ? `Reading HTML of ${args}` : 'Reading full page HTML';
case 'links': return 'Finding all links on the page';
case 'forms': return 'Looking for forms';
case 'console': return 'Checking browser console for errors';
case 'network': return 'Checking network requests';
case 'url': return 'Checking current URL';
case 'back': return 'Going back';
case 'forward': return 'Going forward';
case 'reload': return 'Reloading the page';
case 'scroll': return args ? `Scrolling to ${args}` : 'Scrolling down';
case 'wait': return `Waiting for ${args}`;
case 'inspect': return args ? `Inspecting CSS of ${args}` : 'Getting CSS for last picked element';
case 'style': return `Changing CSS: ${args}`;
case 'cleanup': return 'Removing page clutter (ads, popups, banners)';
case 'prettyscreenshot': return 'Taking a clean screenshot';
case 'css': return `Checking CSS property: ${args}`;
case 'is': return `Checking if element is ${args}`;
case 'diff': return `Comparing ${args}`;
case 'responsive': return 'Taking screenshots at mobile, tablet, and desktop sizes';
case 'status': return 'Checking browser status';
case 'tabs': return 'Listing open tabs';
case 'focus': return 'Bringing browser to front';
case 'select': return `Selecting option in ${args}`;
case 'hover': return `Hovering over ${args}`;
case 'viewport': return `Setting viewport to ${args}`;
case 'upload': return `Uploading file to ${args.split(/\s+/)[0]}`;
default: return `Running browse ${browseCmd} ${args}`.trim();
}
}
if (cmd.includes('git ')) return `Running: ${shorten(cmd)}`;
let short = shorten(cmd);
return short.length > 100 ? short.slice(0, 100) + '…' : short;
}
if (tool === 'Read' && input.file_path) return `Reading ${shorten(input.file_path)}`;
if (tool === 'Edit' && input.file_path) return `Editing ${shorten(input.file_path)}`;
if (tool === 'Write' && input.file_path) return `Writing ${shorten(input.file_path)}`;
if (tool === 'Grep' && input.pattern) return `Searching for "${input.pattern}"`;
if (tool === 'Glob' && input.pattern) return `Finding files matching ${input.pattern}`;
try { return shorten(JSON.stringify(input)).slice(0, 80); } catch { return ''; }
}
// ─── Test setup ──────────────────────────────────────────────────
let tmpDir: string;
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-agent-test-'));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
// ─── Queue File Parsing ─────────────────────────────────────────
describe('queue file parsing', () => {
test('valid JSONL line parsed correctly', () => {
const line = JSON.stringify({ message: 'hello', prompt: 'check this', pageUrl: 'https://example.com' });
const entry = parseQueueLine(line);
expect(entry).not.toBeNull();
expect(entry.message).toBe('hello');
expect(entry.prompt).toBe('check this');
expect(entry.pageUrl).toBe('https://example.com');
});
test('malformed JSON line skipped without crash', () => {
const entry = parseQueueLine('this is not json {{{');
expect(entry).toBeNull();
});
test('valid JSON without message or prompt is skipped', () => {
const line = JSON.stringify({ foo: 'bar' });
const entry = parseQueueLine(line);
expect(entry).toBeNull();
});
test('empty file returns no entries', () => {
const entries = parseQueueFile('');
expect(entries).toEqual([]);
});
test('file with blank lines returns no entries', () => {
const entries = parseQueueFile('\n\n\n');
expect(entries).toEqual([]);
});
test('mixed valid and invalid lines', () => {
const content = [
JSON.stringify({ message: 'first' }),
'not json',
JSON.stringify({ unrelated: true }),
JSON.stringify({ message: 'second', prompt: 'do stuff' }),
].join('\n');
const entries = parseQueueFile(content);
expect(entries.length).toBe(2);
expect(entries[0].message).toBe('first');
expect(entries[1].message).toBe('second');
});
});
// ─── writeToInbox ────────────────────────────────────────────────
describe('writeToInbox', () => {
test('creates .context/sidebar-inbox/ directory', () => {
writeToInbox(tmpDir, 'test message');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
expect(fs.existsSync(inboxDir)).toBe(true);
expect(fs.statSync(inboxDir).isDirectory()).toBe(true);
});
test('writes valid JSON file', () => {
const filePath = writeToInbox(tmpDir, 'test message', 'https://example.com', 'session-123');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.type).toBe('observation');
expect(data.userMessage).toBe('test message');
expect(data.page.url).toBe('https://example.com');
expect(data.sidebarSessionId).toBe('session-123');
expect(data.timestamp).toBeTruthy();
});
test('atomic write — final file exists, no .tmp left', () => {
const filePath = writeToInbox(tmpDir, 'atomic test');
expect(filePath).not.toBeNull();
expect(fs.existsSync(filePath!)).toBe(true);
// Check no .tmp files remain in the inbox directory
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir);
const tmpFiles = files.filter(f => f.endsWith('.tmp'));
expect(tmpFiles.length).toBe(0);
// Final file should end with -observation.json
const jsonFiles = files.filter(f => f.endsWith('-observation.json') && !f.startsWith('.'));
expect(jsonFiles.length).toBe(1);
});
test('handles missing git root gracefully', () => {
const result = writeToInbox('', 'test');
expect(result).toBeNull();
});
test('defaults pageUrl to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no url provided');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.page.url).toBe('unknown');
});
test('defaults sessionId to unknown when not provided', () => {
const filePath = writeToInbox(tmpDir, 'no session');
expect(filePath).not.toBeNull();
const data = JSON.parse(fs.readFileSync(filePath!, 'utf-8'));
expect(data.sidebarSessionId).toBe('unknown');
});
test('multiple writes create separate files', () => {
writeToInbox(tmpDir, 'message 1');
// Tiny delay to ensure different timestamps
const t = Date.now();
while (Date.now() === t) {} // spin until next ms
writeToInbox(tmpDir, 'message 2');
const inboxDir = path.join(tmpDir, '.context', 'sidebar-inbox');
const files = fs.readdirSync(inboxDir).filter(f => f.endsWith('.json') && !f.startsWith('.'));
expect(files.length).toBe(2);
});
});
// ─── describeToolCall (verbose narration) ────────────────────────
describe('describeToolCall', () => {
// Browse navigation commands
test('goto → plain English with URL', () => {
const result = describeToolCall('Bash', { command: '$B goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('goto strips quotes from URL', () => {
const result = describeToolCall('Bash', { command: '$B goto "https://example.com"' });
expect(result).toBe('Opening https://example.com');
});
test('url → checking current URL', () => {
expect(describeToolCall('Bash', { command: '$B url' })).toBe('Checking current URL');
});
test('back/forward/reload → plain English', () => {
expect(describeToolCall('Bash', { command: '$B back' })).toBe('Going back');
expect(describeToolCall('Bash', { command: '$B forward' })).toBe('Going forward');
expect(describeToolCall('Bash', { command: '$B reload' })).toBe('Reloading the page');
});
// Snapshot variants
test('snapshot -i → scanning for interactive elements', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -i' })).toBe('Scanning for interactive elements');
});
test('snapshot -D → checking what changed', () => {
expect(describeToolCall('Bash', { command: '$B snapshot -D' })).toBe('Checking what changed');
});
test('snapshot (plain) → taking a snapshot', () => {
expect(describeToolCall('Bash', { command: '$B snapshot' })).toBe('Taking a snapshot of the page');
});
// Interaction commands
test('click → clicking element', () => {
expect(describeToolCall('Bash', { command: '$B click @e3' })).toBe('Clicking @e3');
});
test('fill → typing into element', () => {
expect(describeToolCall('Bash', { command: '$B fill @e4 "hello world"' })).toBe('Typing ""hello world"" into @e4');
});
test('scroll with selector → scrolling to element', () => {
expect(describeToolCall('Bash', { command: '$B scroll .footer' })).toBe('Scrolling to .footer');
});
test('scroll without args → scrolling down', () => {
expect(describeToolCall('Bash', { command: '$B scroll' })).toBe('Scrolling down');
});
// Reading commands
test('text → reading page text', () => {
expect(describeToolCall('Bash', { command: '$B text' })).toBe('Reading page text');
});
test('html with selector → reading HTML of element', () => {
expect(describeToolCall('Bash', { command: '$B html .header' })).toBe('Reading HTML of .header');
});
test('html without selector → reading full page HTML', () => {
expect(describeToolCall('Bash', { command: '$B html' })).toBe('Reading full page HTML');
});
test('links → finding all links', () => {
expect(describeToolCall('Bash', { command: '$B links' })).toBe('Finding all links on the page');
});
test('console → checking console', () => {
expect(describeToolCall('Bash', { command: '$B console' })).toBe('Checking browser console for errors');
});
// Inspector commands
test('inspect with selector → inspecting CSS', () => {
expect(describeToolCall('Bash', { command: '$B inspect .header' })).toBe('Inspecting CSS of .header');
});
test('inspect without args → getting last picked element', () => {
expect(describeToolCall('Bash', { command: '$B inspect' })).toBe('Getting CSS for last picked element');
});
test('style → changing CSS', () => {
expect(describeToolCall('Bash', { command: '$B style .header color red' })).toBe('Changing CSS: .header color red');
});
test('cleanup → removing page clutter', () => {
expect(describeToolCall('Bash', { command: '$B cleanup --all' })).toBe('Removing page clutter (ads, popups, banners)');
});
// Visual commands
test('screenshot → saving screenshot', () => {
expect(describeToolCall('Bash', { command: '$B screenshot /tmp/shot.png' })).toBe('Saving screenshot to /tmp/shot.png');
});
test('screenshot without path', () => {
expect(describeToolCall('Bash', { command: '$B screenshot' })).toBe('Saving screenshot');
});
test('responsive → multi-size screenshots', () => {
expect(describeToolCall('Bash', { command: '$B responsive' })).toBe('Taking screenshots at mobile, tablet, and desktop sizes');
});
// Non-browse tools
test('Read tool → reading file', () => {
expect(describeToolCall('Read', { file_path: '/Users/foo/project/src/app.ts' })).toBe('Reading ~/project/src/app.ts');
});
test('Grep tool → searching for pattern', () => {
expect(describeToolCall('Grep', { pattern: 'handleClick' })).toBe('Searching for "handleClick"');
});
test('Glob tool → finding files', () => {
expect(describeToolCall('Glob', { pattern: '**/*.tsx' })).toBe('Finding files matching **/*.tsx');
});
test('Edit tool → editing file', () => {
expect(describeToolCall('Edit', { file_path: '/Users/foo/src/main.ts' })).toBe('Editing ~/src/main.ts');
});
// Edge cases
test('null input → empty string', () => {
expect(describeToolCall('Bash', null)).toBe('');
});
test('unknown browse command → generic description', () => {
expect(describeToolCall('Bash', { command: '$B newtab https://foo.com' })).toContain('newtab');
});
test('non-browse bash → shortened command', () => {
expect(describeToolCall('Bash', { command: 'echo hello' })).toBe('echo hello');
});
test('full browse binary path recognized', () => {
const result = describeToolCall('Bash', { command: '/Users/garrytan/.claude/skills/gstack/browse/dist/browse goto https://example.com' });
expect(result).toBe('Opening https://example.com');
});
test('tab command → switching tab', () => {
expect(describeToolCall('Bash', { command: '$B tab 2' })).toContain('tab');
});
});
// ─── Per-tab agent concurrency (source code validation) ──────────
describe('per-tab agent concurrency', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
test('server has per-tab agent state map', () => {
expect(serverSrc).toContain('tabAgents');
expect(serverSrc).toContain('TabAgentState');
expect(serverSrc).toContain('getTabAgent');
});
test('server returns per-tab agent status in /sidebar-chat', () => {
expect(serverSrc).toContain('getTabAgentStatus');
expect(serverSrc).toContain('tabAgentStatus');
});
test('spawnClaude accepts forTabId parameter', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('forTabId');
expect(spawnFn).toContain('tabState.status');
});
test('sidebar-command endpoint uses per-tab agent state', () => {
expect(serverSrc).toContain('msgTabId');
expect(serverSrc).toContain('tabState.status');
expect(serverSrc).toContain('tabState.queue');
});
test('agent event handler resets per-tab state', () => {
expect(serverSrc).toContain('eventTabId');
expect(serverSrc).toContain('tabState.status = \'idle\'');
});
test('agent event handler processes per-tab queue', () => {
// After agent_done, should process next message from THIS tab's queue
expect(serverSrc).toContain('tabState.queue.length > 0');
expect(serverSrc).toContain('tabState.queue.shift');
});
test('sidebar-agent uses per-tab processing set', () => {
expect(agentSrc).toContain('processingTabs');
expect(agentSrc).not.toContain('isProcessing');
});
test('sidebar-agent sends tabId with all events', () => {
// sendEvent should accept tabId parameter
expect(agentSrc).toContain('async function sendEvent(event: Record<string, any>, tabId?: number)');
// askClaude destructures tabId from queue entry (regex tolerates
// additional fields like `canary` and `pageUrl` from security module).
expect(agentSrc).toMatch(
/const \{[^}]*\bprompt\b[^}]*\bargs\b[^}]*\bstateFile\b[^}]*\bcwd\b[^}]*\btabId\b[^}]*\}/
);
});
test('sidebar-agent allows concurrent agents across tabs', () => {
// poll() should not block globally — it should check per-tab
expect(agentSrc).toContain('processingTabs.has(tid)');
// askClaude should be fire-and-forget (no await blocking the loop)
expect(agentSrc).toContain('askClaude(entry).catch');
});
test('queue entries include tabId', () => {
const spawnFn = serverSrc.slice(
serverSrc.indexOf('function spawnClaude('),
serverSrc.indexOf('\nfunction ', serverSrc.indexOf('function spawnClaude(') + 1),
);
expect(spawnFn).toContain('tabId: agentTabId');
});
test('health check monitors all per-tab agents', () => {
expect(serverSrc).toContain('for (const [tid, state] of tabAgents)');
});
});
describe('BROWSE_TAB tab pinning (cross-tab isolation)', () => {
const serverSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'server.ts'), 'utf-8');
const agentSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'sidebar-agent.ts'), 'utf-8');
const cliSrc = fs.readFileSync(path.join(__dirname, '..', 'src', 'cli.ts'), 'utf-8');
test('sidebar-agent passes BROWSE_TAB env var to claude process', () => {
// The env block should include BROWSE_TAB set to the tab ID
expect(agentSrc).toContain('BROWSE_TAB');
expect(agentSrc).toContain('String(tid)');
});
test('CLI reads BROWSE_TAB and sends tabId in command body', () => {
// BROWSE_TAB env var is still honored (sidebar-agent path). After the
// make-pdf refactor, the CLI layer now also accepts --tab-id <N>, with
// the CLI flag taking precedence over the env var. Both resolve to the
// same `tabId` body field.
expect(cliSrc).toContain('process.env.BROWSE_TAB');
expect(cliSrc).toContain('parseInt(envTab, 10)');
});
test('handleCommandInternal accepts tabId from request body', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1) > 0
? serverSrc.indexOf('\n/** HTTP wrapper', serverSrc.indexOf('async function handleCommandInternal(') + 1)
: serverSrc.indexOf('\nasync function ', serverSrc.indexOf('async function handleCommandInternal(') + 200),
);
// Should destructure tabId from body
expect(handleFn).toContain('tabId');
// Should save and restore the active tab
expect(handleFn).toContain('savedTabId');
expect(handleFn).toContain('switchTab(tabId');
});
test('handleCommandInternal restores active tab after command (success path)', () => {
// On success, should restore savedTabId without stealing focus
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.length,
);
// Count restore calls — should appear in both success and error paths
const restoreCount = (handleFn.match(/switchTab\(savedTabId/g) || []).length;
expect(restoreCount).toBeGreaterThanOrEqual(2); // success + error paths
});
test('handleCommandInternal restores active tab on error path', () => {
// The catch block should also restore
const catchBlock = serverSrc.slice(
serverSrc.indexOf('} catch (err: any) {', serverSrc.indexOf('async function handleCommandInternal(')),
);
expect(catchBlock).toContain('switchTab(savedTabId');
});
test('tab pinning only activates when tabId is provided', () => {
const handleFn = serverSrc.slice(
serverSrc.indexOf('async function handleCommandInternal('),
serverSrc.indexOf('try {', serverSrc.indexOf('async function handleCommandInternal(') + 1),
);
// Should check tabId is not undefined/null before switching
expect(handleFn).toContain('tabId !== undefined');
expect(handleFn).toContain('tabId !== null');
});
test('CLI only sends tabId when it is a valid number', () => {
// Body should conditionally include tabId. Historically that was keyed off
// the BROWSE_TAB env var. After the make-pdf refactor, the CLI also honors
// a --tab-id <N> flag on the CLI itself, so the check is "tabId defined
// AND not NaN" rather than literally inspecting the env var.
expect(cliSrc).toContain('tabId !== undefined && !isNaN(tabId)');
});
});

View File

@@ -1,26 +1,15 @@
/** /**
* Regression: changing the default sidebar tab to Terminal must NOT break * Regression: sidebar layout invariants after the chat-tab rip.
* the existing Chat path or the debug-tab return-to logic.
* *
* Original /plan-eng-review Issue 3A asked for a Playwright + extension * The Chrome side panel used to host two surfaces: Chat (one-shot
* E2E test. The codebase doesn't ship Playwright extension launcher * `claude -p` queue) and Terminal (interactive PTY). Chat was ripped
* infrastructure (extension tests here are source-level), so this regression * once the PTY proved out — sidebar-agent.ts is gone, the chat queue
* is implemented as a structural assertion suite over the extension files. * endpoints are gone, and the primary-tab nav (Terminal | Chat) is
* That's enough to lock the load-bearing invariants: * gone. Terminal is now the sole primary surface.
* *
* 1. Terminal is the default-active primary tab. * This file locks the load-bearing invariants of that layout so a
* 2. Chat exists as a non-active primary tab. * future refactor can't silently re-introduce the old surface or break
* 3. The xterm assets are loaded. * the new one.
* 4. The debug-close path no longer hardcodes `tab-chat` (uses the
* activePrimaryPaneId helper that respects whichever primary tab
* the user has selected).
* 5. Manifest declares the ws://127.0.0.1 host permission so MV3
* doesn't block the WebSocket upgrade.
* 6. The chat surface (chat-messages, chat input wiring) still exists
* and was not accidentally deleted alongside the default-tab change.
*
* If a future refactor regresses any of these, this test fails BEFORE the
* change ships.
*/ */
import { describe, test, expect } from 'bun:test'; import { describe, test, expect } from 'bun:test';
@@ -32,84 +21,220 @@ const JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel
const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8'); const TERM_JS = fs.readFileSync(path.join(import.meta.dir, '../../extension/sidepanel-terminal.js'), 'utf-8');
const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8')); const MANIFEST = JSON.parse(fs.readFileSync(path.join(import.meta.dir, '../../extension/manifest.json'), 'utf-8'));
describe('sidebar tabs regression: Terminal is default, Chat survives', () => { describe('sidebar: chat tab + nav are removed, Terminal is sole primary surface', () => {
test('primary tab bar declares Terminal and Chat with Terminal active', () => { test('No primary-tab nav element exists', () => {
// Terminal is the active button. expect(HTML).not.toContain('class="primary-tabs"');
expect(HTML).toMatch(/<button[^>]*class="primary-tab active"[^>]*data-pane="terminal"/); expect(HTML).not.toContain('data-pane="chat"');
// Chat is a primary tab, present and non-active. expect(HTML).not.toContain('data-pane="terminal"');
expect(HTML).toMatch(/<button[^>]*class="primary-tab"[^>]*data-pane="chat"/);
}); });
test('Terminal pane is active and Chat pane is not active', () => { test('No <main id="tab-chat"> pane', () => {
// tab-terminal has the .active class on its <main>. expect(HTML).not.toMatch(/<main[^>]*id="tab-chat"/);
expect(HTML).toMatch(/<main id="tab-terminal" class="tab-content active"/); expect(HTML).not.toContain('id="chat-messages"');
// tab-chat is present but NOT active. expect(HTML).not.toContain('id="chat-loading"');
expect(HTML).toMatch(/<main id="tab-chat" class="tab-content"(?! active)/); expect(HTML).not.toContain('id="chat-welcome"');
}); });
test('xterm assets are loaded for the Terminal pane', () => { test('No chat input / send button / experimental banner', () => {
expect(HTML).toContain('lib/xterm.css'); expect(HTML).not.toContain('class="command-bar"');
expect(HTML).toContain('lib/xterm.js'); expect(HTML).not.toContain('id="command-input"');
expect(HTML).toContain('lib/xterm-addon-fit.js'); expect(HTML).not.toContain('id="send-btn"');
expect(HTML).toContain('sidepanel-terminal.js'); expect(HTML).not.toContain('id="stop-agent-btn"');
expect(HTML).not.toContain('id="experimental-banner"');
}); });
test('chat surface still exists (no accidental deletion)', () => { test('No clear-chat button in footer', () => {
// The chat input and chat-messages containers are load-bearing for the expect(HTML).not.toContain('id="clear-chat"');
// existing sidebar-agent flow. If the default-tab change accidentally
// removed them, this catches it before users do.
expect(HTML).toContain('id="chat-messages"');
expect(HTML).toContain('id="chat-loading"');
}); });
test('debug-close path no longer hardcodes tab-chat', () => { test('Terminal pane is .active by default and has the toolbar', () => {
// Before the Terminal default flip, sidepanel.js had two literal expect(HTML).toMatch(/<main[^>]*id="tab-terminal"[^>]*class="tab-content active"/);
// `getElementById('tab-chat').classList.add('active')` calls inside the expect(HTML).toContain('id="terminal-toolbar"');
// debug-close handlers. Both must now go through activePrimaryPaneId() expect(HTML).toContain('id="terminal-restart-now"');
// so closing debug returns to whichever primary tab is selected.
expect(JS).toContain('function activePrimaryPaneId');
// Old hardcoded form is gone (don't ban the string everywhere — there
// are legitimate references elsewhere in the file).
const debugToggleBlock = JS.slice(
JS.indexOf("debugToggle.addEventListener('click'"),
JS.indexOf("closeDebug.addEventListener('click'"),
);
expect(debugToggleBlock).not.toContain("'tab-chat'");
expect(debugToggleBlock).toContain('activePrimaryPaneId');
}); });
test('primary-tab click handler exists and toggles classes', () => { test('Quick-actions buttons (Cleanup / Screenshot / Cookies) survive in the terminal toolbar', () => {
expect(JS).toContain("querySelectorAll('.primary-tab')"); // Garry explicitly wanted these kept after the chat rip — they drive
expect(JS).toContain('aria-selected'); // browser actions, not chat.
expect(HTML).toContain('id="chat-cleanup-btn"');
expect(HTML).toContain('id="chat-screenshot-btn"');
expect(HTML).toContain('id="chat-cookies-btn"');
// They live inside the terminal toolbar now (siblings of the Restart
// button), not as a separate strip below all panes.
const toolbarStart = HTML.indexOf('id="terminal-toolbar"');
const toolbarEnd = HTML.indexOf('</div>', toolbarStart);
const toolbarBlock = HTML.slice(toolbarStart, toolbarEnd + 6);
expect(toolbarBlock).toContain('id="chat-cleanup-btn"');
expect(toolbarBlock).toContain('id="chat-screenshot-btn"');
expect(toolbarBlock).toContain('id="chat-cookies-btn"');
}); });
}); });
describe('sidebar terminal: lazy spawn + auth chain', () => { describe('sidepanel.js: chat helpers ripped, terminal-injection helper survives', () => {
test('terminal JS waits for first key to start (lazy-spawn)', () => { test('No primary-tab click handler', () => {
expect(TERM_JS).toContain('function onAnyKey'); expect(JS).not.toContain("querySelectorAll('.primary-tab')");
expect(TERM_JS).toContain('terminalActive'); expect(JS).not.toContain('activePrimaryPaneId');
expect(TERM_JS).toContain('connect()');
}); });
test('terminal JS does NOT auto-reconnect on close (codex finding #8)', () => { test('No chat polling, sendMessage, sendChat, stopAgent, or pollTabs', () => {
// Close handler transitions to ENDED and shows a restart button, expect(JS).not.toContain('chatPollInterval');
// not a reconnect timer. expect(JS).not.toContain('function sendMessage');
const closeBlock = TERM_JS.slice(TERM_JS.indexOf("addEventListener('close'")); expect(JS).not.toContain('function pollChat');
expect(closeBlock).toContain('ENDED'); expect(JS).not.toContain('function pollTabs');
// Forbid bare setTimeout(...connect... patterns inside this file's expect(JS).not.toContain('function switchChatTab');
// close handler — would indicate auto-reconnect crept back in. expect(JS).not.toContain('function stopAgent');
expect(TERM_JS).not.toMatch(/close[\s\S]{0,200}setTimeout\([^)]*connect/); expect(JS).not.toContain('function applyChatEnabled');
expect(JS).not.toContain('function showSecurityBanner');
}); });
test('terminal JS reaches /pty-session with the bootstrap auth token', () => { test('Cleanup runs through the live PTY (no /sidebar-command POST)', () => {
expect(TERM_JS).toContain('/pty-session'); // The new Cleanup handler injects the prompt straight into claude's
expect(TERM_JS).toContain('Bearer ${token}'); // PTY via gstackInjectToTerminal. The dead code path was a POST to
expect(TERM_JS).toContain('credentials'); // /sidebar-command which kicked off a fresh claude -p subprocess.
const cleanup = JS.slice(JS.indexOf('async function runCleanup'));
expect(cleanup).toContain('window.gstackInjectToTerminal');
expect(cleanup).not.toContain('/sidebar-command');
expect(cleanup).not.toContain('addChatEntry');
}); });
test('terminal JS opens ws://127.0.0.1 (not wss)', () => { test('Inspector "Send to Code" routes through the live PTY', () => {
expect(TERM_JS).toContain('new WebSocket(`ws://127.0.0.1:'); const sendBtn = JS.slice(JS.indexOf('inspectorSendBtn.addEventListener'));
// Origin is implicit (browser sets chrome-extension://<id>); no manual override. expect(sendBtn).toContain('window.gstackInjectToTerminal');
expect(sendBtn).not.toContain("type: 'sidebar-command'");
});
test('updateConnection no longer kicks off chat / tab polling', () => {
const update = JS.slice(JS.indexOf('function updateConnection'), JS.indexOf('function updateConnection') + 1500);
expect(update).not.toContain('chatPollInterval');
expect(update).not.toContain('tabPollInterval');
expect(update).not.toContain('pollChat');
expect(update).not.toContain('pollTabs');
// BUT must still expose the bootstrap globals for sidepanel-terminal.js.
expect(update).toContain('window.gstackServerPort');
expect(update).toContain('window.gstackAuthToken');
});
});
describe('sidepanel-terminal.js: eager auto-connect + injection API', () => {
test('Exposes window.gstackInjectToTerminal for cross-pane use', () => {
expect(TERM_JS).toContain('window.gstackInjectToTerminal');
// Returns false when no live session, true when bytes go out.
const inject = TERM_JS.slice(TERM_JS.indexOf('window.gstackInjectToTerminal'));
expect(inject).toContain('return false');
expect(inject).toContain('return true');
expect(inject).toContain('ws.readyState !== WebSocket.OPEN');
});
test('Auto-connects on init (no keypress required)', () => {
expect(TERM_JS).not.toContain('function onAnyKey');
expect(TERM_JS).not.toContain("addEventListener('keydown'");
expect(TERM_JS).toContain('function tryAutoConnect');
});
test('Repaint hook fires when Terminal pane becomes visible', () => {
// The chat-tab rip removed gstack:primary-tab-changed; we use a
// MutationObserver on #tab-terminal's class attr instead. The
// observer must call repaintIfLive when the .active class returns.
expect(TERM_JS).toContain('MutationObserver');
expect(TERM_JS).toContain("attributeFilter: ['class']");
expect(TERM_JS).toContain('repaintIfLive');
const repaint = TERM_JS.slice(TERM_JS.indexOf('function repaintIfLive'));
expect(repaint).toContain('fitAddon && fitAddon.fit()');
expect(repaint).toContain('term.refresh');
expect(repaint).toContain("type: 'resize'");
});
test('No auto-reconnect on close (Restart is user-initiated)', () => {
const closeOnly = TERM_JS.slice(
TERM_JS.indexOf("ws.addEventListener('close'"),
TERM_JS.indexOf("ws.addEventListener('error'"),
);
expect(closeOnly).not.toContain('setTimeout');
expect(closeOnly).not.toContain('tryAutoConnect');
expect(closeOnly).not.toContain('connect()');
});
test('forceRestart helper closes ws, disposes xterm, returns to IDLE', () => {
expect(TERM_JS).toContain('function forceRestart');
const fn = TERM_JS.slice(TERM_JS.indexOf('function forceRestart'));
expect(fn).toContain('ws && ws.close()');
expect(fn).toContain('term.dispose()');
expect(fn).toContain('STATE.IDLE');
expect(fn).toContain('tryAutoConnect()');
});
test('Both restart buttons (mid-session and ENDED) call forceRestart', () => {
expect(TERM_JS).toContain("els.restart?.addEventListener('click', forceRestart)");
expect(TERM_JS).toContain("els.restartNow?.addEventListener('click', forceRestart)");
});
});
describe('server.ts: chat / sidebar-agent endpoints are gone', () => {
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
test('No /sidebar-command, /sidebar-chat, /sidebar-agent/* routes', () => {
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-command['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-chat['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname\.startsWith\(['"]\/sidebar-agent\//);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-agent\/event['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-tabs['"]/);
expect(SERVER_SRC).not.toMatch(/url\.pathname === ['"]\/sidebar-session['"]/);
});
test('No chat-related state declarations or helpers', () => {
// Allow the symbol names inside the rip-marker comments — but no
// `let`, `const`, `function`, or `interface` declarations of them.
expect(SERVER_SRC).not.toMatch(/^let agentProcess/m);
expect(SERVER_SRC).not.toMatch(/^let agentStatus/m);
expect(SERVER_SRC).not.toMatch(/^let messageQueue/m);
expect(SERVER_SRC).not.toMatch(/^let sidebarSession/m);
expect(SERVER_SRC).not.toMatch(/^const tabAgents/m);
expect(SERVER_SRC).not.toMatch(/^function pickSidebarModel/m);
expect(SERVER_SRC).not.toMatch(/^function processAgentEvent/m);
expect(SERVER_SRC).not.toMatch(/^function killAgent/m);
expect(SERVER_SRC).not.toMatch(/^function addChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface ChatEntry/m);
expect(SERVER_SRC).not.toMatch(/^interface SidebarSession/m);
});
test('/health no longer surfaces agentStatus or messageQueue length', () => {
const health = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/health'"));
const slice = health.slice(0, 2000);
expect(slice).not.toContain('agentStatus');
expect(slice).not.toContain('messageQueue');
expect(slice).not.toContain('agentStartTime');
// chatEnabled is hardcoded false now (older clients still see the field).
expect(slice).toMatch(/chatEnabled:\s*false/);
// terminalPort survives.
expect(slice).toContain('terminalPort');
});
});
describe('cli.ts: sidebar-agent is no longer spawned', () => {
const CLI_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/cli.ts'), 'utf-8');
test('No Bun.spawn of sidebar-agent.ts', () => {
expect(CLI_SRC).not.toMatch(/Bun\.spawn\(\s*\['bun',\s*'run',\s*\w*[Aa]gent[Ss]cript\][\s\S]{0,300}sidebar-agent/);
// The variable name `agentScript` was for sidebar-agent. After the
// rip there's only termAgentScript. Allow comments to mention the
// history but not active spawn calls.
expect(CLI_SRC).not.toMatch(/^\s*let agentScript = path\.resolve/m);
});
test('Terminal-agent spawn survives', () => {
expect(CLI_SRC).toContain('terminal-agent.ts');
expect(CLI_SRC).toMatch(/Bun\.spawn\(\['bun',\s*'run',\s*termAgentScript\]/);
});
});
describe('files: sidebar-agent.ts and its tests are deleted', () => {
test('browse/src/sidebar-agent.ts is gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'))).toBe(false);
});
test('sidebar-agent test files are gone', () => {
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent.test.ts'))).toBe(false);
expect(fs.existsSync(path.join(import.meta.dir, 'sidebar-agent-roundtrip.test.ts'))).toBe(false);
}); });
}); });
@@ -123,8 +248,6 @@ describe('manifest: ws permission + xterm-safe CSP', () => {
}); });
test('manifest does NOT add unsafe-eval to extension_pages CSP', () => { test('manifest does NOT add unsafe-eval to extension_pages CSP', () => {
// xterm@5 is eval-free (verified at vendor time). If a future xterm
// upgrade requires unsafe-eval, this test fires and forces a decision.
const csp = MANIFEST.content_security_policy; const csp = MANIFEST.content_security_policy;
if (csp && csp.extension_pages) { if (csp && csp.extension_pages) {
expect(csp.extension_pages).not.toContain('unsafe-eval'); expect(csp.extension_pages).not.toContain('unsafe-eval');

View File

@@ -127,7 +127,7 @@ describe('terminal-agent: /ws gates', () => {
}); });
}); });
describe('terminal-agent: PTY round-trip via real WebSocket', () => { describe('terminal-agent: PTY round-trip via real WebSocket (Cookie auth)', () => {
test('binary writes go to PTY stdin, output streams back', async () => { test('binary writes go to PTY stdin, output streams back', async () => {
const cookie = 'rt-token-must-be-at-least-seventeen-chars-long'; const cookie = 'rt-token-must-be-at-least-seventeen-chars-long';
const granted = await grantToken(cookie); const granted = await grantToken(cookie);
@@ -182,6 +182,65 @@ describe('terminal-agent: PTY round-trip via real WebSocket', () => {
await Bun.sleep(200); await Bun.sleep(200);
}); });
test('Sec-WebSocket-Protocol auth path: browser-style upgrade with token in protocol', async () => {
// This is the path the actual browser extension takes. Cross-port
// SameSite=Strict cookies don't reliably survive the jump from the
// browse server (port A) to the agent (port B) when initiated from a
// chrome-extension origin, so we send the token via the only auth
// header the browser WebSocket API lets us set: Sec-WebSocket-Protocol.
//
// The browser sends `gstack-pty.<token>` and the agent must:
// 1) strip the gstack-pty. prefix
// 2) validate the token
// 3) ECHO the protocol back in the upgrade response
// Without (3) the browser closes the connection immediately, which
// is the exact bug the original cookie-only implementation hit in
// manual dogfood. This test catches that regression in CI.
const token = 'sec-protocol-token-must-be-at-least-seventeen-chars';
await grantToken(token);
// We exercise the protocol path by raw-handshaking via fetch+Upgrade,
// because Bun's test-client WebSocket constructor doesn't propagate
// `protocols` cleanly when also passed `headers` (the constructor
// detects the third-arg form unreliably). Real browsers (Chromium)
// use the standard protocols arg fine — the server-side handler is
// identical either way, so this test still locks the load-bearing
// invariant: the agent accepts a token via Sec-WebSocket-Protocol
// and echoes the protocol back so a browser would accept the upgrade.
const handshakeKey = 'dGhlIHNhbXBsZSBub25jZQ==';
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': handshakeKey,
'Sec-WebSocket-Protocol': `gstack-pty.${token}`,
'Origin': 'chrome-extension://test-extension-id',
},
});
// 101 Switching Protocols + protocol echoed back = browser would accept.
// 401/403/anything else = browser would close the connection immediately
// (the bug we hit in manual dogfood).
expect(resp.status).toBe(101);
expect(resp.headers.get('upgrade')?.toLowerCase()).toBe('websocket');
expect(resp.headers.get('sec-websocket-protocol')).toBe(`gstack-pty.${token}`);
});
test('Sec-WebSocket-Protocol auth: rejects unknown token even with valid Origin', async () => {
const resp = await fetch(`http://127.0.0.1:${agentPort}/ws`, {
headers: {
'Connection': 'Upgrade',
'Upgrade': 'websocket',
'Sec-WebSocket-Version': '13',
'Sec-WebSocket-Key': 'dGhlIHNhbXBsZSBub25jZQ==',
'Sec-WebSocket-Protocol': 'gstack-pty.never-granted-token',
'Origin': 'chrome-extension://test-extension-id',
},
});
expect(resp.status).toBe(401);
});
test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => { test('text frame {type:"resize"} is accepted (no crash, ws stays open)', async () => {
const cookie = 'resize-token-must-be-at-least-seventeen-chars'; const cookie = 'resize-token-must-be-at-least-seventeen-chars';
await grantToken(cookie); await grantToken(cookie);

View File

@@ -122,12 +122,26 @@ describe('Source-level guard: terminal-agent', () => {
expect(wsHandler).toContain('forbidden origin'); expect(wsHandler).toContain('forbidden origin');
}); });
test('validates gstack_pty cookie against an in-memory token set', () => { test('validates the session token against an in-memory token set', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')")); const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Two transports: Sec-WebSocket-Protocol (preferred for browsers) and
// Cookie gstack_pty (fallback). Both verify against validTokens.
expect(wsHandler).toContain('sec-websocket-protocol');
expect(wsHandler).toContain('gstack_pty'); expect(wsHandler).toContain('gstack_pty');
expect(wsHandler).toContain('validTokens.has'); expect(wsHandler).toContain('validTokens.has');
}); });
test('Sec-WebSocket-Protocol auth: strips gstack-pty. prefix and echoes back', () => {
const wsHandler = AGENT_SRC.slice(AGENT_SRC.indexOf("if (url.pathname === '/ws')"));
// Browsers send `Sec-WebSocket-Protocol: gstack-pty.<token>`. The agent
// must strip the prefix before checking validTokens, AND echo the
// protocol back in the upgrade response — without the echo, the
// browser closes the connection immediately.
expect(wsHandler).toContain("'gstack-pty.'");
expect(wsHandler).toContain('Sec-WebSocket-Protocol');
expect(wsHandler).toContain('acceptedProtocol');
});
test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => { test('lazy spawn: claude PTY is spawned in message handler, not on upgrade', () => {
// The whole point of lazy-spawn (codex finding #8) is that the WS // The whole point of lazy-spawn (codex finding #8) is that the WS
// upgrade itself does NOT call spawnClaude. Spawn happens on first // upgrade itself does NOT call spawnClaude. Spawn happens on first
@@ -158,14 +172,19 @@ describe('Source-level guard: terminal-agent', () => {
}); });
describe('Source-level guard: server.ts /pty-session route', () => { describe('Source-level guard: server.ts /pty-session route', () => {
test('validates AUTH_TOKEN and uses cookie-based grant', () => { test('validates AUTH_TOKEN, grants over loopback, returns token + Set-Cookie', () => {
const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'")); const route = SERVER_SRC.slice(SERVER_SRC.indexOf("url.pathname === '/pty-session'"));
// Must check auth before minting. // Must check auth before minting.
const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken')); const beforeMint = route.slice(0, route.indexOf('mintPtySessionToken'));
expect(beforeMint).toContain('validateAuth'); expect(beforeMint).toContain('validateAuth');
// Must call the loopback grant before responding. // Must call the loopback grant before responding (otherwise the
// agent's validTokens Set never sees the token and /ws would 401).
expect(route).toContain('grantPtyToken'); expect(route).toContain('grantPtyToken');
// Must Set-Cookie with the minted token. // Must return the token in the JSON body for the
// Sec-WebSocket-Protocol auth path (cross-port cookies don't survive
// SameSite=Strict from a chrome-extension origin).
expect(route).toContain('ptySessionToken');
// Set-Cookie is kept as a fallback for non-browser callers.
expect(route).toContain('Set-Cookie'); expect(route).toContain('Set-Cookie');
expect(route).toContain('buildPtySetCookie'); expect(route).toContain('buildPtySetCookie');
}); });

View File

@@ -1,211 +1,27 @@
# Sidebar Message Flow # Sidebar Flow
How the GStack Browser sidebar actually works. Read this before touching How the GStack Browser sidebar actually works. Read this before touching
sidepanel.js, background.js, content.js, server.ts sidebar endpoints, `sidepanel.js`, `background.js`, `content.js`, `terminal-agent.ts`, or
or sidebar-agent.ts. sidebar-related server endpoints.
The sidebar has one primary surface — the **Terminal** pane, an interactive
`claude` PTY. Activity / Refs / Inspector survive as debug overlays behind
the `debug` toggle in the footer. The chat queue path (one-shot `claude -p`,
sidebar-agent.ts) was ripped once the PTY proved out — the Terminal pane is
strictly more capable.
## Components ## Components
```
┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐
│ sidepanel.js │────▶│ background.js│────▶│ server.ts │────▶│sidebar-agent.ts│
│ (Chrome panel) │ │ (svc worker) │ │ (Bun HTTP) │ │ (Bun process) │
└─────────────────┘ └──────────────┘ └─────────────┘ └────────────────┘
▲ │ │
│ polls /sidebar-chat │ polls queue file │
└───────────────────────────────────────────┘ │
◀──────────────────────┘
POST /sidebar-agent/event
```
## Startup Timeline
```
T+0ms CLI runs `$B connect`
├── Server starts on port 34567
├── Writes state to .gstack/browse.json (pid, port, token)
├── Launches headed Chromium with extension
└── Clears sidebar-agent-queue.jsonl
T+500ms sidebar-agent.ts spawned by CLI
├── Reads auth token from .gstack/browse.json
├── Creates queue file if missing
├── Sets lastLine = current line count
└── Starts polling every 200ms
T+1-3s Extension loads in Chromium
├── background.js: health poll every 1s (fast startup)
│ └── GET /health → gets auth token
├── content.js: injects on welcome page
│ └── Does NOT fire gstack-extension-ready (waits for sidebar)
└── Side panel: may auto-open via chrome.sidePanel.open()
T+2-10s Side panel connects
├── tryConnect() → asks background for port/token
├── Fallback: direct GET /health for token
├── updateConnection(url, token)
│ ├── Starts chat polling (1s interval)
│ ├── Starts tab polling (2s interval)
│ ├── Connects SSE activity stream
│ └── Sends { type: 'sidebarOpened' } to background
└── background relays to content script → hides welcome arrow
T+10s+ Ready for messages
```
## Message Flow: User Types → Claude Responds
```
1. User types "go to hn" in sidebar, hits Enter
2. sidepanel.js sendMessage()
├── Renders user bubble immediately (optimistic)
├── Renders thinking dots immediately
├── Switches to fast poll (300ms)
└── chrome.runtime.sendMessage({ type: 'sidebar-command', message, tabId })
3. background.js
├── Gets active Chrome tab URL
└── POST /sidebar-command { message, activeTabUrl }
with Authorization: Bearer ${authToken}
4. server.ts /sidebar-command handler
├── validateAuth(req)
├── syncActiveTabByUrl(extensionUrl) — syncs Playwright tab to Chrome tab
├── pickSidebarModel(message) — 'sonnet' for actions, 'opus' for analysis
├── Adds user message to chat buffer
├── Builds system prompt + args
└── Appends JSON to ~/.gstack/sidebar-agent-queue.jsonl
5. sidebar-agent.ts poll() (within 200ms)
├── Reads new line from queue file
├── Parses JSON entry
├── Checks processingTabs — skips if tab already has agent running
└── askClaude(entry) — fire and forget
6. sidebar-agent.ts askClaude()
├── spawn('claude', ['-p', prompt, '--model', model, ...])
├── Streams stdout line-by-line (stream-json format)
├── For each event: POST /sidebar-agent/event { type, tool, text, tabId }
└── On close: POST /sidebar-agent/event { type: 'agent_done' }
7. server.ts processAgentEvent()
├── Adds entry to chat buffer (in-memory + disk)
├── On agent_done: sets tab status to 'idle'
└── On agent_done: processes next queued message for that tab
8. sidepanel.js pollChat() (every 300ms during fast poll)
├── GET /sidebar-chat?after=${chatLineCount}&tabId=${tabId}
├── Renders new entries (text, tool_use, agent_done)
└── On agent idle: removes thinking dots, stops fast poll
```
## Arrow Hint Hide Flow (4-step signal chain)
The welcome page shows a right-pointing arrow until the sidebar opens.
```
1. sidepanel.js updateConnection()
└── chrome.runtime.sendMessage({ type: 'sidebarOpened' })
2. background.js
└── chrome.tabs.sendMessage(activeTabId, { type: 'sidebarOpened' })
3. content.js onMessage handler
└── document.dispatchEvent(new CustomEvent('gstack-extension-ready'))
4. welcome.html script
└── addEventListener('gstack-extension-ready', () => arrow.classList.add('hidden'))
```
The arrow does NOT hide when the extension loads. Only when the sidebar connects.
## Auth Token Flow
```
Server starts → AUTH_TOKEN = crypto.randomUUID()
├── GET /health (no auth) → returns { token: AUTH_TOKEN }
├── background.js checkHealth() → authToken = data.token
│ └── Refreshes on EVERY health poll (fixes stale token on restart)
├── sidepanel.js tryConnect() → serverToken from background or /health
│ └── Used for chat polling: Authorization: Bearer ${serverToken}
└── sidebar-agent.ts refreshToken() → reads from .gstack/browse.json
└── Used for event relay: Authorization: Bearer ${authToken}
```
If the server restarts, all three components get fresh tokens within 10s
(background health poll interval).
## Model Routing
`pickSidebarModel(message)` in server.ts classifies messages:
| Pattern | Model | Why |
|---------|-------|-----|
| "click @e24", "go to hn", "screenshot" | sonnet | Deterministic tool calls, no thinking needed |
| "what does this page say?", "summarize" | opus | Needs comprehension |
| "find bugs", "check for broken links" | opus | Analysis task |
| "navigate to X and fill the form" | sonnet | Action-oriented, no analysis words |
Analysis words (`what`, `why`, `how`, `summarize`, `describe`, `analyze`, `read X and Y`)
always override action verbs and force opus.
## Known Failure Modes
| Failure | Symptom | Root Cause | Fix |
|---------|---------|------------|-----|
| Stale auth token | "Unauthorized" in input | Server restarted, background had old token | background.js refreshes token on every health poll |
| Tab ID mismatch | Message sent, no response visible | Server assigned tabId 1, sidebar polling tabId 0 | switchChatTab preserves optimistic UI during switch |
| Sidebar agent not running | Messages queue forever | Agent process failed to spawn or crashed | Check `ps aux | grep sidebar-agent` |
| Agent stale token | Agent runs but no events appear in sidebar | sidebar-agent has old token from .gstack/browse.json | Agent re-reads token before each event POST |
| Queue file missing | spawnClaude fails | Race between server start and agent start | Both sides create file if missing |
| Optimistic UI blown away | User bubble + dots vanish | switchChatTab replaced DOM with welcome screen | Preserved DOM when lastOptimisticMsg is set |
## Per-Tab Concurrency
Each browser tab can run its own agent simultaneously:
- Server: `tabAgents: Map<number, TabAgentState>` with per-tab queue (max 5)
- sidebar-agent: `processingTabs: Set<number>` prevents duplicate spawns
- Two messages on same tab: queued sequentially, processed in order
- Two messages on different tabs: run concurrently
## File Locations
| Component | File | Runs in |
|-----------|------|---------|
| Sidebar UI | `extension/sidepanel.js` | Chrome side panel |
| Service worker | `extension/background.js` | Chrome background |
| Content script | `extension/content.js` | Page context |
| Welcome page | `browse/src/welcome.html` | Page context |
| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
| Agent process | `browse/src/sidebar-agent.ts` | Bun (non-compiled, can spawn) |
| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
| Queue file | `~/.gstack/sidebar-agent-queue.jsonl` | Filesystem |
| State file | `.gstack/browse.json` | Filesystem |
| Chat log | `~/.gstack/sessions/<id>/chat.jsonl` | Filesystem |
## Terminal flow
The sidebar has a second primary tab next to Chat: **Terminal**. Where Chat
spawns one-shot `claude -p` per message, Terminal runs **interactive
`claude` in a real PTY** with xterm.js as the renderer.
### Components
``` ```
┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ sidepanel.js + │────▶│ server.ts │────▶│terminal-agent.ts │ sidepanel.js + │────▶│ server.ts │────▶│terminal-agent.ts │
│ -terminal.js │ │ (compiled) │ │ (non-compiled) │ │ -terminal.js │ │ (compiled) │ │ (non-compiled) │
│ (xterm.js) │ │ │ │ PTY listener │ │ (xterm.js) │ │ │ │ PTY listener │
└─────────────────┘ └──────────────┘ └──────────────────┘ └─────────────────┘ └──────────────┘ └──────────────────┘
▲ │ │ ▲ │ │
│ ws://127.0.0.1:<termPort>/ws (cookie auth) │ Bun.spawn(claude) │ ws://127.0.0.1:<termPort>/ws (Sec-WebSocket-Protocol auth)
└───────────────────────┼──────────────────────▶│ terminal: {data} └───────────────────────┼──────────────────────▶│ Bun.spawn(claude)
│ │ terminal: {data}
│ ▼ │ ▼
│ ┌──────────────────┐ │ ┌──────────────────┐
│ │ claude PTY │ │ │ claude PTY │
@@ -216,7 +32,8 @@ spawns one-shot `claude -p` per message, Terminal runs **interactive
┌──────────────────┐ ┌──────────────────┐
│ pty-session- │ │ pty-session- │
│ cookie.ts │ │ cookie.ts │
│ (HttpOnly cookie) │ (in-memory token
│ registry) │
└──────────────────┘ └──────────────────┘
│ POST /internal/grant (loopback) │ POST /internal/grant (loopback)
@@ -227,7 +44,11 @@ spawns one-shot `claude -p` per message, Terminal runs **interactive
└──────────────────┘ └──────────────────┘
``` ```
### Startup + first-key timeline The compiled browse server can't `posix_spawn` external executables —
`terminal-agent.ts` runs as a separate non-compiled `bun run` process and
owns the `claude` subprocess.
## Startup + first-keystroke timeline
``` ```
T+0ms CLI runs `$B connect` T+0ms CLI runs `$B connect`
@@ -241,81 +62,139 @@ T+500ms terminal-agent.ts boots
└── Probes claude → writes claude-available.json └── Probes claude → writes claude-available.json
T+1-3s Extension loads, sidebar opens T+1-3s Extension loads, sidebar opens
├── Terminal tab is default-active ├── sidepanel-terminal.js: setState(IDLE), shows "Starting Claude Code..."
── sidepanel-terminal.js: setState(IDLE), shows "Press any key" ── tryAutoConnect() polls until window.gstackServerPort + token are set
└── No PTY spawned yet (lazy)
T+user-keys First keystroke fires onAnyKey T+ready tryAutoConnect calls connect()
├── POST /pty-session (Authorization: Bearer AUTH_TOKEN) ├── POST /pty-session (Authorization: Bearer AUTH_TOKEN)
│ └── server mints cookie, posts /internal/grant to agent │ └── server mints session token, posts /internal/grant to agent
│ └── responds with Set-Cookie: gstack_pty=<HttpOnly> │ └── responds with {terminalPort, ptySessionToken}
│ └── responds with terminalPort
├── GET /claude-available (preflight) ├── GET /claude-available (preflight)
├── new WebSocket(ws://127.0.0.1:<terminalPort>/ws) ├── new WebSocket(`ws://127.0.0.1:<terminalPort>/ws`,
└── Browser carries gstack_pty cookie + Origin automatically [`gstack-pty.<token>`])
│ └── Agent validates Origin AND cookie BEFORE upgrading │ └── Browser sends Sec-WebSocket-Protocol + Origin
├── On upgrade success, send {type:"resize"} then a single byte │ └── Agent validates Origin AND token BEFORE upgrading
└── Agent message handler sees first byte → spawnClaude() └── Agent echoes the protocol back (REQUIRED — browser
│ closes the connection without it)
├── On open: send {type:"resize"} then a single \n byte
└── Agent message handler sees the byte → spawnClaude()
``` ```
## Auth: WebSocket can't send Authorization headers
Browser WebSocket clients can't set `Authorization`. They CAN set
`Sec-WebSocket-Protocol` via the second arg of `new WebSocket(url,
protocols)`. We exploit that:
1. `POST /pty-session` (auth: Bearer AUTH_TOKEN) → server mints a
short-lived session token, pushes it to the agent over loopback,
returns it in the JSON body.
2. Extension calls `new WebSocket(url, ['gstack-pty.<token>'])`.
3. Agent reads `Sec-WebSocket-Protocol`, strips `gstack-pty.`, validates
against `validTokens`, echoes the protocol back. Echo is mandatory —
without it Chromium closes the connection on receipt of the upgrade
response.
A `Set-Cookie: gstack_pty=...` header is also returned for non-browser
callers (curl, integration tests). The cookie path was the original v1
design but `SameSite=Strict` cookies don't survive the cross-port jump
from server.ts:34567 → agent:<random> from a chrome-extension origin.
The protocol-token path is what the browser actually uses.
### Dual-token model ### Dual-token model
| Token | Lives in | Used for | Lifetime | | Token | Lives in | Used for | Lifetime |
|-------|----------|----------|----------| |-------|----------|----------|----------|
| `AUTH_TOKEN` | `<stateDir>/browse.json`; in-memory in server.ts | `/pty-session` POST (mint cookie) | server lifetime | | `AUTH_TOKEN` | `<stateDir>/browse.json`; in-memory in server.ts | `/pty-session` POST (mint cookie + token) | server lifetime |
| `gstack_pty` cookie | Browser HttpOnly jar; agent `validTokens` Set | `/ws` upgrade auth | 30 min, dies on WS close | | `gstack-pty.<...>` (Sec-WebSocket-Protocol) | Browser memory only; agent `validTokens` Set | `/ws` upgrade auth | 30 min, auto-revoked on WS close |
| `INTERNAL_TOKEN` | `<stateDir>/terminal-internal-token`; in agent memory | server → agent loopback `/internal/grant` | agent lifetime | | `INTERNAL_TOKEN` | `<stateDir>/terminal-internal-token`; in agent memory | server → agent loopback `/internal/grant` | agent lifetime |
`AUTH_TOKEN` is **never** valid for `/ws` directly. The cookie is **never** `AUTH_TOKEN` is **never** valid for `/ws` directly. The session token is
valid for `/pty-session` or `/command`. Strict separation prevents an SSE **never** valid for `/pty-session` or `/command`. Strict separation
or sidebar-chat token leak from escalating into shell access. prevents an SSE or page-content token leak from escalating into shell
access.
### Threat model ## Threat model
The Terminal tab **bypasses the entire prompt-injection security stack** The Terminal pane **bypasses the prompt-injection security stack** on
(`content-security.ts` datamarking, `security-classifier.ts` ML scoring, purpose — the user is typing directly to claude, there's no untrusted
canary detection, ensemble verdicts). On the Terminal tab the user is page content in the loop. Trust source is the keyboard, same as any
typing directly to claude — there is no untrusted page content in the local terminal.
loop, so the threat model is "user trusts themselves," same as opening
a terminal locally.
That trust assumption is load-bearing on three transport-layer guarantees: That trust assumption is load-bearing on three transport guarantees:
1. **Local-only listener.** `terminal-agent.ts` binds `127.0.0.1` only. 1. **Local-only listener.** terminal-agent.ts binds `127.0.0.1` only.
The dual-listener tunnel surface (server.ts:95 `TUNNEL_PATHS`) does The dual-listener tunnel surface (server.ts `TUNNEL_PATHS`) does
**not** include `/pty-session` or `/terminal/*`, so the tunnel returns not include `/pty-session` or `/terminal/*`, so the tunnel returns
404 by default-deny. 404 by default-deny.
2. **Origin gate.** `/ws` upgrades require 2. **Origin gate.** `/ws` upgrades require
`Origin: chrome-extension://<id>`. A localhost web page cannot mount a `Origin: chrome-extension://<id>`. A localhost web page can't mount
cross-site WebSocket hijack against the shell because its Origin is a cross-site WebSocket hijack against the shell because its Origin
a regular `http(s)://...`. is a regular `http(s)://...`.
3. **Cookie auth.** `gstack_pty` is HttpOnly + SameSite=Strict, scoped to 3. **Session token auth.** Minted only by an authenticated
the local listener, minted only by an authenticated `/pty-session` `/pty-session` POST, scoped to one WS, auto-revoked on close.
POST. JS injected into a page can't read it; cross-site requests
can't send it.
Drop any of those three and the whole tab becomes unsafe. Drop any one of those three and the whole tab becomes unsafe.
### Lifecycle ## Lifecycle
- **Lazy spawn**: claude is not started until the user types a key. Idle - **Eager auto-connect.** Sidebar opens → tryAutoConnect polls for the
sidebar opens cost nothing. bootstrap globals and connects as soon as they're set. No keypress
- **One PTY per WS**: closing the WebSocket SIGINTs claude, then SIGKILLs required.
after 3s. The `gstack_pty` cookie is also revoked so a stolen cookie - **One PTY per WS.** Closing the WebSocket SIGINTs claude, then SIGKILLs
can't be replayed against a new PTY. after 3s. The session token is revoked so a stolen token can't be
- **No auto-reconnect**: when the WS closes the user sees "Session ended, replayed.
click to start a new session." Auto-reconnect would burn a fresh - **No auto-reconnect on close.** The user sees "Session ended, click to
claude session every reload. v1.1 may add session resumption keyed on start a new session." Auto-reconnect would burn a fresh claude session
tab/session id (see TODOS). on every reload. v1.1 may add session resumption keyed on tab/session
id (see TODOS).
- **Manual restart anytime.** A `↻ Restart` button lives in the always-
visible terminal toolbar — works mid-session, not just from the ENDED
state.
### Files ## Quick-action toolbar
Three browser-action buttons live next to the Restart button at the top
of the Terminal pane:
| Button | Behavior |
|--------|----------|
| 🧹 Cleanup | `window.gstackInjectToTerminal(prompt)` — pipes a "remove ads/banners" instruction into the live PTY. claude in the terminal sees it and acts. |
| 📸 Screenshot | `POST /command screenshot` — direct browse-server call, no PTY involvement. |
| 🍪 Cookies | Navigates to the `/cookie-picker` page. |
The Inspector's "Send to Code" button uses the same `gstackInjectToTerminal`
path to forward CSS inspector data into claude.
## Debug surfaces (Activity / Refs / Inspector)
Behind the `debug` toggle in the footer. SSE-driven, independent of the
Terminal pane:
- **Activity** — streams every browse command via `/activity/stream` SSE.
- **Refs** — REST: `GET /refs` — current page's `@ref` element labels.
- **Inspector** — CDP-based element picker; SSE on `/inspector/events`.
When the debug strip closes, the Terminal pane re-becomes visible.
xterm.js doesn't auto-redraw when its container flips from `display:none`
to `display:flex`, so sidepanel-terminal.js runs a `MutationObserver` on
`#tab-terminal`'s class attribute and forces a fit + refresh when
`.active` returns.
## Files
| Component | File | Runs in | | Component | File | Runs in |
|-----------|------|---------| |-----------|------|---------|
| Terminal UI | `extension/sidepanel-terminal.js` + xterm.js in `extension/lib/` | Chrome side panel | | Sidebar UI shell | `extension/sidepanel.html` + `sidepanel.js` + `sidepanel.css` | Chrome side panel |
| PTY agent | `browse/src/terminal-agent.ts` | Bun (non-compiled, can spawn) | | Terminal UI | `extension/sidepanel-terminal.js` + `extension/lib/xterm.js` | Chrome side panel |
| Cookie store | `browse/src/pty-session-cookie.ts` | Bun (compiled, in server.ts) | | Service worker | `extension/background.js` | Chrome background |
| Port file | `<stateDir>/terminal-port` | Filesystem | | Content script | `extension/content.js` | Page context |
| HTTP server | `browse/src/server.ts` | Bun (compiled binary) |
| PTY agent | `browse/src/terminal-agent.ts` | Bun (non-compiled) |
| PTY token store | `browse/src/pty-session-cookie.ts` | Bun (compiled, in server.ts) |
| CLI entry | `browse/src/cli.ts` | Bun (compiled binary) |
| State file | `<stateDir>/browse.json` | Filesystem |
| Terminal port | `<stateDir>/terminal-port` | Filesystem |
| Internal token | `<stateDir>/terminal-internal-token` | Filesystem | | Internal token | `<stateDir>/terminal-internal-token` | Filesystem |
| Claude probe | `<stateDir>/claude-available.json` | Filesystem | | Claude probe | `<stateDir>/claude-available.json` | Filesystem |
| Active tab | `<stateDir>/active-tab.json` | Filesystem (claude reads) | | Active tab | `<stateDir>/active-tab.json` | Filesystem (claude reads) |

View File

@@ -38,6 +38,7 @@
mount: document.getElementById('terminal-mount'), mount: document.getElementById('terminal-mount'),
ended: document.getElementById('terminal-ended'), ended: document.getElementById('terminal-ended'),
restart: document.getElementById('terminal-restart'), restart: document.getElementById('terminal-restart'),
restartNow: document.getElementById('terminal-restart-now'),
}; };
/** State machine. */ /** State machine. */
@@ -109,10 +110,12 @@
} }
/** /**
* POST /pty-session to mint the HttpOnly cookie. Returns { terminalPort, * POST /pty-session to mint a fresh terminal session. Returns
* expiresAt } on success, or null with reason on failure. Note: we do * { terminalPort, ptySessionToken, expiresAt } on success, or
* NOT receive the cookie value; it lives in the browser's HttpOnly jar * { error } on failure. The token rides on the WebSocket
* and travels with the next same-origin request automatically. * Sec-WebSocket-Protocol header, which is the only auth header
* the browser WebSocket API lets us set. The token is NOT persisted —
* each sidebar load mints a fresh one and discards it on close.
*/ */
async function mintSession() { async function mintSession() {
const serverPort = getServerPort(); const serverPort = getServerPort();
@@ -183,6 +186,22 @@
}); });
} }
/**
* Inject a string into the live PTY (the same way a real keystroke would).
* Used by the toolbar's Cleanup button and the Inspector's "Send to Code"
* action so the user can drive claude from outside-the-keyboard surfaces.
* Returns true if the bytes went out, false if no live session.
*/
window.gstackInjectToTerminal = function (text) {
if (!text || !ws || ws.readyState !== WebSocket.OPEN) return false;
try {
ws.send(new TextEncoder().encode(text));
return true;
} catch {
return false;
}
};
async function connect() { async function connect() {
if (state !== STATE.IDLE) return; // already connecting/live if (state !== STATE.IDLE) return; // already connecting/live
setState(STATE.CONNECTING); setState(STATE.CONNECTING);
@@ -192,7 +211,11 @@
setState(STATE.IDLE, { message: `Cannot start: ${minted.error}` }); setState(STATE.IDLE, { message: `Cannot start: ${minted.error}` });
return; return;
} }
const { terminalPort } = minted; const { terminalPort, ptySessionToken } = minted;
if (!ptySessionToken) {
setState(STATE.IDLE, { message: 'Cannot start: no session token returned' });
return;
}
// Pre-flight: does claude even exist on PATH? // Pre-flight: does claude even exist on PATH?
const claudeStatus = await checkClaudeAvailable(terminalPort); const claudeStatus = await checkClaudeAvailable(terminalPort);
@@ -205,7 +228,12 @@
setState(STATE.LIVE); setState(STATE.LIVE);
fitAddon && fitAddon.fit(); fitAddon && fitAddon.fit();
ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`); // Token rides on Sec-WebSocket-Protocol — the only auth header the
// browser WebSocket API lets us set. Cross-port HttpOnly cookies with
// SameSite=Strict don't survive the jump from server.ts:34567 to the
// agent's random port from a chrome-extension origin, so cookies
// alone weren't reliable.
ws = new WebSocket(`ws://127.0.0.1:${terminalPort}/ws`, [`gstack-pty.${ptySessionToken}`]);
ws.binaryType = 'arraybuffer'; ws.binaryType = 'arraybuffer';
ws.addEventListener('open', () => { ws.addEventListener('open', () => {
@@ -256,66 +284,101 @@
// ─── Wiring ─────────────────────────────────────────────────── // ─── Wiring ───────────────────────────────────────────────────
function init() { /**
// First-keystroke trigger on the bootstrap card. * Force a fresh session: close any open WS, dispose xterm, return to
document.addEventListener('keydown', onAnyKey, { once: false, capture: true }); * IDLE, kick off auto-connect. Safe to call from any state.
*/
function forceRestart() {
try { ws && ws.close(); } catch {}
ws = null;
if (term) {
try { term.dispose(); } catch {}
term = null;
fitAddon = null;
}
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
tryAutoConnect();
}
els.installRetry?.addEventListener('click', async () => { /**
// Re-probe and try connecting again. * Repaint xterm when the Terminal pane becomes visible. xterm.js has a
const minted = await mintSession(); * known issue where its renderer doesn't redraw after a display:none →
if (!minted.error) { * display:flex flip — the canvas/DOM stays blank until something forces
const claudeStatus = await checkClaudeAvailable(minted.terminalPort); * a layout pass. fit() recomputes dimensions, refresh() redraws.
if (claudeStatus.available) { */
setState(STATE.IDLE); function repaintIfLive() {
// Auto-trigger reconnect on next key if (state !== STATE.LIVE || !term) return;
} try { fitAddon && fitAddon.fit(); } catch {}
} try { term.refresh(0, term.rows - 1); } catch {}
}); try {
els.restart?.addEventListener('click', () => {
// Clean restart. Drop xterm state too — codex 1C: each session is fresh.
if (term) {
try { term.dispose(); } catch {}
term = null;
fitAddon = null;
}
setState(STATE.IDLE);
});
// Tab switching: tell the agent which browser tab is active so claude's
// active-tab.json stays in sync. sidepanel.js owns the active-tab state;
// we listen for its "tab activated" event.
document.addEventListener('gstack:active-tab-changed', (ev) => {
if (ws && ws.readyState === WebSocket.OPEN) { if (ws && ws.readyState === WebSocket.OPEN) {
try { ws.send(JSON.stringify({ type: 'resize', cols: term.cols, rows: term.rows }));
ws.send(JSON.stringify({
type: 'tabSwitch',
tabId: ev.detail?.tabId,
url: ev.detail?.url,
title: ev.detail?.title,
}));
} catch {}
} }
} catch {}
}
function init() {
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
els.installRetry?.addEventListener('click', () => {
// Re-probe claude on PATH, then try a connect.
setState(STATE.IDLE, { message: 'Starting Claude Code...' });
tryAutoConnect();
}); });
// Initial state // Two restart buttons:
setState(STATE.IDLE); // - els.restart lives inside the ENDED state card (visible only after
// a session has ended).
// - els.restartNow lives in the always-visible toolbar (lets the user
// force a fresh claude mid-session without waiting for it to exit).
els.restart?.addEventListener('click', forceRestart);
els.restartNow?.addEventListener('click', forceRestart);
// Repaint after a debug-tab → primary-pane transition. The debug
// tabs (Activity / Refs / Inspector) hide the Terminal pane via
// .tab-content { display: none }; xterm doesn't auto-redraw when its
// container flips back to visible, so we listen for the close-debug
// event and force a fit + refresh.
const observer = new MutationObserver(() => {
const term = document.getElementById('tab-terminal');
if (term?.classList.contains('active')) {
requestAnimationFrame(repaintIfLive);
}
});
const target = document.getElementById('tab-terminal');
if (target) observer.observe(target, { attributes: true, attributeFilter: ['class'] });
tryAutoConnect();
} }
function onAnyKey(ev) { /**
// Only trigger if Terminal pane is the active one and we're idle. * Eager-connect when the sidebar opens. Polls for sidepanel.js to populate
const terminalActive = document.getElementById('tab-terminal')?.classList.contains('active'); * window.gstackServerPort + window.gstackAuthToken (which it does as soon
if (!terminalActive) return; * as /health succeeds), then fires connect() automatically. The user
* doesn't have to press a key — Terminal is the default tab and "tap to
* start" was a needless paper cut on every reload.
*/
function tryAutoConnect() {
if (state !== STATE.IDLE) return; if (state !== STATE.IDLE) return;
// Ignore pure modifier keys. let waited = 0;
if (['Shift', 'Control', 'Alt', 'Meta', 'CapsLock'].includes(ev.key)) return; const tick = () => {
connect(); // If the user navigated away (Chat tab) or already connected, drop out.
if (state !== STATE.IDLE) return;
if (getServerPort() && getAuthToken()) {
connect();
return;
}
waited += 200;
if (waited > 15000) {
setState(STATE.IDLE, { message: 'Browse server not ready. Reload sidebar to retry.' });
return;
}
setTimeout(tick, 200);
};
tick();
} }
// Wait for sidepanel.js to populate window.gstackServerPort + window.gstackAuthToken.
// sidepanel.js already polls /health and resolves the connection; we just need
// to wait for it. If those globals aren't available within 10s, surface a
// "browse server not ready" message — user can reload sidebar.
if (document.readyState === 'loading') { if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', init); document.addEventListener('DOMContentLoaded', init);
} else { } else {

View File

@@ -675,36 +675,40 @@ body::after {
} }
.tab-content.active { display: flex; flex-direction: column; } .tab-content.active { display: flex; flex-direction: column; }
/* ─── Primary surface tabs (Terminal | Chat) ──────────────────── */
.primary-tabs {
display: flex;
border-bottom: 1px solid var(--border);
background: #0f0f0f;
padding: 0 8px;
flex-shrink: 0;
}
.primary-tab {
background: transparent;
border: none;
color: #71717a;
padding: 8px 14px;
font-size: 12px;
font-family: 'JetBrains Mono', monospace;
cursor: pointer;
border-bottom: 2px solid transparent;
margin-bottom: -1px;
}
.primary-tab:hover { color: #e5e5e5; }
.primary-tab.active {
color: #e5e5e5;
border-bottom-color: #f59e0b;
}
/* ─── Terminal Tab ────────────────────────────────────────────── */ /* ─── Terminal Tab ────────────────────────────────────────────── */
#tab-terminal { #tab-terminal {
background: #0a0a0a; background: #0a0a0a;
padding: 0; padding: 0;
} }
.terminal-toolbar {
display: flex;
align-items: center;
justify-content: space-between;
gap: 6px;
padding: 4px 8px;
border-bottom: 1px solid #1a1a1a;
background: #0a0a0a;
flex-shrink: 0;
}
.terminal-toolbar-actions {
display: flex;
gap: 4px;
flex-wrap: wrap;
}
.terminal-toolbar-btn {
background: transparent;
border: 1px solid #27272a;
color: #a1a1aa;
padding: 3px 10px;
font-size: 11px;
font-family: 'JetBrains Mono', monospace;
border-radius: 3px;
cursor: pointer;
}
.terminal-toolbar-btn:hover {
color: #f59e0b;
border-color: #f59e0b;
}
.terminal-bootstrap { .terminal-bootstrap {
flex: 1; flex: 1;
display: flex; display: flex;

View File

@@ -25,57 +25,28 @@
</div> </div>
</div> </div>
<!-- Security event banner — fires on prompt injection detection.
Variant A from /plan-design-review 2026-04-19: centered alert-heavy,
big red error icon, mono layer scores in expandable details. -->
<div class="security-banner" id="security-banner" role="alert" aria-live="assertive" style="display:none">
<button class="security-banner-close" id="security-banner-close" aria-label="Dismiss">&times;</button>
<div class="security-banner-icon" aria-hidden="true">
<svg width="28" height="28" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="10"></circle>
<line x1="12" y1="8" x2="12" y2="12"></line>
<line x1="12" y1="16" x2="12.01" y2="16"></line>
</svg>
</div>
<div class="security-banner-title" id="security-banner-title">Session terminated</div>
<div class="security-banner-subtitle" id="security-banner-subtitle">prompt injection detected</div>
<button class="security-banner-expand" id="security-banner-expand" aria-expanded="false" aria-controls="security-banner-details">
<span>What happened</span>
<svg class="security-banner-chevron" width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<polyline points="6 9 12 15 18 9"></polyline>
</svg>
</button>
<div class="security-banner-details" id="security-banner-details" hidden>
<div class="security-banner-section-label">SECURITY LAYERS</div>
<div class="security-banner-layers" id="security-banner-layers"></div>
<div class="security-banner-section-label" id="security-banner-suspect-label" hidden>SUSPECTED TEXT</div>
<pre class="security-banner-suspect" id="security-banner-suspect" hidden></pre>
</div>
<div class="security-banner-actions" id="security-banner-actions" hidden>
<button type="button" class="security-banner-btn security-banner-btn-block" id="security-banner-btn-block">Block session</button>
<button type="button" class="security-banner-btn security-banner-btn-allow" id="security-banner-btn-allow">Allow and continue</button>
</div>
</div>
<!-- Browser tab bar --> <!-- Browser tab bar -->
<div class="browser-tabs" id="browser-tabs" style="display:none"></div> <div class="browser-tabs" id="browser-tabs" style="display:none"></div>
<!-- Primary surface tabs: Terminal (default) | Chat. Activity / Refs / <!-- Terminal pane is now the sole primary surface. Activity / Refs /
Inspector still exist as a separate debug-tabs strip below. The Inspector still exist behind the `debug` toggle in the footer. -->
Terminal tab is default-active per /plan-eng-review Issue 1B
(subsequently informed by codex's spawn-waste finding: PTY only
spawns when the user types, so default-active is cheap). -->
<nav class="primary-tabs" id="primary-tabs" role="tablist">
<button class="primary-tab active" role="tab" data-pane="terminal" aria-selected="true">Terminal</button>
<button class="primary-tab" role="tab" data-pane="chat" aria-selected="false">Chat</button>
</nav>
<!-- Terminal Tab (default-active) -->
<main id="tab-terminal" class="tab-content active" role="tabpanel" aria-label="Terminal"> <main id="tab-terminal" class="tab-content active" role="tabpanel" aria-label="Terminal">
<!-- Toolbar with browser quick-actions on the left, Restart on the right.
Restart is always visible so the user can force a fresh claude any
time, not just from the ENDED state. -->
<div class="terminal-toolbar" id="terminal-toolbar">
<div class="terminal-toolbar-actions">
<button id="chat-cleanup-btn" class="terminal-toolbar-btn" title="Remove ads, banners, popups">🧹 Cleanup</button>
<button id="chat-screenshot-btn" class="terminal-toolbar-btn" title="Take a screenshot">📸 Screenshot</button>
<button id="chat-cookies-btn" class="terminal-toolbar-btn" title="Import cookies from your browser">🍪 Cookies</button>
</div>
<button class="terminal-toolbar-btn" id="terminal-restart-now" title="Restart Claude Code session">↻ Restart</button>
</div>
<div class="terminal-bootstrap" id="terminal-bootstrap"> <div class="terminal-bootstrap" id="terminal-bootstrap">
<div class="terminal-bootstrap-icon"></div> <div class="terminal-bootstrap-icon"></div>
<p id="terminal-bootstrap-status">Press any key to start Claude Code.</p> <p id="terminal-bootstrap-status">Starting Claude Code...</p>
<p class="muted" id="terminal-bootstrap-hint">Real PTY. Real terminal. Real claude.</p> <p class="muted" id="terminal-bootstrap-hint">Real PTY. Real terminal. Real claude.</p>
<pre id="loading-debug" class="muted" style="font-size:11px; font-family:'JetBrains Mono',monospace; white-space:pre-wrap; margin-top:8px; color:#71717A;"></pre>
</div> </div>
<div class="terminal-install-card" id="terminal-install-card" style="display:none"> <div class="terminal-install-card" id="terminal-install-card" style="display:none">
<p><strong>Claude Code not found</strong></p> <p><strong>Claude Code not found</strong></p>
@@ -89,22 +60,6 @@
</div> </div>
</main> </main>
<!-- Chat Tab (the existing claude -p one-shot chat path; preserved verbatim) -->
<main id="tab-chat" class="tab-content" role="tabpanel" aria-label="Chat">
<div class="chat-messages" id="chat-messages">
<div class="chat-loading" id="chat-loading">
<div class="chat-loading-spinner"></div>
<p id="loading-status">Looking for browse server...</p>
<pre id="loading-debug" class="muted" style="font-size:11px; font-family:'JetBrains Mono',monospace; white-space:pre-wrap; margin-top:8px; color:#71717A;"></pre>
</div>
<div class="chat-welcome" id="chat-welcome" style="display:none">
<div class="chat-welcome-icon">G</div>
<p>Send a message to Claude Code.</p>
<p class="muted">Your agent will see it and act on it.</p>
</div>
</div>
</main>
<!-- Debug: Activity Tab (hidden by default) --> <!-- Debug: Activity Tab (hidden by default) -->
<main id="tab-activity" class="tab-content" role="log" aria-live="polite"> <main id="tab-activity" class="tab-content" role="log" aria-live="polite">
<div class="empty-state" id="empty-state"> <div class="empty-state" id="empty-state">
@@ -204,30 +159,10 @@
</div> </div>
</main> </main>
<!-- Experimental chat banner (shown when chatEnabled) -->
<div id="experimental-banner" class="experimental-banner" style="display: none;">
Browser co-pilot &mdash; controls this browser, reports back to your workspace
</div>
<!-- Quick Actions Toolbar -->
<div class="quick-actions" id="quick-actions">
<button id="chat-cleanup-btn" class="quick-action-btn" title="Remove ads, banners, popups">🧹 Cleanup</button>
<button id="chat-screenshot-btn" class="quick-action-btn" title="Take a screenshot">📸 Screenshot</button>
<button id="chat-cookies-btn" class="quick-action-btn" title="Import cookies from your browser">🍪 Cookies</button>
</div>
<!-- Command Bar -->
<div class="command-bar">
<button class="stop-btn" id="stop-agent-btn" title="Stop agent" style="display: none;">&#x25A0;</button>
<input type="text" class="command-input" id="command-input" placeholder="Ask about this page..." autocomplete="off" spellcheck="false">
<button class="send-btn" id="send-btn" title="Send">&#x2191;</button>
</div>
<!-- Footer with connection + debug toggle --> <!-- Footer with connection + debug toggle -->
<footer> <footer>
<div class="footer-left"> <div class="footer-left">
<button class="debug-toggle" id="debug-toggle" title="Toggle debug panels">debug</button> <button class="debug-toggle" id="debug-toggle" title="Toggle debug panels">debug</button>
<button class="footer-btn" id="clear-chat" title="Clear chat">clear</button>
<button class="footer-btn" id="reload-sidebar" title="Reload sidebar">reload</button> <button class="footer-btn" id="reload-sidebar" title="Reload sidebar">reload</button>
</div> </div>
<div class="footer-right"> <div class="footer-right">

File diff suppressed because it is too large Load Diff