feat(scrape): /scrape <intent> skill template

One entry point for pulling page data. Three paths under the hood: 1. Match — agent reads $B skill list, semantically matches the user's intent against each skill's triggers + description + host. Confident match = $B skill run <name> in ~200ms. 2. Prototype — no match, drive the page with $B goto/text/html/links etc. Return JSON, append a one-line "say /skillify" nudge. 3. Mutating refusal — verbs like submit/click/fill route to /automate (Phase 2b P0); /scrape is read-only by contract. Match decision lives in the agent, not the daemon. No new code in browse/src/, no expanded daemon command surface, no new prompt-injection blast radius. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 02:42:29 +08:00 · 2026-04-27 18:33:36 -07:00
parent 0b723c437f
commit 5ae696b6fa
2 changed files with 984 additions and 0 deletions
--- a/scrape/SKILL.md.tmpl
+++ b/scrape/SKILL.md.tmpl
@@ -0,0 +1,152 @@
+---
+name: scrape
+version: 1.0.0
+description: |
+  Pull data from a web page. First call on a new intent prototypes the flow
+  via $B primitives and returns JSON. Subsequent calls on a matching intent
+  route to a codified browser-skill and return in ~200ms. Read-only — for
+  mutating flows (form fills, clicks, submissions), use /automate.
+  Use when asked to "scrape", "get data from", "pull", "extract from", or
+  "what's on" a page. (gstack)
+allowed-tools:
+  - Bash
+  - Read
+  - AskUserQuestion
+triggers:
+  - scrape this page
+  - get data from
+  - pull from
+  - extract from
+  - what is on
+---
+
+{{PREAMBLE}}
+
+# /scrape — pull data from a page
+
+One entry point for getting data off the web. Two paths under the hood:
+
+1. **Match path** (~200ms) — if the user's intent matches an existing
+   browser-skill's triggers, run it via `$B skill run <name>` and emit
+   the JSON.
+2. **Prototype path** (~30s) — no matching skill yet, so drive the page
+   with `$B` primitives, return the JSON, and suggest `/skillify` so the
+   next call lands on the match path.
+
+Read-only by contract. If the intent implies writing (submitting forms,
+clicking buttons that mutate state), refuse and route to `/automate`.
+
+## Step 1 — Determine intent
+
+The user's request after `/scrape` is the intent. If they did not include
+one, ask once:
+
+> "What do you want to scrape? Describe it in one line, e.g. 'top stories
+> on Hacker News' or 'product names + prices on example.com/products'."
+
+Do not ask multiple clarifying questions up front. Any further questions
+go in the prototype path where they're cheaper.
+
+## Step 2 — Refuse mutating intents
+
+If the intent implies writes — verbs like *submit*, *post*, *send*, *log
+in*, *click X*, *fill the form*, *delete*, *create*, *order*, *book* —
+respond:
+
+> "/scrape is read-only. For mutating flows, use /automate (browser-skills
+> Phase 2 P0 in TODOS.md — not yet shipped). Until then, use $B click /
+> $B fill / $B type directly."
+
+Stop. Do not enter the match or prototype path.
+
+## Step 3 — Match phase
+
+List existing browser-skills:
+
+```bash
+$B skill list
+```
+
+For each skill, `$B skill show <name>` exposes the full SKILL.md including
+`triggers:`, `description:`, and `host:`. Read these and judge whether the
+user's intent semantically matches one of them.
+
+A confident match means **all three** are true:
+
+- The intent's domain matches the skill's `host` (or one of its hostnames)
+- A `triggers:` phrase or the `description:` covers the same data the
+  intent asks for
+- The intent does not require args the skill does not declare in `args:`
+
+If matched, parse any `--arg key=value` from the intent (or pass none for
+zero-arg skills) and run:
+
+```bash
+$B skill run <name> [--arg key=value ...]
+```
+
+Emit the JSON the skill prints to stdout. Stop.
+
+If matching is ambiguous (two skills could plausibly fit), pick the
+narrower-tier one (project > global > bundled — `$B skill list` shows the
+tier). If still ambiguous, fall through to the prototype path rather than
+guess wrong.
+
+## Step 4 — Prototype phase
+
+No match. Drive the page using `$B` primitives:
+
+1. `$B goto <url>` — navigate to the target. The user's intent usually
+   names a host or a URL; use it directly.
+2. `$B snapshot --text` (or `$B text`) — get a clean text view of the
+   page to find selectors.
+3. `$B html` — pull the raw HTML when you need to parse structured data
+   (lists, tables, repeated rows).
+4. `$B links` — when the intent is to gather URLs.
+5. Iterate: try a selector, check the output, refine.
+
+Emit the result as JSON on stdout (one document, not pretty-printed).
+Use a stable shape — typically `{ "items": [...], "count": N }` or
+similar — so downstream consumers can treat it as data.
+
+## Step 5 — Skillify nudge
+
+After a successful prototype, append exactly one line:
+
+> "Say /skillify to make this a permanent skill (200ms on next call)."
+
+That is the entire nudge. Do not nag, do not list pros, do not push.
+Proactive surfacing is a Phase 3 knob (`gstack-config browser_skillify_prompts`),
+not this skill's job.
+
+## When the prototype fails
+
+If the page loads but data extraction does not yield a sensible JSON shape
+after 3-4 selector attempts:
+
+- Report what you tried, what came back, and what's blocking (lazy-loaded,
+  JS-rendered, paywalled, etc.).
+- Do NOT write a partial result and call it done.
+- Do NOT suggest /skillify on a broken prototype.
+- Ask the user whether they want to (a) try a different selector, (b)
+  switch to a different page, or (c) stop.
+
+## What this skill does NOT do
+
+- Mutating actions (use /automate when shipped, or $B primitives directly)
+- Auth flows / cookie import (use /setup-browser-cookies first)
+- Multi-page crawls (this is one-shot per call)
+- Anything that requires the daemon to not be running
+
+## Output discipline
+
+The match path returns whatever JSON the matched skill emits. The
+prototype path returns whatever JSON you construct. In both cases:
+
+- One JSON document, on stdout.
+- Stderr (or chat) is for logs and the skillify nudge.
+- Do not embed prose around the JSON in the chat reply unless the user
+  asked for an explanation — many `/scrape` callers pipe the output to
+  `jq`.
+
+{{LEARNINGS_LOG}}