Two fixes for E2E test reliability:
1. session-runner.ts: error_max_turns was misclassified as error_api
because is_error flag was checked before subtype. Now known subtypes
like error_max_turns are preserved even when is_error is set. The
is_error override only applies when subtype=success (API failure).
2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid
deleting worktrees from concurrent test runs still in progress.
Previously any second test execution would kill the first's worktrees.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: pair-agent was added without completing the gen-skill-docs
compliance checklist. All 16 failures traced back to this.
Fixes:
- Sync package.json version to VERSION (0.15.9.0)
- Add "(gstack)" to pair-agent description for discoverability
- Add pair-agent to Codex path exception (legitimately documents ~/.codex/)
- Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist
- Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.)
- Update golden file baselines for ship skill
- Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they
use the fast mock install instead of scanning real ~/.claude/skills/gstack
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: review army idempotency + cross-review dedup resolver
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: ship re-run executes all checks, adds review army + dedup
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: regression guards for ship specialist dispatch + dedup + idempotency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.10.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instructs remote agents to treat content inside untrusted envelopes
as potentially malicious. Lists common injection phrases to watch for.
Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS
section, not from page content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scoped tokens get a split snapshot: trusted @refs section (for click/fill)
separated from untrusted web content in an envelope. Ref names truncated
to 50 chars in trusted section. Root tokens unchanged (backward compat).
Resume command also uses split format for scoped tokens.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detects CSS-hidden elements (opacity, font-size, off-screen, same-color,
clip-path) and ARIA label injection patterns. Marks elements with
data-gstack-hidden, extracts text from a clean clone (no DOM mutation),
then removes markers. Only active for scoped tokens on text command.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chain subcommands now route through handleCommandInternal for full security
enforcement (scope, domain, tab ownership, rate limiting, content wrapping).
Adds recursion guard for nested chains, rate-limit exemption for chain
subcommands, and activity event suppression (1 event per chain, not per sub).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add 4 native OpenClaw skills for ClaHub publishing
Hand-crafted methodology skills for the OpenClaw wintermute workspace:
- gstack-openclaw-office-hours (375 lines) — 6 forcing questions, startup + builder modes
- gstack-openclaw-ceo-review (193 lines) — 4 scope modes, 18 cognitive patterns
- gstack-openclaw-investigate (136 lines) — Iron Law, 4-phase debugging
- gstack-openclaw-retro (301 lines) — git analytics, per-person praise/growth
Pure methodology, no gstack infrastructure. All frontmatter uses single-line
inline JSON for OpenClaw parser compatibility.
* feat: add AGENTS.md dispatch section with behavioral rules
Ready-to-paste section for OpenClaw AGENTS.md with 3 iron-clad rules:
1. Always spawn sessions, never redirect user to Claude Code
2. Resolve repo path or ask, don't punt
3. Autoplan runs end-to-end, reports back in chat
Includes full dispatch routing (Simple/Medium/Heavy/Full/Plan tiers).
* chore: clear OpenClaw includeSkills — native skills replace generated
Native ClaHub skills replace the gen-skill-docs pipeline output for
these 4 skills. Updated test to validate empty includeSkills array.
* docs: ClaHub install instructions + dispatch routing rules
- README: add Native OpenClaw Skills section with clawhub install command
- OPENCLAW.md: update dispatch routing with behavioral rules, update
native skills section to reference ClaHub
* chore: bump version and changelog (v0.15.10.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add gstack-upgrade to OpenClaw dispatch routing
Ensures "upgrade gstack" routes to a Claude Code session with
/gstack-upgrade instead of Wintermute trying to handle it conversationally.
* fix: stop tracking 58MB compiled binary bin/gstack-global-discover
Already in .gitignore but was tracked due to historical mistake.
Same issue as browse/dist/ and design/dist/. The .ts source is right
next to it and ./setup builds from source for every platform.
* test: detect compiled binaries and large files tracked by git
Two new tests in skill-validation:
- No Mach-O or ELF binaries tracked (catches accidental git add of compiled output)
- No files over 2MB tracked (catches bloated binaries sneaking in)
Both print the exact git rm --cached command to fix the issue.
* fix: ClaHub → ClawHub (correct spelling)
* docs: add ClawHub publishing instructions to CLAUDE.md
Documents the clawhub publish command (not clawhub skill publish),
auth flow, version bumping, and verification.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OpenClaw section promoted to peer of Claude Code install. Single copy-paste
prompt that installs gstack for Claude Code AND wires up AGENTS.md dispatch.
Usage table shows what happens when you talk naturally. Other agents collapsed
from repeated git clones into one auto-detect command + table. Voice input
moved after 10-15 parallel sprints section.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add includeSkills to HostConfig + update OpenClaw config
Add includeSkills allowlist field with union logic (include minus skip).
Update OpenClaw to generate only 4 native methodology skills (office-hours,
plan-ceo-review, investigate, retro). Remove staticFiles.SOUL.md reference
(pointed to non-existent file).
* feat: OpenClaw integration — gstack-lite/full generation + spawned session detection
Add includeSkills filter to gen-skill-docs pipeline. Generate gstack-lite
(planning discipline for spawned coding sessions) and gstack-full (complete
feature pipeline) for OpenClaw host. Add OPENCLAW_SESSION env var detection
in preamble for spawned session auto-detect. Update setup --host openclaw
to print redirect message.
* docs: OpenClaw architecture doc + regenerate all SKILL.md with spawned session detection
Add docs/OPENCLAW.md with 4-tier dispatch routing and integration architecture.
Generate gstack-lite and gstack-full prompt templates. Regenerate all SKILL.md
files with OPENCLAW_SESSION env var check in preamble.
* test: update golden baselines + OpenClaw includeSkills tests
Update golden SKILL.md baselines for preamble SPAWNED_SESSION change.
Replace staticFiles SOUL.md test with includeSkills validation.
* chore: bump version and changelog (v0.15.9.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove all Wintermute references from source files
Replace with generic "orchestrator" or "OpenClaw" as appropriate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Plan dispatch tier — full review gauntlet for Claude Code project planning
New gstack-plan template chains /office-hours → /autoplan (CEO + eng + design + DX
+ codex adversarial), saves the reviewed plan, and reports back to the orchestrator.
The orchestrator persists the plan link to its own memory store. 5 tiers now:
Simple, Medium, Heavy, Full, Plan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: friendly OpenAI org error on all design commands
Previously only generate.ts showed a user-friendly message when the
OpenAI org wasn't verified. Now evolve, iterate, variants, and check
all detect the 403 + "organization must be verified" pattern and show
a clear message with the correct verification URL.
* test: regression test for >128KB Codex session_meta
Documents the current 128KB buffer limitation. When Codex embeds
session_meta beyond 128KB, this test will fail, signaling the need
for a streaming parse or larger buffer.
* chore: bump version and changelog (v0.15.8.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Each skill gets a real narrative paragraph explaining the workflow,
not just a table cell. design-shotgun: visual exploration with taste
memory. design-html: production HTML with Pretext computed layout.
pair-agent: cross-vendor AI agent coordination through shared browser.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lead with what it does for the user: type /pair-agent, paste into
your other agent, done. First time AI agents from different companies
can coordinate through a shared browser with real security boundaries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Chain command now pre-validates ALL subcommand scopes before
executing any. A read+meta token can no longer escalate to
admin via chain (eval, js, cookies were dispatched without
scope checks). tokenInfo flows through handleMetaCommand into
the chain handler. Rejects entire chain if any subcommand fails.
2. /health strips sensitive fields (currentUrl, agent.currentMessage,
session) when tunnel is active. Only operational metadata (status,
mode, uptime, tabs) exposed to the internet. Previously anyone
reaching the ngrok URL could surveil browsing activity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When pair-agent detects headless mode, it auto-switches to headed
(visible Chromium window) so the user can watch what the remote
agent does. Use --headless to skip this. Fixed compiled binary
path resolution (process.execPath, not process.argv[1] which is
virtual /$bunfs/ in Bun compiled binaries).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The blanket validateAuth() gate (root-only) sat above the /command
endpoint, rejecting all scoped tokens with 401 before they reached
getTokenInfo(). Moved /command above the gate so both root and
scoped tokens are accepted. This was the bug Wintermute hit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added CRITICAL instruction: the agent MUST output the full instruction
block so the user can copy it. Previously the agent could summarize
over it, leaving the user with nothing to paste.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pair-agent now auto-starts the ngrok tunnel without restarting the
server. New POST /tunnel/start endpoint reads authtoken from env,
~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok
availability and calls the endpoint automatically. Zero manual steps
when ngrok is installed and authed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pair-agent command now checks ngrok's native config (not just
~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is
available. The skill template walks users through ngrok install
and auth if not set up, instead of just printing a dead localhost
URL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The paste-into-agent instruction block now teaches the snapshot→@ref
workflow (the most powerful browsing pattern), shows the server URL
prominently, and uses clearer formatting. Tests updated to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full API reference, snapshot→@ref pattern, scopes, tab isolation,
error codes, ngrok setup, and same-machine shortcuts. The instruction
block points here for deeper reading.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users remember /pair-agent, not $B pair-agent. The skill walks through
agent selection (OpenClaw, Hermes, Codex, Cursor, generic), local vs
remote setup, tunnel configuration, and includes platform-specific
notes for each agent type. Wraps the CLI command with context.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 tests covering tab ownership lifecycle (access checks, unowned
tabs, transferTab) and instruction block generator (scopes, URLs,
admin flag, troubleshooting section). Fix server-auth test that
used fragile sliceBetween boundaries broken by new endpoints.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
One command to pair a remote agent: $B pair-agent. Creates a setup
key via POST /pair, prints a copy-pasteable instruction block with
curl commands. Smart tunnel fallback (tunnel URL > auto-start >
localhost). Flags: --for HOST, --local HOST, --admin, --client NAME.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server-side tab ownership check blocks scoped agents from writing to
unowned tabs. Special-case newtab records ownership for scoped tokens.
POST /pair endpoint creates setup keys for the pairing ceremony.
Activity events now include clientId for attribution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add per-tab ownership tracking to BrowserManager. Scoped agents
must create their own tab via newtab before writing. Unowned tabs
(pre-existing, user-opened) are root-only for writes. Read access
always allowed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add test_stub optional field to specialist finding schema
All specialist prompts now document test_stub as an optional output field,
enabling specialists to suggest test code alongside findings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: adaptive gating + test framework detection for review army
Adds gstack-specialist-stats binary for tracking specialist hit rates.
Resolver now detects test framework for test_stub generation, applies
adaptive gating to skip silent specialists, and compiles per-specialist
stats for the review-log entry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cross-review finding dedup + test stub override + enriched review-log
Step 5.0 suppresses findings previously skipped by the user when the
relevant code hasn't changed. Test stub findings force ASK classification
so users approve test creation. Review-log now includes quality_score,
per-specialist stats, and per-finding action records.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.2.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: bash operator precedence in test framework detection
[ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the
assignment only runs when the second test passes. Wrap the OR group
in braces: { [ -f a ] || [ -f b ]; } && X="y".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: DNS rebinding protection checks AAAA (IPv6) records too
Cherry-pick PR #744 by @Gonzih. Closes the IPv6-only DNS rebinding gap
by checking both A and AAAA records independently.
Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: validateOutputPath symlink bypass — resolve real path before safe-dir check
Cherry-pick PR #745 by @Gonzih. Adds a second pass using fs.realpathSync()
to resolve symlinks after lexical path validation.
Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: validate saved URLs before navigation in restoreState
Cherry-pick PR #751 by @Gonzih. Prevents navigation to cloud metadata
endpoints or file:// URIs embedded in user-writable state files.
Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: telemetry-ingest uses anon key instead of service role key
Cherry-pick PR #750 by @Gonzih. The service role key bypasses RLS and
grants unrestricted database access — anon key + RLS is the right model
for a public telemetry endpoint.
Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: killAgent() actually kills the sidebar claude subprocess
Cherry-pick PR #743 by @Gonzih. Implements cross-process kill signaling
via kill-file + polling pattern, tracks active processes per-tab.
Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(design): bind server to localhost and validate reload paths
Cherry-pick PR #803 by @garagon. Adds hostname: '127.0.0.1' to Bun.serve()
and validates /api/reload paths are within cwd() or tmpdir(). Closes C1+C2
from security audit #783.
Co-Authored-By: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add auth gate to /inspector/events SSE endpoint (C3)
The /inspector/events endpoint had no authentication, unlike /activity/stream
which validates tokens. Now requires the same Bearer header or ?token= query
param check. Closes C3 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sanitize design feedback with trust boundary markers (C4+H5)
Wrap user feedback in <user-feedback> XML markers with tag escaping to
prevent prompt injection via malicious feedback text. Cap accumulated
feedback to last 5 iterations to limit incremental poisoning.
Closes C4 and H5 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: harden file/directory permissions to owner-only (C5+H9+M9+M10)
Add mode 0o700 to all mkdirSync calls for state/session directories.
Add mode 0o600 to all writeFileSync calls for session.json, chat.jsonl,
and log files. Add umask 077 to setup script. Prevents auth tokens, chat
history, and browser logs from being world-readable on multi-user systems.
Closes C5, H9, M9, M10 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: TOCTOU race in setup symlink creation (C6)
Remove the existence check before mkdir -p (it's idempotent) and validate
the target isn't already a symlink before creating the link. Prevents a
local attacker from racing between the check and mkdir to redirect
SKILL.md writes. Closes C6 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove CORS wildcard, restrict to localhost (H1)
Replace Access-Control-Allow-Origin: * with http://127.0.0.1 on sidebar
tab/chat endpoints. The Chrome extension uses manifest host_permissions
to bypass CORS entirely, so this only blocks malicious websites from
making cross-origin requests. Closes H1 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: make cookie picker auth mandatory (H2)
Remove the conditional if(authToken) guard that skipped auth when
authToken was undefined. Now all cookie picker data/action routes
reject unauthenticated requests. Closes H2 from security audit #783.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gate /health token on chrome-extension Origin header
Only return the auth token in /health response when the request Origin
starts with chrome-extension://. The Chrome extension always sends this
origin via manifest host_permissions. Regular HTTP requests (including
tunneled ones from ngrok/SSH) won't get the token. The extension also
has a fallback path through background.js that reads the token from the
state file directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: update server-auth test for chrome-extension Origin gating
The test previously checked for 'localhost-only' comment. Now checks for
'chrome-extension://' since the token is gated on Origin header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.7.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Gonzih <gonzih@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: garagon <garagon@users.noreply.github.com>
* feat: anti-skip rule for all review skills
Review skills sometimes skip sections when reviewing strategy or spec
plans. This adds an explicit anti-skip rule to CEO (1-11), eng (1-4),
design (1-7), and DX (1-8) review skills. Also fixes CEO header from
"10 sections" to "11 sections" to match actual count.
* chore: bump version and changelog (v0.15.6.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: self-healing gstack-relink after setup to prevent skill prefix drift
Setup now runs gstack-relink as a final consistency check after linking
Claude skills. This independently reads skill_prefix from config and
ensures name: fields and directory names match, catching cases where
interrupted setups or stale state left skills incorrectly prefixed.
* chore: bump version and changelog (v0.15.6.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve().
Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads
NGROK_DOMAIN for dedicated domain (stable URL). Updates state
file with tunnel URL. Feasibility spike confirmed: SDK works in
compiled Bun binary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server changes for multi-agent browser access:
- /connect endpoint: setup key exchange for /pair-agent ceremony
- /token endpoint: root-only minting of scoped sub-tokens
- /token/:clientId DELETE: revoke agent tokens
- /agents endpoint: list connected agents (root-only)
- /health: strips root token when tunnel is active (P0 security fix)
- /command: scope/rate/domain checks via token registry before dispatch
- Idle timer skips shutdown when tunnel is active
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add golden-file baselines for host config refactor
Snapshot generated SKILL.md output for ship skill across all 3 existing
hosts (Claude, Codex, Factory). These baselines verify the config-driven
refactor produces identical output to the current hardcoded system.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add HostConfig interface and validator for declarative host system
New scripts/host-config.ts defines the typed HostConfig interface that
captures all per-host variation: paths, frontmatter rules, path/tool
rewrites, suppressed resolvers, runtime root symlinks, install strategy,
and behavioral config (co-author trailer, learnings mode, boundary
instruction). Includes validateHostConfig() and validateAllConfigs() with
regex-based security validation and cross-config uniqueness checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add typed host configs for Claude, Codex, Factory, and Kiro
Extract all hardcoded host-specific values from gen-skill-docs.ts,
types.ts, preamble.ts, review.ts, and setup into typed HostConfig
objects. Each host is a single file in hosts/ with its paths, frontmatter
rules, path/tool rewrites, runtime root manifest, and install behavior.
hosts/index.ts exports all configs, derives the Host type, and provides
resolveHostArg() for CLI alias handling (e.g., 'agents' -> 'codex',
'droid' -> 'factory').
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: derive Host type and HOST_PATHS from host configs
types.ts no longer hardcodes host names or paths. The Host type is
derived from ALL_HOST_CONFIGS in hosts/index.ts, and HOST_PATHS is
built dynamically from each config's globalRoot/localSkillRoot/usesEnvVars.
Adding a new host to hosts/index.ts automatically extends the type system.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: gen-skill-docs.ts consumes typed host configs
Replace hardcoded EXTERNAL_HOST_CONFIG, transformFrontmatter host
branches, path/tool rewrite if-chains, and ALL_HOSTS array with
config-driven lookups from hosts/*.ts.
- Host detection uses resolveHostArg() (handles aliases like agents/droid)
- transformFrontmatter uses config's allowlist/denylist mode, extraFields,
conditionalFields, renameFields, and descriptionLimitBehavior
- Path rewrites use config's pathRewrites array (replaceAll, order matters)
- Tool rewrites use config's toolRewrites object
- Skill skipping uses config's generation.skipSkills
- ALL_HOSTS derived from ALL_HOST_NAMES
- Token budget display regex derived from host configs
Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: preamble, co-author trailer, and resolver suppression use host configs
- preamble.ts: hostConfigDir derived from config.globalRoot instead of
hardcoded Record
- utility.ts: generateCoAuthorTrailer reads from config.coAuthorTrailer
instead of host switch statement
- gen-skill-docs.ts: suppressedResolvers from config skip resolver
execution at placeholder replacement time (belt+suspenders with
existing ctx.host checks in individual resolvers)
Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: setup tooling uses config-driven host detection
- host-config-export.ts: new CLI that exposes host configs to bash
(list, get, detect, validate, symlinks commands)
- bin/gstack-platform-detect: reads host configs instead of hardcoded
binary/path mapping
- scripts/skill-check.ts: iterates host configs for skill validation
and freshness checks instead of separate Codex/Factory blocks
- lib/worktree.ts: iterates host configs for directory copy instead
of hardcoded .agents
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add OpenCode, Slate, and Cursor host configs
Three new hosts added to the declarative config system. Each is a typed
HostConfig object with paths, frontmatter rules, and path rewrites.
All generate valid SKILL.md output with zero .claude/skills path leakage.
- hosts/opencode.ts: OpenCode (opencode.ai), skills at ~/.config/opencode/
- hosts/slate.ts: Slate (Random Labs), skills at ~/.slate/
- hosts/cursor.ts: Cursor, skills at ~/.cursor/
- .gitignore: add .kiro/, .opencode/, .slate/, .cursor/, .openclaw/
Zero code changes needed — just config files + re-export in index.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add OpenClaw host config with adapter for tool mapping
OpenClaw gets a hybrid approach: typed config for paths/frontmatter/
detection + a post-processing adapter for semantic tool rewrites.
Config handles: path rewrites, frontmatter (name+description+version),
CLAUDE.md→AGENTS.md, tool name rewrites (Bash→exec, Read→read, etc.),
suppressed resolvers, SOUL.md via staticFiles.
Adapter handles: AskUserQuestion→prose, Agent→sessions_spawn, $B→exec $B.
Zero .claude/skills path leakage. Zero hardcoded tool references remaining.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: contributor add-host skill + fix version sync
- contrib/add-host/SKILL.md.tmpl: contributor-only skill that guides
new host config creation. Lives in contrib/, excluded from user installs.
- package.json: sync version with VERSION file (0.15.2.1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add parameterized host smoke tests for all hosts
35 new tests covering all 7 external hosts (Codex, Factory, Kiro,
OpenCode, Slate, Cursor, OpenClaw). Each host gets 4-5 tests:
- output exists on disk with SKILL.md files
- no .claude/skills path leakage in non-root skills
- frontmatter has name + description fields
- --dry-run freshness check passes
- /codex skill excluded (for hosts with skipSkills: ['codex'])
Tests are parameterized over ALL_HOST_CONFIGS so adding a new host
automatically gets smoke-tested with zero new test code.
Also updates --host all test to verify all registered hosts generate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: 100% coverage for host config system
71 new tests in test/host-config.test.ts covering:
- hosts/index.ts: ALL_HOST_CONFIGS, getHostConfig, resolveHostArg (aliases),
getExternalHosts, uniqueness checks
- host-config.ts validateHostConfig: name regex, displayName, cliCommand,
cliAliases, globalRoot, localSkillRoot, hostSubdir, frontmatter.mode,
linkingStrategy, shell injection attempts, paths with $ and ~
- host-config.ts validateAllConfigs: duplicate name/hostSubdir/globalRoot
detection, error prefix format, real configs pass
- HOST_PATHS derivation: env vars for external hosts, literal paths for
Claude, localSkillRoot matches config, every host has entry
- host-config-export.ts CLI: list, get (string/boolean/array), detect,
validate, symlinks, error cases (missing args, unknown field/host)
- Golden-file regression: claude/codex/factory ship SKILL.md vs baselines
- Individual host config correctness: prefixable, linkingStrategy,
usesEnvVars, description limits, metadata, sidecar, tool rewrites,
conditional fields, suppressed resolvers, boundary instruction,
co-author trailers, skip rules, path rewrites, runtime root assets
Combined with the 35 parameterized smoke tests from gen-skill-docs.test.ts,
total new test coverage for multi-host: 106 tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update golden baselines and sync version after merge from main
Golden files refreshed to match post-merge generated output. package.json
version synced to VERSION file (0.15.4.0).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: sidebar E2E tests now self-contained and passing
- sidebar-url-accuracy: fix stale assertion that expected extensionUrl
in prompt text (prompt format changed, URL is now in pageUrl field)
- sidebar-css-interaction: simplify task from multi-step HN comment
navigation to single-page example.com style injection (faster, more
reliable, still exercises goto + style + completion flow)
- Update golden baselines after merge from main
All 3 sidebar tests now pass: 3/3, 0 fail, ~36s total.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add ADDING_A_HOST.md guide + update docs for multi-host system
- docs/ADDING_A_HOST.md: step-by-step guide for adding a new host
(create config, register, gitignore, generate, test). Covers the
full HostConfig interface, adapter pattern, and validation.
- CONTRIBUTING.md: replace stale "Dual-host development" section with
"Multi-host development" covering all 8 hosts and linking to the guide.
- README.md: consolidate Codex/Factory install sections into one
"Other AI Agents" section listing all supported hosts with auto-detect.
- CLAUDE.md: add hosts/, host-config.ts, host-adapters/, contrib/ to
project structure tree.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: README per-host install instructions for all 8 agents
Each supported agent now has its own copy-paste install block with
the exact command and where skills end up on disk. Includes: auto-detect,
Codex, OpenCode, Cursor, Factory, OpenClaw, Slate, and Kiro.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skill invocation during plan mode takes precedence over generic plan mode
Adds a "Skill Invocation During Plan Mode" section to the preamble resolver so
all generated SKILL.md files include it. Fixes a bug where Claude treats loaded
skill content as reference material instead of executable instructions, and keeps
trying to ExitPlanMode instead of following the skill workflow step by step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: interactive /plan-devex-review with persona, benchmarks, and forcing questions
Complete rewrite of the DX review skill to match CEO/eng review depth. New flow:
investigate (persona, empathy, competitors, magical moment, journey tracing) then
force decisions, then score with evidence. Three modes: DX EXPANSION, DX POLISH,
DX TRIAGE. 20-45 interactive STOP points vs 10-12 before.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: autoplan DX POLISH mode + review log schema for new devex fields
Adds mode selection, persona, competitive, and magical moment override rules to
autoplan Phase 3.5. Documents new review log fields (mode, persona, competitive_tier)
in the plan-file-review-report schema. Syncs package.json version to VERSION.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.15.5.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: CDP inspector module — persistent sessions, CSS cascade, style modification
New browse/src/cdp-inspector.ts with full CDP inspection engine:
- inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel
- modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback
- Persistent CDP session lifecycle (create, reuse, detach on nav, re-create)
- Specificity sorting, overridden property detection, UA rule filtering
- Modification history with undo support
- formatInspectorResult() for CLI output
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI
Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply,
POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE).
CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter
removal), prettyscreenshot (clean screenshot pipeline).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit
Extension changes for the visual CSS inspector:
- inspector.js: element picker with hover highlight, CSS selector generation,
basic mode fallback (getComputedStyle + CSSOM), page alteration handlers
- inspector.css: picker overlay styles (blue highlight + tooltip)
- background.js: inspector message routing (picker <-> server <-> sidepanel)
- sidepanel: Inspector tab with box model viz (gstack palette), matched rules
with specificity badges, computed styles, click-to-edit quick edit,
Send to Agent/Code button, empty/loading/error states
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: document inspect, style, cleanup, prettyscreenshot browse commands
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: auto-track user-created tabs and handle tab close
browser-manager.ts changes:
- context.on('page') listener: automatically tracks tabs opened by the user
(Cmd+T, right-click open in new tab, window.open). Previously only
programmatic newTab() was tracked, so user tabs were invisible.
- page.on('close') handler in wirePageEvents: removes closed tabs from the
pages map and switches activeTabId to the last remaining tab.
- syncActiveTabByUrl: match Chrome extension's active tab URL to the correct
Playwright page for accurate tab identity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: per-tab agent isolation via BROWSE_TAB environment variable
Prevents parallel sidebar agents from interfering with each other's tab context.
Three-layer fix:
- sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process,
per-tab processing set allows concurrent agents across tabs
- cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body
- server.ts: handleCommand() temporarily switches activeTabId when tabId is present,
restores after command completes (safe: Bun event loop is single-threaded)
Also: per-tab agent state (TabAgentState map), per-tab message queuing,
per-tab chat buffers, verbose streaming narration, stop button endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish
Extension changes:
- sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab()
swaps entire chat view, browserTabActivated handler for instant tab sync,
stop button wired to /sidebar-agent/stop, pollTabs renders tab bar
- sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup,
input placeholder "Ask about this page..."
- sidepanel.css: tab bar styles, stop button styles, loading state fixes
- background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel
with tab URL for instant tab switch detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX
sidebar-agent.test.ts (new tests):
- BROWSE_TAB env var passed to claude process
- CLI reads BROWSE_TAB and sends tabId in body
- handleCommand accepts tabId, saves/restores activeTabId
- Tab pinning only activates when tabId provided
- Per-tab agent state, queue, concurrency
- processingTabs set for parallel agents
sidebar-ux.test.ts (new tests):
- context.on('page') tracks user-created tabs
- page.on('close') removes tabs from pages map
- Tab isolation uses BROWSE_TAB not system prompt hack
- Per-tab chat context in sidepanel
- Tab bar rendering, stop button, banner text
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflicts — keep security defenses + per-tab isolation
Merged main's security improvements (XML escaping, prompt injection defense,
allowed commands whitelist, --model opus, Write tool, stderr capture) with
our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set,
no --resume). Updated test expectations for expanded system prompt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.9.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add inspector message types to background.js allowlist
Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing
all inspector message types (startInspector, stopInspector, elementPicked,
pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult).
Messages were silently rejected, making the inspector broken on ALL pages.
Also: separate executeScript and insertCSS into individual try blocks in
injectInspector(), store inspectorMode for routing, and add content.js
fallback when script injection fails (CSP, chrome:// pages).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: basic element picker in content.js for CSP-restricted pages
When inspector.js can't be injected (CSP, chrome:// pages), content.js
provides a basic picker using getComputedStyle + CSSOM:
- startBasicPicker/stopBasicPicker message handlers
- captureBasicData() with ~30 key CSS properties, box model, matched rules
- Hover highlight with outline save/restore (never leaves artifacts)
- Click uses e.target directly (no re-querying by selector)
- Sends inspectResult with mode:'basic' for sidebar rendering
- Escape key cancels picker and restores outlines
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cleanup + screenshot buttons in sidebar inspector toolbar
Two action buttons in the inspector toolbar:
- Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat
notification on success, resets inspector state (element may be removed)
- Screenshot (📸): POSTs screenshot to server, shows spinner, chat
notification with saved file path
Shared infrastructure:
- .inspector-action-btn CSS with loading spinner via ::after pseudo-element
- chat-notification type in addChatEntry() for system messages
- package.json version bump to 0.13.9.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: inspector allowlist, CSP fallback, cleanup/screenshot buttons
16 new tests in sidebar-ux.test.ts:
- Inspector message allowlist includes all inspector types
- content.js basic picker (startBasicPicker, captureBasicData, CSSOM,
outline save/restore, inspectResult with mode basic, Escape cleanup)
- background.js CSP fallback (separate try blocks, inspectorMode, fallback)
- Cleanup button (POST /command, inspector reset after success)
- Screenshot button (POST /command, notification rendering)
- Chat notification type and CSS styles
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for v0.13.9.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cleanup + screenshot buttons in chat toolbar (not just inspector)
Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat
input, always visible. Both inspector and chat buttons share runCleanup() and
runScreenshot() helper functions. Clicking either set shows loading state on
both simultaneously.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: chat toolbar buttons, shared helpers, quick-action-btn styles
Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn,
quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading),
shared runCleanup/runScreenshot helper functions, and cleanup inspector reset.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal
Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and
Readability.js research:
- ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.)
- cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns
- overlays (NEW): paywalls, newsletter popups, interstitials, push prompts,
app download banners, survey modals
- social: follow prompts, share tools
- Cleanup now defaults to --all when no args (sidebar button fix)
- Uses !important on all display:none (overrides inline styles)
- Unlocks body/html scroll (overflow:hidden from modal lockout)
- Removes blur/filter effects (paywall content blur)
- Removes max-height truncation (article teaser truncation)
- Collapses empty ad placeholder whitespace (empty divs after ad removal)
- Skips gstack-ctrl indicator in sticky removal
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable action buttons when disconnected, no error spam
- setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot
buttons (both chat toolbar and inspector toolbar)
- Called with false in updateConnection when server URL is null
- Called with true when connection established
- runCleanup/runScreenshot silently return when disconnected instead of
showing 'Not connected' error notifications
- CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: cleanup heuristics, button disabled state, overlay selectors
17 new tests:
- cleanup defaults to --all on empty args
- CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial)
- Major ad networks in selectors (doubleclick, taboola, criteo, etc.)
- Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast)
- !important override for inline styles
- Scroll unlock (body overflow:hidden)
- Blur removal (paywall content blur)
- Article truncation removal (max-height)
- Empty placeholder collapse
- gstack-ctrl indicator skip in sticky cleanup
- setActionButtonsEnabled function
- Buttons disabled when disconnected
- No error spam from cleanup/screenshot when disconnected
- CSS disabled styles for action buttons
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: LLM-based page cleanup — agent analyzes page semantically
Instead of brittle CSS selectors, the cleanup button now sends a prompt to
the sidebar agent (which IS an LLM). The agent:
1. Runs deterministic $B cleanup --all as a quick first pass
2. Takes a snapshot to see what's left
3. Analyzes the page semantically to identify remaining clutter
4. Removes elements intelligently, preserving site branding
This means cleanup works correctly on any site without site-specific selectors.
The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is
junk, but the SF Chronicle masthead should stay.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: aggressive cleanup heuristics + preserve top nav bar
Deterministic cleanup improvements (used as first pass before LLM analysis):
- New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games,
recirculation widgets (taboola, outbrain, nativo), cross-promotion banners
- Text-content detection: removes "ADVERTISEMENT", "Article continues below",
"Sponsored", "Paid content" labels and their parent wrappers
- Sticky fix: preserves the topmost full-width element near viewport top (site
nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical
position, preserves the first one that spans >80% viewport width.
Tests: clutter category, ad label removal, nav bar preservation logic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: LLM-based cleanup architecture, deterministic heuristics, sticky nav
22 new tests covering:
- Cleanup button uses /sidebar-command (agent) not /command (deterministic)
- Cleanup prompt includes deterministic first pass + agent snapshot analysis
- Cleanup prompt lists specific clutter categories for agent guidance
- Cleanup prompt preserves site identity (masthead, headline, body, byline)
- Cleanup prompt instructs scroll unlock and $B eval removal
- Loading state management (async agent, setTimeout)
- Deterministic clutter: audio/podcast, games/puzzles, recirculation
- Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues)
- Ad label parent wrapper hiding for small containers
- Sticky nav preservation (sort by position, first full-width near top)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: GStack Browser stealth + branding — anti-bot patches, custom UA, rebrand
- Add GSTACK_CHROMIUM_PATH env var for custom Chromium binary
- Add BROWSE_EXTENSIONS_DIR env var for extension path override
- Move auth token to /health endpoint (fixes read-only .app bundles)
- Anti-bot stealth: disable navigator.webdriver, fake plugins, languages
- Custom user agent: Chrome/<version> GStackBrowser (auto-detects version)
- Rebrand Chromium plist to "GStack Browser" at launch time
- Update security test to match new token-via-health approach
* feat: GStack Browser .app bundle — launcher script + build system
- scripts/app/gstack-browser: dual-mode launcher (dev + .app bundle)
- scripts/build-app.sh: compiles binary, bundles Chromium + extension, creates DMG
- Rebrands Chromium plist during build for "GStack Browser" in menu bar
- 389MB .app, 189MB compressed DMG, launches in ~5s
* docs: GStack Browser V0 master plan — AI-native development browser vision
5-phase roadmap from .app wrapper through Chromium fork, 9 capability
visions, competitive landscape, architecture diagrams, design system.
* fix: restore package.json and sync version to 0.14.3.0
* chore: bump version and changelog (v0.14.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: gitignore top-level dist/ (GStack Browser build output)
* feat: GStack Browser icon — custom .icns replaces Chromium's Dock icon
- Generated 1024px icon: dark terminal window with amber prompt cursor
- Converted to .icns with all macOS sizes (16-1024px, 1x and 2x)
- build-app.sh copies icon into both the outer .app and bundled Chromium's
Resources (Chromium's process owns the Dock icon, not the launcher)
- browser-manager.ts patches Chromium's icon at runtime for dev mode too
- Both the Dock and Cmd+Tab now show the GStack icon
* feat: rename /connect-chrome → /open-gstack-browser
- Rename skill directory + update frontmatter name and description
- Update SKILL.md.tmpl to reference GStack Browser branding/stealth
- Create connect-chrome symlink for backwards compatibility
- Setup script creates /connect-chrome alias in .claude/skills/
- Fix package.json version sync (0.14.5.0 → 0.14.6.0)
* feat: rename /connect-chrome → /open-gstack-browser across all references
Update README skill lists, docs/skills.md deep dive, extension sidepanel
banner copy button, and reconnect clipboard text.
* feat: left-align sidebar UI + extension-ready event for welcome page
- Left-align all sidebar text (chat welcome, loading, empty states,
notifications, inspector empty, session placeholder)
- Dispatch 'gstack-extension-ready' CustomEvent from content.js so
the welcome page can detect when the sidebar is active
* chore: add GStack Browser TODOs — CDP stealth patches + Chromium fork
P1: rebrowser-style postinstall patcher for Playwright 1.58.2 (suppress
Runtime.enable, addBinding context discovery, 6 files, ~200 lines).
P2: long-term Chromium fork for permanent stealth + native sidebar.
* chore: regenerate open-gstack-browser/SKILL.md from template
Fix timeline skill name (connect-chrome → open-gstack-browser) and
preamble formatting from merge with main's updated template.
* feat: welcome page served from browse server on headed launch
- Add /welcome endpoint to server.ts, serves welcome.html
- Navigate to /welcome after server starts (not during launchHeaded,
which runs before the server is listening)
- welcome.html bundled in browse/src/ for portability
* feat: auto-open sidebar on every browser launch, not just first install
- Add top-level setTimeout in background.js that fires on every service
worker startup (onInstalled only fires on install/update)
- Remove misaligned arrow from welcome page, replace with text fallback
that hides when extension content script fires gstack-extension-ready
* fix: sidebar auto-open retry with backoff + welcome page tests
- Replace single-attempt sidePanel.open() with autoOpenSidePanel() that
retries up to 5 times with 500ms-5000ms backoff
- Fire on both onInstalled AND every service worker startup
- Remove misaligned arrow from welcome page, replace with text fallback
- Add 12 tests: welcome page structure, /welcome endpoint, headed launch
navigation timing, sidebar auto-open retry logic, extension-ready event
* feat: reload button in sidebar footer
Adds a "reload" button next to "debug" and "clear" in the sidebar
footer. Calls location.reload() to fully refresh the side panel,
re-run connection logic, and clear stale state.
* feat: right-pointing arrow hint for sidebar on welcome page
Replace invisible text fallback with visible amber bubble + animated
right arrow (→) pointing toward where the sidebar opens. Always correct
regardless of window size (unlike the old up arrow at toolbar chrome).
* fix: sidebar auth race — pass token in getPort response
The sidebar called tryConnect() → getPort → got {port, connected} but
NO token. All subsequent requests (SSE, chat poll) failed with 401.
The token only arrived later via the health broadcast, but by then
the SSE connection was already broken.
Fix: include authToken in the getPort response so the sidebar has
the token from its very first connection attempt.
* feat: sidebar debug visibility + auth race tests
- Show attempt count in loading screen ("Connecting... attempt 3")
- After 5 failed attempts, show debug details (port, connected, token)
so stuck users can see exactly what's failing
- Add 4 tests: getPort includes token, tryConnect uses token,
dead state exists with MAX_RECONNECT_ATTEMPTS, reconnectAttempts visible
* fix: startup health check retries every 1s instead of 10s
Root cause: extension service worker starts before Bun.serve() is
listening. First checkHealth() fails, next attempt is 10 seconds
later. User stares at "Connecting..." for 10 seconds.
Fix: retry every 1s for up to 15 attempts on startup, then switch
to 10s polling once connected (or after 15s gives up). Sidebar
should connect within 1-2 seconds of server becoming available.
3 new tests verify the fast-retry → slow-poll transition.
* feat: detailed step-by-step status in sidebar loading screen
Replace useless "Connecting..." with real-time debug info:
- "Looking for browse server... (attempt N)"
- Shows port, server responding status, token status
- Shows chrome.runtime errors if extension messaging fails
- Tells user to run /open-gstack-browser if server not found
* fix: sidebar connects directly to /health instead of waiting for background
Root cause: sidepanel asked background "are you connected?" but background's
health check hadn't succeeded yet (1-10s gap). Sidepanel waited forever.
Fix: when background says not connected, sidepanel hits /health directly
with fetch(). Gets the token from the response. Bypasses background
entirely for initial connection. Shows step-by-step debug info:
"Checking server directly... port: 34567 / Trying GET /health..."
* fix: suppress fake "session ended" and timeout errors in sidebar
Two issues making the sidebar look broken when it's actually working:
1. "Timed out after 300s" error displayed after agent_done — this is a
cleanup timer, not a real error. Now suppressed when no active session.
2. "(session ended)" text appended on every idle poll — removed entirely.
The thinking spinner is cleaned up silently instead.
* fix: sidebar agent passes BROWSE_PORT to child claude
Ensures the child claude process connects to the existing headed
browse server (port 34567) instead of spawning a new headless one.
Without this, sidebar chat commands run in an invisible browser.
* feat: BROWSE_NO_AUTOSTART prevents sidebar from spawning headless browser
When set, the browse CLI refuses to start a new server and exits with
a clear error: "Server not available, run /open-gstack-browser to restart."
The sidebar agent sets this so users never get an invisible headless
browser when the headed one is closed.
* test: BROWSE_NO_AUTOSTART guard in CLI + sidebar-agent env vars
5 tests: CLI checks env var before starting server, shows actionable
error, sidebar-agent sets the flag + BROWSE_PORT, guard runs before
lock acquisition to prevent stale lock files.
* fix: stale auth token causes Unauthorized + invisible error text
background.js checkHealth() never refreshed authToken from /health responses,
so when the browse server restarted with a new token, all sidebar-command
requests got 401 Unauthorized forever.
Also: error placeholder text was #3f3f46 on #0C0C0C (nearly invisible).
Now shows in red to match the error border.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace 40+ silent catch blocks with debug logging
Every empty catch {} in sidepanel.js, sidebar-agent.ts now logs with
[gstack sidebar] or [sidebar-agent] prefix. Chat poll 401s, stop agent,
tab poll, clear chat, SSE parse, refs fetch, stream JSON parse, queue
read/parse, process kill — all now visible in console.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: noisy debug logging + auto model routing in browse server
Server-side silent catch blocks (22 instances) now log with [browse] prefix:
chat persistence, session save/load, agent kill, tab pin/restore, welcome
page, buffer flush, worktree cleanup, lock files, SSE streams.
Also adds pickSidebarModel() — routes sidebar messages to sonnet for
navigation/interaction (click, goto, fill, screenshot) and opus for
analysis/comprehension (summarize, describe, find bugs). Sonnet is
~4x faster for action commands with zero quality difference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: update sidebar tests for model router + longer stopAgent slice
- stopAgent slice 800→1000 to accommodate added error logging lines
- Replace hardcoded opus assertion with model router assertions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sidebar arrow hint stays visible until sidebar actually opens
Previously the welcome page arrow hid immediately when the extension's
content script loaded — but extension loaded ≠ sidebar open. Now the
signal flow is: sidepanel connects → tells background.js → relays to
content script → dispatches gstack-extension-ready → arrow hides.
Adds welcome-page.test.ts: 14 tests verifying arrow, branding, feature
cards, dark theme, and auto-hide behavior via real HTTP server.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: arrow hide signal chain (4-step) + stale session-ended assertion
8 new tests verify the sidebarOpened → background → content → welcome
signal chain. Updates stale "(session ended)" test that checked for
text removed in a prior commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: preserve optimistic UI during tab switch on first message
When the user sends a message and the server assigns it to a new tab
(because Chrome's active tab changed), switchChatTab() was blowing away
the optimistic user bubble and thinking dots with a welcome screen.
Now preserves the current DOM if we're mid-send with a thinking indicator.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: sidebar message flow architecture doc + CLAUDE.md pointer
SIDEBAR_MESSAGE_FLOW.md documents the full init timeline, message flow
(user types → claude responds), auth token chain, arrow hint signal
chain, model routing, tab concurrency, and known failure modes.
CLAUDE.md now tells you to read it before touching sidebar files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sidebar chat resets idle timer + shutdown kills sidebar-agent
Two fixes for the "browser died while chatting" problem:
1. /sidebar-command now calls resetIdleTimer(). Previously only CLI
commands reset it, so the server would shut down after 30 min even
while the user was actively chatting in the sidebar.
2. shutdown() now pkills the sidebar-agent daemon. Previously the agent
survived server shutdown, kept polling a dead server, and spawned
confused claude processes that auto-started headless browsers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable idle timeout in headed mode — browser lives until closed
The 30-minute idle timeout only applies to headless mode now. In headed
mode the user is looking at the Chrome window, so auto-shutdown is wrong.
The browser stays alive until explicit disconnect or window close.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: cookies button in sidebar footer opens cookie picker
One-click cookie import from the sidebar. Navigates the headed browser
to /cookie-picker where you can select which domains to import from
your real Chrome profile.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update project documentation for GStack Browser improvements
README.md: updated Real browser mode and sidebar agent sections with
model routing, cookie import button, no idle timeout in headed mode.
Updated skill table entries for /browse and /open-gstack-browser.
docs/skills.md: updated /open-gstack-browser deep dive with model
routing and cookie import details.
GSTACK_BROWSER_V0.md: added 6 new SHIPPED items to implementation
status table (model routing, debug logging, idle timeout, cookie
button, arrow hint, architecture doc).
TODOS.md: marked "Sidebar agent Write tool + error visibility" as
SHIPPED. Added new P2 TODO for direct API calls to eliminate
claude -p startup tax.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add Claude Code terminal example to welcome page TRY IT NOW
Fifth example shows the parent agent workflow: navigate, extract CSS,
write to file. The other four are all sidebar-only. This one shows
co-presence — the Claude Code session that launched the browser can
also control it directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: hide internal tool-result file reads from sidebar activity
Claude reads its own ~/.claude/projects/.../tool-results/ files as
internal plumbing. These showed up as long unreadable paths in the
sidebar. Now: describeToolCall returns empty for tool-result reads,
and the sidebar skips rendering tool_use entries with no description.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: collapse tool calls into "See reasoning" disclosure on completion
While the agent is working, tool calls stream live so you can watch
progress. When the agent finishes, all tool calls collapse into a
"See reasoning (N steps)" disclosure. Click to expand and see what
the agent did. The final text answer stays visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: 17 new tests for recent sidebar fixes
Covers: tool-result file filtering, empty tool_use skip, reasoning
disclosure collapse, idle timeout headed mode bypass, sidebar-command
idle reset, shutdown sidebar-agent kill, cookie button, and model
routing analysis-before-action priority.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: move cookies button to quick actions toolbar
Cookies now sits next to Cleanup and Screenshot as a primary action
button (🍪 Cookies) instead of buried in the footer. Same behavior,
more discoverable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add instructional text to cookie picker page
"Select the domains of cookies you want to import to GStack Browser.
You'll be able to browse those sites with the same login as your
other browser."
Also fixes stale test that expected hardcoded '--model', 'opus'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: 6-card welcome page with cookie import + dual-agent cards
3x2 grid layout (was 2x2). New cards: "Import your cookies" (click
🍪 Cookies to import login sessions from Chrome/Arc/Brave) and
"Or use your main agent" (your Claude Code terminal also controls
this browser). Responsive: 3 cols > 2 cols > 1 col.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: move sidebar arrow hint to top-right instead of vertically centered
The arrow was centered vertically which put it behind the feature cards.
Now positioned at top: 80px where there's open space and it's more visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: document /plan-devex-review and /devex-review in README
Add both skills to the install instructions, skills table, and a new
"Which review?" comparison table showing when to use each review type.
* feat: add /plan-devex-review to /autoplan as conditional Phase 3.5
Auto-detects developer-facing scope (API, CLI, SDK, shell, agent, MCP,
OpenClaw) and runs DX review with dual adversarial voices after Eng review.
* chore: bump version and changelog (v0.15.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add DX framework resolver for shared principles and scoring rubric
New {{DX_FRAMEWORK}} resolver provides compact (~150 lines) shared content
for /plan-devex-review and /devex-review: Addy Osmani's 8 DX principles,
7 characteristics table, 10 cognitive patterns, scoring rubric, and TTHW
benchmarks. Hall of Fame examples loaded on-demand per pass to avoid bloat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add DX Review row to review dashboard
Adds plan-devex-review and devex-review schema entries to the review
dashboard resolver and placeholder table in the preamble. All existing
SKILL.md files regenerated to include the new DX Review row.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /plan-devex-review skill — DX plan review with Osmani framework
Plan-stage developer experience review. Rates 8 DX dimensions 0-10:
getting started, API/CLI/SDK design, error messages, docs, upgrade path,
dev environment, community, and DX measurement. Includes developer empathy
simulation, auto-detect product type with applicability gate, DX scorecard
with trend tracking, and a conditional Claude Code Skill DX checklist.
Hall of Fame examples loaded on-demand per pass from dx-hall-of-fame.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /devex-review skill — live DX audit with browse
Live-system developer experience audit using browse tool. Tests all 8
dimensions aligned with /plan-devex-review for boomerang comparison
(plan said 3 min TTHW, reality says 8). Each dimension marked TESTED,
INFERRED, or N/A with evidence. Scope-aware: declares what browse can
and cannot test, falls back to file artifacts for untestable dimensions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.15.3.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>