merge: resolve CHANGELOG conflict with main, bump to v0.12.8.0

Main claimed v0.12.7.0 for community PRs + security hardening. Bumped our codex-cwd-bug entry to v0.12.8.0. Both entries preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 04:38:24 +08:00 · 2026-03-26 23:28:18 -06:00
parent af0a26ee1f b343ba2797
commit 9996a9e5cc
22 changed files with 335 additions and 84 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,6 @@
 # Changelog
-## [0.12.7.0] - 2026-03-27 — Codex No Longer Reviews the Wrong Project
+## [0.12.8.0] - 2026-03-27 — Codex No Longer Reviews the Wrong Project
 When you run gstack in Conductor with multiple workspaces open, Codex could silently review the wrong project. The `codex exec -C` flag resolved the repo root inline via `$(git rev-parse --show-toplevel)`, which evaluates in whatever cwd the background shell inherits. In multi-workspace environments, that cwd might be a different project entirely.
@@ -18,6 +18,30 @@ When you run gstack in Conductor with multiple workspaces open, Codex could sile
 - **Regression test** that scans all `.tmpl`, resolver `.ts`, and generated `SKILL.md` files for codex commands using inline `$(git rev-parse --show-toplevel)`. Prevents reintroduction.
 ## [0.12.7.0] - 2026-03-27 — Community PRs + Security Hardening
 Seven community contributions merged, reviewed, and tested. Plus security hardening for telemetry and review logging, and E2E test stability fixes.
 ### Added
 - **Dotfile filtering in skill discovery.** Hidden directories (`.git`, `.vscode`, etc.) are no longer picked up as skill templates.
 - **JSON validation gate in review-log.** Malformed input is rejected instead of appended to the JSONL file.
 - **Telemetry input sanitization.** All string fields are stripped of quotes, backslashes, and control characters before being written to JSONL.
 - **Host-specific co-author trailers.** `/ship` and `/document-release` now use the correct co-author line for Codex vs Claude.
 - **10 new security tests** covering telemetry injection, review-log validation, and dotfile filtering.
 ### Fixed
 - **File paths starting with `./` no longer treated as CSS selectors.** `$B screenshot ./path/to/file.png` now works instead of trying to find a CSS element.
 - **Build chain resilience.** `gen:skill-docs` failure no longer blocks binary compilation.
 - **Update checker fall-through.** After upgrading, the checker now also checks for newer remote versions instead of stopping.
 - **Flaky E2E tests stabilized.** `browse-basic`, `ship-base-branch`, and `review-dashboard-via` tests now pass reliably by extracting only relevant SKILL.md sections instead of copying full 1900-line files into test fixtures.
 - **Removed unreliable `journey-think-bigger` routing test.** Never passed reliably because the routing signal was too ambiguous. 10 other journey tests cover routing with clear signals.
 ### For contributors
 - New CLAUDE.md rule: never copy full SKILL.md files into E2E test fixtures. Extract the relevant section only.
 ## [0.12.6.0] - 2026-03-27 — Sidebar Knows What Page You're On
 The Chrome sidebar agent used to navigate to the wrong page when you asked it to do something. If you'd manually browsed to a site, the sidebar would ignore that and go to whatever Playwright last saw (often Hacker News from the demo). Now it works.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -298,6 +298,30 @@ them. Report progress at each check (which tests passed, which are running, any
 failures so far). The user wants to see the run complete, not a promise that
 you'll check later.
 ## E2E test fixtures: extract, don't copy
 **NEVER copy a full SKILL.md file into an E2E test fixture.** SKILL.md files are
 1500-2000 lines. When `claude -p` reads a file that large, context bloat causes
 timeouts, flaky turn limits, and tests that take 5-10x longer than necessary.
 Instead, extract only the section the test actually needs:
 ```typescript
 // BAD — agent reads 1900 lines, burns tokens on irrelevant sections
 fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
 // GOOD — agent reads ~60 lines, finishes in 38s instead of timing out
 const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
 const start = full.indexOf('## Review Readiness Dashboard');
 const end = full.indexOf('\n---\n', start);
 fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), full.slice(start, end > start ? end : undefined));
 ```
 Also when running targeted E2E tests to debug failures:
 - Run in **foreground** (`bun test ...`), not background with `&` and `tee`
 - Never `pkill` running eval processes and restart — you lose results and waste money
 - One clean run beats three killed-and-restarted runs
 ## Deploying to the active skill
 The active skill lives at `~/.claude/skills/gstack/`. After making changes:
--- a/2
+++ b/2
@@ -1 +1 @@
-0.12.7.0
+0.12.8.0
--- a/bin/gstack-review-log
+++ b/bin/gstack-review-log
@@ -6,4 +6,13 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null)"
 GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
 mkdir -p "$GSTACK_HOME/projects/$SLUG"
-echo "$1" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
+
 # Validate: input must be parseable JSON (reject malformed or injection attempts)
 INPUT="$1"
 if ! printf '%s' "$INPUT" | bun -e "JSON.parse(await Bun.stdin.text())" 2>/dev/null; then
  # Not valid JSON — refuse to append
  echo "gstack-review-log: invalid JSON, skipping" >&2
  exit 1
 fi
 echo "$INPUT" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
--- a/bin/gstack-telemetry-log
+++ b/bin/gstack-telemetry-log
@@ -151,15 +151,23 @@ fi
 # ─── Construct and append JSON ───────────────────────────────
 mkdir -p "$ANALYTICS_DIR"
-# Escape null fields
+# Sanitize string fields for JSON safety (strip quotes, backslashes, control chars)
 json_safe() { printf '%s' "$1" | tr -d '"\\\n\r\t' | head -c 200; }
 SKILL="$(json_safe "$SKILL")"
 OUTCOME="$(json_safe "$OUTCOME")"
 SESSION_ID="$(json_safe "$SESSION_ID")"
 SOURCE="$(json_safe "$SOURCE")"
 EVENT_TYPE="$(json_safe "$EVENT_TYPE")"
 # Escape null fields — sanitize ERROR_CLASS and FAILED_STEP via json_safe()
 ERR_FIELD="null"
-[ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$ERROR_CLASS\""
+[ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$(json_safe "$ERROR_CLASS")\""
 ERR_MSG_FIELD="null"
-[ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(echo "$ERROR_MESSAGE" | head -c 200 | sed 's/"/\\"/g')\""
+[ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(printf '%s' "$ERROR_MESSAGE" | head -c 200 | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/	/\\t/g' | tr '\n\r' '  ')\""
 STEP_FIELD="null"
-[ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(echo "$FAILED_STEP" | head -c 30)\""
+[ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(json_safe "$FAILED_STEP")\""
 # Cap unreasonable durations
 if [ -n "$DURATION" ] && [ "$DURATION" -gt 86400 ] 2>/dev/null; then
--- a/bin/gstack-update-check
+++ b/bin/gstack-update-check
@@ -113,12 +113,11 @@ if [ -f "$MARKER_FILE" ]; then
  OLD="$(cat "$MARKER_FILE" 2>/dev/null | tr -d '[:space:]')"
  rm -f "$MARKER_FILE"
  rm -f "$SNOOZE_FILE"
  mkdir -p "$STATE_DIR"
  echo "UP_TO_DATE $LOCAL" > "$CACHE_FILE"
  if [ -n "$OLD" ]; then
    echo "JUST_UPGRADED $OLD $LOCAL"
  fi
-  exit 0
+  # Don't exit — fall through to remote check in case
  # more updates landed since the upgrade
 fi
 # ─── Step 3: Check cache freshness ──────────────────────────
--- a/browse/src/meta-commands.ts
+++ b/browse/src/meta-commands.ts
@@ -137,7 +137,11 @@ export async function handleMetaCommand(
      // Separate target (selector/@ref) from output path
      for (const arg of remaining) {
-        if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
+        // File paths containing / and ending with an image/pdf extension are never CSS selectors
        const isFilePath = arg.includes('/') && /\.(png|jpe?g|webp|pdf)$/i.test(arg);
        if (isFilePath) {
          outputPath = arg;
        } else if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
          targetSelector = arg;
        } else {
          outputPath = arg;
--- a/browse/test/commands.test.ts
+++ b/browse/test/commands.test.ts
@@ -543,6 +543,17 @@ describe('Visual', () => {
    }
  });
  test('screenshot treats relative dot-slash path as file path, not CSS selector', async () => {
    await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
    // ./path/to/file.png must be treated as output path, not a CSS class selector (#495)
    const relPath = './browse-test-dotpath.png';
    const absPath = path.resolve(relPath);
    const result = await handleMetaCommand('screenshot', [relPath], bm, async () => {});
    expect(result).toContain('Screenshot saved');
    expect(fs.existsSync(absPath)).toBe(true);
    fs.unlinkSync(absPath);
  });
  test('screenshot with nonexistent selector throws timeout', async () => {
    await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
    try {
--- a/browse/test/gstack-update-check.test.ts
+++ b/browse/test/gstack-update-check.test.ts
@@ -92,6 +92,35 @@ describe('gstack-update-check', () => {
    expect(cache).toContain('UP_TO_DATE');
  });
  // ─── Path C2: Just-upgraded marker + newer remote ──────────
  test('just-upgraded marker does not mask newer remote version', () => {
    writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
    writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
    writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.5.0\n');
    const { exitCode, stdout } = run();
    expect(exitCode).toBe(0);
    // Should output both the just-upgraded notice AND the new upgrade
    expect(stdout).toContain('JUST_UPGRADED 0.3.3 0.4.0');
    expect(stdout).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
    // Cache should reflect the upgrade available, not UP_TO_DATE
    const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
    expect(cache).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
  });
  // ─── Path C3: Just-upgraded marker + remote matches local ──
  test('just-upgraded with no further updates writes UP_TO_DATE cache', () => {
    writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
    writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
    writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.4.0\n');
    const { exitCode, stdout } = run();
    expect(exitCode).toBe(0);
    expect(stdout).toBe('JUST_UPGRADED 0.3.3 0.4.0');
    const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
    expect(cache).toContain('UP_TO_DATE');
  });
  // ─── Path D1: Fresh cache, UP_TO_DATE ───────────────────────
  test('exits silently when cache says UP_TO_DATE and is fresh', () => {
    writeFileSync(join(gstackDir, 'VERSION'), '0.3.3\n');
--- a/document-release/SKILL.md.tmpl
+++ b/document-release/SKILL.md.tmpl
@@ -280,7 +280,7 @@ committing.
 git commit -m "$(cat <<'EOF'
 docs: update project documentation for vX.Y.Z.W
-Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+{{CO_AUTHOR_TRAILER}}
 EOF
 )"
 ```
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "gstack",
-  "version": "0.12.7.0",
+  "version": "0.12.8.0",
  "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
  "license": "MIT",
  "type": "module",
@@ -8,7 +8,7 @@
    "browse": "./browse/dist/browse"
  },
  "scripts": {
-    "build": "bun run gen:skill-docs && bun run gen:skill-docs --host codex && bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true",
+    "build": "bun run gen:skill-docs; bun run gen:skill-docs --host codex; bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true",
    "gen:skill-docs": "bun run scripts/gen-skill-docs.ts",
    "dev": "bun run browse/src/cli.ts",
    "server": "bun run browse/src/server.ts",
--- a/scripts/discover-skills.ts
+++ b/scripts/discover-skills.ts
@@ -10,7 +10,7 @@ const SKIP = new Set(['node_modules', '.git', 'dist']);
 function subdirs(root: string): string[] {
  return fs.readdirSync(root, { withFileTypes: true })
-    .filter(d => d.isDirectory() && !SKIP.has(d.name))
+    .filter(d => d.isDirectory() && !d.name.startsWith('.') && !SKIP.has(d.name))
    .map(d => d.name);
 }
--- a/scripts/resolvers/index.ts
+++ b/scripts/resolvers/index.ts
@@ -12,7 +12,7 @@ import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup }
 import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch } from './design';
 import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
 import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './review';
-import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology } from './utility';
+import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer } from './utility';
 export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
  SLUG_EVAL: generateSlugEval,
@@ -44,4 +44,5 @@ export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
  PLAN_COMPLETION_AUDIT_SHIP: generatePlanCompletionAuditShip,
  PLAN_COMPLETION_AUDIT_REVIEW: generatePlanCompletionAuditReview,
  PLAN_VERIFICATION_EXEC: generatePlanVerificationExec,
  CO_AUTHOR_TRAILER: generateCoAuthorTrailer,
 };
--- a/scripts/resolvers/utility.ts
+++ b/scripts/resolvers/utility.ts
@@ -365,3 +365,10 @@ Minimum 0 per category.
 11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
 12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
 }
 export function generateCoAuthorTrailer(ctx: TemplateContext): string {
  if (ctx.host === 'codex') {
    return 'Co-Authored-By: OpenAI Codex <noreply@openai.com>';
  }
  return 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
 }
--- a/ship/SKILL.md.tmpl
+++ b/ship/SKILL.md.tmpl
@@ -464,7 +464,7 @@ Save this summary — it goes into the PR body in Step 8.
 git commit -m "$(cat <<'EOF'
 chore: bump version and changelog (vX.Y.Z.W)
-Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+{{CO_AUTHOR_TRAILER}}
 EOF
 )"
 ```
--- a/test/gen-skill-docs.test.ts
+++ b/test/gen-skill-docs.test.ts
@@ -3,6 +3,7 @@ import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
 import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';
 const ROOT = path.resolve(import.meta.dir, '..');
 const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
@@ -1599,6 +1600,29 @@ describe('setup script validation', () => {
  });
 });
 describe('discover-skills hidden directory filtering', () => {
  test('discoverTemplates skips dot-prefixed directories', () => {
    const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-discover-'));
    try {
      // Create a hidden dir with a template (should be excluded)
      fs.mkdirSync(path.join(tmpDir, '.hidden'), { recursive: true });
      fs.writeFileSync(path.join(tmpDir, '.hidden', 'SKILL.md.tmpl'), '---\nname: evil\n---\ntest');
      // Create a visible dir with a template (should be included)
      fs.mkdirSync(path.join(tmpDir, 'visible'), { recursive: true });
      fs.writeFileSync(path.join(tmpDir, 'visible', 'SKILL.md.tmpl'), '---\nname: good\n---\ntest');
      const { discoverTemplates } = require('../scripts/discover-skills');
      const results = discoverTemplates(tmpDir);
      const dirs = results.map((r: { tmpl: string }) => r.tmpl);
      expect(dirs).toContain('visible/SKILL.md.tmpl');
      expect(dirs).not.toContain('.hidden/SKILL.md.tmpl');
    } finally {
      fs.rmSync(tmpDir, { recursive: true, force: true });
    }
  });
 });
 describe('telemetry', () => {
  test('generated SKILL.md contains telemetry start block', () => {
    const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
--- a/test/helpers/touchfiles.ts
+++ b/test/helpers/touchfiles.ts
@@ -151,7 +151,6 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
  // Skill routing — journey-stage tests (depend on ALL skill descriptions)
  'journey-ideation':       ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
  'journey-plan-eng':       ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
  'journey-think-bigger':   ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
  'journey-debug':          ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
  'journey-qa':             ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
  'journey-code-review':    ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
@@ -276,7 +275,6 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
  // Skill routing — periodic (LLM routing is non-deterministic)
  'journey-ideation': 'periodic',
  'journey-plan-eng': 'periodic',
  'journey-think-bigger': 'periodic',
  'journey-debug': 'periodic',
  'journey-qa': 'periodic',
  'journey-code-review': 'periodic',
--- a/test/review-log.test.ts
+++ b/test/review-log.test.ts
@@ -0,0 +1,77 @@
 import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
 import { execSync, ExecSyncOptionsWithStringEncoding } from 'child_process';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';
 const ROOT = path.resolve(import.meta.dir, '..');
 const BIN = path.join(ROOT, 'bin');
 let tmpDir: string;
 let slugDir: string;
 function run(input: string, opts: { expectFail?: boolean } = {}): { stdout: string; exitCode: number } {
  const execOpts: ExecSyncOptionsWithStringEncoding = {
    cwd: ROOT,
    env: { ...process.env, GSTACK_HOME: tmpDir },
    encoding: 'utf-8',
    timeout: 10000,
  };
  try {
    const stdout = execSync(`${BIN}/gstack-review-log '${input.replace(/'/g, "'\\''")}'`, execOpts).trim();
    return { stdout, exitCode: 0 };
  } catch (e: any) {
    if (opts.expectFail) {
      return { stdout: e.stderr?.toString() || '', exitCode: e.status || 1 };
    }
    throw e;
  }
 }
 beforeEach(() => {
  tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-revlog-'));
  // gstack-review-log uses gstack-slug which needs a git repo — create the projects dir
  // with a predictable slug by pre-creating the directory structure
  slugDir = path.join(tmpDir, 'projects');
  fs.mkdirSync(slugDir, { recursive: true });
 });
 afterEach(() => {
  fs.rmSync(tmpDir, { recursive: true, force: true });
 });
 describe('gstack-review-log', () => {
  test('appends valid JSON to review JSONL file', () => {
    const input = '{"skill":"plan-eng-review","status":"clean"}';
    const result = run(input);
    expect(result.exitCode).toBe(0);
    // Find the JSONL file that was written
    const projectDirs = fs.readdirSync(slugDir);
    expect(projectDirs.length).toBeGreaterThan(0);
    const projectDir = path.join(slugDir, projectDirs[0]);
    const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
    expect(jsonlFiles.length).toBeGreaterThan(0);
    const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
    const parsed = JSON.parse(content);
    expect(parsed.skill).toBe('plan-eng-review');
    expect(parsed.status).toBe('clean');
  });
  test('rejects non-JSON input with non-zero exit code', () => {
    const result = run('not json at all', { expectFail: true });
    expect(result.exitCode).not.toBe(0);
    // Verify nothing was written
    const projectDirs = fs.readdirSync(slugDir);
    if (projectDirs.length > 0) {
      const projectDir = path.join(slugDir, projectDirs[0]);
      const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
      if (jsonlFiles.length > 0) {
        const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
        expect(content).toBe('');
      }
    }
  });
 });
--- a/test/skill-e2e-bws.test.ts
+++ b/test/skill-e2e-bws.test.ts
@@ -45,7 +45,7 @@ describeIfSelected('Skill E2E tests', [
 4. $B screenshot /tmp/skill-e2e-test.png
 Report the results of each command.`,
      workingDirectory: tmpDir,
-      maxTurns: 5,
+      maxTurns: 7,
      timeout: 60_000,
      testName: 'browse-basic',
      runId,
--- a/test/skill-e2e-review.test.ts
+++ b/test/skill-e2e-review.test.ts
@@ -340,21 +340,22 @@ Write your findings to ${dir}/review-output.md`,
    run('git', ['add', 'app.ts'], dir);
    run('git', ['commit', '-m', 'feat: update to v2'], dir);
-    // Copy ship skill
+    // Extract only Step 0 (base branch detection) from ship/SKILL.md
-    fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
+    // (copying the full 1900-line file causes agent context bloat and flaky timeouts)
    const fullShipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
    const step0Start = fullShipSkill.indexOf('## Step 0: Detect platform and base branch');
    const step0End = fullShipSkill.indexOf('## Step 1: Pre-flight');
    const shipSection = fullShipSkill.slice(step0Start, step0End > step0Start ? step0End : undefined);
    fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), shipSection);
    const result = await runSkillTest({
-      prompt: `Read ship-SKILL.md for the ship workflow.
+      prompt: `Read ship-SKILL.md. It contains Step 0 (Detect base branch) from the ship workflow.
-Skip the preamble bash block, lake intro, telemetry, and contributor mode sections — go straight to Step 0.
+Run the base branch detection. Since there is no remote, gh commands will fail — fall back to main.
-Run ONLY Step 0 (Detect base branch) and Step 1 (Pre-flight) from the ship workflow.
+Then run git diff and git log against the detected base branch.
 Since there is no remote, gh commands will fail — fall back to main.
-After completing Step 0 and Step 1, STOP. Do NOT proceed to Step 2 or beyond.
+Write a summary to ${dir}/ship-preflight.md including:
 Do NOT push, create PRs, or modify VERSION/CHANGELOG.
 Write a summary of what you detected to ${dir}/ship-preflight.md including:
 - The detected base branch name
 - The current branch name
 - The diff stat against the base branch`,
@@ -580,8 +581,13 @@ describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'],
    ].join('\n'));
    fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);
-    // Copy ship skill
+    // Extract only the Review Readiness Dashboard section from ship/SKILL.md
-    fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md'));
+    // (copying the full 1900-line file causes agent context bloat and timeouts)
    const fullSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
    const dashStart = fullSkill.indexOf('## Review Readiness Dashboard');
    const dashEnd = fullSkill.indexOf('\n---\n', dashStart);
    const dashSection = fullSkill.slice(dashStart, dashEnd > dashStart ? dashEnd : undefined);
    fs.writeFileSync(path.join(dashDir, 'ship-SKILL.md'), dashSection);
  });
  afterAll(() => {
@@ -605,7 +611,7 @@ Skip the preamble, lake intro, telemetry, and all other ship steps.
 Write the dashboard output to ${dashDir}/dashboard-output.md`,
      workingDirectory: dashDir,
      maxTurns: 12,
-      timeout: 90_000,
+      timeout: 180_000,
      testName: 'review-dashboard-via',
      runId,
    });
@@ -639,7 +645,7 @@ Write the dashboard output to ${dashDir}/dashboard-output.md`,
    );
    // Ship dashboard should not gate when eng review is clear
    expect(gateQuestions).toHaveLength(0);
-  }, 120_000);
+  }, 240_000);
 });
 // Module-level afterAll — finalize eval collector after all tests complete
--- a/test/skill-routing-e2e.test.ts
+++ b/test/skill-routing-e2e.test.ts
@@ -250,56 +250,10 @@ describeE2E('Skill Routing E2E — Developer Journey', () => {
    }
  }, 150_000);
-  testIfSelected('journey-think-bigger', async () => {
+  // Removed: journey-think-bigger
-    const tmpDir = createRoutingWorkDir('think-bigger');
+  // Tested ambiguous routing ("think bigger" → plan-ceo-review) but Claude
-    try {
+  // legitimately answers directly instead of routing. Never passed reliably.
-      fs.writeFileSync(path.join(tmpDir, 'plan.md'), `# Waitlist App Architecture
+  // The other 10 journey tests cover routing with clear signals.
 ## Components
 - REST API (Express.js)
 - PostgreSQL database
 - React frontend
 - SMS integration (Twilio)
 ## Data Model
 - restaurants (id, name, settings)
 - parties (id, restaurant_id, name, size, phone, status, created_at)
 - wait_estimates (id, restaurant_id, avg_wait_minutes)
 ## API Endpoints
 - POST /api/parties - add party to waitlist
 - GET /api/parties - list current waitlist
 - PATCH /api/parties/:id/status - update party status
 - GET /api/estimate - get current wait estimate
 `);
      spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
      spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
      const testName = 'journey-think-bigger';
      const expectedSkill = 'plan-ceo-review';
      const result = await runSkillTest({
        prompt: "Actually, looking at this plan again, I feel like we're thinking too small. We're just doing waitlists but what about the whole restaurant guest experience? Is there a bigger opportunity here we should go after?",
        workingDirectory: tmpDir,
        maxTurns: 5,
        allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'],
        timeout: 120_000,
        testName,
        runId,
      });
      const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill');
      const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined;
      logCost(`journey: ${testName}`, result);
      recordRouting(testName, result, expectedSkill, actualSkill);
      expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0);
      const validSkills = ['plan-ceo-review', 'office-hours'];
      expect(validSkills, `Expected one of ${validSkills.join('/')} but got ${actualSkill}`).toContain(actualSkill);
    } finally {
      fs.rmSync(tmpDir, { recursive: true, force: true });
    }
  }, 180_000);
  testIfSelected('journey-debug', async () => {
    const tmpDir = createRoutingWorkDir('debug');
--- a/test/telemetry.test.ts
+++ b/test/telemetry.test.ts
@@ -125,6 +125,82 @@ describe('gstack-telemetry-log', () => {
    expect(events[0]).toHaveProperty('_branch');
  });
  // ─── json_safe() injection prevention tests ────────────────
  test('sanitizes skill name with quote injection attempt', () => {
    setConfig('telemetry', 'anonymous');
    run(`${BIN}/gstack-telemetry-log --skill 'review","injected":"true' --duration 10 --outcome success --session-id inj-1`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    // Must be valid JSON (no injection — quotes stripped, so no field injection possible)
    const event = JSON.parse(lines[0]);
    // The key check: no injected top-level property was created
    expect(event).not.toHaveProperty('injected');
    // Skill field should have quotes stripped but content preserved
    expect(event.skill).not.toContain('"');
  });
  test('truncates skill name exceeding 200 chars', () => {
    setConfig('telemetry', 'anonymous');
    const longSkill = 'a'.repeat(250);
    run(`${BIN}/gstack-telemetry-log --skill '${longSkill}' --duration 10 --outcome success --session-id trunc-1`);
    const events = parseJsonl();
    expect(events[0].skill.length).toBeLessThanOrEqual(200);
  });
  test('sanitizes outcome with newline injection attempt', () => {
    setConfig('telemetry', 'anonymous');
    // Use printf to pass actual newline in the argument
    run(`bash -c 'OUTCOME=$(printf "success\\nfake\\":\\"true"); ${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome "$OUTCOME" --session-id inj-2'`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    const event = JSON.parse(lines[0]);
    expect(event).not.toHaveProperty('fake');
  });
  test('sanitizes session_id with backslash-quote injection', () => {
    setConfig('telemetry', 'anonymous');
    run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id 'id\\\\"","x":"y'`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    const event = JSON.parse(lines[0]);
    expect(event).not.toHaveProperty('x');
  });
  test('sanitizes error_class with quote injection', () => {
    setConfig('telemetry', 'anonymous');
    run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-class 'timeout","extra":"val' --session-id inj-3`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    const event = JSON.parse(lines[0]);
    expect(event).not.toHaveProperty('extra');
  });
  test('sanitizes failed_step with quote injection', () => {
    setConfig('telemetry', 'anonymous');
    run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --failed-step 'step1","hacked":"yes' --session-id inj-4`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    const event = JSON.parse(lines[0]);
    expect(event).not.toHaveProperty('hacked');
  });
  test('escapes error_message quotes and preserves content', () => {
    setConfig('telemetry', 'anonymous');
    run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'Error: file "test.txt" not found' --session-id inj-5`);
    const lines = readJsonl();
    expect(lines).toHaveLength(1);
    const event = JSON.parse(lines[0]);
    expect(event.error_message).toContain('file');
    expect(event.error_message).toContain('not found');
  });
  test('creates analytics directory if missing', () => {
    // Remove analytics dir
    const analyticsDir = path.join(tmpDir, 'analytics');