merge: resolve CHANGELOG conflict with main, bump to v0.12.8.0

Main claimed v0.12.7.0 for community PRs + security hardening.
Bumped our codex-cwd-bug entry to v0.12.8.0. Both entries preserved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-26 23:28:18 -06:00
22 changed files with 335 additions and 84 deletions

View File

@@ -1,6 +1,6 @@
# Changelog # Changelog
## [0.12.7.0] - 2026-03-27 — Codex No Longer Reviews the Wrong Project ## [0.12.8.0] - 2026-03-27 — Codex No Longer Reviews the Wrong Project
When you run gstack in Conductor with multiple workspaces open, Codex could silently review the wrong project. The `codex exec -C` flag resolved the repo root inline via `$(git rev-parse --show-toplevel)`, which evaluates in whatever cwd the background shell inherits. In multi-workspace environments, that cwd might be a different project entirely. When you run gstack in Conductor with multiple workspaces open, Codex could silently review the wrong project. The `codex exec -C` flag resolved the repo root inline via `$(git rev-parse --show-toplevel)`, which evaluates in whatever cwd the background shell inherits. In multi-workspace environments, that cwd might be a different project entirely.
@@ -18,6 +18,30 @@ When you run gstack in Conductor with multiple workspaces open, Codex could sile
- **Regression test** that scans all `.tmpl`, resolver `.ts`, and generated `SKILL.md` files for codex commands using inline `$(git rev-parse --show-toplevel)`. Prevents reintroduction. - **Regression test** that scans all `.tmpl`, resolver `.ts`, and generated `SKILL.md` files for codex commands using inline `$(git rev-parse --show-toplevel)`. Prevents reintroduction.
## [0.12.7.0] - 2026-03-27 — Community PRs + Security Hardening
Seven community contributions merged, reviewed, and tested. Plus security hardening for telemetry and review logging, and E2E test stability fixes.
### Added
- **Dotfile filtering in skill discovery.** Hidden directories (`.git`, `.vscode`, etc.) are no longer picked up as skill templates.
- **JSON validation gate in review-log.** Malformed input is rejected instead of appended to the JSONL file.
- **Telemetry input sanitization.** All string fields are stripped of quotes, backslashes, and control characters before being written to JSONL.
- **Host-specific co-author trailers.** `/ship` and `/document-release` now use the correct co-author line for Codex vs Claude.
- **10 new security tests** covering telemetry injection, review-log validation, and dotfile filtering.
### Fixed
- **File paths starting with `./` no longer treated as CSS selectors.** `$B screenshot ./path/to/file.png` now works instead of trying to find a CSS element.
- **Build chain resilience.** `gen:skill-docs` failure no longer blocks binary compilation.
- **Update checker fall-through.** After upgrading, the checker now also checks for newer remote versions instead of stopping.
- **Flaky E2E tests stabilized.** `browse-basic`, `ship-base-branch`, and `review-dashboard-via` tests now pass reliably by extracting only relevant SKILL.md sections instead of copying full 1900-line files into test fixtures.
- **Removed unreliable `journey-think-bigger` routing test.** Never passed reliably because the routing signal was too ambiguous. 10 other journey tests cover routing with clear signals.
### For contributors
- New CLAUDE.md rule: never copy full SKILL.md files into E2E test fixtures. Extract the relevant section only.
## [0.12.6.0] - 2026-03-27 — Sidebar Knows What Page You're On ## [0.12.6.0] - 2026-03-27 — Sidebar Knows What Page You're On
The Chrome sidebar agent used to navigate to the wrong page when you asked it to do something. If you'd manually browsed to a site, the sidebar would ignore that and go to whatever Playwright last saw (often Hacker News from the demo). Now it works. The Chrome sidebar agent used to navigate to the wrong page when you asked it to do something. If you'd manually browsed to a site, the sidebar would ignore that and go to whatever Playwright last saw (often Hacker News from the demo). Now it works.

View File

@@ -298,6 +298,30 @@ them. Report progress at each check (which tests passed, which are running, any
failures so far). The user wants to see the run complete, not a promise that failures so far). The user wants to see the run complete, not a promise that
you'll check later. you'll check later.
## E2E test fixtures: extract, don't copy
**NEVER copy a full SKILL.md file into an E2E test fixture.** SKILL.md files are
1500-2000 lines. When `claude -p` reads a file that large, context bloat causes
timeouts, flaky turn limits, and tests that take 5-10x longer than necessary.
Instead, extract only the section the test actually needs:
```typescript
// BAD — agent reads 1900 lines, burns tokens on irrelevant sections
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
// GOOD — agent reads ~60 lines, finishes in 38s instead of timing out
const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
const start = full.indexOf('## Review Readiness Dashboard');
const end = full.indexOf('\n---\n', start);
fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), full.slice(start, end > start ? end : undefined));
```
Also when running targeted E2E tests to debug failures:
- Run in **foreground** (`bun test ...`), not background with `&` and `tee`
- Never `pkill` running eval processes and restart — you lose results and waste money
- One clean run beats three killed-and-restarted runs
## Deploying to the active skill ## Deploying to the active skill
The active skill lives at `~/.claude/skills/gstack/`. After making changes: The active skill lives at `~/.claude/skills/gstack/`. After making changes:

View File

@@ -1 +1 @@
0.12.7.0 0.12.8.0

View File

@@ -6,4 +6,13 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null)" eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null)"
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}" GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
mkdir -p "$GSTACK_HOME/projects/$SLUG" mkdir -p "$GSTACK_HOME/projects/$SLUG"
echo "$1" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
# Validate: input must be parseable JSON (reject malformed or injection attempts)
INPUT="$1"
if ! printf '%s' "$INPUT" | bun -e "JSON.parse(await Bun.stdin.text())" 2>/dev/null; then
# Not valid JSON — refuse to append
echo "gstack-review-log: invalid JSON, skipping" >&2
exit 1
fi
echo "$INPUT" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"

View File

@@ -151,15 +151,23 @@ fi
# ─── Construct and append JSON ─────────────────────────────── # ─── Construct and append JSON ───────────────────────────────
mkdir -p "$ANALYTICS_DIR" mkdir -p "$ANALYTICS_DIR"
# Escape null fields # Sanitize string fields for JSON safety (strip quotes, backslashes, control chars)
json_safe() { printf '%s' "$1" | tr -d '"\\\n\r\t' | head -c 200; }
SKILL="$(json_safe "$SKILL")"
OUTCOME="$(json_safe "$OUTCOME")"
SESSION_ID="$(json_safe "$SESSION_ID")"
SOURCE="$(json_safe "$SOURCE")"
EVENT_TYPE="$(json_safe "$EVENT_TYPE")"
# Escape null fields — sanitize ERROR_CLASS and FAILED_STEP via json_safe()
ERR_FIELD="null" ERR_FIELD="null"
[ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$ERROR_CLASS\"" [ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$(json_safe "$ERROR_CLASS")\""
ERR_MSG_FIELD="null" ERR_MSG_FIELD="null"
[ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(echo "$ERROR_MESSAGE" | head -c 200 | sed 's/"/\\"/g')\"" [ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(printf '%s' "$ERROR_MESSAGE" | head -c 200 | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/ /\\t/g' | tr '\n\r' ' ')\""
STEP_FIELD="null" STEP_FIELD="null"
[ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(echo "$FAILED_STEP" | head -c 30)\"" [ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(json_safe "$FAILED_STEP")\""
# Cap unreasonable durations # Cap unreasonable durations
if [ -n "$DURATION" ] && [ "$DURATION" -gt 86400 ] 2>/dev/null; then if [ -n "$DURATION" ] && [ "$DURATION" -gt 86400 ] 2>/dev/null; then

View File

@@ -113,12 +113,11 @@ if [ -f "$MARKER_FILE" ]; then
OLD="$(cat "$MARKER_FILE" 2>/dev/null | tr -d '[:space:]')" OLD="$(cat "$MARKER_FILE" 2>/dev/null | tr -d '[:space:]')"
rm -f "$MARKER_FILE" rm -f "$MARKER_FILE"
rm -f "$SNOOZE_FILE" rm -f "$SNOOZE_FILE"
mkdir -p "$STATE_DIR"
echo "UP_TO_DATE $LOCAL" > "$CACHE_FILE"
if [ -n "$OLD" ]; then if [ -n "$OLD" ]; then
echo "JUST_UPGRADED $OLD $LOCAL" echo "JUST_UPGRADED $OLD $LOCAL"
fi fi
exit 0 # Don't exit — fall through to remote check in case
# more updates landed since the upgrade
fi fi
# ─── Step 3: Check cache freshness ────────────────────────── # ─── Step 3: Check cache freshness ──────────────────────────

View File

@@ -137,7 +137,11 @@ export async function handleMetaCommand(
// Separate target (selector/@ref) from output path // Separate target (selector/@ref) from output path
for (const arg of remaining) { for (const arg of remaining) {
if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) { // File paths containing / and ending with an image/pdf extension are never CSS selectors
const isFilePath = arg.includes('/') && /\.(png|jpe?g|webp|pdf)$/i.test(arg);
if (isFilePath) {
outputPath = arg;
} else if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
targetSelector = arg; targetSelector = arg;
} else { } else {
outputPath = arg; outputPath = arg;

View File

@@ -543,6 +543,17 @@ describe('Visual', () => {
} }
}); });
test('screenshot treats relative dot-slash path as file path, not CSS selector', async () => {
await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
// ./path/to/file.png must be treated as output path, not a CSS class selector (#495)
const relPath = './browse-test-dotpath.png';
const absPath = path.resolve(relPath);
const result = await handleMetaCommand('screenshot', [relPath], bm, async () => {});
expect(result).toContain('Screenshot saved');
expect(fs.existsSync(absPath)).toBe(true);
fs.unlinkSync(absPath);
});
test('screenshot with nonexistent selector throws timeout', async () => { test('screenshot with nonexistent selector throws timeout', async () => {
await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm); await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
try { try {

View File

@@ -92,6 +92,35 @@ describe('gstack-update-check', () => {
expect(cache).toContain('UP_TO_DATE'); expect(cache).toContain('UP_TO_DATE');
}); });
// ─── Path C2: Just-upgraded marker + newer remote ──────────
test('just-upgraded marker does not mask newer remote version', () => {
writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.5.0\n');
const { exitCode, stdout } = run();
expect(exitCode).toBe(0);
// Should output both the just-upgraded notice AND the new upgrade
expect(stdout).toContain('JUST_UPGRADED 0.3.3 0.4.0');
expect(stdout).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
// Cache should reflect the upgrade available, not UP_TO_DATE
const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
expect(cache).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
});
// ─── Path C3: Just-upgraded marker + remote matches local ──
test('just-upgraded with no further updates writes UP_TO_DATE cache', () => {
writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.4.0\n');
const { exitCode, stdout } = run();
expect(exitCode).toBe(0);
expect(stdout).toBe('JUST_UPGRADED 0.3.3 0.4.0');
const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
expect(cache).toContain('UP_TO_DATE');
});
// ─── Path D1: Fresh cache, UP_TO_DATE ─────────────────────── // ─── Path D1: Fresh cache, UP_TO_DATE ───────────────────────
test('exits silently when cache says UP_TO_DATE and is fresh', () => { test('exits silently when cache says UP_TO_DATE and is fresh', () => {
writeFileSync(join(gstackDir, 'VERSION'), '0.3.3\n'); writeFileSync(join(gstackDir, 'VERSION'), '0.3.3\n');

View File

@@ -280,7 +280,7 @@ committing.
git commit -m "$(cat <<'EOF' git commit -m "$(cat <<'EOF'
docs: update project documentation for vX.Y.Z.W docs: update project documentation for vX.Y.Z.W
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> {{CO_AUTHOR_TRAILER}}
EOF EOF
)" )"
``` ```

View File

@@ -1,6 +1,6 @@
{ {
"name": "gstack", "name": "gstack",
"version": "0.12.7.0", "version": "0.12.8.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT", "license": "MIT",
"type": "module", "type": "module",
@@ -8,7 +8,7 @@
"browse": "./browse/dist/browse" "browse": "./browse/dist/browse"
}, },
"scripts": { "scripts": {
"build": "bun run gen:skill-docs && bun run gen:skill-docs --host codex && bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true", "build": "bun run gen:skill-docs; bun run gen:skill-docs --host codex; bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true",
"gen:skill-docs": "bun run scripts/gen-skill-docs.ts", "gen:skill-docs": "bun run scripts/gen-skill-docs.ts",
"dev": "bun run browse/src/cli.ts", "dev": "bun run browse/src/cli.ts",
"server": "bun run browse/src/server.ts", "server": "bun run browse/src/server.ts",

View File

@@ -10,7 +10,7 @@ const SKIP = new Set(['node_modules', '.git', 'dist']);
function subdirs(root: string): string[] { function subdirs(root: string): string[] {
return fs.readdirSync(root, { withFileTypes: true }) return fs.readdirSync(root, { withFileTypes: true })
.filter(d => d.isDirectory() && !SKIP.has(d.name)) .filter(d => d.isDirectory() && !d.name.startsWith('.') && !SKIP.has(d.name))
.map(d => d.name); .map(d => d.name);
} }

View File

@@ -12,7 +12,7 @@ import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup }
import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch } from './design'; import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch } from './design';
import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing'; import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './review'; import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './review';
import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology } from './utility'; import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer } from './utility';
export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = { export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
SLUG_EVAL: generateSlugEval, SLUG_EVAL: generateSlugEval,
@@ -44,4 +44,5 @@ export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
PLAN_COMPLETION_AUDIT_SHIP: generatePlanCompletionAuditShip, PLAN_COMPLETION_AUDIT_SHIP: generatePlanCompletionAuditShip,
PLAN_COMPLETION_AUDIT_REVIEW: generatePlanCompletionAuditReview, PLAN_COMPLETION_AUDIT_REVIEW: generatePlanCompletionAuditReview,
PLAN_VERIFICATION_EXEC: generatePlanVerificationExec, PLAN_VERIFICATION_EXEC: generatePlanVerificationExec,
CO_AUTHOR_TRAILER: generateCoAuthorTrailer,
}; };

View File

@@ -365,3 +365,10 @@ Minimum 0 per category.
11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user. 11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`; 12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
} }
export function generateCoAuthorTrailer(ctx: TemplateContext): string {
if (ctx.host === 'codex') {
return 'Co-Authored-By: OpenAI Codex <noreply@openai.com>';
}
return 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
}

View File

@@ -464,7 +464,7 @@ Save this summary — it goes into the PR body in Step 8.
git commit -m "$(cat <<'EOF' git commit -m "$(cat <<'EOF'
chore: bump version and changelog (vX.Y.Z.W) chore: bump version and changelog (vX.Y.Z.W)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> {{CO_AUTHOR_TRAILER}}
EOF EOF
)" )"
``` ```

View File

@@ -3,6 +3,7 @@ import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot'; import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
import * as fs from 'fs'; import * as fs from 'fs';
import * as path from 'path'; import * as path from 'path';
import * as os from 'os';
const ROOT = path.resolve(import.meta.dir, '..'); const ROOT = path.resolve(import.meta.dir, '..');
const MAX_SKILL_DESCRIPTION_LENGTH = 1024; const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
@@ -1599,6 +1600,29 @@ describe('setup script validation', () => {
}); });
}); });
describe('discover-skills hidden directory filtering', () => {
test('discoverTemplates skips dot-prefixed directories', () => {
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-discover-'));
try {
// Create a hidden dir with a template (should be excluded)
fs.mkdirSync(path.join(tmpDir, '.hidden'), { recursive: true });
fs.writeFileSync(path.join(tmpDir, '.hidden', 'SKILL.md.tmpl'), '---\nname: evil\n---\ntest');
// Create a visible dir with a template (should be included)
fs.mkdirSync(path.join(tmpDir, 'visible'), { recursive: true });
fs.writeFileSync(path.join(tmpDir, 'visible', 'SKILL.md.tmpl'), '---\nname: good\n---\ntest');
const { discoverTemplates } = require('../scripts/discover-skills');
const results = discoverTemplates(tmpDir);
const dirs = results.map((r: { tmpl: string }) => r.tmpl);
expect(dirs).toContain('visible/SKILL.md.tmpl');
expect(dirs).not.toContain('.hidden/SKILL.md.tmpl');
} finally {
fs.rmSync(tmpDir, { recursive: true, force: true });
}
});
});
describe('telemetry', () => { describe('telemetry', () => {
test('generated SKILL.md contains telemetry start block', () => { test('generated SKILL.md contains telemetry start block', () => {
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');

View File

@@ -151,7 +151,6 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
// Skill routing — journey-stage tests (depend on ALL skill descriptions) // Skill routing — journey-stage tests (depend on ALL skill descriptions)
'journey-ideation': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], 'journey-ideation': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
'journey-plan-eng': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], 'journey-plan-eng': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
'journey-think-bigger': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
'journey-debug': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], 'journey-debug': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
'journey-qa': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], 'journey-qa': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
'journey-code-review': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], 'journey-code-review': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
@@ -276,7 +275,6 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
// Skill routing — periodic (LLM routing is non-deterministic) // Skill routing — periodic (LLM routing is non-deterministic)
'journey-ideation': 'periodic', 'journey-ideation': 'periodic',
'journey-plan-eng': 'periodic', 'journey-plan-eng': 'periodic',
'journey-think-bigger': 'periodic',
'journey-debug': 'periodic', 'journey-debug': 'periodic',
'journey-qa': 'periodic', 'journey-qa': 'periodic',
'journey-code-review': 'periodic', 'journey-code-review': 'periodic',

77
test/review-log.test.ts Normal file
View File

@@ -0,0 +1,77 @@
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import { execSync, ExecSyncOptionsWithStringEncoding } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
const ROOT = path.resolve(import.meta.dir, '..');
const BIN = path.join(ROOT, 'bin');
let tmpDir: string;
let slugDir: string;
function run(input: string, opts: { expectFail?: boolean } = {}): { stdout: string; exitCode: number } {
const execOpts: ExecSyncOptionsWithStringEncoding = {
cwd: ROOT,
env: { ...process.env, GSTACK_HOME: tmpDir },
encoding: 'utf-8',
timeout: 10000,
};
try {
const stdout = execSync(`${BIN}/gstack-review-log '${input.replace(/'/g, "'\\''")}'`, execOpts).trim();
return { stdout, exitCode: 0 };
} catch (e: any) {
if (opts.expectFail) {
return { stdout: e.stderr?.toString() || '', exitCode: e.status || 1 };
}
throw e;
}
}
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-revlog-'));
// gstack-review-log uses gstack-slug which needs a git repo — create the projects dir
// with a predictable slug by pre-creating the directory structure
slugDir = path.join(tmpDir, 'projects');
fs.mkdirSync(slugDir, { recursive: true });
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
describe('gstack-review-log', () => {
test('appends valid JSON to review JSONL file', () => {
const input = '{"skill":"plan-eng-review","status":"clean"}';
const result = run(input);
expect(result.exitCode).toBe(0);
// Find the JSONL file that was written
const projectDirs = fs.readdirSync(slugDir);
expect(projectDirs.length).toBeGreaterThan(0);
const projectDir = path.join(slugDir, projectDirs[0]);
const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
expect(jsonlFiles.length).toBeGreaterThan(0);
const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
const parsed = JSON.parse(content);
expect(parsed.skill).toBe('plan-eng-review');
expect(parsed.status).toBe('clean');
});
test('rejects non-JSON input with non-zero exit code', () => {
const result = run('not json at all', { expectFail: true });
expect(result.exitCode).not.toBe(0);
// Verify nothing was written
const projectDirs = fs.readdirSync(slugDir);
if (projectDirs.length > 0) {
const projectDir = path.join(slugDir, projectDirs[0]);
const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
if (jsonlFiles.length > 0) {
const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
expect(content).toBe('');
}
}
});
});

View File

@@ -45,7 +45,7 @@ describeIfSelected('Skill E2E tests', [
4. $B screenshot /tmp/skill-e2e-test.png 4. $B screenshot /tmp/skill-e2e-test.png
Report the results of each command.`, Report the results of each command.`,
workingDirectory: tmpDir, workingDirectory: tmpDir,
maxTurns: 5, maxTurns: 7,
timeout: 60_000, timeout: 60_000,
testName: 'browse-basic', testName: 'browse-basic',
runId, runId,

View File

@@ -340,21 +340,22 @@ Write your findings to ${dir}/review-output.md`,
run('git', ['add', 'app.ts'], dir); run('git', ['add', 'app.ts'], dir);
run('git', ['commit', '-m', 'feat: update to v2'], dir); run('git', ['commit', '-m', 'feat: update to v2'], dir);
// Copy ship skill // Extract only Step 0 (base branch detection) from ship/SKILL.md
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md')); // (copying the full 1900-line file causes agent context bloat and flaky timeouts)
const fullShipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
const step0Start = fullShipSkill.indexOf('## Step 0: Detect platform and base branch');
const step0End = fullShipSkill.indexOf('## Step 1: Pre-flight');
const shipSection = fullShipSkill.slice(step0Start, step0End > step0Start ? step0End : undefined);
fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), shipSection);
const result = await runSkillTest({ const result = await runSkillTest({
prompt: `Read ship-SKILL.md for the ship workflow. prompt: `Read ship-SKILL.md. It contains Step 0 (Detect base branch) from the ship workflow.
Skip the preamble bash block, lake intro, telemetry, and contributor mode sections — go straight to Step 0. Run the base branch detection. Since there is no remote, gh commands will fail — fall back to main.
Run ONLY Step 0 (Detect base branch) and Step 1 (Pre-flight) from the ship workflow. Then run git diff and git log against the detected base branch.
Since there is no remote, gh commands will fail — fall back to main.
After completing Step 0 and Step 1, STOP. Do NOT proceed to Step 2 or beyond. Write a summary to ${dir}/ship-preflight.md including:
Do NOT push, create PRs, or modify VERSION/CHANGELOG.
Write a summary of what you detected to ${dir}/ship-preflight.md including:
- The detected base branch name - The detected base branch name
- The current branch name - The current branch name
- The diff stat against the base branch`, - The diff stat against the base branch`,
@@ -580,8 +581,13 @@ describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'],
].join('\n')); ].join('\n'));
fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755); fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);
// Copy ship skill // Extract only the Review Readiness Dashboard section from ship/SKILL.md
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md')); // (copying the full 1900-line file causes agent context bloat and timeouts)
const fullSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
const dashStart = fullSkill.indexOf('## Review Readiness Dashboard');
const dashEnd = fullSkill.indexOf('\n---\n', dashStart);
const dashSection = fullSkill.slice(dashStart, dashEnd > dashStart ? dashEnd : undefined);
fs.writeFileSync(path.join(dashDir, 'ship-SKILL.md'), dashSection);
}); });
afterAll(() => { afterAll(() => {
@@ -605,7 +611,7 @@ Skip the preamble, lake intro, telemetry, and all other ship steps.
Write the dashboard output to ${dashDir}/dashboard-output.md`, Write the dashboard output to ${dashDir}/dashboard-output.md`,
workingDirectory: dashDir, workingDirectory: dashDir,
maxTurns: 12, maxTurns: 12,
timeout: 90_000, timeout: 180_000,
testName: 'review-dashboard-via', testName: 'review-dashboard-via',
runId, runId,
}); });
@@ -639,7 +645,7 @@ Write the dashboard output to ${dashDir}/dashboard-output.md`,
); );
// Ship dashboard should not gate when eng review is clear // Ship dashboard should not gate when eng review is clear
expect(gateQuestions).toHaveLength(0); expect(gateQuestions).toHaveLength(0);
}, 120_000); }, 240_000);
}); });
// Module-level afterAll — finalize eval collector after all tests complete // Module-level afterAll — finalize eval collector after all tests complete

View File

@@ -250,56 +250,10 @@ describeE2E('Skill Routing E2E — Developer Journey', () => {
} }
}, 150_000); }, 150_000);
testIfSelected('journey-think-bigger', async () => { // Removed: journey-think-bigger
const tmpDir = createRoutingWorkDir('think-bigger'); // Tested ambiguous routing ("think bigger" → plan-ceo-review) but Claude
try { // legitimately answers directly instead of routing. Never passed reliably.
fs.writeFileSync(path.join(tmpDir, 'plan.md'), `# Waitlist App Architecture // The other 10 journey tests cover routing with clear signals.
## Components
- REST API (Express.js)
- PostgreSQL database
- React frontend
- SMS integration (Twilio)
## Data Model
- restaurants (id, name, settings)
- parties (id, restaurant_id, name, size, phone, status, created_at)
- wait_estimates (id, restaurant_id, avg_wait_minutes)
## API Endpoints
- POST /api/parties - add party to waitlist
- GET /api/parties - list current waitlist
- PATCH /api/parties/:id/status - update party status
- GET /api/estimate - get current wait estimate
`);
spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
const testName = 'journey-think-bigger';
const expectedSkill = 'plan-ceo-review';
const result = await runSkillTest({
prompt: "Actually, looking at this plan again, I feel like we're thinking too small. We're just doing waitlists but what about the whole restaurant guest experience? Is there a bigger opportunity here we should go after?",
workingDirectory: tmpDir,
maxTurns: 5,
allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'],
timeout: 120_000,
testName,
runId,
});
const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill');
const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined;
logCost(`journey: ${testName}`, result);
recordRouting(testName, result, expectedSkill, actualSkill);
expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0);
const validSkills = ['plan-ceo-review', 'office-hours'];
expect(validSkills, `Expected one of ${validSkills.join('/')} but got ${actualSkill}`).toContain(actualSkill);
} finally {
fs.rmSync(tmpDir, { recursive: true, force: true });
}
}, 180_000);
testIfSelected('journey-debug', async () => { testIfSelected('journey-debug', async () => {
const tmpDir = createRoutingWorkDir('debug'); const tmpDir = createRoutingWorkDir('debug');

View File

@@ -125,6 +125,82 @@ describe('gstack-telemetry-log', () => {
expect(events[0]).toHaveProperty('_branch'); expect(events[0]).toHaveProperty('_branch');
}); });
// ─── json_safe() injection prevention tests ────────────────
test('sanitizes skill name with quote injection attempt', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill 'review","injected":"true' --duration 10 --outcome success --session-id inj-1`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
// Must be valid JSON (no injection — quotes stripped, so no field injection possible)
const event = JSON.parse(lines[0]);
// The key check: no injected top-level property was created
expect(event).not.toHaveProperty('injected');
// Skill field should have quotes stripped but content preserved
expect(event.skill).not.toContain('"');
});
test('truncates skill name exceeding 200 chars', () => {
setConfig('telemetry', 'anonymous');
const longSkill = 'a'.repeat(250);
run(`${BIN}/gstack-telemetry-log --skill '${longSkill}' --duration 10 --outcome success --session-id trunc-1`);
const events = parseJsonl();
expect(events[0].skill.length).toBeLessThanOrEqual(200);
});
test('sanitizes outcome with newline injection attempt', () => {
setConfig('telemetry', 'anonymous');
// Use printf to pass actual newline in the argument
run(`bash -c 'OUTCOME=$(printf "success\\nfake\\":\\"true"); ${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome "$OUTCOME" --session-id inj-2'`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('fake');
});
test('sanitizes session_id with backslash-quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id 'id\\\\"","x":"y'`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('x');
});
test('sanitizes error_class with quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-class 'timeout","extra":"val' --session-id inj-3`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('extra');
});
test('sanitizes failed_step with quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --failed-step 'step1","hacked":"yes' --session-id inj-4`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('hacked');
});
test('escapes error_message quotes and preserves content', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'Error: file "test.txt" not found' --session-id inj-5`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event.error_message).toContain('file');
expect(event.error_message).toContain('not found');
});
test('creates analytics directory if missing', () => { test('creates analytics directory if missing', () => {
// Remove analytics dir // Remove analytics dir
const analyticsDir = path.join(tmpDir, 'analytics'); const analyticsDir = path.join(tmpDir, 'analytics');