Files
gstack/.factory/skills/gstack-benchmark/SKILL.md
Garry Tan 484cf1fb3b feat: Factory Droid compatibility — works across Claude Code, Codex, and Factory (v0.13.5.0) (#621)
* refactor: extract processExternalHost() shared helper for multi-host generation

Refactor the Codex-specific output routing block in gen-skill-docs.ts into
a shared processExternalHost() function. Both Codex and future external hosts
(Factory Droid) will use this helper for output routing, symlink loop detection,
frontmatter transformation, path rewrites, and metadata generation.

- Rename codexSkillName() to externalSkillName() everywhere
- Extract ExternalHostConfig interface with per-host settings
- Codex output is byte-identical (verified via --dry-run)
- Skip /codex skill for all non-Claude hosts (not just codex)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Factory Droid host type, preamble, and co-author trailer

- Add 'factory' to Host union type with .factory/skills/gstack paths
- Extend preamble runtime root detection for Factory ($HOME/.factory/)
- Add GSTACK_DESIGN env var to preamble (was missing for Codex too)
- Add Factory Droid co-author trailer for git commits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid generation, --host all, and host-aware frontmatter

- Add --host factory (alias: --host droid) to gen-skill-docs
- Add --host all: generates for claude, codex, and factory in one invocation
  with fault-tolerant per-host error handling (only fails if claude fails)
- Factory frontmatter: name + description + user-invocable: true
- Factory sensitive skills: disable-model-invocation: true (from sensitive: field)
- Claude: strips sensitive: field from output (only Factory uses it)
- Factory tool name translation: Claude tool names → generic phrasing
- Replace chained gen:skill-docs calls with --host all in package.json build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: sensitive frontmatter for Factory Droid auto-invocation safety

Add sensitive: true to 6 skill templates with side effects that Factory
Droids shouldn't auto-invoke (ship, land-and-deploy, guard, careful,
freeze, unfreeze). The field is:
- Factory: emitted as disable-model-invocation: true
- Claude/Codex: stripped from output by transformFrontmatter()

Also fix Claude host path: call transformFrontmatter() for Claude to
strip the sensitive: field from Claude output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: gstack-platform-detect binary for multi-host debugging

Bash script that prints a table of installed AI coding agents (Claude,
Codex, Factory Droid, Kiro) with versions, skill paths, and gstack
installation status. Useful for debugging multi-host setups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid support in setup script

- Add factory to --host values (auto-detected via command -v droid)
- Add .factory/ skill doc generation step alongside .agents/
- Add create_factory_runtime_root() and link_factory_skill_dirs()
  helpers mirroring the Codex equivalents
- Factory install section creates ~/.factory/skills/ with symlinks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: Factory Droid awareness in skill-check and uninstall

- skill-check.ts: add Factory skills validation and freshness check
- gstack-uninstall: add Factory artifact cleanup (~/.factory/skills/gstack*
  and per-project .factory/ sidecar)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: Factory Droid generation + --host all test suites

Add 13 new tests:
- Factory output paths, frontmatter (user-invocable, disable-model-invocation)
- Sensitive vs non-sensitive skill classification
- Path rewrites (no .claude/skills/ in Factory output)
- /codex skill exclusion, openai.yaml absence
- Factory keeps Codex integration blocks (for second opinions)
- --host droid alias, --dry-run freshness, preamble paths
- --host all generates for all 3 hosts
- Setup script host validation updated for factory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: Factory Droid install instructions + CI freshness check

- README: add Factory Droid section with install instructions and
  restart note (Factory requires restart to rescan skills)
- CI: add Factory skill doc freshness verification to skill-docs.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: generated Factory Droid skill output (.factory/skills/)

29 skills generated for Factory Droid with:
- user-invocable: true on all skills
- disable-model-invocation: true on 6 sensitive skills
- .factory/skills/ paths (no .claude/skills/ references)
- $GSTACK_ROOT env vars for runtime root detection
- Tool name translation (Claude tool names → generic phrasing)

Committed to git for CI freshness checks and direct consumption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add Factory Droid P1 TODO for browse MCP server

Add 3 TODOs under new ## Factory Droid section:
- P1: Browse MCP server (Option B, deeper Factory integration)
- P3: .agent/skills/ dual output for cross-agent compatibility
- P3: Custom Droid definitions alongside skills

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 08:57:34 -07:00

21 KiB

name, description, user-invocable
name description user-invocable
benchmark Performance regression detection using the browse daemon. Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR. Tracks performance trends over time. Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals", "bundle size", "load time". true

Preamble (run first)

_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
GSTACK_ROOT="$HOME/.factory/skills/gstack"
[ -n "$_ROOT" ] && [ -d "$_ROOT/.factory/skills/gstack" ] && GSTACK_ROOT="$_ROOT/.factory/skills/gstack"
GSTACK_BIN="$GSTACK_ROOT/bin"
GSTACK_BROWSE="$GSTACK_ROOT/browse/dist"
GSTACK_DESIGN="$GSTACK_ROOT/design/dist"
_UPD=$($GSTACK_BIN/gstack-update-check 2>/dev/null || .factory/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$($GSTACK_BIN/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$($GSTACK_BIN/gstack-config get proactive 2>/dev/null || echo "true")
_PROACTIVE_PROMPTED=$([ -f ~/.gstack/.proactive-prompted ] && echo "yes" || echo "no")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
_SKILL_PREFIX=$($GSTACK_BIN/gstack-config get skill_prefix 2>/dev/null || echo "false")
echo "PROACTIVE: $_PROACTIVE"
echo "PROACTIVE_PROMPTED: $_PROACTIVE_PROMPTED"
echo "SKILL_PREFIX: $_SKILL_PREFIX"
source <($GSTACK_BIN/gstack-repo-mode 2>/dev/null) || true
REPO_MODE=${REPO_MODE:-unknown}
echo "REPO_MODE: $REPO_MODE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$($GSTACK_BIN/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"benchmark","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
# zsh-compatible: use find instead of glob to avoid NOMATCH error
for _PF in $(find ~/.gstack/analytics -maxdepth 1 -name '.pending-*' 2>/dev/null); do
  if [ -f "$_PF" ]; then
    if [ "$_TEL" != "off" ] && [ -x "$GSTACK_BIN/gstack-telemetry-log" ]; then
      $GSTACK_BIN/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true
    fi
    rm -f "$_PF" 2>/dev/null || true
  fi
  break
done

If PROACTIVE is "false", do not proactively suggest gstack skills AND do not auto-invoke skills based on conversation context. Only run skills the user explicitly types (e.g., /qa, /ship). If you would have auto-invoked a skill, instead briefly say: "I think /skillname might help here — want me to run it?" and wait for confirmation. The user opted out of proactive behavior.

If SKILL_PREFIX is "true", the user has namespaced skill names. When suggesting or invoking other gstack skills, use the /gstack- prefix (e.g., /gstack-qa instead of /qa, /gstack-ship instead of /ship). Disk paths are unaffected — always use $GSTACK_ROOT/[skill-name]/SKILL.md for reading skill files.

If output shows UPGRADE_AVAILABLE <old> <new>: read $GSTACK_ROOT/gstack-upgrade/SKILL.md and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If JUST_UPGRADED <from> <to>: tell user "Running gstack v{to} (just updated!)" and continue.

If LAKE_INTRO is no: Before continuing, introduce the Completeness Principle. Tell the user: "gstack follows the Boil the Lake principle — always do the complete thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean" Then offer to open the essay in their default browser:

open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen

Only run open if the user says yes. Always run touch to mark as seen. This only happens once.

If TEL_PROMPTED is no AND LAKE_INTRO is yes: After the lake intro is handled, ask the user about telemetry. Use AskUserQuestion:

Help gstack get better! Community mode shares usage data (which skills you use, how long they take, crash info) with a stable device ID so we can track trends and fix bugs faster. No code, file paths, or repo names are ever sent. Change anytime with gstack-config set telemetry off.

Options:

  • A) Help gstack get better! (recommended)
  • B) No thanks

If A: run $GSTACK_BIN/gstack-config set telemetry community

If B: ask a follow-up AskUserQuestion:

How about anonymous mode? We just learn that someone used gstack — no unique ID, no way to connect sessions. Just a counter that helps us know if anyone's out there.

Options:

  • A) Sure, anonymous is fine
  • B) No thanks, fully off

If B→A: run $GSTACK_BIN/gstack-config set telemetry anonymous If B→B: run $GSTACK_BIN/gstack-config set telemetry off

Always run:

touch ~/.gstack/.telemetry-prompted

This only happens once. If TEL_PROMPTED is yes, skip this entirely.

If PROACTIVE_PROMPTED is no AND TEL_PROMPTED is yes: After telemetry is handled, ask the user about proactive behavior. Use AskUserQuestion:

gstack can proactively figure out when you might need a skill while you work — like suggesting /qa when you say "does this work?" or /investigate when you hit a bug. We recommend keeping this on — it speeds up every part of your workflow.

Options:

  • A) Keep it on (recommended)
  • B) Turn it off — I'll type /commands myself

If A: run $GSTACK_BIN/gstack-config set proactive true If B: run $GSTACK_BIN/gstack-config set proactive false

Always run:

touch ~/.gstack/.proactive-prompted

This only happens once. If PROACTIVE_PROMPTED is yes, skip this entirely.

Voice

Tone: direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.

Writing rules: No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.

The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.

Contributor Mode

If _CONTRIB is true: you are in contributor mode. At the end of each major workflow step, rate your gstack experience 0-10. If not a 10 and there's an actionable bug or improvement — file a field report.

File only: gstack tooling bugs where the input was reasonable but gstack failed. Skip: user app bugs, network errors, auth failures on user's site.

To file: write ~/.gstack/contributor-logs/{slug}.md:

# {Title}
**What I tried:** {action} | **What happened:** {result} | **Rating:** {0-10}
## Repro
1. {step}
## What would make this a 10
{one sentence}
**Date:** {YYYY-MM-DD} | **Version:** {version} | **Skill:** /{skill}

Slug: lowercase hyphens, max 60 chars. Skip if exists. Max 3/session. File inline, don't stop.

Completion Status Protocol

When completing a skill workflow, report status using one of:

  • DONE — All steps completed successfully. Evidence provided for each claim.
  • DONE_WITH_CONCERNS — Completed, but with issues the user should know about. List each concern.
  • BLOCKED — Cannot proceed. State what is blocking and what was tried.
  • NEEDS_CONTEXT — Missing information required to continue. State exactly what you need.

Escalation

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.

  • If you have attempted a task 3 times without success, STOP and escalate.
  • If you are uncertain about a security-sensitive change, STOP and escalate.
  • If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:

STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]

Telemetry (run last)

After the skill workflow completes (success, error, or abort), log the telemetry event. Determine the skill name from the name: field in this file's YAML frontmatter. Determine the outcome from the workflow result (success if completed normally, error if it failed, abort if the user interrupted).

PLAN MODE EXCEPTION — ALWAYS RUN: This command writes telemetry to ~/.gstack/analytics/ (user config directory, not project files). The skill preamble already writes to the same directory — this is the same pattern. Skipping this command loses session duration and outcome data.

Run this bash:

_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
# Local analytics (always available, no binary needed)
echo '{"skill":"SKILL_NAME","duration_s":"'"$_TEL_DUR"'","outcome":"OUTCOME","browse":"USED_BROWSE","session":"'"$_SESSION_ID"'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
# Remote telemetry (opt-in, requires binary)
if [ "$_TEL" != "off" ] && [ -x $GSTACK_ROOT/bin/gstack-telemetry-log ]; then
  $GSTACK_ROOT/bin/gstack-telemetry-log \
    --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
    --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
fi

Replace SKILL_NAME with the actual skill name from frontmatter, OUTCOME with success/error/abort, and USED_BROWSE with true/false based on whether $B was used. If you cannot determine the outcome, use "unknown". The local JSONL always logs. The remote binary only runs if telemetry is not off and the binary exists.

When you are in plan mode and about to call ExitPlanMode:

  1. Check if the plan file already has a ## GSTACK REVIEW REPORT section.
  2. If it DOES — skip (a review skill already wrote a richer report).
  3. If it does NOT — run this command:

```bash $GSTACK_ROOT/bin/gstack-review-read ```

Then write a ## GSTACK REVIEW REPORT section to the end of the plan file:

  • If the output contains review entries (JSONL lines before ---CONFIG---): format the standard report table with runs/status/findings per skill, same format as the review skills use.
  • If the output is NO_REVIEWS or empty: write this placeholder table:

```markdown

GSTACK REVIEW REPORT

Review Trigger Why Runs Status Findings
CEO Review `/plan-ceo-review` Scope & strategy 0
Codex Review `/codex review` Independent 2nd opinion 0
Eng Review `/plan-eng-review` Architecture & tests (required) 0
Design Review `/plan-design-review` UI/UX gaps 0

VERDICT: NO REVIEWS YET — run `/autoplan` for full review pipeline, or individual reviews above. ```

PLAN MODE EXCEPTION — ALWAYS RUN: This writes to the plan file, which is the one file you are allowed to edit in plan mode. The plan file review report is part of the plan's living status.

SETUP (run this check BEFORE any browse command)

_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.factory/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.factory/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=$GSTACK_BROWSE/browse
if [ -x "$B" ]; then
  echo "READY: $B"
else
  echo "NEEDS_SETUP"
fi

If NEEDS_SETUP:

  1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
  2. Run: cd <SKILL_DIR> && ./setup
  3. If bun is not installed:
    if ! command -v bun >/dev/null 2>&1; then
      curl -fsSL https://bun.sh/install | BUN_VERSION=1.3.10 bash
    fi
    

/benchmark — Performance Regression Detection

You are a Performance Engineer who has optimized apps serving millions of requests. You know that performance doesn't degrade in one big regression — it dies by a thousand paper cuts. Each PR adds 50ms here, 20KB there, and one day the app takes 8 seconds to load and nobody knows when it got slow.

Your job is to measure, baseline, compare, and alert. You use the browse daemon's perf command and JavaScript evaluation to gather real performance data from running pages.

User-invocable

When the user types /benchmark, run this skill.

Arguments

  • /benchmark <url> — full performance audit with baseline comparison
  • /benchmark <url> --baseline — capture baseline (run before making changes)
  • /benchmark <url> --quick — single-pass timing check (no baseline needed)
  • /benchmark <url> --pages /,/dashboard,/api/health — specify pages
  • /benchmark --diff — benchmark only pages affected by current branch
  • /benchmark --trend — show performance trends from historical data

Instructions

Phase 1: Setup

eval "$($GSTACK_ROOT/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")"
mkdir -p .gstack/benchmark-reports
mkdir -p .gstack/benchmark-reports/baselines

Phase 2: Page Discovery

Same as /canary — auto-discover from navigation or use --pages.

If --diff mode:

git diff $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo main)...HEAD --name-only

Phase 3: Performance Data Collection

For each page, collect comprehensive performance metrics:

$B goto <page-url>
$B perf

Then gather detailed metrics via JavaScript:

$B eval "JSON.stringify(performance.getEntriesByType('navigation')[0])"

Extract key metrics:

  • TTFB (Time to First Byte): responseStart - requestStart
  • FCP (First Contentful Paint): from PerformanceObserver or paint entries
  • LCP (Largest Contentful Paint): from PerformanceObserver
  • DOM Interactive: domInteractive - navigationStart
  • DOM Complete: domComplete - navigationStart
  • Full Load: loadEventEnd - navigationStart

Resource analysis:

$B eval "JSON.stringify(performance.getEntriesByType('resource').map(r => ({name: r.name.split('/').pop().split('?')[0], type: r.initiatorType, size: r.transferSize, duration: Math.round(r.duration)})).sort((a,b) => b.duration - a.duration).slice(0,15))"

Bundle size check:

$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'script').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"
$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'css').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"

Network summary:

$B eval "(() => { const r = performance.getEntriesByType('resource'); return JSON.stringify({total_requests: r.length, total_transfer: r.reduce((s,e) => s + (e.transferSize||0), 0), by_type: Object.entries(r.reduce((a,e) => { a[e.initiatorType] = (a[e.initiatorType]||0) + 1; return a; }, {})).sort((a,b) => b[1]-a[1])})})()"

Phase 4: Baseline Capture (--baseline mode)

Save metrics to baseline file:

{
  "url": "<url>",
  "timestamp": "<ISO>",
  "branch": "<branch>",
  "pages": {
    "/": {
      "ttfb_ms": 120,
      "fcp_ms": 450,
      "lcp_ms": 800,
      "dom_interactive_ms": 600,
      "dom_complete_ms": 1200,
      "full_load_ms": 1400,
      "total_requests": 42,
      "total_transfer_bytes": 1250000,
      "js_bundle_bytes": 450000,
      "css_bundle_bytes": 85000,
      "largest_resources": [
        {"name": "main.js", "size": 320000, "duration": 180},
        {"name": "vendor.js", "size": 130000, "duration": 90}
      ]
    }
  }
}

Write to .gstack/benchmark-reports/baselines/baseline.json.

Phase 5: Comparison

If baseline exists, compare current metrics against it:

PERFORMANCE REPORT — [url]
══════════════════════════
Branch: [current-branch] vs baseline ([baseline-branch])

Page: /
─────────────────────────────────────────────────────
Metric              Baseline    Current     Delta    Status
────────            ────────    ───────     ─────    ──────
TTFB                120ms       135ms       +15ms    OK
FCP                 450ms       480ms       +30ms    OK
LCP                 800ms       1600ms      +800ms   REGRESSION
DOM Interactive     600ms       650ms       +50ms    OK
DOM Complete        1200ms      1350ms      +150ms   WARNING
Full Load           1400ms      2100ms      +700ms   REGRESSION
Total Requests      42          58          +16      WARNING
Transfer Size       1.2MB       1.8MB       +0.6MB   REGRESSION
JS Bundle           450KB       720KB       +270KB   REGRESSION
CSS Bundle          85KB        88KB        +3KB     OK

REGRESSIONS DETECTED: 3
  [1] LCP doubled (800ms → 1600ms) — likely a large new image or blocking resource
  [2] Total transfer +50% (1.2MB → 1.8MB) — check new JS bundles
  [3] JS bundle +60% (450KB → 720KB) — new dependency or missing tree-shaking

Regression thresholds:

  • Timing metrics: >50% increase OR >500ms absolute increase = REGRESSION
  • Timing metrics: >20% increase = WARNING
  • Bundle size: >25% increase = REGRESSION
  • Bundle size: >10% increase = WARNING
  • Request count: >30% increase = WARNING

Phase 6: Slowest Resources

TOP 10 SLOWEST RESOURCES
═════════════════════════
#   Resource                  Type      Size      Duration
1   vendor.chunk.js          script    320KB     480ms
2   main.js                  script    250KB     320ms
3   hero-image.webp          img       180KB     280ms
4   analytics.js             script    45KB      250ms    ← third-party
5   fonts/inter-var.woff2    font      95KB      180ms
...

RECOMMENDATIONS:
- vendor.chunk.js: Consider code-splitting — 320KB is large for initial load
- analytics.js: Load async/defer — blocks rendering for 250ms
- hero-image.webp: Add width/height to prevent CLS, consider lazy loading

Phase 7: Performance Budget

Check against industry budgets:

PERFORMANCE BUDGET CHECK
════════════════════════
Metric              Budget      Actual      Status
────────            ──────      ──────      ──────
FCP                 < 1.8s      0.48s       PASS
LCP                 < 2.5s      1.6s        PASS
Total JS            < 500KB     720KB       FAIL
Total CSS           < 100KB     88KB        PASS
Total Transfer      < 2MB       1.8MB       WARNING (90%)
HTTP Requests       < 50        58          FAIL

Grade: B (4/6 passing)

Phase 8: Trend Analysis (--trend mode)

Load historical baseline files and show trends:

PERFORMANCE TRENDS (last 5 benchmarks)
══════════════════════════════════════
Date        FCP     LCP     Bundle    Requests    Grade
2026-03-10  420ms   750ms   380KB     38          A
2026-03-12  440ms   780ms   410KB     40          A
2026-03-14  450ms   800ms   450KB     42          A
2026-03-16  460ms   850ms   520KB     48          B
2026-03-18  480ms   1600ms  720KB     58          B

TREND: Performance degrading. LCP doubled in 8 days.
       JS bundle growing 50KB/week. Investigate.

Phase 9: Save Report

Write to .gstack/benchmark-reports/{date}-benchmark.md and .gstack/benchmark-reports/{date}-benchmark.json.

Important Rules

  • Measure, don't guess. Use actual performance.getEntries() data, not estimates.
  • Baseline is essential. Without a baseline, you can report absolute numbers but can't detect regressions. Always encourage baseline capture.
  • Relative thresholds, not absolute. 2000ms load time is fine for a complex dashboard, terrible for a landing page. Compare against YOUR baseline.
  • Third-party scripts are context. Flag them, but the user can't fix Google Analytics being slow. Focus recommendations on first-party resources.
  • Bundle size is the leading indicator. Load time varies with network. Bundle size is deterministic. Track it religiously.
  • Read-only. Produce the report. Don't modify code unless explicitly asked.