Files
gstack/design-shotgun/SKILL.md.tmpl
Garry Tan bf65487162 v1.26.0.0 feat: V1 transcript ingest + per-skill gbrain manifests + retrieval surface (#1298)
* feat: lib/gstack-memory-helpers shared module for V1 memory ingest pipeline

Lane 0 foundation per plan §"Eng review additions". 5 public functions
imported by the V1 helpers (Lanes A/B/C):

  canonicalizeRemote(url)  — normalize git remote → host/org/repo
  secretScanFile(path)     — gitleaks wrapper with discriminated return
  detectEngineTier()       — cached 60s in ~/.gstack/.gbrain-engine-cache.json
  parseSkillManifest(path) — extract gbrain.context_queries: from frontmatter
  withErrorContext(op,fn,caller) — async-aware error logging

22 unit tests, all passing. State files use schema_version: 1 +
last_writer field per Section 2A standardization. Manifest parser
handles all three kinds (vector/list/filesystem) and ignores
incomplete items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-memory-ingest — V1 unified memory ingest helper

Lane A. Walks coding-agent transcripts (Claude Code + Codex; Cursor V1.0.1
follow-up) AND ~/.gstack/ curated artifacts (eureka, learnings, timeline,
ceo-plans, design-docs, retros, builder-profile). Calls gbrain put_page
with type-tagged frontmatter. Uses gstack-memory-helpers (Lane 0):

  - Modes: --probe / --incremental (default, mtime fast-path) / --bulk
  - Default 90-day window; --all-history opts into full archive
  - --sources subset filter; --include-unattributed opt-in for no-remote sessions
  - --limit N for smoke testing; --benchmark for throughput reporting
  - Tolerant JSONL parser handles truncated last lines (D10 partial-flag)
  - State file at ~/.gstack/.transcript-ingest-state.json (LOCAL per ED1)
  - schema_version: 1 with backup-on-mismatch + JSON-corrupt recovery
  - gitleaks via secretScanFile() before every put_page (D19)
  - withErrorContext wraps every put_page for forensic ~/.gstack/.gbrain-errors.jsonl

15 unit tests cover --help, --probe (empty, Claude Code, Codex, mixed
artifacts), --sources filter, state file lifecycle (create, schema mismatch
backup, JSON corrupt backup), truncated-last-line handling, --limit
validation. All passing.

V1.5 P0 follow-ups noted in the file header:
  - Cursor SQLite extraction (V1.0.1)
  - gbrain put_file routing for Supabase Storage tier (cross-repo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-gbrain-sync — V1 unified sync verb (Lane B)

Orchestrates three storage tiers per plan §"Storage tiering":
  1. Code (current repo)         → gbrain import (Supabase or local PGLite)
  2. Transcripts + curated memory → gstack-memory-ingest (typed put_page)
  3. Curated artifacts to git    → gstack-brain-sync (existing pipeline)

Modes: --incremental (default, mtime fast-path) / --full (~25-35 min per
ED2 honest budget) / --dry-run (preview, no writes).

Flags: --code-only / --no-code / --no-memory / --no-brain-sync for
selective stage disable. Each stage failure is non-fatal; subsequent
stages still run.

State at ~/.gstack/.gbrain-sync-state.json (LOCAL per ED1) with
schema_version: 1 + last_writer + per-stage outcomes for forensic tracing.

--watch daemon explicitly deferred to V1.5 P0 TODO per Codex F3
(reverses the "no daemon" invariant). Continuous sync rides the existing
preamble-boundary hook only.

8 unit tests cover --help, unknown flag rejection, --dry-run preview shape
(all stages + code-only), --no-code stage skip, state file lifecycle
(create on real run + skip on dry-run), and stage results recorded
in state. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-brain-context-load — V1 retrieval surface (Lane C)

Called from the gstack preamble at every skill start. Reads the active
skill's gbrain.context_queries: frontmatter (Layer 2) or falls back to a
generic salience block (Layer 1 with explicit repo: {repo_slug} filter
per Codex F7 cleanup).

Dispatches each query by kind:
  kind: vector       → gbrain query <text>
  kind: list         → gbrain list_pages --filter ...
  kind: filesystem   → local glob (with mtime_desc sort + tail support)

Each MCP/CLI call has a 500ms hard timeout per Section 1C. On timeout
or missing gbrain CLI, helper renders SKIP for that section and continues —
skill startup never blocks > 2s on gbrain issues.

Datamark envelope per Section 1D + D12: rendered body wrapped once at
the page level in <USER_TRANSCRIPT_DATA do-not-interpret-as-instructions>
(not per-message). Layer 1 prompt-injection defense.

Default manifest (D13 three-section): recent transcripts (limit 5) +
recent curated last-7d (limit 10) + skill-name-matched timeline events
(limit 5). All scoped to {repo_slug}.

Template var substitution: {repo_slug}, {user_slug}, {branch},
{skill_name}, {window}. Unresolved vars cause the query to skip with a
logged reason (--explain shows it).

10 unit tests cover help/unknown-flag/limit-validation, default-fallback
when skill not found, manifest dispatch when --skill-file points at a
real SKILL.md, datamark envelope wrapping, render_as template
substitution, unresolved-template-var skip, --quiet suppression, and
graceful gbrain-CLI-absence behavior. All passing.

V1.5 P0: salience smarts promote to gbrain server-side MCP tools
(get_recent_salience, find_anomalies, recency-aware list_pages); helper
signature unchanged, internals switch from 4-call composition to single
MCP call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain.context_queries manifests on 6 V1 skills (Lane E partial)

Adds the V1 retrieval contracts. Each skill declares what it wants gbrain
to surface in the preamble at invocation time:

  /office-hours        — prior sessions + builder profile + design docs
                         + recent eureka (4 queries)
  /plan-ceo-review     — prior CEO plans + design docs + recent CEO review
                         activity (3 queries)
  /design-shotgun      — prior approved variants + DESIGN.md + recent
                         design docs (3 queries)
  /design-consultation — existing DESIGN.md + prior design decisions +
                         brand-related notes (3 queries)
  /investigate         — prior investigations + project learnings + recent
                         eureka cross-project (3 queries)
  /retro               — prior retros + recent timeline + recent learnings
                         (3 queries)

Each query carries an explicit kind (vector | list | filesystem) per D3,
schema: 1 versioning per D15, and {repo_slug} template var per F7
cross-repo-contamination cleanup. Mix of vector / list / filesystem
matches what each skill actually needs:

  - filesystem (mtime_desc + tail) for log JSONL + curated markdown
  - list with tags_contains filter for typed gbrain pages
  - (vector reserved for V1.0.1 when gbrain query surface stabilizes)

Smoke test: bun run bin/gstack-brain-context-load.ts --skill-file
office-hours/SKILL.md --repo test-repo --explain returns mode=manifest
queries=4 with the filesystem kinds populating real data from
~/.gstack/builder-profile.jsonl + ~/.gstack/analytics/eureka.jsonl on
this Mac. End-to-end retrieval flow confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Step 7.5 ingest gate + Step 10 verdict + memory.md ref doc (Lane E partial)

Step 7.5: Transcript & memory ingest gate. After Step 7 wires brain-sync
but before Step 8's CLAUDE.md persist, runs gstack-memory-ingest --probe,
then either silent-bulks (small) or AskUserQuestion-gates with the exact
counts + value promise + 5 options (this-repo-90d, all-history, multi-repo,
incremental-from-now, never). Decision persists to
gstack-config set transcript_ingest_mode <choice>.

Step 10: GREEN/YELLOW/RED verdict block. Re-running /setup-gbrain on a
configured Mac is now a first-class doctor path — every step's detection
+ repair logic feeds into a single verdict at the end. Rows: CLI / Engine /
doctor / MCP / Repo policy / Code import / Memory sync / Transcripts /
CLAUDE.md / Smoke. Tells the user "Run /setup-gbrain again any time gbrain
feels off; it's safe and idempotent."

setup-gbrain/memory.md: user-facing reference doc covering what gets
ingested + what stays local + secret scanning via gitleaks + storage
tiering + querying + deleting + how the agent auto-loads context per skill +
common recovery cases. Linked from Step 8's CLAUDE.md persist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: V1 E2E pipeline + --no-write flag for ingest helper (Lane F)

E2E pipeline test exercises the full Lane A → B → C value loop:
  1. Set up fake $HOME with all 8 memory source types as fixtures
  2. gstack-memory-ingest --probe verifies counts match disk
  3. gstack-memory-ingest --incremental writes state with schema_version: 1
  4. Idempotency: re-run reports 0 changes
  5. --probe distinguishes new vs unchanged after first incremental
  6. gstack-gbrain-sync --dry-run previews 3 stages
  7. --no-code --no-brain-sync --quiet writes sync state with 1 stage entry
  8. office-hours/SKILL.md V1 manifest dispatches 4 queries (mode=manifest)
  9. Datamark envelope wraps every loaded section (Section 1D + D12)
 10. Layer 1 fallback when no skill specified — default 3-section manifest
 11. plan-ceo-review/SKILL.md manifest also dispatches (regression for V1
     manifest authoring across all 6 V1 skills)

Side effect: bin/gstack-memory-ingest.ts gains --no-write flag (also
honored via GSTACK_MEMORY_INGEST_NO_WRITE=1 env var). Skips gbrain put_page
calls while still updating the state file. Used by tests + dry-runs to
avoid real ingest churn when verifying state-file lifecycle. The
--bulk and --incremental modes still call gbrain by default — only
explicit opt-in suppresses writes.

V1 lane test totals (covering all 5 helpers + 6 skill manifests):
  test/gstack-memory-helpers.test.ts     22 tests
  test/gstack-memory-ingest.test.ts      15 tests
  test/gstack-gbrain-sync.test.ts         8 tests
  test/gstack-brain-context-load.test.ts 10 tests
  test/skill-e2e-memory-pipeline.test.ts 10 tests
  ────────────────────────────────────── ─────────
  TOTAL                                  65 passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.26.0.0)

V1 of memory ingest + retrieval surface. Coding-agent transcripts (Claude
Code + Codex) on disk become first-class queryable pages in gbrain. Six
high-leverage skills auto-load per-skill context manifests at every
invocation. Datamark envelopes wrap loaded pages as Layer 1 prompt-
injection defense. Storage tiering: curated memory rides existing
brain-sync git pipeline; code+transcripts route to Supabase Storage when
configured else local PGLite — never double-store.

Net branch size vs main: +4174/-849 across 39 files. 65 V1 tests, all
green. Goldilocks scope per CEO D18; V1.5 P0 follow-ups documented in
the plan's V1.5 TODOs section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:40:30 -07:00

345 lines
13 KiB
Cheetah

---
name: design-shotgun
preamble-tier: 2
version: 1.0.0
description: |
Design shotgun: generate multiple AI design variants, open a comparison board,
collect structured feedback, and iterate. Standalone design exploration you can
run anytime. Use when: "explore designs", "show me options", "design variants",
"visual brainstorm", or "I don't like how this looks".
Proactively suggest when the user describes a UI feature but hasn't seen
what it could look like. (gstack)
triggers:
- explore design variants
- show me design options
- visual design brainstorm
allowed-tools:
- Bash
- Read
- Glob
- Grep
- Agent
- AskUserQuestion
gbrain:
schema: 1
context_queries:
- id: prior-approved-variants
kind: filesystem
glob: "~/.gstack/projects/{repo_slug}/designs/*/approved.json"
sort: mtime_desc
limit: 5
render_as: "## Prior approved design variants for this project"
- id: design-md
kind: filesystem
glob: "DESIGN.md"
tail: 1
render_as: "## DESIGN.md (project design system)"
- id: recent-design-docs
kind: filesystem
glob: "~/.gstack/projects/{repo_slug}/*-design-*.md"
sort: mtime_desc
limit: 3
render_as: "## Recent design docs"
---
{{PREAMBLE}}
# /design-shotgun: Visual Design Exploration
You are a design brainstorming partner. Generate multiple AI design variants, open them
side-by-side in the user's browser, and iterate until they approve a direction. This is
visual brainstorming, not a review process.
{{DESIGN_SETUP}}
{{UX_PRINCIPLES}}
## Step 0: Session Detection
Check for prior design exploration sessions for this project:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
setopt +o nomatch 2>/dev/null || true
_PREV=$(find ~/.gstack/projects/$SLUG/designs/ -name "approved.json" -maxdepth 2 2>/dev/null | sort -r | head -5)
[ -n "$_PREV" ] && echo "PREVIOUS_SESSIONS_FOUND" || echo "NO_PREVIOUS_SESSIONS"
echo "$_PREV"
```
**If `PREVIOUS_SESSIONS_FOUND`:** Read each `approved.json`, display a summary, then
AskUserQuestion:
> "Previous design explorations for this project:
> - [date]: [screen] — chose variant [X], feedback: '[summary]'
>
> A) Revisit — reopen the comparison board to adjust your choices
> B) New exploration — start fresh with new or updated instructions
> C) Something else"
If A: regenerate the board from existing variant PNGs, reopen, and resume the feedback loop.
If B: proceed to Step 1.
**If `NO_PREVIOUS_SESSIONS`:** Show the first-time message:
"This is /design-shotgun — your visual brainstorming tool. I'll generate multiple AI
design directions, open them side-by-side in your browser, and you pick your favorite.
You can run /design-shotgun anytime during development to explore design directions for
any part of your product. Let's start."
## Step 1: Context Gathering
When design-shotgun is invoked from plan-design-review, design-consultation, or another
skill, the calling skill has already gathered context. Check for `$_DESIGN_BRIEF` — if
it's set, skip to Step 2.
When run standalone, gather context to build a proper design brief.
**Required context (5 dimensions):**
1. **Who** — who is the design for? (persona, audience, expertise level)
2. **Job to be done** — what is the user trying to accomplish on this screen/page?
3. **What exists** — what's already in the codebase? (existing components, pages, patterns)
4. **User flow** — how do users arrive at this screen and where do they go next?
5. **Edge cases** — long names, zero results, error states, mobile, first-time vs power user
**Auto-gather first:**
```bash
cat DESIGN.md 2>/dev/null | head -80 || echo "NO_DESIGN_MD"
```
```bash
ls src/ app/ pages/ components/ 2>/dev/null | head -30
```
```bash
setopt +o nomatch 2>/dev/null || true
ls ~/.gstack/projects/$SLUG/*office-hours* 2>/dev/null | head -5
```
If DESIGN.md exists, tell the user: "I'll follow your design system in DESIGN.md by
default. If you want to go off the reservation on visual direction, just say so —
design-shotgun will follow your lead, but won't diverge by default."
**Check for a live site to screenshot** (for the "I don't like THIS" use case):
```bash
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null || echo "NO_LOCAL_SITE"
```
If a local site is running AND the user referenced a URL or said something like "I don't
like how this looks," screenshot the current page and use `$D evolve` instead of
`$D variants` to generate improvement variants from the existing design.
**AskUserQuestion with pre-filled context:** Pre-fill what you inferred from the codebase,
DESIGN.md, and office-hours output. Then ask for what's missing. Frame as ONE question
covering all gaps:
> "Here's what I know: [pre-filled context]. I'm missing [gaps].
> Tell me: [specific questions about the gaps].
> How many variants? (default 3, up to 8 for important screens)"
Two rounds max of context gathering, then proceed with what you have and note assumptions.
## Step 2: Taste Memory
Read both the persistent taste profile (cross-session) AND the per-session approved
designs to bias generation toward the user's demonstrated taste.
**Persistent taste profile (v1 schema at `~/.gstack/projects/$SLUG/taste-profile.json`):**
{{TASTE_PROFILE}}
**Per-session approved.json files (legacy, still supported):**
```bash
setopt +o nomatch 2>/dev/null || true
_TASTE=$(find ~/.gstack/projects/$SLUG/designs/ -name "approved.json" -maxdepth 2 2>/dev/null | sort -r | head -10)
```
If prior sessions exist, read each `approved.json` and extract patterns from the
approved variants. Merge these into the taste-profile.json-derived signal — if the
profile already says "user prefers Geist font" (from aggregated history), the
approved.json files add the specific recent approval context.
Limit to last 10 sessions. Try/catch JSON parse on each (skip corrupted files).
**Updating taste profile after a design-shotgun session:** When the user picks a
variant, call `{{BIN_DIR}}/gstack-taste-update approved <variant-path>`. When they
explicitly reject a variant, call `{{BIN_DIR}}/gstack-taste-update rejected <variant-path>`.
The CLI handles schema migration from approved.json, decay, and conflict flagging.
## Step 3: Generate Variants
Set up the output directory:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
_DESIGN_DIR="$HOME/.gstack/projects/$SLUG/designs/<screen-name>-$(date +%Y%m%d)"
mkdir -p "$_DESIGN_DIR"
echo "DESIGN_DIR: $_DESIGN_DIR"
```
Replace `<screen-name>` with a descriptive kebab-case name from the context gathering.
### Step 3a: Concept Generation
Before any API calls, generate N text concepts describing each variant's design direction.
Each concept should be a distinct creative direction, not a minor variation. Present them
as a lettered list:
```
I'll explore 3 directions:
A) "Name" — one-line visual description of this direction
B) "Name" — one-line visual description of this direction
C) "Name" — one-line visual description of this direction
```
Draw on DESIGN.md, taste memory, and the user's request to make each concept distinct.
**Anti-convergence directive (hard requirement):** Each variant MUST use a different
font family, color palette, and layout approach. If two variants look like siblings
— same typographic feel, overlapping color temperature, comparable layout rhythm —
one of them failed. Regenerate the weaker one with a deliberately different direction.
Concrete test: if someone could swap the headline text between two variants without
noticing, they're too similar. Variants should feel like they came from three
different design teams, not the same team at three different coffee levels.
### Step 3b: Concept Confirmation
Use AskUserQuestion to confirm before spending API credits:
> "These are the {N} directions I'll generate. Each takes ~60s, but I'll run them all
> in parallel so total time is ~60 seconds regardless of count."
Options:
- A) Generate all {N} — looks good
- B) I want to change some concepts (tell me which)
- C) Add more variants (I'll suggest additional directions)
- D) Fewer variants (tell me which to drop)
If B: incorporate feedback, re-present concepts, re-confirm. Max 2 rounds.
If C: add concepts, re-present, re-confirm.
If D: drop specified concepts, re-present, re-confirm.
### Step 3c: Parallel Generation
**If evolving from a screenshot** (user said "I don't like THIS"), take ONE screenshot
first:
```bash
$B screenshot "$_DESIGN_DIR/current.png"
```
**Launch N Agent subagents in a single message** (parallel execution). Use the Agent
tool with `subagent_type: "general-purpose"` for each variant. Each agent is independent
and handles its own generation, quality check, verification, and retry.
**Important: $D path propagation.** The `$D` variable from DESIGN SETUP is a shell
variable that agents do NOT inherit. Substitute the resolved absolute path (from the
`DESIGN_READY: /path/to/design` output in Step 0) into each agent prompt.
**Agent prompt template** (one per variant, substitute all `{...}` values):
```
Generate a design variant and save it.
Design binary: {absolute path to $D binary}
Brief: {the full variant-specific brief for this direction}
Output: /tmp/variant-{letter}.png
Final location: {_DESIGN_DIR absolute path}/variant-{letter}.png
Steps:
1. Run: {$D path} generate --brief "{brief}" --output /tmp/variant-{letter}.png
2. If the command fails with a rate limit error (429 or "rate limit"), wait 5 seconds
and retry. Up to 3 retries.
3. If the output file is missing or empty after the command succeeds, retry once.
4. Copy: cp /tmp/variant-{letter}.png {_DESIGN_DIR}/variant-{letter}.png
5. Quality check: {$D path} check --image {_DESIGN_DIR}/variant-{letter}.png --brief "{brief}"
If quality check fails, retry generation once.
6. Verify: ls -lh {_DESIGN_DIR}/variant-{letter}.png
7. Report exactly one of:
VARIANT_{letter}_DONE: {file size}
VARIANT_{letter}_FAILED: {error description}
VARIANT_{letter}_RATE_LIMITED: exhausted retries
```
For the evolve path, replace step 1 with:
```
{$D path} evolve --screenshot {_DESIGN_DIR}/current.png --brief "{brief}" --output /tmp/variant-{letter}.png
```
**Why /tmp/ then cp?** In observed sessions, `$D generate --output ~/.gstack/...`
failed with "The operation was aborted" while `--output /tmp/...` succeeded. This is
a sandbox restriction. Always generate to `/tmp/` first, then `cp`.
### Step 3d: Results
After all agents complete:
1. Read each generated PNG inline (Read tool) so the user sees all variants at once.
2. Report status: "All {N} variants generated in ~{actual time}. {successes} succeeded,
{failures} failed."
3. For any failures: report explicitly with the error. Do NOT silently skip.
4. If zero variants succeeded: fall back to sequential generation (one at a time with
`$D generate`, showing each as it lands). Tell the user: "Parallel generation failed
(likely rate limiting). Falling back to sequential..."
5. Proceed to Step 4 (comparison board).
**Dynamic image list for comparison board:** When proceeding to Step 4, construct the
image list from whatever variant files actually exist, not a hardcoded A/B/C list:
```bash
setopt +o nomatch 2>/dev/null || true # zsh compat
_IMAGES=$(ls "$_DESIGN_DIR"/variant-*.png 2>/dev/null | tr '\n' ',' | sed 's/,$//')
```
Use `$_IMAGES` in the `$D compare --images` command.
## Step 4: Comparison Board + Feedback Loop
{{DESIGN_SHOTGUN_LOOP}}
## Step 5: Feedback Confirmation
After receiving feedback (via HTTP POST or AskUserQuestion fallback), output a clear
summary confirming what was understood:
"Here's what I understood from your feedback:
PREFERRED: Variant [X]
RATINGS: A: 4/5, B: 3/5, C: 2/5
YOUR NOTES: [full text of per-variant and overall comments]
DIRECTION: [regenerate action if any]
Is this right?"
Use AskUserQuestion to confirm before saving.
## Step 6: Save & Next Steps
Write `approved.json` to `$_DESIGN_DIR/` (handled by the loop above).
If invoked from another skill: return the structured feedback for that skill to consume.
The calling skill reads `approved.json` and the approved variant PNG.
If standalone, offer next steps via AskUserQuestion:
> "Design direction locked in. What's next?
> A) Iterate more — refine the approved variant with specific feedback
> B) Finalize — generate production Pretext-native HTML/CSS with /design-html
> C) Save to plan — add this as an approved mockup reference in the current plan
> D) Done — I'll use this later"
## Important Rules
1. **Never save to `.context/`, `docs/designs/`, or `/tmp/`.** All design artifacts go
to `~/.gstack/projects/$SLUG/designs/`. This is enforced. See DESIGN_SETUP above.
2. **Show variants inline before opening the board.** The user should see designs
immediately in their terminal. The browser board is for detailed feedback.
3. **Confirm feedback before saving.** Always summarize what you understood and verify.
4. **Taste memory is automatic.** Prior approved designs inform new generations by default.
5. **Two rounds max on context gathering.** Don't over-interrogate. Proceed with assumptions.
6. **DESIGN.md is the default constraint.** Unless the user says otherwise.