diff --git a/setup-gbrain/SKILL.md b/setup-gbrain/SKILL.md index f987ffe5..e0dcc91b 100644 --- a/setup-gbrain/SKILL.md +++ b/setup-gbrain/SKILL.md @@ -1047,6 +1047,75 @@ the prereq is fixed. --- +## Step 7.5: Transcript & memory ingest gate + +After memory sync is wired (Step 7) but before persisting the CLAUDE.md +config (Step 8), offer to bring this Mac's coding-agent transcripts + +curated `~/.gstack/` artifacts into gbrain so the retrieval surface +(per-skill manifests, salience block) has data to surface. + +Run the probe to size the operation: +```bash +~/.claude/skills/gstack/bin/gstack-memory-ingest --probe +``` + +Read the output. If `Total files in window: 0`, skip — there's nothing +to ingest. Set `gstack-config set transcript_ingest_mode incremental` +silently and continue to Step 8. + +If `New (never ingested)` is < 200 AND total bytes are < 100MB: silent +bulk via `gstack-memory-ingest --bulk --quiet`. Set +`transcript_ingest_mode=incremental` and continue. + +Otherwise (the "many transcripts on disk" path): AskUserQuestion with +the exact counts AND the value promise. Default scope is **current repo +only, last 90 days**: + +> "Found transcripts in THIS repo () over the last +> 90 days, plus across other repos on this machine ( +> total if all ingested). Ingest THIS repo's transcripts into gbrain? +> +> What you get after this: every gstack skill auto-loads recent salience +> from your past sessions in this repo, so the agent finds your prior +> work without you describing it. You can query 'what was I doing on +> day X' and get a real answer. Per-session pages are searchable, +> taggable, and deletable. Secret scanning runs before any push. +> +> What stays the same: nothing leaves your machine unless gbrain sync +> is enabled (Step 7). Per-repo trust policies still apply. +> +> Multi-Mac note: if you HAVE enabled brain sync (Step 7), these +> transcript pages will sync across your Macs. Caveat: deleting a +> transcript page later removes it from gbrain but git history retains +> it in prior commits. Use `gstack-transcript-prune` to delete in bulk; +> use `git filter-repo` on the brain remote for hard-delete from +> history." + +Options: +- A) Yes — this repo, last 90 days (recommended; ~est min) +- B) Yes — this repo, ALL history +- C) Yes — this repo + other repos on this machine +- D) Skip historical, track new from now (`transcript_ingest_mode=incremental`) +- E) Never ingest transcripts (`transcript_ingest_mode=off`) + +After answer: +```bash +~/.claude/skills/gstack/bin/gstack-config set transcript_ingest_mode +~/.claude/skills/gstack/bin/gstack-gbrain-sync --full --no-brain-sync +``` +(`--no-brain-sync` because Step 7 already wired that path; this just +runs the code import + memory ingest stages. Brain-sync will run on the +next preamble hook.) + +If A/D/E, ingest is incremental from this point on; preamble-boundary +hook runs `gstack-gbrain-sync --incremental --quiet` on every skill +start (cheap mtime fast-path). + +Reference doc for users: `setup-gbrain/memory.md` (linked from CLAUDE.md +Step 8). + +--- + ## Step 8: Persist `## GBrain Configuration` in CLAUDE.md Find-and-replace (or append) this section in CLAUDE.md: @@ -1076,6 +1145,48 @@ and STOP with a NEEDS_CONTEXT escalation. --- +## Step 10: GREEN/YELLOW/RED verdict block (idempotent doctor output) + +After Steps 1-9 complete, summarize. Re-running `/setup-gbrain` on a +configured Mac is a first-class doctor path: every step detects existing +state, repairs only what's missing, and reports here. + +```bash +~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null || true +~/.claude/skills/gstack/bin/gstack-config get transcript_ingest_mode 2>/dev/null || echo "off" +~/.claude/skills/gstack/bin/gstack-config get gbrain_sync_mode 2>/dev/null || echo "off" +[ -f ~/.gstack/.gbrain-sync-state.json ] && cat ~/.gstack/.gbrain-sync-state.json || echo "{}" +``` + +Print the verdict block. Each row is `[OK]/[FIX]/[WARN]/[ERR]` — see +template below; substitute your detect outputs: + +``` +gbrain status: GREEN + + CLI ............. OK + Engine .......... OK at + doctor .......... OK + MCP ............. OK registered (user scope) + Repo policy ..... OK + Code import ..... OK + Memory sync ..... OK to + Transcripts ..... OK sessions, last ingest + CLAUDE.md ....... OK + Smoke test ...... OK put → search → delete round-trip + +Run `/setup-gbrain` again any time gbrain feels off; it's safe and idempotent. +``` + +If any row is YELLOW or RED, the verdict line says so and the failing rows +surface a one-line "next action" (e.g., +`Engine .......... ERR PGLite corrupt — run \`gbrain restore-from-sync\` (V1.5)`). +For V1, restore-from-sync is a V1.5 P0 cross-repo TODO; until it ships, +the user's brain remote (with brain-sync enabled) holds curated artifacts +as markdown + git, recoverable manually via `gbrain import` from a clone. + +--- + ## `/setup-gbrain --cleanup-orphans` (D20) Re-collect a PAT (Step 4 path-2a scope disclosure), then: diff --git a/setup-gbrain/SKILL.md.tmpl b/setup-gbrain/SKILL.md.tmpl index 3bbf9b12..3b1ff2d7 100644 --- a/setup-gbrain/SKILL.md.tmpl +++ b/setup-gbrain/SKILL.md.tmpl @@ -398,6 +398,75 @@ the prereq is fixed. --- +## Step 7.5: Transcript & memory ingest gate + +After memory sync is wired (Step 7) but before persisting the CLAUDE.md +config (Step 8), offer to bring this Mac's coding-agent transcripts + +curated `~/.gstack/` artifacts into gbrain so the retrieval surface +(per-skill manifests, salience block) has data to surface. + +Run the probe to size the operation: +```bash +~/.claude/skills/gstack/bin/gstack-memory-ingest --probe +``` + +Read the output. If `Total files in window: 0`, skip — there's nothing +to ingest. Set `gstack-config set transcript_ingest_mode incremental` +silently and continue to Step 8. + +If `New (never ingested)` is < 200 AND total bytes are < 100MB: silent +bulk via `gstack-memory-ingest --bulk --quiet`. Set +`transcript_ingest_mode=incremental` and continue. + +Otherwise (the "many transcripts on disk" path): AskUserQuestion with +the exact counts AND the value promise. Default scope is **current repo +only, last 90 days**: + +> "Found transcripts in THIS repo () over the last +> 90 days, plus across other repos on this machine ( +> total if all ingested). Ingest THIS repo's transcripts into gbrain? +> +> What you get after this: every gstack skill auto-loads recent salience +> from your past sessions in this repo, so the agent finds your prior +> work without you describing it. You can query 'what was I doing on +> day X' and get a real answer. Per-session pages are searchable, +> taggable, and deletable. Secret scanning runs before any push. +> +> What stays the same: nothing leaves your machine unless gbrain sync +> is enabled (Step 7). Per-repo trust policies still apply. +> +> Multi-Mac note: if you HAVE enabled brain sync (Step 7), these +> transcript pages will sync across your Macs. Caveat: deleting a +> transcript page later removes it from gbrain but git history retains +> it in prior commits. Use `gstack-transcript-prune` to delete in bulk; +> use `git filter-repo` on the brain remote for hard-delete from +> history." + +Options: +- A) Yes — this repo, last 90 days (recommended; ~est min) +- B) Yes — this repo, ALL history +- C) Yes — this repo + other repos on this machine +- D) Skip historical, track new from now (`transcript_ingest_mode=incremental`) +- E) Never ingest transcripts (`transcript_ingest_mode=off`) + +After answer: +```bash +~/.claude/skills/gstack/bin/gstack-config set transcript_ingest_mode +~/.claude/skills/gstack/bin/gstack-gbrain-sync --full --no-brain-sync +``` +(`--no-brain-sync` because Step 7 already wired that path; this just +runs the code import + memory ingest stages. Brain-sync will run on the +next preamble hook.) + +If A/D/E, ingest is incremental from this point on; preamble-boundary +hook runs `gstack-gbrain-sync --incremental --quiet` on every skill +start (cheap mtime fast-path). + +Reference doc for users: `setup-gbrain/memory.md` (linked from CLAUDE.md +Step 8). + +--- + ## Step 8: Persist `## GBrain Configuration` in CLAUDE.md Find-and-replace (or append) this section in CLAUDE.md: @@ -427,6 +496,48 @@ and STOP with a NEEDS_CONTEXT escalation. --- +## Step 10: GREEN/YELLOW/RED verdict block (idempotent doctor output) + +After Steps 1-9 complete, summarize. Re-running `/setup-gbrain` on a +configured Mac is a first-class doctor path: every step detects existing +state, repairs only what's missing, and reports here. + +```bash +~/.claude/skills/gstack/bin/gstack-gbrain-detect 2>/dev/null || true +~/.claude/skills/gstack/bin/gstack-config get transcript_ingest_mode 2>/dev/null || echo "off" +~/.claude/skills/gstack/bin/gstack-config get gbrain_sync_mode 2>/dev/null || echo "off" +[ -f ~/.gstack/.gbrain-sync-state.json ] && cat ~/.gstack/.gbrain-sync-state.json || echo "{}" +``` + +Print the verdict block. Each row is `[OK]/[FIX]/[WARN]/[ERR]` — see +template below; substitute your detect outputs: + +``` +gbrain status: GREEN + + CLI ............. OK + Engine .......... OK at + doctor .......... OK + MCP ............. OK registered (user scope) + Repo policy ..... OK + Code import ..... OK + Memory sync ..... OK to + Transcripts ..... OK sessions, last ingest + CLAUDE.md ....... OK + Smoke test ...... OK put → search → delete round-trip + +Run `/setup-gbrain` again any time gbrain feels off; it's safe and idempotent. +``` + +If any row is YELLOW or RED, the verdict line says so and the failing rows +surface a one-line "next action" (e.g., +`Engine .......... ERR PGLite corrupt — run \`gbrain restore-from-sync\` (V1.5)`). +For V1, restore-from-sync is a V1.5 P0 cross-repo TODO; until it ships, +the user's brain remote (with brain-sync enabled) holds curated artifacts +as markdown + git, recoverable manually via `gbrain import` from a clone. + +--- + ## `/setup-gbrain --cleanup-orphans` (D20) Re-collect a PAT (Step 4 path-2a scope disclosure), then: diff --git a/setup-gbrain/memory.md b/setup-gbrain/memory.md new file mode 100644 index 00000000..40f38922 --- /dev/null +++ b/setup-gbrain/memory.md @@ -0,0 +1,178 @@ +# gstack memory ingest — what it does, what stays local, what you can do with it + +This is the user-facing reference for the V1 transcript + memory ingest +feature in `/setup-gbrain`. If you ran `/setup-gbrain` and it asked +"Ingest THIS repo's transcripts into gbrain?", this doc explains what +happens after you say yes. + +## What gets ingested + +| Source | Type | Where | Sensitivity | +|---|---|---|---| +| Claude Code session JSONL | `transcript` | `~/.claude/projects/*/` | High — full conversations including tool I/O | +| Codex CLI session JSONL | `transcript` | `~/.codex/sessions/YYYY/MM/DD/` | High | +| Cursor session SQLite (V1.0.1) | `transcript` | `~/Library/Application Support/Cursor/` | Same — deferred V1.0.1 | +| Eureka log | `eureka` | `~/.gstack/analytics/eureka.jsonl` | Medium — your insights, often non-secret | +| Project learnings | `learning` | `~/.gstack/projects//learnings.jsonl` | Medium | +| Project timeline | `timeline` | `~/.gstack/projects//timeline.jsonl` | Low | +| CEO plans | `ceo-plan` | `~/.gstack/projects//ceo-plans/*.md` | Medium | +| Design docs | `design-doc` | `~/.gstack/projects//*-design-*.md` | Medium | +| Retros | `retro` | `~/.gstack/projects//retros/*.md` | Medium | +| Builder profile | `builder-profile-entry` | `~/.gstack/builder-profile.jsonl` | Low | + +## What stays local + +- **State files** (`~/.gstack/.gbrain-sync-state.json`, + `~/.gstack/.transcript-ingest-state.json`, + `~/.gstack/.gbrain-engine-cache.json`, + `~/.gstack/.gbrain-errors.jsonl`) are local-only per ED1 (state file + sync semantics decision). They are not synced via the brain remote. + +- **Sessions with no resolvable git remote** (running in `/tmp/`, scratch + dirs, etc.) are skipped by default. Pass `--include-unattributed` to + the ingest helper to opt them in. + +- **Repos under a `deny` trust policy** (set in `/setup-gbrain` Step 6) + are skipped — neither code nor transcripts from those repos ingest. + +## What gets scanned for secrets + +Every ingested page passes through **gitleaks** before write +(per D19 — replaces the regex scanner that previously ran only on +staged git diffs). Gitleaks is industry-standard, covers: + +- AWS / GCP / Azure access keys +- ANTHROPIC_API_KEY, OPENAI_API_KEY, GitHub tokens +- Stripe keys, Slack tokens, JWT secrets +- Generic high-entropy strings (configurable threshold) + +A session with a positive finding is **skipped entirely** — not partially +redacted. The match line + rule ID are logged to stderr; you can see what +was skipped via `bun run bin/gstack-memory-ingest.ts --probe` (which +shows new vs. updated counts) or by reviewing the helper's output during +`/gbrain-sync --full`. + +If gitleaks is not installed (run `brew install gitleaks` on macOS, or +`apt install gitleaks` on Linux), the helper warns once and disables +secret scanning. **In that mode, transcripts ingest unscanned. Don't run +ingest without gitleaks if you have any concern about secrets in your +sessions.** + +## Where it goes + +Storage tier depends on your gbrain engine (set during `/setup-gbrain`): + +- **Supabase configured:** code + transcripts go to Supabase Storage + (multi-Mac native). Curated memory (eureka/learnings/etc.) goes to the + brain-linked git repo via `gstack-brain-sync`. +- **Local PGLite only:** everything stays on this Mac. Curated memory + syncs via git if you've enabled brain-sync. + +The "never double-store" rule per the plan: code and transcripts NEVER +go in the gbrain-linked git repo. They're too big and they're +replaceable from disk on each Mac. + +## What you can do with it + +- **Query in natural language:** + ```bash + gbrain query "what was I doing on the auth migration" + gbrain search "session_id:abc123" + ``` + +- **Browse by type:** + ```bash + gbrain list_pages --type transcript --limit 10 + gbrain list_pages --type ceo-plan + ``` + +- **Read a specific page:** + ```bash + gbrain get_page transcripts/claude-code/garrytan-gstack/2026-05-01-abc123 + ``` + +- **Delete a page:** + ```bash + gbrain delete_page + ``` + Caveat: with brain-sync enabled, the page is removed from gbrain's + index but git history retains it. For hard-delete, run `git filter-repo` + on the brain remote. + +- **Bulk-delete by criteria** (V1.0.1 follow-up — `gstack-transcript-prune` + helper). For V1.0, use `gbrain delete_page ` per-page or write + a small loop over `gbrain list_pages` output. + +- **Disable entirely:** + ```bash + gstack-config set transcript_ingest_mode off + gstack-config set gbrain_context_load off # also disables retrieval + ``` + +## How the agent uses it + +At every gstack skill start, the preamble runs +`gstack-brain-context-load` which: + +1. Reads the active skill's `gbrain.context_queries:` frontmatter +2. Dispatches each query to gbrain (vector / list / filesystem) +3. Renders results into `## ` sections wrapped in + `` envelopes +4. The model sees this as part of the preamble before making any decisions + +For example, when you run `/office-hours`, the model context +automatically includes: + +- `## Prior office-hours sessions in this repo` (last 5) +- `## Your builder profile snapshot` (latest entry) +- `## Recent design docs for this project` (last 3) +- `## Recent eureka moments` (last 5) + +So the "Welcome back, last time you were on X" beat is sourced from +your actual data, not cold-start. + +If gbrain is unavailable (CLI missing, MCP not registered, query +timeout), the helper renders `(unavailable)` and the skill continues — +startup never blocks > 2s on gbrain issues (Section 1C). + +## What to do when something feels off + +Run `/setup-gbrain` again. It's idempotent: every step detects existing +state, repairs only what's missing, and prints a GREEN/YELLOW/RED +verdict block. If a row is RED, the row tells you what to do. + +Common cases: + +- **Salience block is empty** — your transcripts may not be ingested + yet. Run `gstack-gbrain-sync --full` to do a full pass. + +- **"gbrain CLI missing" in the preamble output** — gbrain isn't on + your PATH. Run `/setup-gbrain` to install/wire it. + +- **PGLite engine corrupt (V1.5)** — V1.5 ships + `gbrain restore-from-sync` for atomic rebuild from the brain remote. + For V1.0, manual recovery: `cd ~/.gbrain && rm -rf db && gbrain init + --pglite && gbrain import `. + +- **A page has stale or wrong content** — `gbrain delete_page `, + then re-run `gstack-gbrain-sync --incremental` to re-ingest from + source if the source file is still on disk and unchanged. + +## Privacy + audit + +- Every `secretScanFile` finding is logged to stderr at ingest time. +- Every gbrain put/delete is logged to `~/.gstack/.gbrain-errors.jsonl` + with `{ts, op, duration_ms, outcome}` for forensic tracing. +- `~/.gstack/.gbrain-engine-cache.json` shows which storage tier is + active (PGLite vs Supabase). +- Brain-sync git history shows every curated artifact push with the + user's git identity. + +If you find a transcript page that contains a secret gitleaks missed, +the recovery path is: +1. `gbrain delete_page ` — removes from index immediately +2. Rotate the secret (rotate it anyway as a defensive measure) +3. If brain-sync is on: `git filter-repo --invert-paths --path ` + on the brain remote for hard-delete from history +4. File a gitleaks issue with the pattern (or extend the gitleaks config + at `~/.gitleaks.toml`).