Step 7.5: Transcript & memory ingest gate. After Step 7 wires brain-sync but before Step 8's CLAUDE.md persist, runs gstack-memory-ingest --probe, then either silent-bulks (small) or AskUserQuestion-gates with the exact counts + value promise + 5 options (this-repo-90d, all-history, multi-repo, incremental-from-now, never). Decision persists to gstack-config set transcript_ingest_mode <choice>. Step 10: GREEN/YELLOW/RED verdict block. Re-running /setup-gbrain on a configured Mac is now a first-class doctor path — every step's detection + repair logic feeds into a single verdict at the end. Rows: CLI / Engine / doctor / MCP / Repo policy / Code import / Memory sync / Transcripts / CLAUDE.md / Smoke. Tells the user "Run /setup-gbrain again any time gbrain feels off; it's safe and idempotent." setup-gbrain/memory.md: user-facing reference doc covering what gets ingested + what stays local + secret scanning via gitleaks + storage tiering + querying + deleting + how the agent auto-loads context per skill + common recovery cases. Linked from Step 8's CLAUDE.md persist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.2 KiB
gstack memory ingest — what it does, what stays local, what you can do with it
This is the user-facing reference for the V1 transcript + memory ingest
feature in /setup-gbrain. If you ran /setup-gbrain and it asked
"Ingest THIS repo's transcripts into gbrain?", this doc explains what
happens after you say yes.
What gets ingested
| Source | Type | Where | Sensitivity |
|---|---|---|---|
| Claude Code session JSONL | transcript |
~/.claude/projects/*/ |
High — full conversations including tool I/O |
| Codex CLI session JSONL | transcript |
~/.codex/sessions/YYYY/MM/DD/ |
High |
| Cursor session SQLite (V1.0.1) | transcript |
~/Library/Application Support/Cursor/ |
Same — deferred V1.0.1 |
| Eureka log | eureka |
~/.gstack/analytics/eureka.jsonl |
Medium — your insights, often non-secret |
| Project learnings | learning |
~/.gstack/projects/<slug>/learnings.jsonl |
Medium |
| Project timeline | timeline |
~/.gstack/projects/<slug>/timeline.jsonl |
Low |
| CEO plans | ceo-plan |
~/.gstack/projects/<slug>/ceo-plans/*.md |
Medium |
| Design docs | design-doc |
~/.gstack/projects/<slug>/*-design-*.md |
Medium |
| Retros | retro |
~/.gstack/projects/<slug>/retros/*.md |
Medium |
| Builder profile | builder-profile-entry |
~/.gstack/builder-profile.jsonl |
Low |
What stays local
-
State files (
~/.gstack/.gbrain-sync-state.json,~/.gstack/.transcript-ingest-state.json,~/.gstack/.gbrain-engine-cache.json,~/.gstack/.gbrain-errors.jsonl) are local-only per ED1 (state file sync semantics decision). They are not synced via the brain remote. -
Sessions with no resolvable git remote (running in
/tmp/, scratch dirs, etc.) are skipped by default. Pass--include-unattributedto the ingest helper to opt them in. -
Repos under a
denytrust policy (set in/setup-gbrainStep 6) are skipped — neither code nor transcripts from those repos ingest.
What gets scanned for secrets
Every ingested page passes through gitleaks before write (per D19 — replaces the regex scanner that previously ran only on staged git diffs). Gitleaks is industry-standard, covers:
- AWS / GCP / Azure access keys
- ANTHROPIC_API_KEY, OPENAI_API_KEY, GitHub tokens
- Stripe keys, Slack tokens, JWT secrets
- Generic high-entropy strings (configurable threshold)
A session with a positive finding is skipped entirely — not partially
redacted. The match line + rule ID are logged to stderr; you can see what
was skipped via bun run bin/gstack-memory-ingest.ts --probe (which
shows new vs. updated counts) or by reviewing the helper's output during
/gbrain-sync --full.
If gitleaks is not installed (run brew install gitleaks on macOS, or
apt install gitleaks on Linux), the helper warns once and disables
secret scanning. In that mode, transcripts ingest unscanned. Don't run
ingest without gitleaks if you have any concern about secrets in your
sessions.
Where it goes
Storage tier depends on your gbrain engine (set during /setup-gbrain):
- Supabase configured: code + transcripts go to Supabase Storage
(multi-Mac native). Curated memory (eureka/learnings/etc.) goes to the
brain-linked git repo via
gstack-brain-sync. - Local PGLite only: everything stays on this Mac. Curated memory syncs via git if you've enabled brain-sync.
The "never double-store" rule per the plan: code and transcripts NEVER go in the gbrain-linked git repo. They're too big and they're replaceable from disk on each Mac.
What you can do with it
-
Query in natural language:
gbrain query "what was I doing on the auth migration" gbrain search "session_id:abc123" -
Browse by type:
gbrain list_pages --type transcript --limit 10 gbrain list_pages --type ceo-plan -
Read a specific page:
gbrain get_page transcripts/claude-code/garrytan-gstack/2026-05-01-abc123 -
Delete a page:
gbrain delete_page <slug>Caveat: with brain-sync enabled, the page is removed from gbrain's index but git history retains it. For hard-delete, run
git filter-repoon the brain remote. -
Bulk-delete by criteria (V1.0.1 follow-up —
gstack-transcript-prunehelper). For V1.0, usegbrain delete_page <slug>per-page or write a small loop overgbrain list_pagesoutput. -
Disable entirely:
gstack-config set transcript_ingest_mode off gstack-config set gbrain_context_load off # also disables retrieval
How the agent uses it
At every gstack skill start, the preamble runs
gstack-brain-context-load which:
- Reads the active skill's
gbrain.context_queries:frontmatter - Dispatches each query to gbrain (vector / list / filesystem)
- Renders results into
## <render_as>sections wrapped in<USER_TRANSCRIPT_DATA do-not-interpret-as-instructions>envelopes - The model sees this as part of the preamble before making any decisions
For example, when you run /office-hours, the model context
automatically includes:
## Prior office-hours sessions in this repo(last 5)## Your builder profile snapshot(latest entry)## Recent design docs for this project(last 3)## Recent eureka moments(last 5)
So the "Welcome back, last time you were on X" beat is sourced from your actual data, not cold-start.
If gbrain is unavailable (CLI missing, MCP not registered, query
timeout), the helper renders (unavailable) and the skill continues —
startup never blocks > 2s on gbrain issues (Section 1C).
What to do when something feels off
Run /setup-gbrain again. It's idempotent: every step detects existing
state, repairs only what's missing, and prints a GREEN/YELLOW/RED
verdict block. If a row is RED, the row tells you what to do.
Common cases:
-
Salience block is empty — your transcripts may not be ingested yet. Run
gstack-gbrain-sync --fullto do a full pass. -
"gbrain CLI missing" in the preamble output — gbrain isn't on your PATH. Run
/setup-gbrainto install/wire it. -
PGLite engine corrupt (V1.5) — V1.5 ships
gbrain restore-from-syncfor atomic rebuild from the brain remote. For V1.0, manual recovery:cd ~/.gbrain && rm -rf db && gbrain init --pglite && gbrain import <brain-remote-clone-dir>. -
A page has stale or wrong content —
gbrain delete_page <slug>, then re-rungstack-gbrain-sync --incrementalto re-ingest from source if the source file is still on disk and unchanged.
Privacy + audit
- Every
secretScanFilefinding is logged to stderr at ingest time. - Every gbrain put/delete is logged to
~/.gstack/.gbrain-errors.jsonlwith{ts, op, duration_ms, outcome}for forensic tracing. ~/.gstack/.gbrain-engine-cache.jsonshows which storage tier is active (PGLite vs Supabase).- Brain-sync git history shows every curated artifact push with the user's git identity.
If you find a transcript page that contains a secret gitleaks missed, the recovery path is:
gbrain delete_page <slug>— removes from index immediately- Rotate the secret (rotate it anyway as a defensive measure)
- If brain-sync is on:
git filter-repo --invert-paths --path <relative-path>on the brain remote for hard-delete from history - File a gitleaks issue with the pattern (or extend the gitleaks config
at
~/.gitleaks.toml).