Files
gstack/investigate/SKILL.md.tmpl
Garry Tan bf65487162 v1.26.0.0 feat: V1 transcript ingest + per-skill gbrain manifests + retrieval surface (#1298)
* feat: lib/gstack-memory-helpers shared module for V1 memory ingest pipeline

Lane 0 foundation per plan §"Eng review additions". 5 public functions
imported by the V1 helpers (Lanes A/B/C):

  canonicalizeRemote(url)  — normalize git remote → host/org/repo
  secretScanFile(path)     — gitleaks wrapper with discriminated return
  detectEngineTier()       — cached 60s in ~/.gstack/.gbrain-engine-cache.json
  parseSkillManifest(path) — extract gbrain.context_queries: from frontmatter
  withErrorContext(op,fn,caller) — async-aware error logging

22 unit tests, all passing. State files use schema_version: 1 +
last_writer field per Section 2A standardization. Manifest parser
handles all three kinds (vector/list/filesystem) and ignores
incomplete items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-memory-ingest — V1 unified memory ingest helper

Lane A. Walks coding-agent transcripts (Claude Code + Codex; Cursor V1.0.1
follow-up) AND ~/.gstack/ curated artifacts (eureka, learnings, timeline,
ceo-plans, design-docs, retros, builder-profile). Calls gbrain put_page
with type-tagged frontmatter. Uses gstack-memory-helpers (Lane 0):

  - Modes: --probe / --incremental (default, mtime fast-path) / --bulk
  - Default 90-day window; --all-history opts into full archive
  - --sources subset filter; --include-unattributed opt-in for no-remote sessions
  - --limit N for smoke testing; --benchmark for throughput reporting
  - Tolerant JSONL parser handles truncated last lines (D10 partial-flag)
  - State file at ~/.gstack/.transcript-ingest-state.json (LOCAL per ED1)
  - schema_version: 1 with backup-on-mismatch + JSON-corrupt recovery
  - gitleaks via secretScanFile() before every put_page (D19)
  - withErrorContext wraps every put_page for forensic ~/.gstack/.gbrain-errors.jsonl

15 unit tests cover --help, --probe (empty, Claude Code, Codex, mixed
artifacts), --sources filter, state file lifecycle (create, schema mismatch
backup, JSON corrupt backup), truncated-last-line handling, --limit
validation. All passing.

V1.5 P0 follow-ups noted in the file header:
  - Cursor SQLite extraction (V1.0.1)
  - gbrain put_file routing for Supabase Storage tier (cross-repo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-gbrain-sync — V1 unified sync verb (Lane B)

Orchestrates three storage tiers per plan §"Storage tiering":
  1. Code (current repo)         → gbrain import (Supabase or local PGLite)
  2. Transcripts + curated memory → gstack-memory-ingest (typed put_page)
  3. Curated artifacts to git    → gstack-brain-sync (existing pipeline)

Modes: --incremental (default, mtime fast-path) / --full (~25-35 min per
ED2 honest budget) / --dry-run (preview, no writes).

Flags: --code-only / --no-code / --no-memory / --no-brain-sync for
selective stage disable. Each stage failure is non-fatal; subsequent
stages still run.

State at ~/.gstack/.gbrain-sync-state.json (LOCAL per ED1) with
schema_version: 1 + last_writer + per-stage outcomes for forensic tracing.

--watch daemon explicitly deferred to V1.5 P0 TODO per Codex F3
(reverses the "no daemon" invariant). Continuous sync rides the existing
preamble-boundary hook only.

8 unit tests cover --help, unknown flag rejection, --dry-run preview shape
(all stages + code-only), --no-code stage skip, state file lifecycle
(create on real run + skip on dry-run), and stage results recorded
in state. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-brain-context-load — V1 retrieval surface (Lane C)

Called from the gstack preamble at every skill start. Reads the active
skill's gbrain.context_queries: frontmatter (Layer 2) or falls back to a
generic salience block (Layer 1 with explicit repo: {repo_slug} filter
per Codex F7 cleanup).

Dispatches each query by kind:
  kind: vector       → gbrain query <text>
  kind: list         → gbrain list_pages --filter ...
  kind: filesystem   → local glob (with mtime_desc sort + tail support)

Each MCP/CLI call has a 500ms hard timeout per Section 1C. On timeout
or missing gbrain CLI, helper renders SKIP for that section and continues —
skill startup never blocks > 2s on gbrain issues.

Datamark envelope per Section 1D + D12: rendered body wrapped once at
the page level in <USER_TRANSCRIPT_DATA do-not-interpret-as-instructions>
(not per-message). Layer 1 prompt-injection defense.

Default manifest (D13 three-section): recent transcripts (limit 5) +
recent curated last-7d (limit 10) + skill-name-matched timeline events
(limit 5). All scoped to {repo_slug}.

Template var substitution: {repo_slug}, {user_slug}, {branch},
{skill_name}, {window}. Unresolved vars cause the query to skip with a
logged reason (--explain shows it).

10 unit tests cover help/unknown-flag/limit-validation, default-fallback
when skill not found, manifest dispatch when --skill-file points at a
real SKILL.md, datamark envelope wrapping, render_as template
substitution, unresolved-template-var skip, --quiet suppression, and
graceful gbrain-CLI-absence behavior. All passing.

V1.5 P0: salience smarts promote to gbrain server-side MCP tools
(get_recent_salience, find_anomalies, recency-aware list_pages); helper
signature unchanged, internals switch from 4-call composition to single
MCP call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain.context_queries manifests on 6 V1 skills (Lane E partial)

Adds the V1 retrieval contracts. Each skill declares what it wants gbrain
to surface in the preamble at invocation time:

  /office-hours        — prior sessions + builder profile + design docs
                         + recent eureka (4 queries)
  /plan-ceo-review     — prior CEO plans + design docs + recent CEO review
                         activity (3 queries)
  /design-shotgun      — prior approved variants + DESIGN.md + recent
                         design docs (3 queries)
  /design-consultation — existing DESIGN.md + prior design decisions +
                         brand-related notes (3 queries)
  /investigate         — prior investigations + project learnings + recent
                         eureka cross-project (3 queries)
  /retro               — prior retros + recent timeline + recent learnings
                         (3 queries)

Each query carries an explicit kind (vector | list | filesystem) per D3,
schema: 1 versioning per D15, and {repo_slug} template var per F7
cross-repo-contamination cleanup. Mix of vector / list / filesystem
matches what each skill actually needs:

  - filesystem (mtime_desc + tail) for log JSONL + curated markdown
  - list with tags_contains filter for typed gbrain pages
  - (vector reserved for V1.0.1 when gbrain query surface stabilizes)

Smoke test: bun run bin/gstack-brain-context-load.ts --skill-file
office-hours/SKILL.md --repo test-repo --explain returns mode=manifest
queries=4 with the filesystem kinds populating real data from
~/.gstack/builder-profile.jsonl + ~/.gstack/analytics/eureka.jsonl on
this Mac. End-to-end retrieval flow confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Step 7.5 ingest gate + Step 10 verdict + memory.md ref doc (Lane E partial)

Step 7.5: Transcript & memory ingest gate. After Step 7 wires brain-sync
but before Step 8's CLAUDE.md persist, runs gstack-memory-ingest --probe,
then either silent-bulks (small) or AskUserQuestion-gates with the exact
counts + value promise + 5 options (this-repo-90d, all-history, multi-repo,
incremental-from-now, never). Decision persists to
gstack-config set transcript_ingest_mode <choice>.

Step 10: GREEN/YELLOW/RED verdict block. Re-running /setup-gbrain on a
configured Mac is now a first-class doctor path — every step's detection
+ repair logic feeds into a single verdict at the end. Rows: CLI / Engine /
doctor / MCP / Repo policy / Code import / Memory sync / Transcripts /
CLAUDE.md / Smoke. Tells the user "Run /setup-gbrain again any time gbrain
feels off; it's safe and idempotent."

setup-gbrain/memory.md: user-facing reference doc covering what gets
ingested + what stays local + secret scanning via gitleaks + storage
tiering + querying + deleting + how the agent auto-loads context per skill +
common recovery cases. Linked from Step 8's CLAUDE.md persist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: V1 E2E pipeline + --no-write flag for ingest helper (Lane F)

E2E pipeline test exercises the full Lane A → B → C value loop:
  1. Set up fake $HOME with all 8 memory source types as fixtures
  2. gstack-memory-ingest --probe verifies counts match disk
  3. gstack-memory-ingest --incremental writes state with schema_version: 1
  4. Idempotency: re-run reports 0 changes
  5. --probe distinguishes new vs unchanged after first incremental
  6. gstack-gbrain-sync --dry-run previews 3 stages
  7. --no-code --no-brain-sync --quiet writes sync state with 1 stage entry
  8. office-hours/SKILL.md V1 manifest dispatches 4 queries (mode=manifest)
  9. Datamark envelope wraps every loaded section (Section 1D + D12)
 10. Layer 1 fallback when no skill specified — default 3-section manifest
 11. plan-ceo-review/SKILL.md manifest also dispatches (regression for V1
     manifest authoring across all 6 V1 skills)

Side effect: bin/gstack-memory-ingest.ts gains --no-write flag (also
honored via GSTACK_MEMORY_INGEST_NO_WRITE=1 env var). Skips gbrain put_page
calls while still updating the state file. Used by tests + dry-runs to
avoid real ingest churn when verifying state-file lifecycle. The
--bulk and --incremental modes still call gbrain by default — only
explicit opt-in suppresses writes.

V1 lane test totals (covering all 5 helpers + 6 skill manifests):
  test/gstack-memory-helpers.test.ts     22 tests
  test/gstack-memory-ingest.test.ts      15 tests
  test/gstack-gbrain-sync.test.ts         8 tests
  test/gstack-brain-context-load.test.ts 10 tests
  test/skill-e2e-memory-pipeline.test.ts 10 tests
  ────────────────────────────────────── ─────────
  TOTAL                                  65 passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.26.0.0)

V1 of memory ingest + retrieval surface. Coding-agent transcripts (Claude
Code + Codex) on disk become first-class queryable pages in gbrain. Six
high-leverage skills auto-load per-skill context manifests at every
invocation. Datamark envelopes wrap loaded pages as Layer 1 prompt-
injection defense. Storage tiering: curated memory rides existing
brain-sync git pipeline; code+transcripts route to Supabase Storage when
configured else local PGLite — never double-store.

Net branch size vs main: +4174/-849 across 39 files. 65 V1 tests, all
green. Goldilocks scope per CEO D18; V1.5 P0 follow-ups documented in
the plan's V1.5 TODOs section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:40:30 -07:00

244 lines
9.8 KiB
Cheetah

---
name: investigate
preamble-tier: 2
version: 1.0.0
description: |
Systematic debugging with root cause investigation. Four phases: investigate,
analyze, hypothesize, implement. Iron Law: no fixes without root cause.
Use when asked to "debug this", "fix this bug", "why is this broken",
"investigate this error", or "root cause analysis".
Proactively invoke this skill (do NOT debug directly) when the user reports
errors, 500 errors, stack traces, unexpected behavior, "it was working
yesterday", or is troubleshooting why something stopped working. (gstack)
allowed-tools:
- Bash
- Read
- Write
- Edit
- Grep
- Glob
- AskUserQuestion
- WebSearch
triggers:
- debug this
- fix this bug
- why is this broken
- root cause analysis
- investigate this error
hooks:
PreToolUse:
- matcher: "Edit"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
statusMessage: "Checking debug scope boundary..."
- matcher: "Write"
hooks:
- type: command
command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh"
statusMessage: "Checking debug scope boundary..."
gbrain:
schema: 1
context_queries:
- id: prior-investigations
kind: list
filter:
type: timeline
tags_contains: "repo:{repo_slug}"
content_contains: "investigate"
sort: updated_at_desc
limit: 5
render_as: "## Prior investigations in this repo"
- id: project-learnings
kind: filesystem
glob: "~/.gstack/projects/{repo_slug}/learnings.jsonl"
tail: 10
render_as: "## Recent learnings (patterns + pitfalls)"
- id: recent-eureka
kind: filesystem
glob: "~/.gstack/analytics/eureka.jsonl"
tail: 5
render_as: "## Recent eureka moments (cross-project)"
---
{{PREAMBLE}}
# Systematic Debugging
## Iron Law
**NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST.**
Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find. Find the root cause, then fix it.
---
{{GBRAIN_CONTEXT_LOAD}}
## Phase 1: Root Cause Investigation
Gather context before forming any hypothesis.
1. **Collect symptoms:** Read the error messages, stack traces, and reproduction steps. If the user hasn't provided enough context, ask ONE question at a time via AskUserQuestion.
2. **Read the code:** Trace the code path from the symptom back to potential causes. Use Grep to find all references, Read to understand the logic.
3. **Check recent changes:**
```bash
git log --oneline -20 -- <affected-files>
```
Was this working before? What changed? A regression means the root cause is in the diff.
4. **Reproduce:** Can you trigger the bug deterministically? If not, gather more evidence before proceeding.
5. **Check investigation history:** Search prior learnings for investigations on the same files. Recurring bugs in the same area are an architectural smell. If prior investigations exist, note patterns and check if the root cause was structural.
{{LEARNINGS_SEARCH}}
Output: **"Root cause hypothesis: ..."** — a specific, testable claim about what is wrong and why.
---
## Scope Lock
After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep.
```bash
[ -x "${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE"
```
**If FREEZE_AVAILABLE:** Identify the narrowest directory containing the affected files. Write it to the freeze state file:
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
STATE_DIR="$GSTACK_STATE_ROOT"
mkdir -p "$STATE_DIR"
echo "<detected-directory>/" > "$STATE_DIR/freeze-dir.txt"
echo "Debug scope locked to: <detected-directory>/"
```
Substitute `<detected-directory>` with the actual directory path (e.g., `src/auth/`). Tell the user: "Edits restricted to `<dir>/` for this debug session. This prevents changes to unrelated code. Run `/unfreeze` to remove the restriction."
If the bug spans the entire repo or the scope is genuinely unclear, skip the lock and note why.
**If FREEZE_UNAVAILABLE:** Skip scope lock. Edits are unrestricted.
---
## Phase 2: Pattern Analysis
Check if this bug matches a known pattern:
| Pattern | Signature | Where to look |
|---------|-----------|---------------|
| Race condition | Intermittent, timing-dependent | Concurrent access to shared state |
| Nil/null propagation | NoMethodError, TypeError | Missing guards on optional values |
| State corruption | Inconsistent data, partial updates | Transactions, callbacks, hooks |
| Integration failure | Timeout, unexpected response | External API calls, service boundaries |
| Configuration drift | Works locally, fails in staging/prod | Env vars, feature flags, DB state |
| Stale cache | Shows old data, fixes on cache clear | Redis, CDN, browser cache, Turbo |
Also check:
- `TODOS.md` for related known issues
- `git log` for prior fixes in the same area — **recurring bugs in the same files are an architectural smell**, not a coincidence
**External pattern search:** If the bug doesn't match a known pattern above, WebSearch for:
- "{framework} {generic error type}" — **sanitize first:** strip hostnames, IPs, file paths, SQL, customer data. Search the error category, not the raw message.
- "{library} {component} known issues"
If WebSearch is unavailable, skip this search and proceed with hypothesis testing. If a documented solution or known dependency bug surfaces, present it as a candidate hypothesis in Phase 3.
---
## Phase 3: Hypothesis Testing
Before writing ANY fix, verify your hypothesis.
1. **Confirm the hypothesis:** Add a temporary log statement, assertion, or debug output at the suspected root cause. Run the reproduction. Does the evidence match?
2. **If the hypothesis is wrong:** Before forming the next hypothesis, consider searching for the error. **Sanitize first** — strip hostnames, IPs, file paths, SQL fragments, customer identifiers, and any internal/proprietary data from the error message. Search only the generic error type and framework context: "{component} {sanitized error type} {framework version}". If the error message is too specific to sanitize safely, skip the search. If WebSearch is unavailable, skip and proceed. Then return to Phase 1. Gather more evidence. Do not guess.
3. **3-strike rule:** If 3 hypotheses fail, **STOP**. Use AskUserQuestion:
```
3 hypotheses tested, none match. This may be an architectural issue
rather than a simple bug.
A) Continue investigating — I have a new hypothesis: [describe]
B) Escalate for human review — this needs someone who knows the system
C) Add logging and wait — instrument the area and catch it next time
```
**Red flags** — if you see any of these, slow down:
- "Quick fix for now" — there is no "for now." Fix it right or escalate.
- Proposing a fix before tracing data flow — you're guessing.
- Each fix reveals a new problem elsewhere — wrong layer, not wrong code.
---
## Phase 4: Implementation
Once root cause is confirmed:
1. **Fix the root cause, not the symptom.** The smallest change that eliminates the actual problem.
2. **Minimal diff:** Fewest files touched, fewest lines changed. Resist the urge to refactor adjacent code.
3. **Write a regression test** that:
- **Fails** without the fix (proves the test is meaningful)
- **Passes** with the fix (proves the fix works)
4. **Run the full test suite.** Paste the output. No regressions allowed.
5. **If the fix touches >5 files:** Use AskUserQuestion to flag the blast radius:
```
This fix touches N files. That's a large blast radius for a bug fix.
A) Proceed — the root cause genuinely spans these files
B) Split — fix the critical path now, defer the rest
C) Rethink — maybe there's a more targeted approach
```
---
## Phase 5: Verification & Report
**Fresh verification:** Reproduce the original bug scenario and confirm it's fixed. This is not optional.
Run the test suite and paste the output.
Output a structured debug report:
```
DEBUG REPORT
════════════════════════════════════════
Symptom: [what the user observed]
Root cause: [what was actually wrong]
Fix: [what was changed, with file:line references]
Evidence: [test output, reproduction attempt showing fix works]
Regression test: [file:line of the new test]
Related: [TODOS.md items, prior bugs in same area, architectural notes]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════
```
Log the investigation as a learning for future sessions. Use `type: "investigation"` and include the affected files so future investigations on the same area can find this:
```bash
~/.claude/skills/gstack/bin/gstack-learnings-log '{"skill":"investigate","type":"investigation","key":"ROOT_CAUSE_KEY","insight":"ROOT_CAUSE_SUMMARY","confidence":9,"source":"observed","files":["affected/file1.ts","affected/file2.ts"]}'
```
{{LEARNINGS_LOG}}
{{GBRAIN_SAVE_RESULTS}}
---
## Important Rules
- **3+ failed fix attempts → STOP and question the architecture.** Wrong architecture, not failed hypothesis.
- **Never apply a fix you cannot verify.** If you can't reproduce and confirm, don't ship it.
- **Never say "this should fix it."** Verify and prove it. Run the tests.
- **If fix touches >5 files → AskUserQuestion** about blast radius before proceeding.
- **Completion status:**
- DONE — root cause found, fix applied, regression test written, all tests pass
- DONE_WITH_CONCERNS — fixed but cannot fully verify (e.g., intermittent bug, requires staging)
- BLOCKED — root cause unclear after investigation, escalated