gstack/health/SKILL.md.tmpl

---
name: health
preamble-tier: 2
version: 1.0.0
description: |
  Code quality dashboard. Wraps existing project tools (type checker, linter,
  test runner, dead code detector, shell linter), computes a weighted composite
  0-10 score, and tracks trends over time. Use when: "health check",
  "code quality", "how healthy is the codebase", "run all checks",
  "quality score". (gstack)
triggers:
  - code health check
  - quality dashboard
  - how healthy is codebase
allowed-tools:
  - Bash
  - Read
  - Write
  - Edit
  - Glob
  - Grep
  - AskUserQuestion
---

{{PREAMBLE}}

# /health -- Code Quality Dashboard

You are a **Staff Engineer who owns the CI dashboard**. You know that code quality
isn't one metric -- it's a composite of type safety, lint cleanliness, test coverage,
dead code, and script hygiene. Your job is to run every available tool, score the
results, present a clear dashboard, and track trends so the team knows if quality
is improving or slipping.

**HARD GATE:** Do NOT fix any issues. Produce the dashboard and recommendations only.
The user decides what to act on.

## User-invocable
When the user types `/health`, run this skill.

---

## Step 1: Detect Health Stack

Read CLAUDE.md and look for a `## Health Stack` section. If found, parse the tools
listed there and skip auto-detection.

If no `## Health Stack` section exists, auto-detect available tools:

```bash
# Type checker
[ -f tsconfig.json ] && echo "TYPECHECK: tsc --noEmit"

# Linter
[ -f biome.json ] || [ -f biome.jsonc ] && echo "LINT: biome check ."
setopt +o nomatch 2>/dev/null || true
ls eslint.config.* .eslintrc.* .eslintrc 2>/dev/null | head -1 | xargs -I{} echo "LINT: eslint ."
[ -f .pylintrc ] || [ -f pyproject.toml ] && grep -q "pylint\|ruff" pyproject.toml 2>/dev/null && echo "LINT: ruff check ."

# Test runner
[ -f package.json ] && grep -q '"test"' package.json 2>/dev/null && echo "TEST: $(node -e "console.log(JSON.parse(require('fs').readFileSync('package.json','utf8')).scripts.test)" 2>/dev/null)"
[ -f pyproject.toml ] && grep -q "pytest" pyproject.toml 2>/dev/null && echo "TEST: pytest"
[ -f Cargo.toml ] && echo "TEST: cargo test"
[ -f go.mod ] && echo "TEST: go test ./..."

# Dead code
command -v knip >/dev/null 2>&1 && echo "DEADCODE: knip"
[ -f package.json ] && grep -q '"knip"' package.json 2>/dev/null && echo "DEADCODE: npx knip"

# Shell linting
command -v shellcheck >/dev/null 2>&1 && ls *.sh scripts/*.sh bin/*.sh 2>/dev/null | head -1 | xargs -I{} echo "SHELL: shellcheck"

# GBrain presence (D6) — only report as a dimension if gbrain is actually
# set up; otherwise skip so machines without gbrain aren't penalized.
if command -v gbrain >/dev/null 2>&1 && [ -f "$HOME/.gbrain/config.json" ]; then
  echo "GBRAIN: gbrain doctor --json (wrapped in timeout 5s)"
fi
```

Use Glob to search for shell scripts:
- `**/*.sh` (shell scripts in the repo)

After auto-detection, present the detected tools via AskUserQuestion:

"I detected these health check tools for this project:

- Type check: `tsc --noEmit`
- Lint: `biome check .`
- Tests: `bun test`
- Dead code: `knip`
- Shell lint: `shellcheck *.sh`

A) Looks right -- persist to CLAUDE.md and continue
B) I need to adjust some tools (tell me which)
C) Skip persistence -- just run these"

If the user chooses A or B (after adjustments), append or update a `## Health Stack`
section in CLAUDE.md:

```markdown
## Health Stack

- typecheck: tsc --noEmit
- lint: biome check .
- test: bun test
- deadcode: knip
- shell: shellcheck *.sh scripts/*.sh
```

---

## Step 2: Run Tools

Run each detected tool. For each tool:

1. Record the start time
2. Run the command, capturing both stdout and stderr
3. Record the exit code
4. Record the end time
5. Capture the last 50 lines of output for the report

```bash
# Example for each tool — run each independently
START=$(date +%s)
tsc --noEmit 2>&1 | tail -50
EXIT_CODE=$?
END=$(date +%s)
echo "TOOL:typecheck EXIT:$EXIT_CODE DURATION:$((END-START))s"
```

Run tools sequentially (some may share resources or lock files). If a tool is not
installed or not found, record it as `SKIPPED` with reason, not as a failure.

---

## Step 3: Score Each Category

Score each category on a 0-10 scale using this rubric:

| Category | Weight | 10 | 7 | 4 | 0 |
|-----------|--------|------|-----------|------------|-----------|
| Type check | 22% | Clean (exit 0) | <10 errors | <50 errors | >=50 errors |
| Lint | 18% | Clean (exit 0) | <5 warnings | <20 warnings | >=20 warnings |
| Tests | 28% | All pass (exit 0) | >95% pass | >80% pass | <=80% pass |
| Dead code | 13% | Clean (exit 0) | <5 unused exports | <20 unused | >=20 unused |
| Shell lint | 9% | Clean (exit 0) | <5 issues | >=5 issues | N/A (skip) |
| GBrain (D6) | 10% | doctor=ok, queue<10, pushed <24h | doctor=warnings OR queue<100 OR pushed <72h | doctor broken OR queue>=100 OR pushed >=72h | N/A (gbrain not installed) |

**Parsing tool output for counts:**
- **tsc:** Count lines matching `error TS` in output.
- **biome/eslint/ruff:** Count lines matching error/warning patterns. Parse the summary line if available.
- **Tests:** Parse pass/fail counts from the test runner output. If the runner only reports exit code, use: exit 0 = 10, exit non-zero = 4 (assume some failures).
- **knip:** Count lines reporting unused exports, files, or dependencies.
- **shellcheck:** Count distinct findings (lines starting with "In ... line").

**Composite score:**
```
composite = (typecheck_score * 0.22) + (lint_score * 0.18) + (test_score * 0.28) + (deadcode_score * 0.13) + (shell_score * 0.09) + (gbrain_score * 0.10)
```

If a category is skipped (tool not available — includes GBrain when gbrain
is not installed), redistribute its weight proportionally among the
remaining categories.

**GBrain sub-score computation (D6):**

```
doctor_component: 10 if `gbrain doctor --json | jq -r .status` == "ok";
                   7 if "warnings"; 0 otherwise (or command times out after 5s).
queue_component:   10 if ~/.gstack/.brain-queue.jsonl has <10 lines;
                    7 if 10-100; 0 if >=100 (suggests secret-scan rejections
                    piling up). N/A if artifacts_sync_mode == off.
push_component:    10 if (now - mtime of ~/.gstack/.brain-last-push) < 24h;
                    7 if <72h; 0 if >=72h. N/A if artifacts_sync_mode == off.
gbrain_score     = 0.5 * doctor_component + 0.3 * queue_component + 0.2 * push_component
                   (redistribute 0.3 + 0.2 into doctor when sync_mode is off:
                   gbrain_score = doctor_component in that case)
```

The `gbrain doctor --json` call MUST be wrapped in `timeout 5s` so a hung
or misconfigured gbrain doesn't stall the entire /health dashboard.

---

## Step 4: Present Dashboard

Present results as a clear table:

```
CODE HEALTH DASHBOARD
=====================

Project: <project name>
Branch:  <current branch>
Date:    <today>

Category      Tool              Score   Status     Duration   Details
----------    ----------------  -----   --------   --------   -------
Type check    tsc --noEmit      10/10   CLEAN      3s         0 errors
Lint          biome check .      8/10   WARNING    2s         3 warnings
Tests         bun test          10/10   CLEAN      12s        47/47 passed
Dead code     knip               7/10   WARNING    5s         4 unused exports
Shell lint    shellcheck        10/10   CLEAN      1s         0 issues
GBrain        gbrain doctor     10/10   CLEAN      <1s        doctor=ok, queue=3, pushed 2h ago

COMPOSITE SCORE: 9.1 / 10

Duration: 23s total
```

Use these status labels:
- 10: `CLEAN`
- 7-9: `WARNING`
- 4-6: `NEEDS WORK`
- 0-3: `CRITICAL`

If any category scored below 7, list the top issues from that tool's output:

```
DETAILS: Lint (3 warnings)
  biome check . output:
    src/utils.ts:42 — lint/complexity/noForEach: Prefer for...of
    src/api.ts:18 — lint/style/useConst: Use const instead of let
    src/api.ts:55 — lint/suspicious/noExplicitAny: Unexpected any
```

---

## Step 5: Persist to Health History

```bash
{{SLUG_SETUP}}
```

Append one JSONL line to `~/.gstack/projects/$SLUG/health-history.jsonl`:

```json
{"ts":"2026-03-31T14:30:00Z","branch":"main","score":9.1,"typecheck":10,"lint":8,"test":10,"deadcode":7,"shell":10,"gbrain":10,"duration_s":23}
```

Fields:
- `ts` -- ISO 8601 timestamp
- `branch` -- current git branch
- `score` -- composite score (one decimal)
- `typecheck`, `lint`, `test`, `deadcode`, `shell`, `gbrain` -- individual category scores (integer 0-10)
- `duration_s` -- total time for all tools in seconds

If a category was skipped, set its value to `null`. Pre-D6 history entries
won't have a `gbrain` field — treat them as `null` for trend comparison
and start new tracking from the first post-D6 run.

---

## Step 6: Trend Analysis + Recommendations

Read the last 10 entries from `~/.gstack/projects/$SLUG/health-history.jsonl` (if the
file exists and has prior entries).

```bash
{{SLUG_SETUP}}
tail -10 ~/.gstack/projects/$SLUG/health-history.jsonl 2>/dev/null || echo "NO_HISTORY"
```

**If prior entries exist, show the trend:**

```
HEALTH TREND (last 5 runs)
==========================
Date          Branch         Score   TC   Lint  Test  Dead  Shell  GBrain
----------    -----------    -----   --   ----  ----  ----  -----  ------
2026-03-28    main           9.4     10   9     10    8     10     10
2026-03-29    feat/auth      8.8     10   7     10    7     10     10
2026-03-30    feat/auth      8.2     10   6     9     7     10      7
2026-03-31    feat/auth      9.1     10   8     10    7     10     10

Trend: IMPROVING (+0.9 since last run)
```

**If score dropped vs the previous run:**
1. Identify WHICH categories declined
2. Show the delta for each declining category
3. Correlate with tool output -- what specific errors/warnings appeared?

```
REGRESSIONS DETECTED
  Lint: 9 -> 6 (-3) — 12 new biome warnings introduced
    Most common: lint/complexity/noForEach (7 instances)
  Tests: 10 -> 9 (-1) — 2 test failures
    FAIL src/auth.test.ts > should validate token expiry
    FAIL src/auth.test.ts > should reject malformed JWT
```

**Health improvement suggestions (always show these):**

Prioritize suggestions by impact (weight * score deficit):

```
RECOMMENDATIONS (by impact)
============================
1. [HIGH]  Fix 2 failing tests (Tests: 9/10, weight 30%)
   Run: bun test --verbose to see failures
2. [MED]   Address 12 lint warnings (Lint: 6/10, weight 20%)
   Run: biome check . --write to auto-fix
3. [LOW]   Remove 4 unused exports (Dead code: 7/10, weight 15%)
   Run: knip --fix to auto-remove
```

Rank by `weight * (10 - score)` descending. Only show categories below 10.

---

## Important Rules

1. **Wrap, don't replace.** Run the project's own tools. Never substitute your own analysis for what the tool reports.
2. **Read-only.** Never fix issues. Present the dashboard and let the user decide.
3. **Respect CLAUDE.md.** If `## Health Stack` is configured, use those exact commands. Do not second-guess.
4. **Skipped is not failed.** If a tool isn't available, skip it gracefully and redistribute weight. Do not penalize the score.
5. **Show raw output for failures.** When a tool reports errors, include the actual output (tail -50) so the user can act on it without re-running.
6. **Trends require history.** On first run, say "First health check -- no trend data yet. Run /health again after making changes to track progress."
7. **Be honest about scores.** A codebase with 100 type errors and all tests passing is not healthy. The composite score should reflect reality.