mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-22 04:38:24 +08:00
merge: integrate origin/main (v0.5.1-v0.6.4) into team-supabase-store
Resolves conflicts in package.json (keep unified cli-eval.ts + add eval:select) and test/skill-llm-eval.test.ts (keep judgeCost/judgeCosts helpers + add diff-based test selection). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
74
TODOS.md
74
TODOS.md
@@ -263,6 +263,30 @@
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
|
||||
### CI/CD generation for non-GitHub providers
|
||||
|
||||
**What:** Extend CI/CD bootstrap to generate GitLab CI (`.gitlab-ci.yml`), CircleCI (`.circleci/config.yml`), and Bitrise pipelines.
|
||||
|
||||
**Why:** Not all projects use GitHub Actions. Universal CI/CD bootstrap would make test bootstrap work for everyone.
|
||||
|
||||
**Context:** v1 ships with GitHub Actions only. Detection logic already checks for `.gitlab-ci.yml`, `.circleci/`, `bitrise.yml` and skips with an informational note. Each provider needs ~20 lines of template text in `generateTestBootstrap()`.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Test bootstrap (shipped)
|
||||
|
||||
### Auto-upgrade weak tests (★) to strong tests (★★★)
|
||||
|
||||
**What:** When Step 3.4 coverage audit identifies existing ★-rated tests (smoke/trivial assertions), generate improved versions testing edge cases and error paths.
|
||||
|
||||
**Why:** Many codebases have tests that technically exist but don't catch real bugs — `expect(component).toBeDefined()` isn't testing behavior. Upgrading these closes the gap between "has tests" and "has good tests."
|
||||
|
||||
**Context:** Requires the quality scoring rubric from the test coverage audit. Modifying existing test files is riskier than creating new ones — needs careful diffing to ensure the upgraded test still passes. Consider creating a companion test file rather than modifying the original.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Test quality scoring (shipped)
|
||||
|
||||
## Retro
|
||||
|
||||
### Deployment health tracking (retro + browse)
|
||||
@@ -488,6 +512,16 @@ Implemented as `supabase/functions/weekly-digest/index.ts`. pg_cron Monday 9am U
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
### Cross-platform URL open helper
|
||||
|
||||
**What:** `gstack-open-url` helper script — detect platform, use `open` (macOS) or `xdg-open` (Linux).
|
||||
|
||||
**Why:** The first-time Completeness Principle intro uses macOS `open` to launch the essay. If gstack ever supports Linux, this silently fails.
|
||||
|
||||
**Effort:** S (human: ~30 min / CC: ~2 min)
|
||||
**Priority:** P4
|
||||
**Depends on:** Nothing
|
||||
|
||||
### CDP-based DOM mutation detection for ref staleness
|
||||
|
||||
**What:** Use Chrome DevTools Protocol `DOM.documentUpdated` / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit `snapshot` call.
|
||||
@@ -534,6 +568,46 @@ Shipped as `/design-consultation` on garrytan/design branch. Renamed from `/setu
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
## Ship Confidence Dashboard
|
||||
|
||||
### Smart review relevance detection — PARTIALLY SHIPPED
|
||||
|
||||
~~**What:** Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only).~~
|
||||
|
||||
`bin/gstack-diff-scope` shipped — categorizes diff into SCOPE_FRONTEND, SCOPE_BACKEND, SCOPE_PROMPTS, SCOPE_TESTS, SCOPE_DOCS, SCOPE_CONFIG. Used by design-review-lite to skip when no frontend files changed. Dashboard integration for conditional row display is a follow-up.
|
||||
|
||||
**Remaining:** Dashboard conditional row display (hide "Design Review: NOT YET RUN" when SCOPE_FRONTEND=false). Extend to Eng Review (skip for docs-only) and CEO Review (skip for config-only).
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
**Depends on:** gstack-diff-scope (shipped)
|
||||
|
||||
### /merge skill — review-gated PR merge
|
||||
|
||||
**What:** Create a `/merge` skill that merges an approved PR, but first checks the Review Readiness Dashboard and runs `/review` (Fix-First) if code review hasn't been done. Separates "ship" (create PR) from "merge" (land it).
|
||||
|
||||
**Why:** Currently `/review` runs inside `/ship` Step 3.5 but isn't tracked as a gate. A `/merge` skill ensures code review always happens before landing, and enables workflows where someone else reviews the PR first.
|
||||
|
||||
**Context:** `/ship` creates the PR. `/merge` would: check dashboard → run `/review` if needed → `gh pr merge`. This is where code review tracking belongs — at merge time, not at plan time.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** Ship Confidence Dashboard (shipped)
|
||||
|
||||
## Completeness
|
||||
|
||||
### Completeness metrics dashboard
|
||||
|
||||
**What:** Track how often Claude chooses the complete option vs shortcut across gstack sessions. Aggregate into a dashboard showing completeness trend over time.
|
||||
|
||||
**Why:** Without measurement, we can't know if the Completeness Principle is working. Could surface patterns (e.g., certain skills still bias toward shortcuts).
|
||||
|
||||
**Context:** Would require logging choices (e.g., append to a JSONL file when AskUserQuestion resolves), parsing them, and displaying trends. Similar pattern to eval persistence.
|
||||
|
||||
**Effort:** M (human) / S (CC)
|
||||
**Priority:** P3
|
||||
**Depends on:** Boil the Lake shipped (v0.6.1)
|
||||
|
||||
## Completed
|
||||
|
||||
### Phase 1: Foundations (v0.2.0)
|
||||
|
||||
Reference in New Issue
Block a user