mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-21 20:28:24 +08:00
feat: /codex skill — multi-AI second opinion + proactive suggestions (#197)
* feat: /codex skill — multi-AI second opinion (review, challenge, consult) Three modes: code review with pass/fail gate, adversarial challenge mode, and conversational consult with session continuity. First multi-AI skill in gstack, wrapping OpenAI's Codex CLI. * feat: integrate /codex into /review, /ship, /plan-eng-review + dashboard /review offers Codex second opinion after completing its own review. /ship offers Codex review as optional gate before pushing. /plan-eng-review offers Codex plan critique after scope challenge. Review Readiness Dashboard shows Codex Review as optional row. * chore: bump version and changelog (v0.8.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: codex skill validation (12 stub tests) + E2E eval test Stub tests (free tier): verify template content — three modes, gate verdict, session continuity, cost tracking, cross-model comparison, binary discovery, error handling, mktemp usage, and integrations into /review, /ship, /plan-eng-review. E2E test (paid tier): runs /codex review on vulnerable fixture repo via session-runner, verifies output contains findings and GATE verdict. * fix: codex auth error message — use codex login, not OPENAI_API_KEY Codex authenticates via ChatGPT OAuth (codex login), not an env var. * feat: codex uses high reasoning effort by default gpt-5.2-codex is the only model available with ChatGPT login. All commands now use model_reasoning_effort="high" for maximum depth — the whole point is a thorough second opinion. * feat: crank codex reasoning to xhigh (maximum) * feat: per-mode reasoning (high for review/consult, xhigh for challenge) + web search Review and consult use high reasoning — thorough but not slow. Challenge (adversarial) uses xhigh — maximum depth for breaking code. All modes enable web_search_cached so Codex can look up docs/APIs. * refactor: don't hardcode model — use codex default (always latest) * feat: JSONL output for codex challenge + consult modes Use --json flag to parse codex's JSONL events, extracting reasoning traces ([codex thinking]), tool calls ([codex ran]), and token counts. This gives richer output than the -o flag alone — you can see what codex thought through before its answer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: only persist codex-review log when code review actually ran Don't write a codex-review entry to reviews.jsonl when only the adversarial challenge (option B) was selected — there's no gate verdict to record, and a false entry misleads the Review Readiness Dashboard into thinking a code review happened. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add codex plan review option to /plan-eng-review After scope challenge (Step 0), offer to have Codex independently review the plan with a brutally honest tech reviewer persona. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: update e2e test for codex skill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex integration bugs — plan content, review persistence, quoting, stderr - plan-eng-review: Codex now reads the plan file itself instead of inlining content as a CLI arg (avoids ARG_MAX for large plans) - review: add missing echo to persist codex-review results to reviews.jsonl - codex: consult mode uses $TMPERR (mktemp) instead of hardcoded stderr path - codex + review: quote $SLUG/$BRANCH_SLUG in review log paths - codex: scope plan lookup to current project, warn on cross-project fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add .context/ to .gitignore to prevent session ID leaks Codex consult mode stores session IDs in .context/codex-session-id. Without this ignore rule, session IDs could leak into commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: proactive skill suggestions + opt-out + trigger phrase tests - Preamble reads proactive config via gstack-config - Root SKILL.md.tmpl has lifecycle map (stage → skill suggestion) - Users can opt out ("stop suggesting") / opt in ("be proactive again") - Restored trigger phrase validation tests (16 skills × "Use when" check) - Added missing "Use when" trigger phrases to /debug and /office-hours Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update changelog for v0.8.0 — add proactive suggestions note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -28,14 +28,14 @@ touch ~/.gstack/sessions/"$PPID"
|
||||
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
|
||||
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
|
||||
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
|
||||
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
|
||||
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
|
||||
echo "BRANCH: $_BRANCH"
|
||||
echo "PROACTIVE: $_PROACTIVE"
|
||||
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
|
||||
echo "LAKE_INTRO: $_LAKE_SEEN"
|
||||
mkdir -p ~/.gstack/analytics
|
||||
echo '{"skill":"review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
|
||||
_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
|
||||
echo "PROACTIVE: $_PROACTIVE"
|
||||
```
|
||||
|
||||
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
|
||||
@@ -417,6 +417,58 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-')
|
||||
mkdir -p ~/.gstack/projects/"$SLUG"
|
||||
echo '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}' >> ~/.gstack/projects/"$SLUG"/"$BRANCH_SLUG"-reviews.jsonl
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
|
||||
@@ -231,6 +231,58 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
BRANCH_SLUG=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-')
|
||||
mkdir -p ~/.gstack/projects/"$SLUG"
|
||||
echo '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}' >> ~/.gstack/projects/"$SLUG"/"$BRANCH_SLUG"-reviews.jsonl
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
|
||||
Reference in New Issue
Block a user