diff --git a/agents/code-reviewer.md b/agents/code-reviewer.md index 91cd7dc3..c0db2a22 100644 --- a/agents/code-reviewer.md +++ b/agents/code-reviewer.md @@ -27,6 +27,80 @@ When invoked: - **Consolidate** similar issues (e.g., "5 functions missing error handling" not 5 separate findings) - **Prioritize** issues that could cause bugs, security vulnerabilities, or data loss +### Pre-Report Gate + +Before writing a finding, answer all four questions. If any answer is "no" or +"unsure", downgrade severity or drop the finding. + +1. **Can I cite the exact line?** Name the file and line. Vague findings like + "somewhere in the auth layer" are not actionable and must be dropped. +2. **Can I describe the concrete failure mode?** Name the input, state, and bad + outcome. If you cannot name the trigger, you are pattern-matching, not + reviewing. +3. **Have I read the surrounding context?** Check callers, imports, and tests. + Many apparent issues are already handled one frame up or guarded by a type. +4. **Is the severity defensible?** A missing JSDoc is never HIGH. A single + `any` in a test fixture is never CRITICAL. Severity inflation erodes trust + faster than missed findings. + +### HIGH / CRITICAL Require Proof + +For any finding tagged HIGH or CRITICAL, include: + +- The exact snippet and line number +- The specific failure scenario: input, state, and outcome +- Why existing guards, such as types, validation, or framework defaults, do not + catch it + +If you cannot produce all three, demote to MEDIUM or drop. + +### It Is Acceptable And Expected To Return Zero Findings + +A clean review is a valid review. Do not manufacture findings to justify the +invocation. If the diff is small, well-typed, tested, and follows the project's +patterns, the correct output is a summary with zero rows and verdict `APPROVE`. + +Manufactured findings, filler nits, speculative "consider using X", and +hypothetical edge cases without a trigger are the primary failure mode of LLM +reviewers and directly undermine this agent's usefulness. + +## Common False Positives - Skip These + +Patterns that LLM reviewers commonly mis-flag. Skip unless you have evidence +specific to this codebase: + +- **"Consider adding error handling"** on a call whose error path is handled by + the caller or framework, such as Express error middleware, React error + boundaries, top-level `try/catch`, or Promise chains with `.catch` upstream. +- **"Missing input validation"** when the function is internal and its callers + already validate. Trace at least one caller before flagging. +- **"Magic number"** for well-known constants: `200`, `404`, `1000` ms, `60`, + `24`, `1024`, array index `0` or `-1`, HTTP status codes, and single-use + local constants whose meaning is obvious from the variable name. +- **"Function too long"** for exhaustive `switch` statements, configuration + objects, test tables, or generated code. Length is not complexity. +- **"Missing JSDoc"** on single-purpose internal helpers whose name and + signature are self-describing. +- **"Prefer `const` over `let`"** when the variable is reassigned. Read the + whole function before flagging. +- **"Possible null dereference"** when the preceding line narrows the type or an + `if` guard is in scope. Trace type flow instead of pattern-matching on `?.`. +- **"N+1 query"** on fixed-cardinality loops, such as iterating a four-element + enum, or on paths already using `DataLoader` or batching. +- **"Missing await"** on fire-and-forget calls that are intentionally detached, + such as logging, metrics, or background queue pushes. Check for a comment or + `void` prefix before flagging. +- **"Should use TypeScript"** or **"Should have types"** in a JavaScript-only + file. Match the project's existing language; do not suggest a stack change. +- **"Hardcoded value"** for values in test fixtures, example code, or + documentation snippets. Tests should have hardcoded expectations. +- **Security theater**: flagging `Math.random()` in a non-cryptographic context + such as animation, jitter, or sampling, or flagging `eval`/`Function` in a + plugin system that is explicitly a code-loading surface. + +When tempted to flag one of the above, ask: "Would a senior engineer on this +team actually change this in review?" If no, skip. + ## Review Checklist ### Security (CRITICAL) @@ -206,10 +280,13 @@ Verdict: WARNING — 2 HIGH issues should be resolved before merge. ## Approval Criteria -- **Approve**: No CRITICAL or HIGH issues +- **Approve**: No CRITICAL or HIGH issues, including clean reviews with zero + findings. This is a valid and expected outcome. - **Warning**: HIGH issues only (can merge with caution) - **Block**: CRITICAL issues found — must fix before merge +Do not withhold approval to appear rigorous. If the diff is clean, approve it. + ## Project-Specific Guidelines When available, also check project-specific conventions from `CLAUDE.md` or project rules: diff --git a/docs/stale-pr-salvage-ledger.md b/docs/stale-pr-salvage-ledger.md index c75721e3..67f727f3 100644 --- a/docs/stale-pr-salvage-ledger.md +++ b/docs/stale-pr-salvage-ledger.md @@ -44,6 +44,7 @@ on fresh branches, and credit the source PR. | #1566 | Agent architecture audit skill | Salvaged in #1772. | | #1578 | OpenCode file-probe hardening | Salvaged in #1773. | | #1603 | `plan-orchestrate` skill | Salvaged in #1766 with current manifest/catalog wiring. | +| #1658 | Code-reviewer false-positive suppression | Salvaged in the May 12 code-reviewer maintainer pass with current review-agent wording, a proof gate for HIGH/CRITICAL findings, common false-positive exclusions, and a regression test. | | #1659 | Frontend design direction and interface-polish skills | Salvaged in the May 12 frontend-design maintainer pass with canonical `skills/` layout and current ECC frontend guidance, while preserving the repo guardrail that the official `frontend-design` skill should be installed from `anthropics/skills`. | | #1674 | Production audit skill | Salvaged in #1732 after supply-chain/privacy review and rewrite. | | #1687 | zh-CN localization sync | Large safe subsets salvaged in #1746-#1752; remaining pieces require translator/manual review. | @@ -77,6 +78,8 @@ porting. | #1504 | Already mapped to #1776 in the durable salvage table. | | #1508 | Already present as `skills/fastapi-patterns/` and `agents/fastapi-reviewer.md`. | | #1603 | Useful `/plan-orchestrate` work was already preserved in #1766 with current package/catalog metadata. | +| #1631 | Already present in `scripts/hooks/suggest-compact.js` and `tests/hooks/hooks.test.js`; current code reads `session_id` from stdin JSON before falling back to `CLAUDE_SESSION_ID`. | +| #1658 | Ported through the code-reviewer maintainer branch after confirming the false-positive proof gate and common false-positive skip list were still missing. | | #1693 | Already present as `skills/redis-patterns/`. | ## Already Present Or Superseded @@ -87,6 +90,7 @@ porting. | #1318 | Gemini agent adaptation utility was already present on current `main`. | | #1323 | Hook config update was already present on current `main`. | | #1337 | Catalog count update was superseded by current catalog-count sync. | +| #1631 | `suggest-compact` stdin `session_id` isolation was already present on current `main` with hook tests. | | #1608 | Unsafe dashboard document/terminal open handling was already present on current `main` through safe runtime helpers and project-bound document opening. | | #1678 | Windows MCP `.cmd`/`.bat` fallback behavior was already present on current `main` with current health-check tests. | | #1682/#1701 | Strategic compact hook-path fixes were merged directly or superseded by current docs fixes. | diff --git a/tests/ci/code-reviewer-false-positive-guard.test.js b/tests/ci/code-reviewer-false-positive-guard.test.js new file mode 100644 index 00000000..935c8ed6 --- /dev/null +++ b/tests/ci/code-reviewer-false-positive-guard.test.js @@ -0,0 +1,82 @@ +#!/usr/bin/env node +'use strict'; + +const assert = require('assert'); +const fs = require('fs'); +const path = require('path'); + +const repoRoot = path.resolve(__dirname, '..', '..'); +const reviewerPath = path.join(repoRoot, 'agents', 'code-reviewer.md'); + +const requiredHeadings = [ + '## Confidence-Based Filtering', + '### Pre-Report Gate', + '### HIGH / CRITICAL Require Proof', + '### It Is Acceptable And Expected To Return Zero Findings', + '## Common False Positives - Skip These', +]; + +const requiredPatterns = [ + /Can I cite the exact line/i, + /concrete failure mode/i, + /Have I read the surrounding context/i, + /Severity inflation/i, + /exact snippet and line number/i, + /specific failure scenario/i, + /demote to MEDIUM or drop/i, + /clean review is a valid review/i, + /Manufactured findings/i, + /Common False Positives/i, + /Consider adding error handling/i, + /Missing input validation/i, + /Magic number/i, + /Would a senior engineer on this\s+team actually change this in review/i, + /Do not withhold approval to appear rigorous/i, +]; + +let passed = 0; +let failed = 0; + +function test(name, fn) { + try { + fn(); + console.log(` PASS ${name}`); + passed++; + } catch (error) { + console.log(` FAIL ${name}`); + console.log(` Error: ${error.message}`); + failed++; + } +} + +function readReviewer() { + return fs.readFileSync(reviewerPath, 'utf8'); +} + +console.log('\n=== Testing code-reviewer false-positive guardrails ===\n'); + +for (const heading of requiredHeadings) { + test(`code-reviewer.md contains heading: ${heading}`, () => { + const source = readReviewer(); + assert.ok(source.includes(heading), `code-reviewer.md missing required heading "${heading}"`); + }); +} + +for (const pattern of requiredPatterns) { + test(`code-reviewer.md matches ${pattern}`, () => { + const source = readReviewer(); + assert.ok(pattern.test(source), `code-reviewer.md missing required pattern ${pattern}`); + }); +} + +test('code-reviewer.md retains the >80% confidence threshold', () => { + const source = readReviewer(); + assert.ok(/>\s*80%\s*confident/i.test(source), 'code-reviewer.md missing >80% confidence threshold'); +}); + +if (failed > 0) { + console.log(`\nFailed: ${failed}`); + process.exit(1); +} + +console.log(`\nPassed: ${passed}`); diff --git a/tests/docs/stale-pr-salvage-ledger.test.js b/tests/docs/stale-pr-salvage-ledger.test.js index 24f407d5..e0ff5c0e 100644 --- a/tests/docs/stale-pr-salvage-ledger.test.js +++ b/tests/docs/stale-pr-salvage-ledger.test.js @@ -58,6 +58,7 @@ test('stale PR salvage ledger preserves representative source attribution', () = '#1493', '#1528/#1529/#1547', '#1603', + '#1658', '#1659', '#1674', '#1687', @@ -110,6 +111,8 @@ test('stale PR salvage ledger records the May 12 gap pass', () => { '#1504', '#1508', '#1603', + '#1631', + '#1658', '#1693', ]) { assert.ok(source.includes(pr), `Missing May 12 gap-pass PR ${pr}`); @@ -119,6 +122,8 @@ test('stale PR salvage ledger records the May 12 gap pass', () => { assert.ok(source.includes('already preserved in #1770')); assert.ok(source.includes('already preserved in #1769')); assert.ok(source.includes('already preserved in #1766')); + assert.ok(source.includes('false-positive proof gate')); + assert.ok(source.includes('session_id` from stdin JSON')); assert.ok(source.includes('Already present as `skills/redis-patterns/`')); });