mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-05-13 08:03:04 +08:00
feat: salvage code-reviewer false-positive guardrails (#1817)
This commit is contained in:
@@ -27,6 +27,80 @@ When invoked:
|
||||
- **Consolidate** similar issues (e.g., "5 functions missing error handling" not 5 separate findings)
|
||||
- **Prioritize** issues that could cause bugs, security vulnerabilities, or data loss
|
||||
|
||||
### Pre-Report Gate
|
||||
|
||||
Before writing a finding, answer all four questions. If any answer is "no" or
|
||||
"unsure", downgrade severity or drop the finding.
|
||||
|
||||
1. **Can I cite the exact line?** Name the file and line. Vague findings like
|
||||
"somewhere in the auth layer" are not actionable and must be dropped.
|
||||
2. **Can I describe the concrete failure mode?** Name the input, state, and bad
|
||||
outcome. If you cannot name the trigger, you are pattern-matching, not
|
||||
reviewing.
|
||||
3. **Have I read the surrounding context?** Check callers, imports, and tests.
|
||||
Many apparent issues are already handled one frame up or guarded by a type.
|
||||
4. **Is the severity defensible?** A missing JSDoc is never HIGH. A single
|
||||
`any` in a test fixture is never CRITICAL. Severity inflation erodes trust
|
||||
faster than missed findings.
|
||||
|
||||
### HIGH / CRITICAL Require Proof
|
||||
|
||||
For any finding tagged HIGH or CRITICAL, include:
|
||||
|
||||
- The exact snippet and line number
|
||||
- The specific failure scenario: input, state, and outcome
|
||||
- Why existing guards, such as types, validation, or framework defaults, do not
|
||||
catch it
|
||||
|
||||
If you cannot produce all three, demote to MEDIUM or drop.
|
||||
|
||||
### It Is Acceptable And Expected To Return Zero Findings
|
||||
|
||||
A clean review is a valid review. Do not manufacture findings to justify the
|
||||
invocation. If the diff is small, well-typed, tested, and follows the project's
|
||||
patterns, the correct output is a summary with zero rows and verdict `APPROVE`.
|
||||
|
||||
Manufactured findings, filler nits, speculative "consider using X", and
|
||||
hypothetical edge cases without a trigger are the primary failure mode of LLM
|
||||
reviewers and directly undermine this agent's usefulness.
|
||||
|
||||
## Common False Positives - Skip These
|
||||
|
||||
Patterns that LLM reviewers commonly mis-flag. Skip unless you have evidence
|
||||
specific to this codebase:
|
||||
|
||||
- **"Consider adding error handling"** on a call whose error path is handled by
|
||||
the caller or framework, such as Express error middleware, React error
|
||||
boundaries, top-level `try/catch`, or Promise chains with `.catch` upstream.
|
||||
- **"Missing input validation"** when the function is internal and its callers
|
||||
already validate. Trace at least one caller before flagging.
|
||||
- **"Magic number"** for well-known constants: `200`, `404`, `1000` ms, `60`,
|
||||
`24`, `1024`, array index `0` or `-1`, HTTP status codes, and single-use
|
||||
local constants whose meaning is obvious from the variable name.
|
||||
- **"Function too long"** for exhaustive `switch` statements, configuration
|
||||
objects, test tables, or generated code. Length is not complexity.
|
||||
- **"Missing JSDoc"** on single-purpose internal helpers whose name and
|
||||
signature are self-describing.
|
||||
- **"Prefer `const` over `let`"** when the variable is reassigned. Read the
|
||||
whole function before flagging.
|
||||
- **"Possible null dereference"** when the preceding line narrows the type or an
|
||||
`if` guard is in scope. Trace type flow instead of pattern-matching on `?.`.
|
||||
- **"N+1 query"** on fixed-cardinality loops, such as iterating a four-element
|
||||
enum, or on paths already using `DataLoader` or batching.
|
||||
- **"Missing await"** on fire-and-forget calls that are intentionally detached,
|
||||
such as logging, metrics, or background queue pushes. Check for a comment or
|
||||
`void` prefix before flagging.
|
||||
- **"Should use TypeScript"** or **"Should have types"** in a JavaScript-only
|
||||
file. Match the project's existing language; do not suggest a stack change.
|
||||
- **"Hardcoded value"** for values in test fixtures, example code, or
|
||||
documentation snippets. Tests should have hardcoded expectations.
|
||||
- **Security theater**: flagging `Math.random()` in a non-cryptographic context
|
||||
such as animation, jitter, or sampling, or flagging `eval`/`Function` in a
|
||||
plugin system that is explicitly a code-loading surface.
|
||||
|
||||
When tempted to flag one of the above, ask: "Would a senior engineer on this
|
||||
team actually change this in review?" If no, skip.
|
||||
|
||||
## Review Checklist
|
||||
|
||||
### Security (CRITICAL)
|
||||
@@ -206,10 +280,13 @@ Verdict: WARNING — 2 HIGH issues should be resolved before merge.
|
||||
|
||||
## Approval Criteria
|
||||
|
||||
- **Approve**: No CRITICAL or HIGH issues
|
||||
- **Approve**: No CRITICAL or HIGH issues, including clean reviews with zero
|
||||
findings. This is a valid and expected outcome.
|
||||
- **Warning**: HIGH issues only (can merge with caution)
|
||||
- **Block**: CRITICAL issues found — must fix before merge
|
||||
|
||||
Do not withhold approval to appear rigorous. If the diff is clean, approve it.
|
||||
|
||||
## Project-Specific Guidelines
|
||||
|
||||
When available, also check project-specific conventions from `CLAUDE.md` or project rules:
|
||||
|
||||
@@ -44,6 +44,7 @@ on fresh branches, and credit the source PR.
|
||||
| #1566 | Agent architecture audit skill | Salvaged in #1772. |
|
||||
| #1578 | OpenCode file-probe hardening | Salvaged in #1773. |
|
||||
| #1603 | `plan-orchestrate` skill | Salvaged in #1766 with current manifest/catalog wiring. |
|
||||
| #1658 | Code-reviewer false-positive suppression | Salvaged in the May 12 code-reviewer maintainer pass with current review-agent wording, a proof gate for HIGH/CRITICAL findings, common false-positive exclusions, and a regression test. |
|
||||
| #1659 | Frontend design direction and interface-polish skills | Salvaged in the May 12 frontend-design maintainer pass with canonical `skills/` layout and current ECC frontend guidance, while preserving the repo guardrail that the official `frontend-design` skill should be installed from `anthropics/skills`. |
|
||||
| #1674 | Production audit skill | Salvaged in #1732 after supply-chain/privacy review and rewrite. |
|
||||
| #1687 | zh-CN localization sync | Large safe subsets salvaged in #1746-#1752; remaining pieces require translator/manual review. |
|
||||
@@ -77,6 +78,8 @@ porting.
|
||||
| #1504 | Already mapped to #1776 in the durable salvage table. |
|
||||
| #1508 | Already present as `skills/fastapi-patterns/` and `agents/fastapi-reviewer.md`. |
|
||||
| #1603 | Useful `/plan-orchestrate` work was already preserved in #1766 with current package/catalog metadata. |
|
||||
| #1631 | Already present in `scripts/hooks/suggest-compact.js` and `tests/hooks/hooks.test.js`; current code reads `session_id` from stdin JSON before falling back to `CLAUDE_SESSION_ID`. |
|
||||
| #1658 | Ported through the code-reviewer maintainer branch after confirming the false-positive proof gate and common false-positive skip list were still missing. |
|
||||
| #1693 | Already present as `skills/redis-patterns/`. |
|
||||
|
||||
## Already Present Or Superseded
|
||||
@@ -87,6 +90,7 @@ porting.
|
||||
| #1318 | Gemini agent adaptation utility was already present on current `main`. |
|
||||
| #1323 | Hook config update was already present on current `main`. |
|
||||
| #1337 | Catalog count update was superseded by current catalog-count sync. |
|
||||
| #1631 | `suggest-compact` stdin `session_id` isolation was already present on current `main` with hook tests. |
|
||||
| #1608 | Unsafe dashboard document/terminal open handling was already present on current `main` through safe runtime helpers and project-bound document opening. |
|
||||
| #1678 | Windows MCP `.cmd`/`.bat` fallback behavior was already present on current `main` with current health-check tests. |
|
||||
| #1682/#1701 | Strategic compact hook-path fixes were merged directly or superseded by current docs fixes. |
|
||||
|
||||
82
tests/ci/code-reviewer-false-positive-guard.test.js
Normal file
82
tests/ci/code-reviewer-false-positive-guard.test.js
Normal file
@@ -0,0 +1,82 @@
|
||||
#!/usr/bin/env node
|
||||
'use strict';
|
||||
|
||||
const assert = require('assert');
|
||||
const fs = require('fs');
|
||||
const path = require('path');
|
||||
|
||||
const repoRoot = path.resolve(__dirname, '..', '..');
|
||||
const reviewerPath = path.join(repoRoot, 'agents', 'code-reviewer.md');
|
||||
|
||||
const requiredHeadings = [
|
||||
'## Confidence-Based Filtering',
|
||||
'### Pre-Report Gate',
|
||||
'### HIGH / CRITICAL Require Proof',
|
||||
'### It Is Acceptable And Expected To Return Zero Findings',
|
||||
'## Common False Positives - Skip These',
|
||||
];
|
||||
|
||||
const requiredPatterns = [
|
||||
/Can I cite the exact line/i,
|
||||
/concrete failure mode/i,
|
||||
/Have I read the surrounding context/i,
|
||||
/Severity inflation/i,
|
||||
/exact snippet and line number/i,
|
||||
/specific failure scenario/i,
|
||||
/demote to MEDIUM or drop/i,
|
||||
/clean review is a valid review/i,
|
||||
/Manufactured findings/i,
|
||||
/Common False Positives/i,
|
||||
/Consider adding error handling/i,
|
||||
/Missing input validation/i,
|
||||
/Magic number/i,
|
||||
/Would a senior engineer on this\s+team actually change this in review/i,
|
||||
/Do not withhold approval to appear rigorous/i,
|
||||
];
|
||||
|
||||
let passed = 0;
|
||||
let failed = 0;
|
||||
|
||||
function test(name, fn) {
|
||||
try {
|
||||
fn();
|
||||
console.log(` PASS ${name}`);
|
||||
passed++;
|
||||
} catch (error) {
|
||||
console.log(` FAIL ${name}`);
|
||||
console.log(` Error: ${error.message}`);
|
||||
failed++;
|
||||
}
|
||||
}
|
||||
|
||||
function readReviewer() {
|
||||
return fs.readFileSync(reviewerPath, 'utf8');
|
||||
}
|
||||
|
||||
console.log('\n=== Testing code-reviewer false-positive guardrails ===\n');
|
||||
|
||||
for (const heading of requiredHeadings) {
|
||||
test(`code-reviewer.md contains heading: ${heading}`, () => {
|
||||
const source = readReviewer();
|
||||
assert.ok(source.includes(heading), `code-reviewer.md missing required heading "${heading}"`);
|
||||
});
|
||||
}
|
||||
|
||||
for (const pattern of requiredPatterns) {
|
||||
test(`code-reviewer.md matches ${pattern}`, () => {
|
||||
const source = readReviewer();
|
||||
assert.ok(pattern.test(source), `code-reviewer.md missing required pattern ${pattern}`);
|
||||
});
|
||||
}
|
||||
|
||||
test('code-reviewer.md retains the >80% confidence threshold', () => {
|
||||
const source = readReviewer();
|
||||
assert.ok(/>\s*80%\s*confident/i.test(source), 'code-reviewer.md missing >80% confidence threshold');
|
||||
});
|
||||
|
||||
if (failed > 0) {
|
||||
console.log(`\nFailed: ${failed}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log(`\nPassed: ${passed}`);
|
||||
@@ -58,6 +58,7 @@ test('stale PR salvage ledger preserves representative source attribution', () =
|
||||
'#1493',
|
||||
'#1528/#1529/#1547',
|
||||
'#1603',
|
||||
'#1658',
|
||||
'#1659',
|
||||
'#1674',
|
||||
'#1687',
|
||||
@@ -110,6 +111,8 @@ test('stale PR salvage ledger records the May 12 gap pass', () => {
|
||||
'#1504',
|
||||
'#1508',
|
||||
'#1603',
|
||||
'#1631',
|
||||
'#1658',
|
||||
'#1693',
|
||||
]) {
|
||||
assert.ok(source.includes(pr), `Missing May 12 gap-pass PR ${pr}`);
|
||||
@@ -119,6 +122,8 @@ test('stale PR salvage ledger records the May 12 gap pass', () => {
|
||||
assert.ok(source.includes('already preserved in #1770'));
|
||||
assert.ok(source.includes('already preserved in #1769'));
|
||||
assert.ok(source.includes('already preserved in #1766'));
|
||||
assert.ok(source.includes('false-positive proof gate'));
|
||||
assert.ok(source.includes('session_id` from stdin JSON'));
|
||||
assert.ok(source.includes('Already present as `skills/redis-patterns/`'));
|
||||
});
|
||||
|
||||
|
||||
Reference in New Issue
Block a user