feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement - maintainability.md (always-on): dead code, DRY, stale comments - security.md (conditional): OWASP deep analysis, auth bypass, injection - performance.md (conditional): N+1 queries, bundle impact, complexity - data-migration.md (conditional): reversibility, lock duration, backfill - api-contract.md (conditional): breaking changes, versioning, error format - red-team.md (conditional): adversarial analysis, cross-cutting concerns All use standard header with JSON output schema and NO FINDINGS fallback.
2026-05-20 19:29:56 +08:00 · 2026-03-30 19:52:36 -07:00
parent 55d80c4ccc
commit f06c38d975
7 changed files with 340 additions and 0 deletions
--- a/review/specialists/api-contract.md
+++ b/review/specialists/api-contract.md
@@ -0,0 +1,48 @@
 # API Contract Specialist Review Checklist
 Scope: When SCOPE_API=true
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"api-contract","summary":"...","fix":"...","fingerprint":"path:line:api-contract","specialist":"api-contract"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 ## Categories
 ### Breaking Changes
 - Removed fields from response bodies (clients may depend on them)
 - Changed field types (string → number, object → array)
 - New required parameters added to existing endpoints
 - Changed HTTP methods (GET → POST) or status codes (200 → 201)
 - Renamed endpoints without maintaining the old path as a redirect/alias
 - Changed authentication requirements (public → authenticated)
 ### Versioning Strategy
 - Breaking changes made without a version bump (v1 → v2)
 - Multiple versioning strategies mixed in the same API (URL vs header vs query param)
 - Deprecated endpoints without a sunset timeline or migration guide
 - Version-specific logic scattered across controllers instead of centralized
 ### Error Response Consistency
 - New endpoints returning different error formats than existing ones
 - Error responses missing standard fields (error code, message, details)
 - HTTP status codes that don't match the error type (200 for errors, 500 for validation)
 - Error messages that leak internal implementation details (stack traces, SQL)
 ### Rate Limiting & Pagination
 - New endpoints missing rate limiting when similar endpoints have it
 - Pagination changes (offset → cursor) without backwards compatibility
 - Changed page sizes or default limits without documentation
 - Missing total count or next-page indicators in paginated responses
 ### Documentation Drift
 - OpenAPI/Swagger spec not updated to match new endpoints or changed params
 - README or API docs describing old behavior after changes
 - Example requests/responses that no longer work
 - Missing documentation for new endpoints or changed parameters
 ### Backwards Compatibility
 - Clients on older versions: will they break?
 - Mobile apps that can't force-update: does the API still work for them?
 - Webhook payloads changed without notifying subscribers
 - SDK or client library changes needed to use new features
--- a/review/specialists/data-migration.md
+++ b/review/specialists/data-migration.md
@@ -0,0 +1,47 @@
 # Data Migration Specialist Review Checklist
 Scope: When SCOPE_MIGRATIONS=true
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 ## Categories
 ### Reversibility
 - Can this migration be rolled back without data loss?
 - Is there a corresponding down/rollback migration?
 - Does the rollback actually undo the change or just no-op?
 - Would rolling back break the current application code?
 ### Data Loss Risk
 - Dropping columns that still contain data (add deprecation period first)
 - Changing column types that truncate data (varchar(255) → varchar(50))
 - Removing tables without verifying no code references them
 - Renaming columns without updating all references (ORM, raw SQL, views)
 - NOT NULL constraints added to columns with existing NULL values (needs backfill first)
 ### Lock Duration
 - ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL)
 - Adding indexes without CONCURRENTLY on tables with >100K rows
 - Multiple ALTER TABLE statements that could be combined into one lock acquisition
 - Schema changes that acquire exclusive locks during peak traffic hours
 ### Backfill Strategy
 - New NOT NULL columns without DEFAULT value (requires backfill before constraint)
 - New columns with computed defaults that need batch population
 - Missing backfill script or rake task for existing records
 - Backfill that updates all rows at once instead of batching (locks table)
 ### Index Creation
 - CREATE INDEX without CONCURRENTLY on production tables
 - Duplicate indexes (new index covers same columns as existing one)
 - Missing indexes on new foreign key columns
 - Partial indexes where a full index would be more useful (or vice versa)
 ### Multi-Phase Safety
 - Migrations that must be deployed in a specific order with application code
 - Schema changes that break the current running code (deploy code first, then migrate)
 - Migrations that assume a deploy boundary (old code + new schema = crash)
 - Missing feature flag to handle mixed old/new code during rolling deploy
--- a/review/specialists/maintainability.md
+++ b/review/specialists/maintainability.md
@@ -0,0 +1,45 @@
 # Maintainability Specialist Review Checklist
 Scope: Always-on (every review)
 Output: JSON objects, one finding per line. Schema:
 {"severity":"INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"maintainability","summary":"...","fix":"...","fingerprint":"path:line:maintainability","specialist":"maintainability"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 ## Categories
 ### Dead Code & Unused Imports
 - Variables assigned but never read in the changed files
 - Functions/methods defined but never called (check with Grep across the repo)
 - Imports/requires that are no longer referenced after the change
 - Commented-out code blocks (either remove or explain why they exist)
 ### Magic Numbers & String Coupling
 - Bare numeric literals used in logic (thresholds, limits, retry counts) — should be named constants
 - Error message strings used as query filters or conditionals elsewhere
 - Hardcoded URLs, ports, or hostnames that should be config
 - Duplicated literal values across multiple files
 ### Stale Comments & Docstrings
 - Comments that describe old behavior after the code was changed in this diff
 - TODO/FIXME comments that reference completed work
 - Docstrings with parameter lists that don't match the current function signature
 - ASCII diagrams in comments that no longer match the code flow
 ### DRY Violations
 - Similar code blocks (3+ lines) appearing multiple times within the diff
 - Copy-paste patterns where a shared helper would be cleaner
 - Configuration or setup logic duplicated across test files
 - Repeated conditional chains that could be a lookup table or map
 ### Conditional Side Effects
 - Code paths that branch on a condition but forget a side effect on one branch
 - Log messages that claim an action happened but the action was conditionally skipped
 - State transitions where one branch updates related records but the other doesn't
 - Event emissions that only fire on the happy path, missing error/edge paths
 ### Module Boundary Violations
 - Reaching into another module's internal implementation (accessing private-by-convention methods)
 - Direct database queries in controllers/views that should go through a service/model
 - Tight coupling between components that should communicate through interfaces
--- a/review/specialists/performance.md
+++ b/review/specialists/performance.md
@@ -0,0 +1,51 @@
 # Performance Specialist Review Checklist
 Scope: When SCOPE_BACKEND=true OR SCOPE_FRONTEND=true
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"performance","summary":"...","fix":"...","fingerprint":"path:line:performance","specialist":"performance"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 ## Categories
 ### N+1 Queries
 - ActiveRecord/ORM associations traversed in loops without eager loading (.includes, joinedload, include)
 - Database queries inside iteration blocks (each, map, forEach) that could be batched
 - Nested serializers that trigger lazy-loaded associations
 - GraphQL resolvers that query per-field instead of batching (check for DataLoader usage)
 ### Missing Database Indexes
 - New WHERE clauses on columns without indexes (check migration files or schema)
 - New ORDER BY on non-indexed columns
 - Composite queries (WHERE a AND b) without composite indexes
 - Foreign key columns added without indexes
 ### Algorithmic Complexity
 - O(n^2) or worse patterns: nested loops over collections, Array.find inside Array.map
 - Repeated linear searches that could use a hash/map/set lookup
 - String concatenation in loops (use join or StringBuilder)
 - Sorting or filtering large collections multiple times when once would suffice
 ### Bundle Size Impact (Frontend)
 - New production dependencies that are known-heavy (moment.js, lodash full, jquery)
 - Barrel imports (import from 'library') instead of deep imports (import from 'library/specific')
 - Large static assets (images, fonts) committed without optimization
 - Missing code splitting for route-level chunks
 ### Rendering Performance (Frontend)
 - Fetch waterfalls: sequential API calls that could be parallel (Promise.all)
 - Unnecessary re-renders from unstable references (new objects/arrays in render)
 - Missing React.memo, useMemo, or useCallback on expensive computations
 - Layout thrashing from reading then writing DOM properties in loops
 - Missing loading="lazy" on below-fold images
 ### Missing Pagination
 - List endpoints that return unbounded results (no LIMIT, no pagination params)
 - Database queries without LIMIT that grow with data volume
 - API responses that embed full nested objects instead of IDs with expansion
 ### Blocking in Async Contexts
 - Synchronous I/O (file reads, subprocess, HTTP requests) inside async functions
 - time.sleep() / Thread.sleep() inside event-loop-based handlers
 - CPU-intensive computation blocking the main thread without worker offload
--- a/review/specialists/red-team.md
+++ b/review/specialists/red-team.md
@@ -0,0 +1,44 @@
 # Red Team Review
 Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 This is NOT a checklist review. This is adversarial analysis.
 You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
 ## Approach
 ### 1. Attack the Happy Path
 - What happens when the system is under 10x normal load?
 - What happens when two requests hit the same resource simultaneously?
 - What happens when the database is slow (>5s query time)?
 - What happens when an external service returns garbage?
 ### 2. Find the Silent Failures
 - Error handling that swallows exceptions (catch-all with just a log)
 - Operations that can partially complete (3 of 5 items processed, then crash)
 - State transitions that leave records in inconsistent states on failure
 - Background jobs that fail without alerting anyone
 ### 3. Exploit Trust Assumptions
 - Data validated on the frontend but not the backend
 - Internal APIs called without authentication (assuming "only our code calls this")
 - Configuration values assumed to be present but not validated
 - File paths or URLs constructed from user input without sanitization
 ### 4. Break the Edge Cases
 - What happens with the maximum possible input size?
 - What happens with zero items, empty strings, null values?
 - What happens on the first run ever (no existing data)?
 - What happens when the user clicks the button twice in 100ms?
 ### 5. Find What the Other Specialists Missed
 - Review each specialist's findings. What's the gap between their categories?
 - Look for cross-category issues (e.g., a performance issue that's also a security issue)
 - Look for issues at integration boundaries (where two systems meet)
 - Look for issues that only manifest in specific deployment configurations
--- a/review/specialists/security.md
+++ b/review/specialists/security.md
@@ -0,0 +1,60 @@
 # Security Specialist Review Checklist
 Scope: When SCOPE_AUTH=true OR (SCOPE_BACKEND=true AND diff > 100 lines)
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"security","summary":"...","fix":"...","fingerprint":"path:line:security","specialist":"security"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 This checklist goes deeper than the main CRITICAL pass. The main agent already checks SQL injection, race conditions, LLM trust, and enum completeness. This specialist focuses on auth/authz patterns, cryptographic misuse, and attack surface expansion.
 ## Categories
 ### Input Validation at Trust Boundaries
 - User input accepted without validation at controller/handler level
 - Query parameters used directly in database queries or file paths
 - Request body fields accepted without type checking or schema validation
 - File uploads without type/size/content validation
 - Webhook payloads processed without signature verification
 ### Auth & Authorization Bypass
 - Endpoints missing authentication middleware (check route definitions)
 - Authorization checks that default to "allow" instead of "deny"
 - Role escalation paths (user can modify their own role/permissions)
 - Direct object reference vulnerabilities (user A accesses user B's data by changing an ID)
 - Session fixation or session hijacking opportunities
 - Token/API key validation that doesn't check expiration
 ### Injection Vectors (beyond SQL)
 - Command injection via subprocess calls with user-controlled arguments
 - Template injection (Jinja2, ERB, Handlebars) with user input
 - LDAP injection in directory queries
 - SSRF via user-controlled URLs (fetch, redirect, webhook targets)
 - Path traversal via user-controlled file paths (../../etc/passwd)
 - Header injection via user-controlled values in HTTP headers
 ### Cryptographic Misuse
 - Weak hashing algorithms (MD5, SHA1) for security-sensitive operations
 - Predictable randomness (Math.random, rand()) for tokens or secrets
 - Non-constant-time comparisons (==) on secrets, tokens, or digests
 - Hardcoded encryption keys or IVs
 - Missing salt in password hashing
 ### Secrets Exposure
 - API keys, tokens, or passwords in source code (even in comments)
 - Secrets logged in application logs or error messages
 - Credentials in URLs (query parameters or basic auth in URL)
 - Sensitive data in error responses returned to users
 - PII stored in plaintext when encryption is expected
 ### XSS via Escape Hatches
 - Rails: .html_safe, raw() on user-controlled data
 - React: dangerouslySetInnerHTML with user content
 - Vue: v-html with user content
 - Django: |safe, mark_safe() on user input
 - General: innerHTML assignment with unsanitized data
 ### Deserialization
 - Deserializing untrusted data (pickle, Marshal, YAML.load, JSON.parse of executable types)
 - Accepting serialized objects from user input or external APIs without schema validation
--- a/review/specialists/testing.md
+++ b/review/specialists/testing.md
@@ -0,0 +1,45 @@
 # Testing Specialist Review Checklist
 Scope: Always-on (every review)
 Output: JSON objects, one finding per line. Schema:
 {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"testing","summary":"...","fix":"...","fingerprint":"path:line:testing","specialist":"testing"}
 If no findings: output `NO FINDINGS` and nothing else.
 ---
 ## Categories
 ### Missing Negative-Path Tests
 - New code paths that handle errors, rejections, or invalid input with NO corresponding test
 - Guard clauses and early returns that are untested
 - Error branches in try/catch, rescue, or error boundaries with no failure-path test
 - Permission/auth checks that are asserted in code but never tested for the "denied" case
 ### Missing Edge-Case Coverage
 - Boundary values: zero, negative, max-int, empty string, empty array, nil/null/undefined
 - Single-element collections (off-by-one on loops)
 - Unicode and special characters in user-facing inputs
 - Concurrent access patterns with no race-condition test
 ### Test Isolation Violations
 - Tests sharing mutable state (class variables, global singletons, DB records not cleaned up)
 - Order-dependent tests (pass in sequence, fail when randomized)
 - Tests that depend on system clock, timezone, or locale
 - Tests that make real network calls instead of using stubs/mocks
 ### Flaky Test Patterns
 - Timing-dependent assertions (sleep, setTimeout, waitFor with tight timeouts)
 - Assertions on ordering of unordered results (hash keys, Set iteration, async resolution order)
 - Tests that depend on external services (APIs, databases) without fallback
 - Randomized test data without seed control
 ### Security Enforcement Tests Missing
 - Auth/authz checks in controllers with no test for the "unauthorized" case
 - Rate limiting logic with no test proving it actually blocks
 - Input sanitization with no test for malicious input
 - CSRF/CORS configuration with no integration test
 ### Coverage Gaps
 - New public methods/functions with zero test coverage
 - Changed methods where existing tests only cover the old behavior, not the new branch
 - Utility functions called from multiple places but tested only indirectly