mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-21 03:40:00 +08:00
All specialist prompts now document test_stub as an optional output field, enabling specialists to suggest test code alongside findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
46 lines
2.2 KiB
Markdown
46 lines
2.2 KiB
Markdown
# Red Team Review
|
|
|
|
Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists.
|
|
Output: JSON objects, one finding per line. Schema:
|
|
{"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"}
|
|
Optional: line, fix, fingerprint, evidence, test_stub.
|
|
If no findings: output `NO FINDINGS` and nothing else.
|
|
|
|
---
|
|
|
|
This is NOT a checklist review. This is adversarial analysis.
|
|
|
|
You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.
|
|
|
|
## Approach
|
|
|
|
### 1. Attack the Happy Path
|
|
- What happens when the system is under 10x normal load?
|
|
- What happens when two requests hit the same resource simultaneously?
|
|
- What happens when the database is slow (>5s query time)?
|
|
- What happens when an external service returns garbage?
|
|
|
|
### 2. Find the Silent Failures
|
|
- Error handling that swallows exceptions (catch-all with just a log)
|
|
- Operations that can partially complete (3 of 5 items processed, then crash)
|
|
- State transitions that leave records in inconsistent states on failure
|
|
- Background jobs that fail without alerting anyone
|
|
|
|
### 3. Exploit Trust Assumptions
|
|
- Data validated on the frontend but not the backend
|
|
- Internal APIs called without authentication (assuming "only our code calls this")
|
|
- Configuration values assumed to be present but not validated
|
|
- File paths or URLs constructed from user input without sanitization
|
|
|
|
### 4. Break the Edge Cases
|
|
- What happens with the maximum possible input size?
|
|
- What happens with zero items, empty strings, null values?
|
|
- What happens on the first run ever (no existing data)?
|
|
- What happens when the user clicks the button twice in 100ms?
|
|
|
|
### 5. Find What the Other Specialists Missed
|
|
- Review each specialist's findings. What's the gap between their categories?
|
|
- Look for cross-category issues (e.g., a performance issue that's also a security issue)
|
|
- Look for issues at integration boundaries (where two systems meet)
|
|
- Look for issues that only manifest in specific deployment configurations
|