combineVerdict's 2-of-N ensemble rule was designed for user input —
the Stack Overflow FP mitigation where a dev asking about injection
shouldn't kill the session. For tool output (page content, Read/Grep
results), the content wasn't user-authored, so that FP risk doesn't
apply. Before this change: testsavant_content=0.99 on a hostile page
downgraded to WARN when the transcript classifier degraded (timeout,
Haiku unavailable) or voted differently.
Add CombineVerdictOpts.toolOutput flag. When true, a single ML
classifier >= BLOCK threshold blocks directly. User-input default
path unchanged — still requires 2-of-N to block.
Caller: sidebar-agent.ts tool-result scan now passes { toolOutput: true }.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>