Gap caught by user: the review-flow tests verified the decision path
(POST, file write, agent_error emission) but not the actual security
property — that Block stops subsequent tool calls and Allow lets them
continue.
Mock-claude tool_result_injection scenario now emits a second tool_use
~8s after the injected tool_result, targeting post-block-followup.
example.com. If block really blocks, that event never reaches the
chat feed (SIGTERM killed the subprocess before it emitted). If allow
really allows, it does.
Allow test asserts the followup tool_use DOES appear → session lives.
Block test asserts the followup tool_use does NOT appear after 12s →
kill actually stopped further work. Both tests previously proved the
control plane (decision file → agent poll → agent_error); they now
prove the data plane too.
Test timeout bumped 60s → 90s to accommodate the 12s quiet window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>