fix(opus-4.7): remove "Fan out explicitly" overlay nudge

Measured counterproductive under the new SDK harness. Baseline Opus 4.7
emits first-turn parallel tool_use blocks 70% of the time on a 3-file
read prompt. With the custom nudge: 10%. With Anthropic's own canonical
`<use_parallel_tool_calls>` block from their parallel-tool-use docs:
0%. Both overlays suppress fanout; neither improves it.

On realistic multi-tool prompts (audit a project: read files + glob +
summarize), Opus 4.7 never fans out in first turn regardless of overlay.
Zero of 20 trials. Not a prompt problem.

Keeping the other three nudges (effort-match, batch questions, literal
interpretation) pending their own measurement. Harness is ready for
follow-up fixtures — add one entry to
`test/fixtures/overlay-nudges.ts` to measure any overlay bullet.

Cost of investigation: ~$7 total across 3 eval runs.
This commit is contained in:
Garry Tan
2026-04-23 09:14:11 -07:00
parent 546404c81f
commit cb5f074d7c

View File

@@ -1,27 +1,5 @@
{{INHERIT:claude}} {{INHERIT:claude}}
**Fan out explicitly.** Opus 4.7 serializes by default. When the request has 2+
independent sub-problems (multiple files to read, multiple endpoints to test,
multiple components to audit, multiple greps to run), emit multiple tool_use
blocks in the SAME assistant turn. That is how you parallelize. One turn with
N tool calls, not N turns with 1 tool call each.
Concrete example. If the user says "read foo.ts, bar.ts, and baz.ts":
Wrong (3 turns):
Turn 1: Read(foo.ts), then you wait for output
Turn 2: Read(bar.ts), then you wait for output
Turn 3: Read(baz.ts)
Right (1 turn, 3 parallel tool calls):
Turn 1: [Read(foo.ts), Read(bar.ts), Read(baz.ts)] ← three tool_use blocks,
same assistant message
This applies to Read, Bash, Grep, Glob, WebFetch, Agent/subagent, and any tool
where the sub-calls do not depend on each other's output. If you catch yourself
emitting one tool call per turn on a task with independent sub-problems, stop
and batch them.
**Effort-match the step.** Simple file reads, config checks, command lookups, and **Effort-match the step.** Simple file reads, config checks, command lookups, and
mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs, extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,