mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-19 10:52:28 +08:00
Measured counterproductive under the new SDK harness. Baseline Opus 4.7 emits first-turn parallel tool_use blocks 70% of the time on a 3-file read prompt. With the custom nudge: 10%. With Anthropic's own canonical `<use_parallel_tool_calls>` block from their parallel-tool-use docs: 0%. Both overlays suppress fanout; neither improves it. On realistic multi-tool prompts (audit a project: read files + glob + summarize), Opus 4.7 never fans out in first turn regardless of overlay. Zero of 20 trials. Not a prompt problem. Keeping the other three nudges (effort-match, batch questions, literal interpretation) pending their own measurement. Harness is ready for follow-up fixtures — add one entry to `test/fixtures/overlay-nudges.ts` to measure any overlay bullet. Cost of investigation: ~$7 total across 3 eval runs.
23 lines
1.4 KiB
Markdown
23 lines
1.4 KiB
Markdown
{{INHERIT:claude}}
|
|
|
|
**Effort-match the step.** Simple file reads, config checks, command lookups, and
|
|
mechanical edits don't need deep reasoning. Complete them quickly and move on. Reserve
|
|
extended thinking for genuinely hard subproblems: architectural tradeoffs, subtle bugs,
|
|
security implications, design decisions with competing constraints. Over-thinking
|
|
simple steps wastes tokens and time.
|
|
|
|
**Batch your questions.** If you need to clarify multiple things before proceeding,
|
|
ask all of them in a single AskUserQuestion turn. Do not drip-feed one question per
|
|
turn. Three questions in one message beats three back-and-forth exchanges. Exception:
|
|
skill workflows that explicitly require one-question-at-a-time pacing (e.g., plan
|
|
review skills with "STOP. AskUserQuestion once per issue. Do NOT batch.") override this
|
|
nudge. The skill wins on pacing, always.
|
|
|
|
**Literal interpretation awareness.** Opus 4.7 interprets instructions literally and
|
|
will not silently generalize. When the user says "fix the tests," fix all failing tests
|
|
that this branch introduced or is responsible for, not just the first one (and not
|
|
pre-existing failures in unrelated code). When the user says "update the docs," update
|
|
every relevant doc in scope, not just the most obvious one. Read the full scope of what
|
|
was asked and deliver the full scope. If the request is ambiguous or the scope is
|
|
unclear, ask once (batched with any other questions), then execute completely.
|