The test checked for exact keywords like "RECOMMENDATION", "option a",
"which approach" but the model sometimes phrases options as "A)" or
references "Checkout" vs "Elements" directly without using the word
"recommend". Added: "option b", regex for "a)"/"b)", and the actual
decision terms (checkout, elements, hosted, embedded).
Failed 3/3 retries in CI because the assertion was too narrow for
non-deterministic LLM output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>