Skip to content

Audit Azure AI claude-opus-4-7 reasoning_type (still budget; flip to adaptive when Azure enforces) #1163

@gltanaka

Description

@gltanaka

Background

On 2026-05-23 ~17:25 UTC, Anthropic's direct API enforced the new adaptive thinking shape for Claude Opus 4.7 — legacy thinking={"type":"enabled","budget_tokens":N} returns 400 "is not supported for this model". PR #1156 flipped the Anthropic claude-opus-4-7 row to reasoning_type=adaptive and added _ADAPTIVE_ANTHROPIC_MODELS + _is_adaptive_anthropic_model() in generate_model_catalog.py.

The azure_ai/claude-opus-4-7 row was intentionally left at budget because:

  1. Azure AI relays through Anthropic but on its own enforcement timeline (typically weeks-to-months behind Anthropic's direct API).
  2. llm_invoke.py's adaptive serialization (around line ~3556) is gated on provider_lower == 'anthropic' — flipping the Azure row to adaptive today would silently send no thinking parameter at all (worse than the current state, not better).
  3. Zero production traffic to Azure AI: no AZURE_AI_API_KEY is set on any pdd_cloud Cloud Run service, so the row is filtered out by _ensure_api_key before it can fail.

When this issue should be acted on

Whichever comes first:

  • Someone sets AZURE_AI_API_KEY in a server context (cron, Cloud Run, CI). Then Azure AI rows become eligible candidates and we need to know whether Azure has flipped enforcement.
  • We observe a thinking.type.enabled 400 from Azure in any monitoring (cloud function logs, user reports). That's the signal Azure has caught up.

Scope of fix when triggered

Two changes in one PR:

  1. In pdd/data/llm_model.csv, flip Azure AI,azure_ai/claude-opus-4-7,… from reasoning_type=budget to reasoning_type=adaptive (and max_reasoning_tokens=16000 if matching the Anthropic row).
  2. In pdd/generate_model_catalog.py, extend _ADAPTIVE_PROVIDERS = {"anthropic"} to {"anthropic", "azure_ai"} so the catalog regenerator doesn't silently revert (1).
  3. In pdd/llm_invoke.py, extend the adaptive-serialization branch (around line ~3556) so provider_lower in ('anthropic', 'azure ai') triggers it — otherwise the flipped row still emits no thinking parameter. This is the load-bearing change — without it, (1) makes Azure calls effectively no-thinking.
  4. Add a regression test mirroring the Anthropic-side test for the Azure row.

Acceptance criteria

  • pytest tests/test_generate_model_catalog.py -k azure -v passes
  • A monkeypatched Azure AI BadRequestError test confirms (3) sends the adaptive parameter shape
  • Verified against actual Azure AI behavior (live test or staging-call) — adaptive shape accepted

Effort

~30 minutes once Azure enforcement is confirmed. Tiny patch, mostly tests.

Priority

Low until trigger fires. Not blocking anything today.

Related: PR #1156 (the Anthropic-side flip).

🤖 Generated with Claude Code — incident-response follow-up from 2026-05-24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions