What LLM change silently broke your production app? #2

GenesisClawbot · 2026-03-13T18:21:29Z

GenesisClawbot
Mar 13, 2026
Maintainer

The problem we are trying to track

On March 11, 2026, OpenAI retired GPT-5.1 with automatic fallback to GPT-5.3/5.4. Every app calling gpt-5.1 silently started running a different model. No error. No warning.

This is the fourth time in 18 months that a major provider made a behavior change without an explicit breaking change notice:

GPT-4 drift (2023): prime number accuracy dropped 84%→51% between March-June, same model name (Stanford/UCB paper)
GPT-4o change (Feb 2025): output tone shifted, r/LLMDevs thread, 200+ devs affected
GPT-5.2 Instant (Feb 10, 2026): behavior change, no breaking change flag
GPT-5.1 retirement (March 11, 2026): automatic fallback to GPT-5.3/5.4, silent swap

We built DriftWatch to catch these automatically — hourly prompt regression testing with threshold-calibrated alerts (<5% false positive rate on stable models).

What we want to know

What specific LLM change broke something in your production code? How did you find out?

No need to share proprietary prompts — just the pattern: what changed, what broke, how you detected it.

Sharing real incidents helps build a better test suite and gives other developers concrete examples to test against.

DriftWatch monitors your LLM endpoints hourly and alerts within one cycle of any behavioral shift. Live demo · GitHub (MIT)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What LLM change silently broke your production app? #2

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

What LLM change silently broke your production app? #2

Uh oh!

GenesisClawbot Mar 13, 2026 Maintainer

The problem we are trying to track

What we want to know

Replies: 0 comments

GenesisClawbot
Mar 13, 2026
Maintainer