Commit e8585f1
feat: add circuit breaker for upstream provider overload protection (#75)
* feat: add circuit breaker for upstream provider overload protection
Implement per-provider circuit breakers that detect upstream rate limiting
(429/503/529 status codes) and temporarily stop sending requests when
providers are overloaded.
Key features:
- Per-provider circuit breakers (Anthropic, OpenAI)
- Configurable failure threshold, time window, and cooldown period
- Half-open state allows gradual recovery testing
- Prometheus metrics for monitoring (state gauge, trips counter, rejects counter)
- Thread-safe implementation with proper state machine transitions
- Disabled by default for backward compatibility
Circuit breaker states:
- Closed: normal operation, tracking failures within sliding window
- Open: all requests rejected with 503, waiting for cooldown
- Half-Open: limited requests allowed to test if upstream recovered
Status codes that trigger circuit breaker:
- 429 Too Many Requests
- 503 Service Unavailable
- 529 Anthropic Overloaded
Relates to: coder/internal#1153
* chore: apply make fmt
* refactor: use sony/gobreaker for circuit breakers with per-endpoint isolation
- Replace custom circuit breaker implementation with sony/gobreaker
- Change from per-provider to per-endpoint circuit breakers
(e.g., OpenAI chat completions failing won't block responses API)
- Simplify API: CircuitBreakers manages all breakers internally
- Update metrics to include endpoint label
- Simplify tests to focus on key behaviors
Based on PR review feedback suggesting use of established library
and per-endpoint granularity for better fault isolation.
* refactor: align CircuitBreakerConfig fields with gobreaker.Settings
Rename fields to match gobreaker naming convention:
- Window -> Interval
- Cooldown -> Timeout
- HalfOpenMaxRequests -> MaxRequests
- FailureThreshold type int64 -> uint32
* refactor: remove CircuitState, use gobreaker.State directly
* refactor: implement circuit breaker as middleware with per-provider configs
Address PR review feedback:
1. Middleware pattern - Circuit breaker is now HTTP middleware that wraps
handlers, capturing response status codes directly instead of extracting
from provider-specific error types.
2. Per-provider configs - NewCircuitBreakers takes map[string]CircuitBreakerConfig
keyed by provider name. Providers not in the map have no circuit breaker.
3. Remove provider overfitting - Deleted extractStatusCodeFromError() which
hardcoded AnthropicErrorResponse and OpenAIErrorResponse types. Middleware
now uses statusCapturingWriter to inspect actual HTTP response codes.
4. Configurable failure detection - IsFailure func in config allows providers
to define custom status codes as failures. Defaults to 429/503/529.
5. Fix gauge values - State gauge now uses 0 (closed), 0.5 (half-open), 1 (open)
6. Integration tests - Replaced unit tests with httptest-based integration tests
that verify actual behavior: upstream errors trip circuit, requests get
blocked, recovery after timeout, per-endpoint isolation.
7. Error message - Changed from 'upstream rate limiting' to 'circuit breaker is open'
* docs: clarify noop behavior when provider not configured
* Update go.mod
* fix: update metrics help text to reflect 0/0.5/1 gauge values
* refactor: add CircuitBreaker interface with NoopCircuitBreaker
- Add CircuitBreaker interface with Allow(), RecordSuccess(), RecordFailure()
- Add NoopCircuitBreaker struct for providers without circuit breaker config
- Add gobreakerCircuitBreaker wrapping sony/gobreaker implementation
- CircuitBreakers.Get() returns NoopCircuitBreaker when provider not configured
- Add http.Flusher support to statusCapturingWriter for SSE streaming
- Add Unwrap() for ResponseWriter interface detection
* refactor: use gobreaker Execute for proper half-open rejection handling
- Changed CircuitBreaker interface to Execute(fn func() int) (statusCode, rejected)
- Use gobreaker.Execute() to properly handle both ErrOpenState and ErrTooManyRequests
- NoopCircuitBreaker.Execute simply runs the function and returns not rejected
- Simplified middleware by removing separate Allow/Record pattern
* refactor: remove unused circuitBreakers field and getter from RequestBridge
* use per-provider maps for endpoints
* make fmt
* use mux.Handle for cb middleware
* Move CircuitBreakerConfig to the Provider struct
* Update tests
* default noop func for onChange
* create CircuitBreakers per Provider instead of a global one and remove gobreakerCircuitBraker along with the interface and noop struct
* Update bridge.go
Co-authored-by: Paweł Banaszewski <pawel@coder.com>
* fix format
* Apply review suggestions
* Apply review suggestions and add proper integration tests
* Add test to check circuit breaker config
* Remove test
* Remove TestCircuitBreaker_HalfOpenAndRecovery
* Apply review suggestions
* Apply review suggestions
* Fix test
* Add TestCircuitBreaker_HalfOpenMaxRequests test and add Retry-After header
* Apply review suggestions
* Apply review suggestions
* Update provider/anthropic.go
Co-authored-by: Danny Kopping <danny@coder.com>
* Fix tests
* Fix fmt
---------
Co-authored-by: Paweł Banaszewski <pawel@coder.com>
Co-authored-by: Danny Kopping <danny@coder.com>1 parent 61a792b commit e8585f1
12 files changed
Lines changed: 1061 additions & 17 deletions
File tree
- circuitbreaker
- config
- metrics
- provider
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| 20 | + | |
| 21 | + | |
19 | 22 | | |
20 | 23 | | |
21 | | - | |
22 | | - | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | 67 | | |
64 | 68 | | |
65 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
66 | 89 | | |
67 | | - | |
68 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
69 | 101 | | |
70 | 102 | | |
71 | 103 | | |
72 | 104 | | |
73 | 105 | | |
74 | 106 | | |
75 | | - | |
76 | | - | |
77 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
78 | 110 | | |
79 | 111 | | |
80 | 112 | | |
| |||
100 | 132 | | |
101 | 133 | | |
102 | 134 | | |
103 | | - | |
| 135 | + | |
104 | 136 | | |
105 | 137 | | |
106 | 138 | | |
| |||
0 commit comments