Skip to content

Commit 846b882

Browse files
committed
Support configurable retry rules per provider via retryRules
1 parent 0e3ef68 commit 846b882

9 files changed

Lines changed: 289 additions & 24 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,4 @@ ECA Agent Guide (AGENTS.md)
3030
- Use java class typing to avoid GraalVM reflection issues
3131
- Avoid adding too many comments, only add essential or when you think is really important to mention something.
3232
- ECA's protocol specification of client <-> server lives in docs/protocol.md
33-
- When adding support to a new feature or fixing a existing github issue, add a entry to Unreleased in CHANGELOG.md if not already there.
33+
- When adding support to a new feature or fixing a existing github issue, add a entry to Unreleased in CHANGELOG.md if not already there, be concise like the rest.

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
- Fix MCP server initialization crash (`String cannot be cast to IPersistentCollection`) when OAuth metadata endpoint returns a non-JSON or error response.
66
- Auto-approve `eca__read_file` for tool call output cache files (`~/.cache/eca/toolCallOutputs`).
7+
- Support configurable retry rules per provider via `retryRules`, allowing users to define custom HTTP status and/or body pattern matching for automatic retries with optional labels.
78

89
## 0.110.3
910

docs/config.json

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,36 @@
424424
"okhttp"
425425
]
426426
},
427+
"retryRules": {
428+
"type": "array",
429+
"description": "Custom retry rules. Each rule can match by HTTP status code and/or response body regex pattern. When matched, the request is retried with exponential backoff.",
430+
"markdownDescription": "Custom retry rules. Each rule can match by HTTP status code and/or response body regex pattern. When matched, the request is retried with exponential backoff.",
431+
"items": {
432+
"type": "object",
433+
"properties": {
434+
"status": {
435+
"type": "integer",
436+
"description": "HTTP status code to match.",
437+
"markdownDescription": "HTTP status code to match."
438+
},
439+
"bodyPattern": {
440+
"type": "string",
441+
"description": "Regex pattern to match against the response body (case-insensitive).",
442+
"markdownDescription": "Regex pattern to match against the response body (case-insensitive)."
443+
},
444+
"label": {
445+
"type": "string",
446+
"description": "Human-readable label shown in the retry progress message (e.g. 'Rate limited').",
447+
"markdownDescription": "Human-readable label shown in the retry progress message (e.g. 'Rate limited')."
448+
}
449+
},
450+
"anyOf": [
451+
{ "required": ["status"] },
452+
{ "required": ["bodyPattern"] }
453+
],
454+
"additionalProperties": false
455+
}
456+
},
427457
"models": {
428458
"type": "object",
429459
"description": "Available models for this provider. Each key is the model alias.",

docs/config/models.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ Schema:
7878
| `thinkTagStart` | string | Optional override the think start tag tag for openai-chat (Default: "<think>") api | No |
7979
| `thinkTagEnd` | string | Optional override the think end tag for openai-chat (Default: "</think>") api | No |
8080
| `httpClient` | map | Allow customize the http-client for this provider requests, like changing http version | No |
81+
| `retryRules` | array | Custom retry rules that match by HTTP status and/or body regex pattern (see [Retry Rules](#retry-rules)) | No |
8182
| `models` | map | Key: model name, value: its config | Yes |
8283
| `models <model> extraPayload` | map | Extra payload sent in body to LLM | No |
8384
| `models <model> extraHeaders` | map | Extra headers sent to LLM request | No |
@@ -230,6 +231,42 @@ Notes:
230231
- Authentication priority (short): `key` (with dynamic string pase support) > OAuth.
231232
- All providers with API key auth can use credential files.
232233

234+
### Retry Rules
235+
236+
ECA automatically retries requests on common transient errors (429, 500, 502, 503, 529) with exponential backoff. You can define custom retry rules per provider using `retryRules` to handle additional status codes or response body patterns.
237+
238+
Each rule can match by:
239+
240+
- **`status`** (integer): HTTP status code to match
241+
- **`bodyPattern`** (string): Regex pattern to match against the response body (case-insensitive)
242+
- **`label`** (string, optional): Human-readable text shown in the retry progress message
243+
244+
At least one of `status` or `bodyPattern` is required. When both are specified, both must match. Custom rules are checked before built-in classification, so they can override default behavior.
245+
246+
```javascript title="~/.config/eca/config.json"
247+
{
248+
"providers": {
249+
"my-company": {
250+
"api": "openai-chat",
251+
"url": "${env:MY_COMPANY_API_URL}",
252+
"key": "${env:MY_COMPANY_API_KEY}",
253+
"retryRules": [
254+
{"status": 418, "label": "Corporate proxy throttle"},
255+
{"bodyPattern": "capacity.*exceeded", "label": "Capacity exceeded"},
256+
{"status": 503, "bodyPattern": "maintenance", "label": "Under maintenance"}
257+
],
258+
"models": {
259+
"gpt-5": {}
260+
}
261+
}
262+
}
263+
}
264+
```
265+
266+
When a retry rule matches, the chat shows a progress message like:
267+
268+
> ⏳ Corporate proxy throttle. Retrying in 2s (attempt 1/10)
269+
233270
## Providers examples
234271

235272
=== "Anthropic"

src/eca/features/chat.clj

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1299,12 +1299,13 @@
12991299
:variant (:variant chat-ctx)
13001300
:subagent? (some? (get-in @db* [:chats chat-id :subagent]))
13011301
:cancelled? (fn [] (identical? :stopping (get-in @db* [:chats chat-id :status])))
1302-
:on-retry (fn [{:keys [attempt max-retries delay-ms error-data]}]
1303-
(let [{error-type :error/type} (llm-providers.errors/classify-error error-data)
1304-
reason (case error-type
1305-
:rate-limited "Rate limited"
1306-
:overloaded "Provider overloaded"
1307-
"Transient error")]
1302+
:on-retry (fn [{:keys [attempt max-retries delay-ms classified]}]
1303+
(let [{error-type :error/type error-label :error/label} classified
1304+
reason (or error-label
1305+
(case error-type
1306+
:rate-limited "Rate limited"
1307+
:overloaded "Provider overloaded"
1308+
"Transient error"))]
13081309
(send-content! chat-ctx :system
13091310
{:type :progress
13101311
:state :running

src/eca/llm_api.clj

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -371,9 +371,12 @@
371371
(when-not (:silent? (ex-data exception))
372372
(logger/error args)
373373
(on-error args)))
374+
provider-config (get-in config [:providers provider])
375+
retry-rules (:retryRules provider-config)
374376
maybe-retry (fn [error-data attempt on-give-up retry-prompt-fn]
375-
(let [{error-type :error/type} (llm-providers.errors/classify-error error-data)]
376-
(if (and (contains? #{:rate-limited :overloaded} error-type)
377+
(let [{error-type :error/type
378+
:as classified} (llm-providers.errors/classify-error error-data retry-rules)]
379+
(if (and (contains? #{:rate-limited :overloaded :retryable-custom} error-type)
377380
(< attempt default-max-retries)
378381
(not (cancelled?)))
379382
(let [delay-ms (retry-delay-ms attempt)]
@@ -387,14 +390,14 @@
387390
(on-retry {:attempt (inc attempt)
388391
:max-retries default-max-retries
389392
:delay-ms delay-ms
390-
:error-data error-data})
393+
:error-data error-data
394+
:classified classified})
391395
(catch Exception e
392396
(logger/warn logger-tag "on-retry callback failed" {:exception e}))))
393397
(if (sleep-with-cancel delay-ms cancelled?)
394398
(retry-prompt-fn (inc attempt))
395399
(on-give-up error-data)))
396400
(on-give-up error-data))))
397-
provider-config (get-in config [:providers provider])
398401
model-config (get-in provider-config [:models model])
399402
model-config (update model-config :variants #(config/effective-model-variants config provider model %))
400403
api-handler (provider->api-handler provider model config)

src/eca/llm_providers/errors.clj

Lines changed: 37 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -68,35 +68,60 @@
6868

6969
:else nil)))
7070

71+
(defn ^:private classify-by-custom-rules
72+
"Checks user-configured retry rules. Each rule may have :status (int),
73+
:body-pattern (regex string, case-insensitive), and :label (string).
74+
Returns {:error/type :retryable-custom :error/label label} on first match, nil otherwise."
75+
[{:keys [status body]} retry-rules]
76+
(when (seq retry-rules)
77+
(some (fn [{rule-status :status rule-body-pattern :bodyPattern rule-label :label}]
78+
(let [status-matches? (if rule-status
79+
(= rule-status status)
80+
true)
81+
body-matches? (if rule-body-pattern
82+
(when (string? body)
83+
(re-find (re-pattern (str "(?i)" rule-body-pattern)) body))
84+
true)
85+
has-condition? (or rule-status rule-body-pattern)]
86+
(when (and has-condition? status-matches? body-matches?)
87+
(cond-> {:error/type :retryable-custom}
88+
rule-label (assoc :error/label rule-label)))))
89+
retry-rules)))
90+
7191
(defn classify-error
7292
"Classifies an error map into a semantic error type.
7393
7494
Accepts the standard on-error map shape: {:message :status :body :exception}.
95+
Optional `retry-rules` seq of user-configured rules checked before built-in classification.
7596
Returns a map with :error/type — one of:
97+
:retryable-custom — matched a user-configured retry rule (with optional :error/label)
7698
:context-overflow — prompt exceeds model context window
7799
:rate-limited — 429 or rate limit pattern in body/message
78100
:overloaded — provider overloaded (503, 529, etc.)
79101
:auth — authentication/authorization failure (401, 403)
80102
:unknown — unclassified error"
81-
[{:keys [status exception] :as error-data}]
82-
(or (when status
83-
(classify-by-status-and-body error-data))
84-
(classify-by-message error-data)
85-
(when exception
86-
(classify-by-message {:message (ex-message exception)}))
87-
{:error/type :unknown}))
103+
([error-data] (classify-error error-data nil))
104+
([{:keys [status exception] :as error-data} retry-rules]
105+
(or (classify-by-custom-rules error-data retry-rules)
106+
(when status
107+
(classify-by-status-and-body error-data))
108+
(classify-by-message error-data)
109+
(when exception
110+
(classify-by-message {:message (ex-message exception)}))
111+
{:error/type :unknown})))
88112

89113
(defn context-overflow?
90114
"Returns true if the error is a context window overflow."
91115
[error-data]
92116
(= :context-overflow (:error/type (classify-error error-data))))
93117

94118
(def ^:private retryable-error-types
95-
#{:rate-limited :overloaded})
119+
#{:rate-limited :overloaded :retryable-custom})
96120

97121
(defn retryable?
98122
"Returns true if the error is transient and the request can be retried
99-
(rate-limited or provider overloaded)."
100-
[error-data]
101-
(contains? retryable-error-types
102-
(:error/type (classify-error error-data))))
123+
(rate-limited, provider overloaded, or matched a custom retry rule)."
124+
([error-data] (retryable? error-data nil))
125+
([error-data retry-rules]
126+
(contains? retryable-error-types
127+
(:error/type (classify-error error-data retry-rules)))))

test/eca/llm_api_test.clj

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,3 +338,87 @@
338338
:on-message-received identity})))
339339
(is (= 1 @attempt*))
340340
(is (true? @on-error-called*)))))
341+
342+
(deftest sync-retry-on-custom-retry-rule-test
343+
(testing "retries when custom retryRules status matches"
344+
(let [attempt* (atom 0)
345+
retry-events* (atom [])
346+
on-error-called* (atom false)]
347+
(with-redefs [eca.llm-api/prompt! (fn [_opts]
348+
(let [attempt (swap! attempt* inc)]
349+
(if (= 1 attempt)
350+
{:error {:status 418
351+
:body "I'm a teapot"
352+
:message "LLM response status: 418"}}
353+
{:output-text "success"
354+
:usage {:input-tokens 10 :output-tokens 5}})))
355+
eca.llm-api/sleep-with-cancel (fn [_ cancelled?] (not (cancelled?)))]
356+
(llm-api/sync-or-async-prompt!
357+
(make-prompt-opts
358+
{:stream false
359+
:config {:providers {"anthropic" {:key "test-key"
360+
:url "http://test"
361+
:retryRules [{:status 418 :label "Proxy throttle"}]
362+
:models {"claude-sonnet-4-6" {:extraPayload {:stream false}}}}}}
363+
:on-retry (fn [event] (swap! retry-events* conj event))
364+
:on-error (fn [_] (reset! on-error-called* true))
365+
:on-message-received identity})))
366+
(is (= 2 @attempt*))
367+
(is (= 1 (count @retry-events*)))
368+
(is (= :retryable-custom (get-in (first @retry-events*) [:classified :error/type])))
369+
(is (= "Proxy throttle" (get-in (first @retry-events*) [:classified :error/label])))
370+
(is (false? @on-error-called*)))))
371+
372+
(deftest async-retry-on-custom-retry-rule-body-pattern-test
373+
(testing "retries async when custom retryRules bodyPattern matches"
374+
(let [attempt* (atom 0)
375+
retry-events* (atom [])
376+
received-text* (atom "")
377+
on-error-called* (atom false)]
378+
(with-redefs [eca.llm-api/prompt! (fn [{:keys [on-message-received on-error]}]
379+
(let [attempt (swap! attempt* inc)]
380+
(if (= 1 attempt)
381+
(on-error {:status 500
382+
:body "server capacity exceeded"
383+
:message "LLM response status: 500"})
384+
(do
385+
(on-message-received {:type :text :text "hello"})
386+
(on-message-received {:type :finish :finish-reason "stop"})))))
387+
eca.llm-api/sleep-with-cancel (fn [_ cancelled?] (not (cancelled?)))]
388+
(llm-api/sync-or-async-prompt!
389+
(make-prompt-opts
390+
{:config {:providers {"anthropic" {:key "test-key"
391+
:url "http://test"
392+
:retryRules [{:bodyPattern "capacity.*exceeded"
393+
:label "Capacity exceeded"}]
394+
:models {"claude-sonnet-4-6" {}}}}}
395+
:on-retry (fn [event] (swap! retry-events* conj event))
396+
:on-error (fn [_] (reset! on-error-called* true))
397+
:on-message-received (fn [{:keys [type text]}]
398+
(when (= :text type)
399+
(swap! received-text* str text)))})))
400+
(is (= 2 @attempt*))
401+
(is (= 1 (count @retry-events*)))
402+
(is (= "Capacity exceeded" (get-in (first @retry-events*) [:classified :error/label])))
403+
(is (false? @on-error-called*))
404+
(is (= "hello" @received-text*))))
405+
406+
(testing "does not retry when no custom rule matches"
407+
(let [attempt* (atom 0)
408+
on-error-called* (atom false)]
409+
(with-redefs [eca.llm-api/prompt! (fn [{:keys [on-error]}]
410+
(swap! attempt* inc)
411+
(on-error {:status 418
412+
:body "I'm a teapot"
413+
:message "LLM response status: 418"}))
414+
eca.llm-api/sleep-with-cancel (fn [_ _] true)]
415+
(llm-api/sync-or-async-prompt!
416+
(make-prompt-opts
417+
{:config {:providers {"anthropic" {:key "test-key"
418+
:url "http://test"
419+
:retryRules [{:status 599 :label "Something else"}]
420+
:models {"claude-sonnet-4-6" {}}}}}
421+
:on-error (fn [_] (reset! on-error-called* true))
422+
:on-message-received identity})))
423+
(is (= 1 @attempt*))
424+
(is (true? @on-error-called*)))))

0 commit comments

Comments
 (0)