Commit 02174cb
feat: add automatic retry for transient dbt command errors (#2125)
* feat: add automatic retry for transient dbt command errors
Add per-adapter transient error detection and automatic retry logic
using tenacity to CommandLineDbtRunner._run_command.
- New module transient_errors.py with per-adapter error patterns for
BigQuery, Snowflake, Redshift, Databricks, Athena, Dremio, Postgres,
Trino, and ClickHouse, plus common connection error patterns.
- _execute_inner_command wraps _inner_run_command with tenacity retry
(3 attempts, exponential backoff 10-60s).
- Only retries when output matches a known transient error pattern for
the active adapter. Non-transient failures propagate immediately.
- Handles both raise_on_failure=True (DbtCommandError) and
raise_on_failure=False (result.success=False) code paths.
- Added tenacity>=8.0,<10.0 to pyproject.toml dependencies.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: guard _build_haystack against non-string arguments
When tests mock _inner_run_command, result.output and result.stderr
may be MagicMock objects instead of strings. Add isinstance checks
to _build_haystack to avoid TypeError in str.join().
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: address CodeRabbit review feedback
- Fix ''.join → ' '.join for readable error messages
- Use logger.exception instead of logger.error for stack traces
- Make '503' pattern more specific ('503 service unavailable', 'http 503')
- Make 'incident' pattern more specific ('incident id:')
- Remove 'connection refused' from common patterns (too broad)
- Remove redundant dremio patterns already covered by common
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: address CodeRabbit round 2 feedback
- Always capture output for transient error detection (capture_output=True)
- Extract actual output/stderr from DbtCommandError.proc_err
- Preserve raise_on_failure contract: re-raise DbtCommandError after retries
- Deduplicate databricks/databricks_catalog patterns via _DATABRICKS_PATTERNS
- Check all adapter patterns when target is not a known adapter type
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: address CodeRabbit round 3 feedback
- Add explicit exception chaining (raise from exc) to satisfy Ruff B904
- Treat target=None as unknown target, checking all adapter patterns defensively
- Pre-compute _ALL_ADAPTER_PATTERNS at import time for efficiency
- Add unit tests for retry branch behavior (6 test cases covering
transient DbtCommandError retry+re-raise, failed result retry+return,
and non-transient immediate propagation)
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: remove unused imports and fix isort ordering in test_retry_logic
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* style: fix black formatting in test_retry_logic imports
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* test: add early retry success test case
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: restore original capture_output passthrough to preserve streaming output
When capture_output=False, dbt output should stream directly to the
terminal. The previous implementation always passed capture_output=True
to _inner_run_command, which silently captured output that was meant
to be streamed.
Transient-error detection still works:
- DbtCommandError path: output extracted from exc.proc_err
- Failed-result path with capture: result.output available
- Failed-result path without capture: output streamed to terminal,
treated as non-transient (user already saw output)
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: always capture output for transient detection, print to terminal when capture_output=False
Revert to always passing capture_output=True to _inner_run_command so
transient-error detection can always inspect stdout/stderr. When the
caller set capture_output=False (expecting to see output), we now
explicitly write the captured output to sys.stdout/sys.stderr after
the command completes.
This means capture_output now only controls:
- Whether --log-format json is added to dbt CLI args
- Whether output is parsed/logged via parse_dbt_output
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: guard sys.stdout/stderr.write with isinstance check
The existing test_dbt_runner tests mock subprocess.run, which causes
result.output/stderr to be MagicMock objects. Add isinstance(str)
checks before writing to stdout/stderr to avoid TypeError.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* refactor: always set --log-format and always capture output, make capture_output a no-op
- Always pass --log-format to dbt CLI (previously gated on capture_output)
- Always capture subprocess output (for transient-error detection)
- Always parse output when log_format is json (previously gated on capture_output)
- Remove capture_output from internal methods (_run_command, _execute_inner_command, _inner_run_command)
- Keep capture_output on public API methods (run, test, deps, run_operation) as a deprecated no-op for backward compatibility
- Remove sys.stdout/stderr.write hack (no longer needed since output is always parsed/logged)
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: update test_alerts_fetcher positional indices for --log-format prepend
The refactor to always prepend --log-format json to dbt commands shifted
all positional args by 2. Update hardcoded indices in test assertions.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: parse output regardless of log_format, not just json
parse_dbt_output already handles both json and text formats, so remove
the unnecessary log_format == 'json' guard.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: add BigQuery 409 duplicate job ID to transient error patterns
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* fix: narrow BigQuery 409 pattern to 'error 409' instead of generic 'already exists'
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* refactor: simplify retry flow with _inner_run_command_with_retries
- Replace _execute_inner_command + nested _attempt() with a single
_inner_run_command_with_retries method decorated with tenacity @Retry
- Move exhausted-retry handling (log, re-raise or return exc.result)
into _run_command try/except
- Add module-level _before_retry_log(retry_state) for retry logging;
log_command_args read from retry_state.kwargs
- Call chain: _run_command -> _inner_run_command_with_retries -> _inner_run_command
- Update test docstring to reference new method name
Made-with: Cursor
* style: fix black formatting for is_transient_error call
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* docs: fix docstring for target=None in is_transient_error (all patterns checked, not just common)
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* feat: resolve adapter type from profiles.yml for transient error detection
- Add _get_adapter_type() method to CommandLineDbtRunner that parses
dbt_project.yml and profiles.yml to resolve the actual adapter type
(e.g. 'bigquery', 'snowflake') for the selected target.
- Pass adapter_type instead of self.target to is_transient_error(),
ensuring correct per-adapter pattern matching.
- Remove duplicate 'databricks_catalog' entry from _ADAPTER_PATTERNS
since profiles.yml always reports the adapter type, not the profile name.
- Update docstrings to reflect that target should be the adapter type.
- Gracefully falls back to None (check all patterns) if profiles cannot
be parsed.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* refactor: simplify _get_adapter_type — remove broad try/except, streamline logic
Addresses Itamar's review feedback:
- Removed the over-defensive try..except wrapper
- Simplified flow: parse profiles.yml directly, then dbt_project.yml for profile name
- Each missing-key case returns None with a debug log (no silent exception swallowing)
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
* refactor: rename target→adapter_type in is_transient_error signature
Addresses Itamar's review comment — the parameter now reflects that it
receives the adapter type (e.g. 'bigquery'), not the profile target name
(e.g. 'dev'). No logic change; callers pass it positionally.
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Itamar Hartstein <haritamar@gmail.com>1 parent c0a9602 commit 02174cb
7 files changed
Lines changed: 586 additions & 56 deletions
File tree
- elementary/clients/dbt
- tests/unit
- clients/dbt_runner
- monitor/fetchers/alerts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | 30 | | |
32 | 31 | | |
33 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
8 | 15 | | |
9 | 16 | | |
10 | 17 | | |
| 18 | + | |
11 | 19 | | |
12 | 20 | | |
13 | 21 | | |
14 | 22 | | |
15 | 23 | | |
16 | 24 | | |
17 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
18 | 52 | | |
19 | 53 | | |
20 | 54 | | |
| |||
50 | 84 | | |
51 | 85 | | |
52 | 86 | | |
| 87 | + | |
53 | 88 | | |
54 | 89 | | |
55 | 90 | | |
56 | 91 | | |
57 | 92 | | |
58 | 93 | | |
59 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
60 | 156 | | |
61 | 157 | | |
62 | 158 | | |
63 | | - | |
64 | 159 | | |
65 | 160 | | |
66 | 161 | | |
| |||
75 | 170 | | |
76 | 171 | | |
77 | 172 | | |
78 | | - | |
79 | 173 | | |
80 | 174 | | |
81 | 175 | | |
82 | 176 | | |
83 | 177 | | |
84 | 178 | | |
85 | | - | |
86 | | - | |
| 179 | + | |
87 | 180 | | |
88 | 181 | | |
89 | 182 | | |
| |||
112 | 205 | | |
113 | 206 | | |
114 | 207 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
124 | 282 | | |
125 | | - | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
126 | 286 | | |
127 | 287 | | |
128 | 288 | | |
129 | 289 | | |
130 | 290 | | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
131 | 302 | | |
132 | 303 | | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
137 | 310 | | |
138 | 311 | | |
139 | 312 | | |
| |||
152 | 325 | | |
153 | 326 | | |
154 | 327 | | |
155 | | - | |
| 328 | + | |
156 | 329 | | |
157 | 330 | | |
158 | 331 | | |
| |||
177 | 350 | | |
178 | 351 | | |
179 | 352 | | |
180 | | - | |
181 | 353 | | |
182 | 354 | | |
183 | 355 | | |
| |||
191 | 363 | | |
192 | 364 | | |
193 | 365 | | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
211 | 382 | | |
212 | 383 | | |
213 | 384 | | |
| |||
218 | 389 | | |
219 | 390 | | |
220 | 391 | | |
221 | | - | |
| 392 | + | |
222 | 393 | | |
223 | 394 | | |
224 | 395 | | |
| |||
231 | 402 | | |
232 | 403 | | |
233 | 404 | | |
234 | | - | |
235 | 405 | | |
236 | 406 | | |
237 | 407 | | |
| |||
240 | 410 | | |
241 | 411 | | |
242 | 412 | | |
243 | | - | |
| 413 | + | |
244 | 414 | | |
245 | 415 | | |
246 | 416 | | |
| |||
249 | 419 | | |
250 | 420 | | |
251 | 421 | | |
252 | | - | |
253 | 422 | | |
254 | 423 | | |
255 | 424 | | |
| |||
266 | 435 | | |
267 | 436 | | |
268 | 437 | | |
269 | | - | |
270 | | - | |
271 | | - | |
| 438 | + | |
272 | 439 | | |
273 | 440 | | |
274 | 441 | | |
| |||
0 commit comments