-
Notifications
You must be signed in to change notification settings - Fork 214
feat: add automatic retry for transient dbt command errors #2125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
haritamar
merged 23 commits into
master
from
devin/1772197495-add-transient-retry-logic
Feb 28, 2026
Merged
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
a27063c
feat: add automatic retry for transient dbt command errors
devin-ai-integration[bot] 07045ae
fix: guard _build_haystack against non-string arguments
devin-ai-integration[bot] bd25d9b
fix: address CodeRabbit review feedback
devin-ai-integration[bot] 7594348
fix: address CodeRabbit round 2 feedback
devin-ai-integration[bot] 74a436b
fix: address CodeRabbit round 3 feedback
devin-ai-integration[bot] 2fb59ee
fix: remove unused imports and fix isort ordering in test_retry_logic
devin-ai-integration[bot] d2edbd0
style: fix black formatting in test_retry_logic imports
devin-ai-integration[bot] f6a91db
test: add early retry success test case
devin-ai-integration[bot] 02521ed
fix: restore original capture_output passthrough to preserve streamin…
devin-ai-integration[bot] 6ac8a9d
fix: always capture output for transient detection, print to terminal…
devin-ai-integration[bot] 0c5f1e5
fix: guard sys.stdout/stderr.write with isinstance check
devin-ai-integration[bot] 24e593c
refactor: always set --log-format and always capture output, make cap…
devin-ai-integration[bot] 42b24da
fix: update test_alerts_fetcher positional indices for --log-format p…
devin-ai-integration[bot] 0d2673d
fix: parse output regardless of log_format, not just json
devin-ai-integration[bot] 05a2e45
fix: add BigQuery 409 duplicate job ID to transient error patterns
devin-ai-integration[bot] b5de019
fix: narrow BigQuery 409 pattern to 'error 409' instead of generic 'a…
devin-ai-integration[bot] dcd8a9f
refactor: simplify retry flow with _inner_run_command_with_retries
haritamar 532e592
style: fix black formatting for is_transient_error call
devin-ai-integration[bot] 54005bc
docs: fix docstring for target=None in is_transient_error (all patter…
devin-ai-integration[bot] 3aeaf01
Merge remote-tracking branch 'origin/master' into devin/1772197495-ad…
devin-ai-integration[bot] 91e41b3
feat: resolve adapter type from profiles.yml for transient error dete…
devin-ai-integration[bot] 4866840
refactor: simplify _get_adapter_type — remove broad try/except, strea…
devin-ai-integration[bot] ff057ff
refactor: rename target→adapter_type in is_transient_error signature
devin-ai-integration[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| """Per-adapter transient error patterns for automatic retry. | ||
|
|
||
| Each adapter may produce transient errors that are safe to retry. This | ||
| module centralises those patterns so that the runner can decide whether a | ||
| failed dbt command should be retried transparently. | ||
|
|
||
| To add patterns for a new adapter, append a new entry to | ||
| ``_ADAPTER_PATTERNS`` with the **adapter type** as key (e.g. | ||
| ``"bigquery"``, ``"snowflake"``) and a tuple of **plain, lowercase** | ||
| substrings that appear in the error output. Matching is | ||
| case-insensitive substring search so regex is not needed. | ||
|
|
||
| Note: The ``target`` argument accepted by :func:`is_transient_error` may | ||
| be either the dbt adapter type *or* the profile target name (e.g. | ||
| ``"dev"``, ``"prod"``). When it does not match any known adapter key, | ||
| **all** adapter patterns are checked defensively. This is safe because | ||
| adapter-specific error messages only appear in output from that adapter. | ||
| """ | ||
|
|
||
| from typing import Dict, Optional, Sequence, Tuple | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Per-adapter transient error substrings (all lowercase). | ||
| # | ||
| # A command failure is considered *transient* when the dbt output | ||
| # (stdout + stderr, lowercased) contains **any** of the substrings | ||
| # listed for the active adapter **or** in the ``_COMMON`` list. | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _COMMON: Tuple[str, ...] = ( | ||
| # Generic connection / HTTP errors that any adapter can surface. | ||
| "connection reset by peer", | ||
| "connection was closed", | ||
| "remotedisconnected", | ||
| "connectionerror", | ||
| "brokenpipeerror", | ||
| "connection aborted", | ||
| "read timed out", | ||
| ) | ||
|
|
||
| _DATABRICKS_PATTERNS: Tuple[str, ...] = ( | ||
| "temporarily_unavailable", | ||
| "504 gateway timeout", | ||
| "502 bad gateway", | ||
| "service unavailable", | ||
| ) | ||
|
|
||
| _ADAPTER_PATTERNS: Dict[str, Tuple[str, ...]] = { | ||
| "bigquery": ( | ||
| # Streaming-buffer delay after a streaming insert. | ||
| "streaming data from", | ||
| "is temporarily unavailable", | ||
| # Generic transient backend error (500). | ||
| "retrying may solve the problem", | ||
| "backenderror", | ||
| # Rate-limit / quota errors. | ||
| "exceeded rate limits", | ||
| "rateLimitExceeded".lower(), | ||
| "quota exceeded", | ||
| # Internal errors surfaced as 503 / "internal error". | ||
| "internal error encountered", | ||
| "503 service unavailable", | ||
| "http 503", | ||
| ), | ||
| "snowflake": ( | ||
| "could not connect to snowflake backend", | ||
| "authentication token has expired", | ||
| "incident id:", | ||
| "service is unavailable", | ||
| ), | ||
| "redshift": ( | ||
| "connection timed out", | ||
| "could not connect to the server", | ||
| "ssl syscall error", | ||
| ), | ||
| "databricks": _DATABRICKS_PATTERNS, | ||
| "databricks_catalog": _DATABRICKS_PATTERNS, | ||
| "athena": ( | ||
| "throttlingexception", | ||
| "toomanyrequestsexception", | ||
| "service unavailable", | ||
| ), | ||
| "dremio": ( | ||
| # Common patterns (remotedisconnected, connection was closed) already | ||
| # cover the most frequent Dremio transient errors. Add Dremio-specific | ||
| # patterns here as they are identified. | ||
| ), | ||
| "postgres": ( | ||
| "could not connect to server", | ||
| "connection timed out", | ||
| "server closed the connection unexpectedly", | ||
| "ssl syscall error", | ||
| ), | ||
| "trino": ( | ||
| "service unavailable", | ||
| "server returned http response code: 503", | ||
| ), | ||
| "clickhouse": ( | ||
| "connection timed out", | ||
| "broken pipe", | ||
| ), | ||
| } | ||
|
|
||
| # Pre-computed union of all adapter-specific patterns for the unknown-target | ||
| # fallback path. Built once at import time to avoid repeated iteration. | ||
| _ALL_ADAPTER_PATTERNS: Tuple[str, ...] = tuple( | ||
| pattern for patterns in _ADAPTER_PATTERNS.values() for pattern in patterns | ||
| ) | ||
|
|
||
|
|
||
| def is_transient_error( | ||
| target: Optional[str], | ||
|
haritamar marked this conversation as resolved.
Outdated
|
||
| output: Optional[str] = None, | ||
| stderr: Optional[str] = None, | ||
| ) -> bool: | ||
| """Return ``True`` if *output*/*stderr* contain a known transient error. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| target: | ||
| The dbt adapter type (e.g. ``"bigquery"``, ``"snowflake"``) **or** | ||
| the dbt profile target name (e.g. ``"dev"``, ``"prod"``). | ||
| When the value matches a key in ``_ADAPTER_PATTERNS``, only that | ||
| adapter's patterns (plus ``_COMMON``) are used. When it does | ||
| **not** match any known adapter, **all** adapter patterns are | ||
| checked defensively to avoid missing transient errors. | ||
| When ``None`` only the common patterns are checked. | ||
|
haritamar marked this conversation as resolved.
Outdated
|
||
| output: | ||
| The captured stdout of the dbt command (may be ``None``). | ||
| stderr: | ||
| The captured stderr of the dbt command (may be ``None``). | ||
| """ | ||
| haystack = _build_haystack(output, stderr) | ||
| if not haystack: | ||
| return False | ||
|
|
||
| if isinstance(target, str): | ||
| adapter_patterns = _ADAPTER_PATTERNS.get(target.lower()) | ||
| if adapter_patterns is not None: | ||
| # Known adapter — use common + adapter-specific patterns. | ||
| patterns: Sequence[str] = (*_COMMON, *adapter_patterns) | ||
| else: | ||
| # Unknown target key (e.g. profile target name). Check all adapters. | ||
| patterns = (*_COMMON, *_ALL_ADAPTER_PATTERNS) | ||
| else: | ||
| # No target provided; still check all adapters defensively. | ||
| patterns = (*_COMMON, *_ALL_ADAPTER_PATTERNS) | ||
|
|
||
| return any(pattern in haystack for pattern in patterns) | ||
|
|
||
|
|
||
| def _build_haystack(output: Optional[str] = None, stderr: Optional[str] = None) -> str: | ||
| """Concatenate and lowercase *output* + *stderr* for matching.""" | ||
| parts = [] | ||
| if output and isinstance(output, str): | ||
| parts.append(output) | ||
| if stderr and isinstance(stderr, str): | ||
| parts.append(stderr) | ||
| return "\n".join(parts).lower() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.