AP-683 Add Retries with Backoffs to DAG#65
Conversation
- added default retry args to DAG
- added try/catch logic to raise some http errors so Airflow will retry
- add unit test to raise 429 exception
- added Dag-Task_Group-Task test to ensure task retries a 429 status
- some extra efforts are made to properly mock a task that is
embedded in an Airflow partial/expand operation
There was a problem hiding this comment.
Pull request overview
Adds Airflow-level retry/backoff configuration to the gen_llm_image_descriptions DAG and ensures that transient TIND failures (HTTP 429 and 5xx) propagate as exceptions so Airflow will retry the task, while other HTTP errors continue to be recorded as per-record failures. Tests are added at the unit level (image fetcher) and at the DAG-task level.
Changes:
- Add
default_args(3 retries, exponential backoff up to 10 minutes) to the DAG decorator and re-raiserequests.HTTPErrorfor status 429 and ≥500 insidefetch_image_to_record_directory. - Add a unit test verifying that a 429 from the mocked TIND hook propagates out of the
ImageFetcher. - Add a DAG-level test that mocks the embedded partial/expand task callable and asserts a 429 raises through, plus relax two pylint rules to accommodate the DAG file.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| mokelumne/dags/gen_llm_image_descriptions.py | Adds retry default_args and selective re-raise of retryable HTTP errors (429/5xx); adds a too-many-locals disable on generate_summary. |
| test/unit/test_image_fetcher.py | Adds MockTindHookWith429 and a unit test asserting HTTPError propagation from the fetcher. |
| test/tests/test_gen_llm_image_descriptions.py | New DAG-level test asserting that 429 from TIND raises through the fetch task callable. |
| pyproject.toml | Disables additional pylint rules (pointless-statement, too-many-statements) project-wide. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from airflow.dag_processing.dagbag import DagBag | ||
| from pathlib import Path | ||
|
|
||
| dag_dir = Path(__file__).resolve().parent.parent / "mokelumne/dags" |
There was a problem hiding this comment.
I'm pretty sure this is false since the tests passed in CI and locally when I ran it.
anarchivist
left a comment
There was a problem hiding this comment.
r+; just want clarification on the linter config chagnes.
Purpose
During a DAG run, calls to TIND may result in HTTP status 429
too many requestsresponses. We should ensure that these responses are properly raised so that any DAG task that called them can properly retry. Also, we should put in some simple retry configuration for overall DAG operations.Changes