Skip to content

AP-683 Add Retries with Backoffs to DAG#65

Open
jason-raitz wants to merge 1 commit into
mainfrom
AP-683_retry-fetching-with-backoff
Open

AP-683 Add Retries with Backoffs to DAG#65
jason-raitz wants to merge 1 commit into
mainfrom
AP-683_retry-fetching-with-backoff

Conversation

@jason-raitz
Copy link
Copy Markdown
Contributor

Purpose

During a DAG run, calls to TIND may result in HTTP status 429 too many requests responses. We should ensure that these responses are properly raised so that any DAG task that called them can properly retry. Also, we should put in some simple retry configuration for overall DAG operations.

Changes

  • added default retry args to DAG
  • added try/catch logic to raise some http errors so Airflow will retry
  • add unit test to raise 429 exception
  • added Dag-Task_Group-Task test to ensure task retries a 429 status
    • some extra efforts are made to properly mock a task that is embedded in an Airflow partial/expand operation

 - added default retry args to DAG
 - added try/catch logic to raise some http errors so Airflow will retry
 - add unit test to raise 429 exception
 - added Dag-Task_Group-Task test to ensure task retries a 429 status
   - some extra efforts are made to properly mock a task that is
     embedded in an Airflow partial/expand operation
@jason-raitz jason-raitz self-assigned this May 14, 2026
@jason-raitz jason-raitz marked this pull request as ready for review May 14, 2026 15:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Airflow-level retry/backoff configuration to the gen_llm_image_descriptions DAG and ensures that transient TIND failures (HTTP 429 and 5xx) propagate as exceptions so Airflow will retry the task, while other HTTP errors continue to be recorded as per-record failures. Tests are added at the unit level (image fetcher) and at the DAG-task level.

Changes:

  • Add default_args (3 retries, exponential backoff up to 10 minutes) to the DAG decorator and re-raise requests.HTTPError for status 429 and ≥500 inside fetch_image_to_record_directory.
  • Add a unit test verifying that a 429 from the mocked TIND hook propagates out of the ImageFetcher.
  • Add a DAG-level test that mocks the embedded partial/expand task callable and asserts a 429 raises through, plus relax two pylint rules to accommodate the DAG file.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
mokelumne/dags/gen_llm_image_descriptions.py Adds retry default_args and selective re-raise of retryable HTTP errors (429/5xx); adds a too-many-locals disable on generate_summary.
test/unit/test_image_fetcher.py Adds MockTindHookWith429 and a unit test asserting HTTPError propagation from the fetcher.
test/tests/test_gen_llm_image_descriptions.py New DAG-level test asserting that 429 from TIND raises through the fetch task callable.
pyproject.toml Disables additional pylint rules (pointless-statement, too-many-statements) project-wide.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from airflow.dag_processing.dagbag import DagBag
from pathlib import Path

dag_dir = Path(__file__).resolve().parent.parent / "mokelumne/dags"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is false since the tests passed in CI and locally when I ran it.

Copy link
Copy Markdown
Member

@anarchivist anarchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+; just want clarification on the linter config chagnes.

Comment thread pyproject.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants