Fix batchoperator deferrable xcom links#64745
Conversation
When BatchOperator runs with deferrable=True, operator links (job definition, job queue, CloudWatch logs) were not being persisted as XCom values, causing ERROR-level logs when the UI tried to render them. This change: - Extracts link persistence logic into _persist_links() method - Calls _persist_links() from execute_complete() for deferrable tasks - Refactors monitor_job() to use the same method (DRY principle) Fixes XCom not found errors for: - batch_job_definition - batch_job_queue - cloudwatch_events
Added _persist_links(context) call in execute() method before deferring. This ensures operator links (batch_job_definition, batch_job_queue, cloudwatch_events) are persisted as XCom values immediately after job submission, making them available in the UI while the task is deferred.
- Persist placeholder None value for cloudwatch_events XCom when logs aren't available at submission time - Change log level from WARNING to INFO for missing CloudWatch logs - Prevents 'XCom not found' warning in UI while task is deferred - CloudWatch link will be populated when job completes and logs exist
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure the AWS Batch operator’s extra UI links (job definition/queue/logs) are available for deferrable runs by persisting the related XComs before/around deferral and completion.
Changes:
- Persist operator extra-link XComs before deferring in
execute()and again inexecute_complete(). - Refactor link persistence logic into a new
_persist_links()helper. - Adjust CloudWatch log-link persistence and logging behavior.
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
|
Converting this to draft since there is unaddressed copilot feedback and failing tests. change this to ready for review once all feedback is addressed and the tests are green 🙂 |
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
- Reduce AWS API calls by 67% to avoid throttling - Fix missing job success check when awslogs disabled - Improve CloudWatch log error visibility - Handle edge cases in unit tests This addresses code review feedback on PR apache#64745.
|
I've implemented the feedback from Guthub copilot.
|
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
Address GitHub Copilot code review feedback: - Reduce AWS Batch API calls by 67% to prevent throttling - Remove dead code and redundant operations - Fix missing job success check when awslogs disabled - Improve CloudWatch error logging visibility - Use keyword arguments for better test compatibility This optimizes the deferrable execution path by reusing job descriptions and eliminating duplicate link persistence and CloudWatch log fetching. Addresses code review feedback on PR apache#64745. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…hub.com/kakatur/airflow into fix-batchoperator-deferrable-xcom-links
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py:233
- In the deferrable fast-path,
_persist_links(..., skip_cloudwatch=True)is used even when the job is already in a terminal state (e.g.SUCCEEDED). In that case the task returns immediately and the CloudWatch log link is never persisted, so the operator extra link can remain missing in the UI. Consider conditionally persisting CloudWatch links whenjob_statusis already terminal (e.g. call_persist_links(..., skip_cloudwatch=False, job_description=job)before returning/raising).
# Persist operator links before deferring so they're available in the UI
# Skip CloudWatch logs (not available yet) and reuse job description to reduce API calls
job = self._persist_links(context, skip_cloudwatch=True)
job_status = job.get("status")
if job_status == self.hook.SUCCESS_STATE:
self.log.info("Job completed.")
return self.job_id
if job_status == self.hook.FAILURE_STATE:
raise AirflowException(f"Error while running job: {self.job_id} is in {job_status} state")
- Fix missing CloudWatch links in deferrable fast-path - Extract _persist_cloudwatch_link() helper method - Eliminate boolean flag parameters for clarity - Improve code readability and maintainability
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
- Remove redundant wait_for_job() call in execute_complete() - Deduplicate CloudWatch link persistence logic - Reduce AWS Batch API calls after job completion - Ensure consistent CloudWatch links across all execution paths
- Persist CloudWatch links on deferred job failures - Skip CloudWatch API call when do_xcom_push is False - Ensure consistent link behavior across success and failure paths
- Skip CloudWatch API calls when context lacks task instance - Respect awslogs_enabled flag in monitor_job() - Reduce unnecessary AWS Batch API usage - Improve unit test compatibility with minimal context Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py
Outdated
Show resolved
Hide resolved
The awslogs_enabled flag controls log streaming during wait_for_job(), not CloudWatch link persistence in the UI. Users expect CloudWatch links to be available even when log streaming is disabled, so they can view logs after task completion. Reverting the awslogs_enabled gate in monitor_job() preserves the original behavior and avoids breaking existing unit tests that expect CloudWatch links regardless of the flag value. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
I've fixed all issues raised by Copilot :) |
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.