Skip to content

Fix batchoperator deferrable xcom links#64745

Open
kakatur wants to merge 14 commits intoapache:mainfrom
kakatur:fix-batchoperator-deferrable-xcom-links
Open

Fix batchoperator deferrable xcom links#64745
kakatur wants to merge 14 commits intoapache:mainfrom
kakatur:fix-batchoperator-deferrable-xcom-links

Conversation

@kakatur
Copy link
Copy Markdown

@kakatur kakatur commented Apr 5, 2026


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

kakatur added 3 commits April 4, 2026 13:46
When BatchOperator runs with deferrable=True, operator links
(job definition, job queue, CloudWatch logs) were not being
persisted as XCom values, causing ERROR-level logs when the UI
tried to render them.

This change:
- Extracts link persistence logic into _persist_links() method
- Calls _persist_links() from execute_complete() for deferrable tasks
- Refactors monitor_job() to use the same method (DRY principle)

Fixes XCom not found errors for:
- batch_job_definition
- batch_job_queue
- cloudwatch_events
Added _persist_links(context) call in execute() method before deferring.
This ensures operator links (batch_job_definition, batch_job_queue,
cloudwatch_events) are persisted as XCom values immediately after job
submission, making them available in the UI while the task is deferred.
- Persist placeholder None value for cloudwatch_events XCom when logs
  aren't available at submission time
- Change log level from WARNING to INFO for missing CloudWatch logs
- Prevents 'XCom not found' warning in UI while task is deferred
- CloudWatch link will be populated when job completes and logs exist
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg bot commented Apr 5, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure the AWS Batch operator’s extra UI links (job definition/queue/logs) are available for deferrable runs by persisting the related XComs before/around deferral and completion.

Changes:

  • Persist operator extra-link XComs before deferring in execute() and again in execute_complete().
  • Refactor link persistence logic into a new _persist_links() helper.
  • Adjust CloudWatch log-link persistence and logging behavior.

@eladkal eladkal requested review from ferruzzi and vincbeck April 7, 2026 05:43
@o-nikolas o-nikolas marked this pull request as draft April 8, 2026 00:58
@o-nikolas
Copy link
Copy Markdown
Contributor

Converting this to draft since there is unaddressed copilot feedback and failing tests. change this to ready for review once all feedback is addressed and the tests are green 🙂

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

kakatur and others added 3 commits April 10, 2026 18:18
  - Reduce AWS API calls by 67% to avoid throttling
  - Fix missing job success check when awslogs disabled
  - Improve CloudWatch log error visibility
  - Handle edge cases in unit tests

  This addresses code review feedback on PR apache#64745.
@kakatur kakatur marked this pull request as ready for review April 11, 2026 03:20
@kakatur
Copy link
Copy Markdown
Author

kakatur commented Apr 11, 2026

I've implemented the feedback from Guthub copilot.

  1. Reduced API calls - 67% reduction (3→1) before deferring
  2. Context validation - Handles empty context in unit tests
  3. Removed XCom placeholder - Respects do_xcom_push setting
  4. Eliminated duplication in monitor_job - CloudWatch fetched once
  5. Better error logging - WARNING level for CloudWatch failures
  6. No duplication in execute_complete - Conditional logic based on awslogs_enabled
  7. Bug fix - Added missing check_job_success() when awslogs_enabled=False

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

kakatur and others added 3 commits April 11, 2026 07:58
Address GitHub Copilot code review feedback:
- Reduce AWS Batch API calls by 67% to prevent throttling
- Remove dead code and redundant operations
- Fix missing job success check when awslogs disabled
- Improve CloudWatch error logging visibility
- Use keyword arguments for better test compatibility

This optimizes the deferrable execution path by reusing job
descriptions and eliminating duplicate link persistence and
CloudWatch log fetching.

Addresses code review feedback on PR apache#64745.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

providers/amazon/src/airflow/providers/amazon/aws/operators/batch.py:233

  • In the deferrable fast-path, _persist_links(..., skip_cloudwatch=True) is used even when the job is already in a terminal state (e.g. SUCCEEDED). In that case the task returns immediately and the CloudWatch log link is never persisted, so the operator extra link can remain missing in the UI. Consider conditionally persisting CloudWatch links when job_status is already terminal (e.g. call _persist_links(..., skip_cloudwatch=False, job_description=job) before returning/raising).
            # Persist operator links before deferring so they're available in the UI
            # Skip CloudWatch logs (not available yet) and reuse job description to reduce API calls
            job = self._persist_links(context, skip_cloudwatch=True)
            job_status = job.get("status")
            if job_status == self.hook.SUCCESS_STATE:
                self.log.info("Job completed.")
                return self.job_id
            if job_status == self.hook.FAILURE_STATE:
                raise AirflowException(f"Error while running job: {self.job_id} is in {job_status} state")

   - Fix missing CloudWatch links in deferrable fast-path
   - Extract _persist_cloudwatch_link() helper method
   - Eliminate boolean flag parameters for clarity
   - Improve code readability and maintainability
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

  - Remove redundant wait_for_job() call in execute_complete()
  - Deduplicate CloudWatch link persistence logic
  - Reduce AWS Batch API calls after job completion
  - Ensure consistent CloudWatch links across all execution paths
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

  - Persist CloudWatch links on deferred job failures
  - Skip CloudWatch API call when do_xcom_push is False
  - Ensure consistent link behavior across success and failure paths
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

  - Skip CloudWatch API calls when context lacks task instance
  - Respect awslogs_enabled flag in monitor_job()
  - Reduce unnecessary AWS Batch API usage
  - Improve unit test compatibility with minimal context

  Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

  The awslogs_enabled flag controls log streaming during wait_for_job(),
  not CloudWatch link persistence in the UI. Users expect CloudWatch
  links to be available even when log streaming is disabled, so they
  can view logs after task completion.

  Reverting the awslogs_enabled gate in monitor_job() preserves the
  original behavior and avoids breaking existing unit tests that expect
  CloudWatch links regardless of the flag value.

  Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@kakatur
Copy link
Copy Markdown
Author

kakatur commented Apr 12, 2026

I've fixed all issues raised by Copilot :)

@o-nikolas
Copy link
Copy Markdown
Contributor

I see many reviews by copilot and also some comments from @kakatur saying some are addressed. @kakatur can you please resolve each individual copilot thread that you have resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants