fix: (CDK) (AsyncRetriever) - fix the regression. The TIMEOUT Job Status should Retry on server status.#431
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR addresses a regression by distinguishing between TIMED_OUT and FORCED_TIME_OUT job statuses so that server-induced timeouts are retried while system-forced timeouts are handled as terminal errors.
- Introduces the FORCED_TIME_OUT status and updates its usage in job status evaluation
- Alters job replacement and removal logic to differentiate how TIMED_OUT versus FORCED_TIME_OUT jobs are processed
- Updates unit tests to verify the new forced timeout behavior
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| airbyte_cdk/sources/declarative/async_job/job_orchestrator.py | Updates job status checks and job replacement/removal logic for forced vs. server timeouts |
| unit_tests/sources/declarative/async_job/test_job_orchestrator.py | Adds test cases to validate forced timeout behavior |
| airbyte_cdk/sources/declarative/async_job/job.py | Modifies job status to return FORCED_TIME_OUT when a timer expires |
| airbyte_cdk/sources/declarative/async_job/status.py | Introduces FORCED_TIME_OUT in the status enumeration and updates related comments |
Comments suppressed due to low confidence (2)
airbyte_cdk/sources/declarative/async_job/job_orchestrator.py:187
- Double-check that including TIMED_OUT jobs (and not FORCED_TIME_OUT) in the jobs-to-replace logic matches the intended behavior; if FORCED_TIME_OUT jobs should not be retried, ensure that this distinction is clearly documented.
failed_status_jobs = (AsyncJobStatus.FAILED, AsyncJobStatus.TIMED_OUT)
airbyte_cdk/sources/declarative/async_job/job.py:36
- Confirm that changing the status from TIMED_OUT to FORCED_TIME_OUT in job() is fully supported by downstream logic, as this alteration affects how timeouts are handled elsewhere in the system.
if self._timer.has_timed_out(): return AsyncJobStatus.FORCED_TIME_OUT
📝 WalkthroughWalkthroughThe changes update the asynchronous job handling by introducing a new timeout status. The Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Orchestrator
participant Job
participant Status
Client->>Orchestrator: Initiate job processing
Orchestrator->>Job: Call status()
Job-->>Orchestrator: Return FORCED_TIME_OUT
Orchestrator->>Orchestrator: Process forced timeout (invoke special handling logic)
Orchestrator->>Client: Raise AirbyteTracedException
Possibly related PRs
Suggested reviewers
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
unit_tests/sources/declarative/async_job/test_job.py (2)
22-25: Method rename and assertion update looks appropriate for the new status type.I see you've renamed the test method and updated the assertion to check for
FORCED_TIME_OUTinstead ofTIMED_OUT. This aligns perfectly with the implementation changes inAsyncJob.status()and the new enum value inAsyncJobStatus. The test now accurately verifies that when a timer times out, we get the more specificFORCED_TIME_OUTstatus.One thought - would it make sense to add an additional test for the
AsyncJobOrchestratorclass to verify it correctly handles this new status too? The orchestrator code was also updated to include a specific case forFORCED_TIME_OUT. Just a suggestion to increase test coverage, wdyt?
27-32: Placeholder test has helpful documentation but no implementation.I noticed this test is documented but only contains a
passstatement. The comment explains that this test will become important for future metrics/stats functionality. Would it make sense to either:
- Implement a basic version now to ensure the timer-stopping behavior works as expected, or
- Add a TODO comment with an issue reference to track when this should be implemented?
Having an empty test might be confusing for other developers wondering if the functionality is tested elsewhere.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/async_job/job_orchestrator.py(7 hunks)unit_tests/sources/declarative/async_job/test_job.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/async_job/job_orchestrator.py
🧰 Additional context used
🧬 Code Definitions (1)
unit_tests/sources/declarative/async_job/test_job.py (3)
airbyte_cdk/sources/declarative/async_job/job.py (2) (2)
AsyncJob(13-52)status(35-38)airbyte_cdk/sources/declarative/async_job/job_orchestrator.py (1) (1)
status(87-102)airbyte_cdk/sources/declarative/async_job/status.py (1) (1)
AsyncJobStatus(9-26)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
|
Closed, in favor of: #432 |
What
Given this dialog: https://airbytehq-team.slack.com/archives/C07JMAAE620/p1742346710533189?thread_ts=1742236041.905529&cid=C07JMAAE620
Related to:
TIMEOUTJobs are retried, ignoring thepolling_job_timeoutsetting #429There is a need to introduce the
FORCED_TIME_OUTJob status, so we can distinguish between the Server Timeout status of the Job and the status that should be treated as adead endbecause of thepooling_job_timeoutlimitation. This helps us to prevent inf loops when thestatus_mappingprovided incorrectly, such as:Summary by CodeRabbit
New Features
Tests