fix: re-poll for fresh download URL before fetching async job records#973
Closed
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Closed
fix: re-poll for fresh download URL before fetching async job records#973devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Conversation
Download URLs (e.g. Azure Blob Storage SAS tokens) may expire between poll-completion and actual download when many concurrent streams delay record fetching. This adds a re-poll step in fetch_records() to ensure the download URL is still valid before attempting the download. Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775057411-fix-sas-token-expiry-async-download#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775057411-fix-sas-token-expiry-async-downloadPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
|
I don't want a CDK fix. Closing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an issue where download URLs with short TTLs (e.g. Azure Blob Storage SAS tokens with 10-minute expiry) can expire between poll-completion and the actual download in
AsyncHttpJobRepository. This is triggered when many concurrent async streams cause a significant delay betweenupdate_jobs_statusstoring the polling response andfetch_recordsextracting the download URL from it.The fix adds a
_refresh_download_urlcall at the start offetch_records()that re-polls the API to get a fresh response before extracting download targets.Reported in: https://github.com/airbytehq/oncall/issues/11749 (Bing Ads connector, 29 concurrent streams, SAS token 403 errors)
Review & Testing Checklist for Human
fetch_recordscall with no staleness check. This adds one extra HTTP request per job download for all async connectors, not just those with expiring URLs. Verify this overhead is acceptable, or consider gating behind a time-elapsed threshold or a flag._get_validated_polling_responseraises during the refresh (transient network error, API 5xx), the download will now fail even though a cached response exists. Consider whether the refresh should be best-effort (try/except falling back to cached response).Notes
AsyncHttpJobRepository. The Bing Ads connector (manifest-only, usessource-declarative-manifestbase image) will pick up the fix automatically once a new CDK version is released and the base image is rebuilt.Link to Devin session: https://app.devin.ai/sessions/e3c1004bcc834f40a854c0c489a70b98