fix(source-bing-ads): refresh report download URLs at download time to avoid expired SAS URLs (AI-Triage PR)#75428
Conversation
…o avoid expired SAS URLs Add BingAdsReportUrlRefresher custom component that re-polls GenerateReport/Poll at download time to obtain a fresh ReportDownloadUrl, preventing SAS URL expiration failures when many report jobs queue for download concurrently. Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: bot_apk <apk@cognition.ai>
… request context Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
|
|
resolve merge conflicts |
…der from master) Co-Authored-By: bot_apk <apk@cognition.ai>
|
Merge conflicts resolved — merged latest master which brought in the new |
What
Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16105
Report streams using the
AsyncRetrievercache theReportDownloadUrlfrom the polling response when a job completes. This URL is an Azure SAS URL with a short TTL (~5 minutes). When many report jobs complete concurrently, the download queue can cause delays that exceed the TTL, resulting in 403 errors from expired URLs.Related: airbytehq/airbyte-python-cdk#961 (CDK-side mitigation that reclassified these from
config_errortosystem_errorto enable retries).How
New custom component
BingAdsReportUrlRefresher(components.py): Implements theRequesterinterface. When called as adownload_target_requester, it:ReportRequestIdfrom the cachedpolling_responsein the stream sliceCustomerId/CustomerAccountIdfrom the original polling request's headersGenerateReport/Pollto obtain a newReportDownloadUrldownload_target_extractorcan extract the fresh URLManifest wiring (
manifest.yaml): Addsdownload_target_requestertobasic_async_retriever, referencing the new component with the connector's OAuth authenticator.Review guide
components.py— TheBingAdsReportUrlRefresherclass. Key areas to scrutinize:send_request()makes a rawrequests.post()call, bypassing the CDK's HTTP client. This means no CDK-level logging or error classification. Manual retry logic (3 attempts, exponential backoff) partially mitigates this — verify this is acceptable for a download-target-requester that runs once per job.CustomerId/CustomerAccountIdfrom the cachedPreparedRequest.headersviapolling_response["request"]. This is populated by the CDK's_get_polling_response_interpolation_context()(http_job_repository.py:301-322) but is not part of any public contract. If the CDK changes this structure, the component silently falls back to empty strings.CustomerId/CustomerAccountIdheaders can't be extracted from the cached request, the code falls back to empty strings. This could cause opaque auth failures at the Bing Ads API — worth verifying whether the API rejects requests without these headers or silently degrades.polling_responsestructure assumption: expectsReportRequestStatus.ReportRequestIdand arequestobject with headers.manifest.yaml— 4-line addition wiring thedownload_target_requester. Verify the existingdownload_target_extractor(DPath:ReportRequestStatus.ReportDownloadUrl) will correctly extract from the re-poll response.Reviewer checklist
requests.post()) is acceptable here, given the manual retry logicpolling_response["request"]contract with CDK team — silent fallback to emptyCustomerId/CustomerAccountIdif CDK internals changeHTTPError+ConnectionErrorcatches) covers expected transient failure modesCustomerId/CustomerAccountIdheaders are empty strings (auth error vs. silent degradation)User Impact
Report streams should no longer fail with expired SAS URL errors when multiple report jobs queue for download. This is especially impactful for customers with many enabled report streams.
No user-facing configuration changes; no schema, state, or spec changes.
Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/85cd0c285ac8496d8744d7f5a103cec7