chore(source-google-ads): Update CDK with deadlock fix + HTTP socket timeout + retry on HTTP 500#76074
Conversation
… from PR #977 Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Note 📝 PR Converted to Draft More info...Thank you for creating this PR. As a policy to protect our engineers' time, Airbyte requires all PRs to be created first in draft status. Your PR has been automatically converted to draft status in respect for this policy. As soon as your PR is ready for formal review, you can proceed to convert the PR to "ready for review" status by clicking the "Ready for review" button at the bottom of the PR page. To skip draft status in future PRs, please include |
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
/publish-connectors-prerelease
|
|
Deploy preview for airbyte-docs ready!
Deployed with vercel-action |
|
|
/ai-canary-prerelease
|
🔬 Canary Prerelease Testing — StartingBeginning canary prerelease testing. I will:
Session: https://app.devin.ai/sessions/19f6855e0406404f9976b1a1ab4f4bdf Updates will be posted as progress is made. |
🔬 Canary Prerelease — Phase 2 UpdateBreaking Change & Reversibility Gate: ✅ PASSED Evaluation detailsBreaking change checklist:
Reversibility:
Prerelease version: Canary connection selection: 8 connections selected across diverse tiers and regions:
⏳ Awaiting approval via Slack Session: https://app.devin.ai/sessions/19f6855e0406404f9976b1a1ab4f4bdf |
🔬 Canary Prerelease — Phase 4: Connections PinnedAll 8 connections successfully pinned to
Next: Triggering initial syncs and beginning 2-4 hour monitoring window. Updates will be posted every 1-2 hours. Session: https://app.devin.ai/sessions/19f6855e0406404f9976b1a1ab4f4bdf |
🛑 Canary Prerelease — ABORTED (Approval Rejected)Connector: SummaryThe HITL (Human-in-the-Loop) approval request for canary pinning was rejected via Slack. Per playbook protocol, all version pins have been immediately reverted. What happened
Revert status
All 8/8 connections have been restored to the default version. No connections remain pinned to the prerelease. Verdict: ABORTEDCanary testing was aborted due to HITL rejection. No action is required — all connections have been safely reverted. |
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…worker hangs Add TimeoutHTTPAdapter that enforces a 300s socket-level idle timeout on all Google Ads API HTTP calls. This prevents workers from hanging indefinitely on unresponsive API endpoints (e.g., ssl.recv_into() blocking for hours), as observed in sync job 79758185. Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
… for HTTPS-only check - Bump pyproject.toml version from 4.2.2 to 4.2.3 to match metadata.yaml - Remove http:// adapter mount (Google Ads API is HTTPS-only anyway) Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…of failing Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
Devin is currently unreachable - the session may have died. |
…500 under same version Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
/publish-connectors-prerelease
|
|
/ai-canary-prerelease
|
Canary Prerelease: StartingKicking off canary testing for this PR. I'll:
Customer-identifying details will be kept to the linked private issue. This PR will only see anonymized references. Session: https://app.devin.ai/sessions/23a6349f1a19409cb74826397d3fb59a |
Canary Prerelease — Phase 2 & 3Prerelease under test: Breaking-change + reversibility gate: PASSEvaluation
Canary selection: 8 connectionsUnpinned actors with successful syncs in the last 24h, picked for tier and region diversity. Customer identifiers are kept to the private tracking record.
All 8 actors are currently on their default version and have no existing pin.
NextRequesting HITL approval in Slack |
Canary Prerelease — Phase 4: 8/8 Connections PinnedHITL approval received. All 8 canary connections successfully pinned to
All selected actors already sync regularly and had successful runs within the last 24h, so no manual sync triggers are needed — the scheduler will drive the canary. Rollback-on-failure is a single Next: Monitoring window has started. Next update at ~T+60 min, broader sweep at ~T+2.5 h. |
Canary Prerelease — Monitoring Update (T+60 min)Pin applied at approximately 2026-04-23 13:14 UTC. Status as of 14:13 UTC.
All post-pin sync records report the prerelease Status: HEALTHY. Continuing to the final check at T+150 min. |
Canary Prerelease — Final ReportConnector: SummaryAll scheduled syncs on pinned connections completed successfully on the prerelease. 17 sync runs observed across 5 of 8 actors (the other 3 have longer intervals between scheduled syncs and did not run during the window). No new failures on any of the pinned actors, and no failures anywhere on the prerelease Detailed Results
All post-pin syncs report the prerelease Canary VerdictOverall Status: PASS The prerelease performed cleanly across all canary connections that ran during the window. No regressions observed. Recommend proceeding to formal release. Next steps:
For full customer details, see the linked private record. |
|
↪️ Triggering Reason: PR is ready for review with no prior |
Reviewing PR for connector safety and quality.
|
AI PR Review ReportReview Action: REQUEST CHANGES
🔧 Remediation RequiredRequired ActionsTest Coverage (FAIL):
Alternatively, add an 📋 PR DetailsConnector & PR InfoConnector(s): Risk LevelLevel: 4 — High Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended." Review Action DetailsREQUEST CHANGES - The Test Coverage enforced gate is FAIL. The PR title contains behavioral change indicators ("fix") but no new test files or test content were added. Once tests are added (or a sufficient justification is provided), re-run
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Test Coverage — Detailed Evidence:
Behavioral Changes — Keyword Matches:
Breaking Dependencies — CDK Version Jump:
📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationYou can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### Test Coverage
[Your explanation here — e.g., explain why the TimeoutHTTPAdapter and error handler change don't require new unit tests, or why existing CDK tests cover this behavior]Option 2: PR Comment After adding your response, re-run Note: Justifications provide context for the bot to evaluate. For the Test Coverage gate, a sufficient justification can lead to PASS if it explains why the change does not require a new test (e.g., existing tests already cover the behavior, or the change is a trivial refactor with no behavioral impact). |
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
|
What
Three changes to address Google Ads sync hangs and transient failures:
CDK deadlock fix — Updates
airbyte-cdkto a pre-release version (7.16.0.post1.dev23950401533) that includes a fix for a concurrent source deadlock (CDK PR #977). The deadlock occurs when the main thread (sole queue consumer) callsqueue.put()on a full sharedQueue(maxsize=10,000)viaConcurrentMessageRepository.emit_message(), blocking itself indefinitely since no other thread can drain the queue.HTTP socket timeout — Adds a
TimeoutHTTPAdapterthat enforces a 300-second (5-minute) socket-level idle timeout on all Google Ads API HTTP calls. This prevents workers from hanging indefinitely when the Google Ads API becomes unresponsive (observed in sync job 79758185 where a worker was stuck onssl.recv_into()for 3.5+ hours). Timed-out requests are automatically retried by the CDK sincerequests.ReadTimeoutis inTRANSIENT_EXCEPTIONS.Retry on HTTP 500 — Changes
base_error_handlerinsource_google_ads/manifest.yamlto treat HTTP 500 responses from the Google Ads API asRETRYinstead ofFAIL. Google Ads returns 500 withINTERNAL/TRANSIENT_ERRORstatus and documents them as retryable. Previously a single 500 would fail the whole attempt, forcing a full-job retry; now they are retried at the HTTP layer with CDK default exponential backoff. All streams that usebase_error_handlerinherit this.How
airbyte-cdkfrom7.9.2to7.16.0.post1.dev23950401533inpyproject.toml4.2.1to4.2.5in bothmetadata.yamlandpyproject.toml(accounting for 4.2.2/4.2.3/4.2.4 landing on master in the meantime)poetry.lockto matchTimeoutHTTPAdapterclass incomponents.pythat setstimeout=300on every outgoing requesthttps://in bothGoogleAdsHttpRequester.__post_init__andCustomGAQueryHttpRequester.__post_init__(Google Ads API is HTTPS-only, so nohttp://mount needed)base_error_handlerHTTP 500 filter fromaction: FAILtoaction: RETRYinsource_google_ads/manifest.yaml4.2.5Review guide
source_google_ads/components.py— NewTimeoutHTTPAdapterclass + mounting inGoogleAdsHttpRequester.__post_init__andCustomGAQueryHttpRequester.__post_init__source_google_ads/manifest.yaml—base_error_handler.response_filters[0].actionchanged fromFAILtoRETRYfor HTTP 500pyproject.toml— CDK version bump + connector version bumpmetadata.yaml— connectordockerImageTagbumpdocs/integrations/sources/google-ads.md— changelog entrypoetry.lock— auto-regeneratedpoe use-cdk-latestto re-pin before merge.7.9.2→7.16.0.post1. Verify no unintended breaking changes are pulled in beyond the deadlock fix.self._http_client._session.mount(...). This works today but is fragile — if CDK renames the internal session attribute, this will break silently (no timeout enforcement, no error).recv(), not total: The 300s timeout is a socket idle timeout. Streaming responses (Google AdssearchStream) that keep sending data are unaffected. However, a response that sends one byte every 4 minutes would never timeout — this is the expected tradeoff.DefaultErrorHandler's default backoff (exponential) with the CDK default max attempts. No custombackoff_strategiesadded.2.0) instead of the original Poetry 2.0.1 (lock-version2.1). This removesgroupsandmarkersmetadata from package entries. Confirm this doesn't cause issues with CI or other contributors using Poetry 2.x.User Impact
No user-facing impact at the schema or config level. Syncs that previously hung on unresponsive Google Ads API calls, or partially failed on transient HTTP 500s, should now recover automatically.
Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/bc28bf62dcc3400ca2906b3131b84394