fix(source-google-search-console): guard numeric fields against complex types (AI-Triage PR)#75426
Conversation
…ex types to prevent serialization deadlocks Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
bot_apk seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 6b4e766. |
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
|
|
↪️ Triggering Reason: Draft PR with CI passing. Guards numeric fields against null/non-numeric values in Google Search Console. |
Fix Validation EvidenceOutcome: Fix/Feature Proven Successfully Evidence SummaryTested pre-release Next Steps
Connector & PR DetailsConnector: Evidence PlanProving Criteria
Disproving Criteria
Cases Attempted
Pre-flight Checks
Regression Tests
Detailed Evidence LogTimeline
Live Test: Sync Job 76647297
Note: Connection IDs and detailed logs are recorded in the linked private issue. 🤖 Devin Session: https://app.devin.ai/sessions/c8e21c2bd8f04048a1232e29fa78005b |
|
|
add unit tests for SanitizeNumericFields |
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
suisuixia42 Added unit tests for
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
↪️ Triggering Reason: Prove-fix completed with live validation (783K records, zero failures across 7 streams). CI passing. Ready for review. |
Reviewing PR for connector safety and quality.
|
AI PR Review ReportReview Action: NO ACTION (NOT ELIGIBLE)
📋 PR Details & EligibilityConnector & PR InfoConnector(s): Auto-Approve EligibilityEligible: No Review Action DetailsNO ACTION (NOT ELIGIBLE) — All gates pass but the PR is not eligible for auto-approval because it contains functional code changes (new Python class, manifest YAML modifications, new unit tests). No PR review submitted. Human review is required.
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
📚 Evidence ConsultedEvidence
|
…ex types (AI-Triage PR) (airbytehq#75426) Co-authored-by: bot_apk <apk@cognition.ai> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: suisuixia42 <suisui.xia@airbyte.io>
What
Resolves https://github.com/airbytehq/oncall/issues/11654:
Related: #74883
Google Search Console's search analytics streams can produce Python
complexnumbers in numeric metric fields (ctr,position,clicks,impressions) due to Jinja interpolation in the CDK. These values are not JSON-serializable, causingTypeErrorexceptions that crash worker threads and deadlock the concurrent read pipeline.Companion to CDK-level defense-in-depth fix: airbytehq/airbyte-python-cdk#954
How
Adds a
SanitizeNumericFieldscustomRecordTransformationincomponents.pythat:complextype values.realcomponent (afloat) to preserve correct numeric dataThis transformation is wired into all 13 search analytics stream definitions in
manifest.yaml(9 regular streams, 3 keyword streams, 1 dynamic custom reports template) as the last transformation in each stream's pipeline.Why connector-level instead of CDK-only? The CDK fix now raises
AirbyteTracedExceptionon non-serializable types, which prevents the deadlock but stops the sync entirely. This connector-level fix proactively converts complex values to correct floats (e.g.,complex(0.0423, 0).real→0.0423), allowing the sync to complete successfully with correct data.Review guide
components.py— NewSanitizeNumericFieldsclass (lines 139–165). Core logic is simple:isinstancecheck +.realextraction + warning log (field name only, value omitted to avoid logging sensitive data).manifest.yaml— 13 identicalCustomTransformationinsertions. Verify the transformation appears as the last entry in each stream'stransformationslist (order matters — fields must be extracted fromkeysbefore sanitization runs). Streams:search_analytics_all_fields,search_analytics_by_country,search_analytics_by_date,search_analytics_by_device,search_analytics_by_page,search_analytics_by_query,search_analytics_page_report,search_analytics_site_report_by_page,search_analytics_site_report_by_site,search_analytics_keyword_page_report,search_analytics_keyword_site_report_by_page,search_analytics_keyword_site_report_by_site, and the dynamicstream_template.unit_tests/test_components.py— 6 new tests inTestSanitizeNumericFieldscovering: complex value sanitization, normal value passthrough, None handling, missing fields, nonzero imaginary parts, and warning log emission.Human review checklist
isinstance(value, complex)does not match regularfloat/intvalues (it shouldn't — Python'scomplexbuiltin is a distinct type, not a superclass offloat)SanitizeNumericFieldsmust run after key extraction (AddFields + RemoveFields) since the metric fields (clicks,impressions,ctr,position) come directly from the API response, not fromkeysUser Impact
complextype serialization errors will now complete successfully with correct numeric values preserved.Can this PR be safely reverted and rolled back?
Updates since last revision
AirbyteTracedExceptioninstead ofdefault=strfallback (per review from suisuixia42)Link to Devin session: https://app.devin.ai/sessions/d8317c1f4ce64f70b5e807425b72aca2