fix(source-github): make parse_response and error handlers defensive against unexpected response shapes#77685
Conversation
…against unexpected response shapes Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Nice work on the defensive hardening. Pattern is consistent, test coverage is solid. Below are the fixes I'd like to see before merge, framed by priority. P1 — Should fix before mergeP1.1 — Guard
|
…parse failure, consistency fixes Co-Authored-By: bot_apk <apk@cognition.ai>
|
Thanks for the thorough review Patrick Nilan (@pnilan)! Addressed all items in 7338b72: P1.1 — Added P1.2 — P2.1 — Skipped: P2.3 — Dropped P2.4 — Fixed P2.5 — Added All 145 unit tests pass locally. |
|
Deploy preview for airbyte-docs ready!
Deployed with vercel-action |
| try: | ||
| response_data = response_or_exception.json() | ||
| except ValueError: | ||
| return False |
There was a problem hiding this comment.
Done — added a warning log with the first 50 chars of the response body in 02fc609.
| try: | ||
| body = response.json() | ||
| except ValueError: | ||
| return False |
There was a problem hiding this comment.
Done — added a warning log with HTTP status and first 50 chars of the response body in 02fc609.
| "`%s` received non-JSON response (HTTP %s, first 500 chars: %r). Yielding no records for this page.", | ||
| self.name, | ||
| response.status_code, | ||
| response.text[:500], |
There was a problem hiding this comment.
too many characters -> :50
There was a problem hiding this comment.
Done — reduced to [:50] and updated the log message text to match in 02fc609.
| "`%s` received non-JSON response (HTTP %s, first 500 chars: %r). Yielding no records for this page.", | ||
| self.name, | ||
| response.status_code, | ||
| response.text[:500], |
| "`%s` received non-JSON response (HTTP %s, first 500 chars: %r). Yielding no records for this page.", | ||
| self.name, | ||
| response.status_code, | ||
| response.text[:500], |
| "`%s` received non-JSON response (HTTP %s, first 500 chars: %r). " "Yielding base record without timeline events.", | ||
| self.name, | ||
| response.status_code, | ||
| response.text[:500], |
…ror handler catch blocks Co-Authored-By: bot_apk <apk@cognition.ai>
|
/ai-prove-fix
|
Fix Validation Evidence —
|
| Command | Result |
|---|---|
SPEC |
✅ No regression |
CHECK |
✅ No regression |
DISCOVER |
✅ No regression |
READ |
✅ No regression (39 streams, 1028 vs 1026 records) |
All 39 streams tested. The streams with modified parse_response methods (workflows, issue_timeline_events) produced identical record counts between target and control.
Pre-flight Checks
- Viability: Fix wraps
.json()calls in try/except, validates expected keys/types, logs warnings instead of crashing - Safety: No malicious code, no credential changes — pure defensive error handling
- Breaking Change: Not breaking — no schema/spec/stream/state changes, patch bump (2.1.24 → 2.1.25)
- Reversibility: Fully reversible — no state/config format changes
Evidence Plan & Rationale
Strategy: Regression tests only (no live connection tests needed)
Proving criteria: Regression tests show no behavioral differences (all ✅) between target and control, demonstrating the fix does not regress normal operation.
Rationale: This fix adds defensive error handling for edge cases (non-JSON responses, missing keys) that only occur during GitHub API outages. In normal operation, there should be zero behavioral change. The error-case behavior is validated by 36 unit tests in the PR.
Live connection testing was not needed because:
- Error conditions cannot be triggered against real GitHub API
- All affected streams produce identical output in regression tests
source-githubdoes not use forced OAuth write-back
|
|
|
|
↪️ Triggering Reason: PR is ready for review with no prior |
Reviewing PR for connector safety and quality.
|
AI PR Review ReportReview Action: APPROVED
📋 PR DetailsConnector & PR InfoConnector(s): Risk LevelLevel: 4 — High Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended." Review Action DetailsAPPROVED — All enforced gates pass (PR Hygiene, Code Hygiene, Test Coverage, Code Security, CI Checks, Live / E2E Tests). The Behavioral Changes Anti-Pattern gate is FAIL on the keyword heuristic, but Anti-Pattern gates are warning-level — they elevate 🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Spec Comparison (Backwards Compatibility):
Author Justifications:
📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationYou can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### {Gate Name}
[Your explanation here]Option 2: PR Comment After adding your response, re-run Note: The Behavioral Changes FAIL above is a keyword-driven warning on a defensive wrapper (no real change to rate-limit/retry/timeout policy). It elevates |
|
update from main, resolve confclits |
…n test assertion Co-Authored-By: bot_apk <apk@cognition.ai>
|
Done — merged from main, resolved conflicts, and bumped version to 2.1.27. Also added a warning log to the new |
|
/ai-canary-prerelease
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Canary Prerelease: StartingStarting canary prerelease testing for this PR. I'll:
|
|
Canary Prerelease: Approval RequestedConnector: Phase 2 — Breaking change & reversibility gate: PASS
Phase 3 — Proposed canary set (7 source actors, 9 connections, all TIER_2):
Coverage: 3 internal Airbyte source actors + 4 external customer actors. Destinations: GCS, ab-analytics, BigQuery, Redshift, Postgres. Regions: EU, US, US-Central. Approval requested via Slack HITL (escalation workflow run). Pinning will not begin until I receive an Approve callback. For full canary set details (workspace IDs, actor IDs, connection IDs), see the linked private internal issue. |
Canary Prerelease: Approved & PinnedApproval received via Slack HITL. All 7 source actors (9 connections) pinned to
Customer-C-EU-1/-2/-3 share a single source actor (3 connections to BigQuery), so this is 7 actor pins covering 9 connections. Now entering Phase 5 — monitoring syncs. Reminders set at +60 min (initial check-in) and +2.5h (final PASS/FAIL/BLOCKED verdict). For full mapping (workspace IDs, connection IDs, actor IDs, customer names) and rollback steps, see the linked private issue. |
Canary Monitoring Update — t+85 minPinned at approximately 01:14 UTC; sampling at approximately 02:39 UTC.
Overall status so far: HEALTHY (1 sync on Two pre-pin Customer-C-EU jobs (started 00:07 and 00:13) are still in Coverage is light because most connections are on schedules >= 2h. Extending monitoring by approximately 90 min (+4h reminder) so faster connections (Internal-B-US-2, Customer-E-US-1) accumulate multiple post-pin syncs and slower ones (Customer-C-EU-3, Customer-F-USC-1) get at least one. |
Canary Monitoring Update — t+155 minSampled at approximately 03:50 UTC.
3 successful syncs on Two pre-pin Customer-C-EU jobs (started 00:07 / 00:13) are still in Next checkpoint at t+240 min (final PASS/FAIL/BLOCKED verdict). |
Canary Prerelease: Final ReportConnector: SummaryDefensive Detailed Results
Totals: 4 successful post-pin syncs · 2 in progress · 0 failures · 0 new error patterns · record counts and bytes match baseline exactly. Canary VerdictOverall Status: PASS The prerelease performed cleanly across the connections that produced post-pin syncs in the monitoring window. The defensive Recommend proceeding to formal release. Next steps:
For full per-actor IDs, workspace links, and the long-running pre-pin job note, see the linked private internal issue. |
|
↪️ Triggering Reason: Prior AI review APPROVE was on head_sha |
Reviewing PR for connector safety and quality.
|
AI PR Review ReportReview Action: APPROVED
📋 PR DetailsConnector & PR InfoConnector(s): Risk LevelLevel: 4 — High Risk Level is reported for downstream consumers (e.g. auto-merge policy, reviewer routing). It does not change the review action — APPROVE here means "no blocking objection," not "safe to merge unattended." Review Action DetailsAPPROVED — All enforced gates pass (PR Hygiene, Test Coverage, Code Security, CI Checks, Live / E2E Tests). The Anti-Pattern Behavioral Changes gate is FAIL but is a WARNING-level gate per the playbook — it elevates Risk Level (forced ≥4) but does not block APPROVE. octavia-bot will submit the APPROVE review based on the marker above. This does NOT authorize auto-merge; the merge decision remains with humans (or a separate auto-merge policy reading 🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Spec Comparison (Backwards Compatibility)No Behavioral Changes DetailKeywords matched in diff hunks (case-insensitive):
Semantic analysis: The change is purely defensive — wraps existing 📚 Evidence ConsultedEvidence
|
| try: | ||
| events_list = response.json() | ||
| except ValueError: | ||
| self.logger.warning( | ||
| "`%s` received non-JSON response (HTTP %s, first 50 chars: %r). " "Yielding base record without timeline events.", | ||
| self.name, | ||
| response.status_code, | ||
| response.text[:50], | ||
| ) | ||
| yield record | ||
| return | ||
| if not isinstance(events_list, list): | ||
| self.logger.warning( | ||
| "`%s` expected a JSON list but got %s (HTTP %s). " "Yielding base record without timeline events.", | ||
| self.name, | ||
| type(events_list).__name__, | ||
| response.status_code, | ||
| ) | ||
| yield record | ||
| return |
There was a problem hiding this comment.
this code is repeated multiple times, pull into its own private method
There was a problem hiding this comment.
Done — extracted the repeated defensive JSON parsing into GithubStream._safe_json_list(response, key=None) in 4074cc3.
The helper handles both cases:
- Keyed dict (
key="workflows"/"workflow_runs"/"jobs"): parses JSON, extractsbody[key], validates it's a list - Top-level list (
key=None): parses JSON, validates the body itself is a list
Returns None on any failure (non-JSON, missing key, wrong type) with a warning log, so callers short-circuit with a simple if items is None: return.
All 4 call sites (Workflows, WorkflowRuns, WorkflowJobs, IssueTimelineEvents) now use this helper. Net -28 lines. All 161 tests pass.
…t helper Co-Authored-By: bot_apk <apk@cognition.ai>
… dates, refine permissions section - Add 2.1.27 changelog entry with correct merge date (2026-05-03 instead of 2026-05-02) and a clearer subject describing the defensive parse_response and error-handler behavior introduced in PR #77685 - Fix 2.1.25 changelog date (2026-05-02, was 2026-05-01) and rewrite its subject to reference SAML SSO authorization - Fix 2.1.15 changelog date (2026-04-01, was 2026-03-27) - Fix 2.1.9 changelog date (2026-02-06, was 2026-01-27); the dependency-update PR was opened 2026-01-27 but merged 2026-02-06 - Reword 2.1.26 entry to describe the generic 'feature is disabled' skip behavior - Remove the duplicate 'Max Waiting Time' setup step (one copy still claimed default 10 min / max 60 min, contradicting the spec which is default 120 min / max 240 min) - Tighten the 'Start date' note for clarity and grammar - Rewrite 'Permissions and scopes' to call out the minimum scope set, OAuth-vs-PAT differences, per-stream requirements (Collaborators, Teams, Projects), and the SAML SSO authorization requirement that produces 403s on PATs that have the right scopes but are not SSO-authorized
What
Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/16281:
Several
parse_responsemethods in source-github and two functions inerrors_handlers.pyassume specific JSON response shapes. When GitHub returns unexpected responses (HTML error pages from load balancers, malformed JSON, missing keys), these methods crash withTypeError/KeyError/json.JSONDecodeErrorand bypass the CDK error pipeline.How
Applied a consistent defensive pattern to each affected site:
streams.py (4 sites):
Workflows.parse_response— wrap.json()in try/exceptValueError, validateworkflowskey is a listWorkflowRuns.parse_response— same pattern forworkflow_runskeyWorkflowJobs.parse_response— same pattern forjobskey (was using["jobs"]which raisesKeyError)IssueTimelineEvents.parse_response— validate response is a list (not a dict error wrapper)errors_handlers.py (3 sites):
5.
is_conflict_with_empty_repository— wrap.json()in try/exceptValueError6.
GithubStreamABCErrorHandler.interpret_response— extract.json()call into_safe_json_check_graphql_rate_limitedhelper with try/except7.
GitHubGraphQLErrorHandler.interpret_response— extract.json().get("errors")into_safe_json_get_errorshelper with try/exceptAll sites log a warning and yield no records (or return False) instead of crashing.
Declarative-First Evaluation
Not applicable — source-github is a Python CDK connector (
cdk:python), not a declarative/manifest-only connector.Test Coverage
Added 36 new parametrized unit tests covering all 7 fixed sites:
test_workflows_parse_response_defensive— 7 cases (valid, empty list, HTML body, empty body, missing key, wrong type, None key)test_workflow_runs_parse_response_defensive— 7 casestest_workflow_jobs_parse_response_defensive— 8 cases (includes no-cursor record)test_issue_timeline_events_parse_response_defensive— 6 cases (includes dict-instead-of-list)test_is_conflict_with_empty_repository_defensive— 5 casestest_graphql_rate_limit_check_with_html_response— 1 casetest_graphql_error_handler_with_html_response— 1 casetest_graphql_error_handler_with_valid_errors— 1 caseAll 144 unit tests pass locally.
Review guide
source_github/streams.py— defensiveparse_responsefor Workflows, WorkflowRuns, WorkflowJobs, IssueTimelineEventssource_github/errors_handlers.py— safe JSON helpers and wrapped.json()callsunit_tests/test_stream.py— 36 new parametrized tests at bottom of fileUser Impact
Syncs that previously crashed with
TypeError,KeyError, orjson.JSONDecodeErrorwhen GitHub returned unexpected responses (HTML error pages, malformed JSON) will now log a warning and skip the affected page, allowing the sync to continue.Can this PR be safely reverted and rolled back?
Prior Art
Link to Devin session: https://app.devin.ai/sessions/ce182b4200d5420d8be36f435b7415a8
Important
⚡ Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.