Skip to content

fix: handle EOFError in monkey_patched_get_item for corrupted cache entries#984

Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1775605599-fix-eoferror-monkey-patch
Draft

fix: handle EOFError in monkey_patched_get_item for corrupted cache entries#984
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1775605599-fix-eoferror-monkey-patch

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

Summary

Resolves https://github.com/airbytehq/oncall/issues/11909:

The monkey_patched_get_item function in http_client.py does not handle EOFError that can occur when self.deserialize() encounters corrupted/truncated pickle data in the SQLite cache. This corruption can happen due to fast_save=True + synchronous=OFF (as noted in existing code comments). The error propagates uncaught and crashes the sync.

The fix wraps the self.deserialize(key, row[0]) call in a try/except EOFError that raises KeyError(key) instead, treating the corrupted entry as a cache miss. This causes requests_cache to transparently re-fetch from the upstream API. This is the same pattern used in python-diskcache for the same issue.

Observed in production via Sentry on source-hubspot v6.4.2 (stream: deal_splits), but this is a CDK-level fix benefiting all connectors using HTTP caching.

Review & Testing Checklist for Human

  • Should the catch be broader than just EOFError? Other deserialization errors are possible (e.g., pickle.UnpicklingError). The Sentry traces specifically show EOFError, but consider whether a broader catch (e.g., Exception with a log warning) would be more resilient. A narrower catch is safer but may not cover all corruption modes.
  • Verify KeyError is the correct cache-miss signal. Confirm that raising KeyError from __getitem__ causes requests_cache to treat the entry as missing and re-fetch, rather than triggering unexpected behavior.
  • Run a connector that uses HTTP caching (e.g., source-hubspot) to verify no regression in normal cache hit/miss behavior.

Notes

  • This is a non-breaking CDK-level patch fix. No connector version bump needed.
  • The existing test_assert_requests_cache_version test is unchanged and still guards against silent requests_cache upgrades.
  • Prior art: PR #725 introduced this monkey patch for sqlite3.InterfaceError; this extends it for EOFError.

Link to Devin session: https://app.devin.ai/sessions/01100afa4b594aa8a286f53cedc5984b

…ntries

When the SQLite cache contains corrupted/truncated pickle data (which
can happen due to fast_save=True + synchronous=OFF), the requests_cache
deserialization pipeline raises EOFError. This was not caught, causing
the sync to crash.

Catch EOFError during deserialization and convert it to KeyError, which
requests_cache treats as a cache miss and transparently re-fetches from
the upstream API.

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775605599-fix-eoferror-monkey-patch#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775605599-fix-eoferror-monkey-patch

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

PyTest Results (Fast)

4 015 tests  +3   4 004 ✅ +3   7m 5s ⏱️ -38s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c1cb2c7. ± Comparison against base commit 4aaafcf.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

PyTest Results (Full)

4 018 tests  +3   4 006 ✅ +3   11m 14s ⏱️ +15s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c1cb2c7. ± Comparison against base commit 4aaafcf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants