fix(sql): Add fallback to source_defined_primary_key in CatalogProvider#627
Conversation
- Fix CatalogProvider.get_primary_keys() to fall back to stream.source_defined_primary_key when primary_key is empty/None - Add comprehensive unit tests covering all fallback scenarios - Resolves bug where destinations ignore source-defined primary keys when configured primary key is not set - Affects all destinations using CDK SQL processor base classes Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Original prompt from AJ Steers: |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
Looks like our new test suite assumes all Python connectors are sources. Since this is a destination and doesn't support 'discover', the Nothing we need to solve here in this PR and not related to our change. |
|
/bump-version |
…y fix - Update airbyte-cdk dependency to use dev branch devin/1751064114-fix-primary-key-fallback - This branch contains the fix for primary key fallback logic in CatalogProvider - Allows end-to-end testing of the primary key fix with MotherDuck destination - References CDK PR: airbytehq/airbyte-python-cdk#627 Co-Authored-By: AJ Steers <aj@airbyte.io>
Address @aaronsteers feedback to use more concise approach: pks = configured_stream.primary_key or configured_stream.stream.source_defined_primary_key or [] This maintains exact same functionality while being more readable. Co-Authored-By: AJ Steers <aj@airbyte.io>
- Consolidate 7 individual test methods into 1 parametrized test with 6 scenarios - Use list[str] format for parameters with wrapping logic in test - Remove case normalization test since CatalogProvider doesn't handle normalization - Reduce code from ~150 lines to ~50 lines while maintaining full coverage Co-Authored-By: AJ Steers <aj@airbyte.io>
…behavior - Add explanation that method uses primary_key if set explicitly in configured catalog - Otherwise falls back to source_defined_primary_key if set - Addresses GitHub comment from @aaronsteers Co-Authored-By: AJ Steers <aj@airbyte.io>
- Fix Ruff Format Check CI failure - Apply proper formatting to expanded docstring in get_primary_keys method Co-Authored-By: AJ Steers <aj@airbyte.io>
… unset - Reverse priority order to favor source_defined_primary_key over primary_key - Return None when neither primary key field is set - Update all callers to handle None gracefully with 'pks or []' coalescing - Update unit tests to reflect new behavior and priority order - Addresses @aaronsteers feedback on primary key fallback logic Co-Authored-By: AJ Steers <aj@airbyte.io>
…perations - Replace 'or []' coalescing with explicit guard statements in merge methods - Raise AirbyteInternalError when no primary keys available for merge operations - Addresses @aaronsteers feedback on code clarity and error handling - Ensures merge operations fail fast when primary keys are missing Co-Authored-By: AJ Steers <aj@airbyte.io>
- Split long docstring line in get_primary_keys method across multiple lines - Addresses @aaronsteers feedback on line length issue in PR comment - Maintains readability while complying with formatting standards Co-Authored-By: AJ Steers <aj@airbyte.io>
source_defined_primary_key in CatalogProvider
…lues - Updated docstring to specify 'primary key column names' instead of just 'primary keys' - Added Returns section explaining the method returns column names or None - Addresses @dbgold17's clarifying question about terminology in GitHub comment Co-Authored-By: AJ Steers <aj@airbyte.io>
|
Hi David Gold (@dbgold17)! Great question about the terminology. I've just updated the docstring to clarify this. The method returns column names (not actual key values). Specifically:
The confusion was understandable since the original docstring just said "primary keys" which could be ambiguous. Thanks for pointing this out! |
David Gold (dbgold17)
left a comment
There was a problem hiding this comment.
discussed on a huddle. LGTM minus some variable name confusion
…ithub.com/airbytehq/airbyte-python-cdk into devin/1751064114-fix-primary-key-fallback
acb92dd
into
main
Related:
source_defined_primary_keywhen defined (CDK bump) airbyte#62133source_defined_primary_keyin CatalogProvider #627 (this pr)Fix primary key fallback in CatalogProvider for SQL destinations
Summary
This PR fixes a critical bug in the Airbyte Python CDK where the
CatalogProvider.get_primary_keys()method ignores source-defined primary keys when configured primary keys are empty or None. This affects all destinations using the CDK's SQL processor base classes, not just MotherDuck.Root Cause: The CDK's
CatalogProvider.get_primary_keys()method only usesConfiguredAirbyteStream.primary_keyand returns an empty list when it's falsy, without falling back tostream.source_defined_primary_keyas specified in the Airbyte protocol.Fix: Added fallback logic to use
stream.source_defined_primary_keywhenprimary_keyis empty/None, with comprehensive unit test coverage.Review & Testing Checklist for Human
source_defined_primary_keywhenprimary_keyis empty/Noneprimary_keystill work correctly (configured primary key should take precedence)configured_stream.stream.source_defined_primary_keylookup doesn't cause performance issuesRecommended Test Plan
primary_keybut non-emptysource_defined_primary_keyDiagram
graph TD A[airbyte_cdk/sql/shared/catalog_providers.py]:::major-edit B[unit_tests/sql/shared/test_catalog_providers.py]:::major-edit C[ConfiguredAirbyteStream.primary_key]:::context D[AirbyteStream.source_defined_primary_key]:::context E[SQL Destinations]:::context F[MotherDuck Destination]:::context G[DuckDB Destination]:::context H[Other SQL Destinations]:::context A --> C A --> D A --> E E --> F E --> G E --> H B --> A subgraph Legend L1[Major Edit]:::major-edit L2[Minor Edit]:::minor-edit L3[Context/No Edit]:::context end classDef major-edit fill:#90EE90 classDef minor-edit fill:#87CEEB classDef context fill:#FFFFFFNotes
primary_keyinConfiguredAirbyteStreamoverrides source-defined primary keys when set, andsource_defined_primary_keyprovides the default from the sourceImportant
Auto-merge enabled.
This PR is set to merge automatically when all requirements are met.