Skip to content

docs: document primary key precedence in airbyte-protocol#62435

Closed
devin-ai-integration[bot] wants to merge 1 commit into
masterfrom
devin/1751154418-document-primary-key-precedence
Closed

docs: document primary key precedence in airbyte-protocol#62435
devin-ai-integration[bot] wants to merge 1 commit into
masterfrom
devin/1751154418-document-primary-key-precedence

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Jun 28, 2025

Document primary key precedence in airbyte-protocol

Summary

This PR adds comprehensive documentation for primary key precedence behavior in the Airbyte protocol specification. Based on extensive codebase analysis across platform, CDK, and connector implementations, this documents the existing behavior where source_defined_primary_key takes precedence over user-configured primary_key.

Key Changes:

  • Added source_defined_primary_key field documentation to the AirbyteStream section
  • Added new "Logic for resolving Primary Key" section following the pattern of existing cursor field precedence documentation
  • Updated ConfiguredAirbyteStream primary_key field description to reference the new precedence rules
  • Applied changes consistently across main documentation and both versioned files (v1.6, v1.7)

Technical Context:
The investigation revealed that Airbyte intentionally prioritizes source-defined primary keys as immutable data integrity constraints rather than user-overridable defaults. This ensures data consistency by leveraging source expertise (e.g., actual database primary keys, API entity identifiers) over user preferences.

Review & Testing Checklist for Human

  • Verify precedence logic accuracy: Test with real connectors that have both source_defined_primary_key and user-configured primary_key to confirm documented behavior matches actual system behavior
  • Check documentation consistency: Ensure all three protocol documentation files have identical changes and no copy-paste errors
  • Validate integration with existing patterns: Confirm the new "Logic for resolving Primary Key" section follows the same style and structure as the existing "Logic for resolving the Cursor Field" section
  • Test documentation build: Verify the updated documentation builds successfully in the docusaurus environment without broken links

Recommended Test Plan:

  1. Set up a connector with both source-defined and user-configured primary keys
  2. Verify the system uses the source-defined key and validates matching when both are present
  3. Check that the documentation builds cleanly and links resolve correctly

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    MainDoc["docs/platform/understanding-airbyte/airbyte-protocol.md"]:::major-edit
    Version16["docusaurus/platform_versioned_docs/version-1.6/understanding-airbyte/airbyte-protocol.md"]:::major-edit  
    Version17["docusaurus/platform_versioned_docs/version-1.7/understanding-airbyte/airbyte-protocol.md"]:::major-edit
    
    CatalogHelper["AirbyteCatalogHelper.kt<br/>(selectPrimaryKey method)"]:::context
    CDKCode["CDK Primary Key Logic"]:::context
    ConnectorImpls["Connector Implementations<br/>(JDBC, API sources)"]:::context
    
    MainDoc --> CatalogHelper
    Version16 --> CatalogHelper  
    Version17 --> CatalogHelper
    CatalogHelper --> CDKCode
    CDKCode --> ConnectorImpls
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB  
    classDef context fill:#FFFFFF
Loading

Notes

  • This documentation change is based on comprehensive analysis of the codebase behavior rather than new feature implementation
  • The validation language was specifically updated per user feedback to be more nuanced: "Mismatches may or may not result in an error, depending upon when and where the discrepancy is identified"
  • All changes maintain consistency with existing documentation patterns, particularly following the cursor field precedence section structure
  • Session Info: Requested by Aaron ("AJ") Steers (@aaronsteers), Devin session: https://app.devin.ai/sessions/5de41b96ea294d658df32235daf66f30

- Add source_defined_primary_key field documentation to AirbyteStream
- Add 'Logic for resolving Primary Key' section following cursor field pattern
- Update ConfiguredAirbyteStream primary_key description with precedence reference
- Document that source_defined_primary_key takes precedence over configured primary_key
- Include validation logic: mismatches may or may not result in errors depending on context
- Apply changes consistently across main and versioned documentation files

This documents the existing behavior discovered through comprehensive codebase analysis
where source-defined primary keys are treated as data integrity constraints that
cannot be overridden by user configuration.

Co-Authored-By: AJ Steers <aj@airbyte.io>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 28, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 29, 2025 0:08am

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Original prompt from AJ Steers:

@Devin - Do some research to make sure the order of precedence is correct: "We will use `primary_key` if it is set explicitly in the configured catalog, otherwise we will fall back to `source_defined_primary_key`, if set." I think the easiest way to confirm is to look at destination-mysql or destination-snowflake implementations, and or the java/kotlin destination CDK(s).

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

devin-ai-integration Bot commented Jun 28, 2025

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 Greetings, Contributor!

Here are some helpful tips and reminders for your convenience.

Helpful Resources

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • /format-fix - Fixes most formatting issues.
  • /bump-version - Bumps connector versions.
    • You can specify a custom changelog by passing changelog. Example: /bump-version changelog="My cool update"
    • Leaving the changelog arg blank will auto-populate the changelog from the PR title.
  • /run-cat-tests - Runs legacy CAT tests (Connector Acceptance Tests)
  • /build-connector-images - Builds and publishes a pre-release docker image for the modified connector(s).

📝 Edit this welcome message.

Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches my understanding (after not a small amount of spelunking!) and it fills a documentation gap I've run into more than once.

Next week I'll ask for review also from someone in the Move team, to ensure this is accurate.

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Closing due to inactivity for more than 7 days. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants