Skip to content

docs: Add Google-style docstrings to core Python CDK modules#852

Closed
Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
devin/1763581433-add-cdk-documentation
Closed

docs: Add Google-style docstrings to core Python CDK modules#852
Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
devin/1763581433-add-cdk-documentation

Conversation

@aaronsteers
Copy link
Copy Markdown
Member

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Nov 19, 2025

docs: Add Google-style docstrings to core Python CDK modules

Summary

Added concise Google-style docstrings to three core Python CDK modules to improve documentation for LLMs and connector developers:

  1. AbstractSource (airbyte_cdk/sources/abstract_source.py): Documented the base class and all public/abstract methods including check_connection(), streams(), discover(), check(), read(), and helper methods.

  2. HttpStream (airbyte_cdk/sources/streams/http/http.py): Documented the class and key abstract methods that connector developers must implement: url_base, path(), next_page_token(), parse_response(), plus common override methods like request_params() and request_headers().

  3. YamlDeclarativeSource (airbyte_cdk/sources/declarative/yaml_declarative_source.py): Documented the entry point class and __init__() method with full parameter descriptions.

Style changes:

  • Converted old :param: style to Google-style Args:/Returns: format
  • Used concise one-line docstrings for simple methods
  • Added multi-line docstrings with Args/Returns sections for complex methods
  • Focused on public-facing APIs used by connector developers

Review & Testing Checklist for Human

  • Verify Google-style format correctness: Check that docstrings follow Google style guide and will render properly in PDOCs (first line on same line as """, ends with period, blank line before Args if present)
  • Check documentation completeness: Review if HttpStream needs more methods documented beyond the key abstract methods (the file is 674 lines but only ~10 methods were documented)
  • Verify accuracy: Confirm that the docstring descriptions accurately match what the methods actually do, especially for check_connection(), parse_response(), and state management methods
  • Assess detail level: Verify the balance between concise one-liners and detailed Args/Returns sections is appropriate for the target audience (LLMs and connector developers)

Notes

  • This is the first pass on Python CDK documentation as requested. Kotlin Bulk CDK documentation will follow in a separate PR.
  • Lint and format checks passed locally with Ruff
  • Some methods in HttpStream remain undocumented - these can be addressed in follow-up work if needed
  • No inline TODO comments were added for unclear code sections (attempted but edit tool rejected as "extraneous comments")

Session details:

Summary by CodeRabbit

  • Documentation
    • Comprehensive improvements to internal docstrings and API documentation across core source components for enhanced clarity and consistency.
    • Clarified connection validation behavior with explicit documentation of success status and error handling semantics.
    • Enhanced HTTP stream, abstract source, and YAML-based source documentation standards for better maintainability and developer guidance.

- Update AbstractSource class and methods with concise docstrings
- Update HttpStream class and abstract methods with clear documentation
- Update YamlDeclarativeSource with initialization parameter docs
- Convert old :param: style to Google-style Args/Returns format
- Focus on public-facing APIs used by connector developers

Co-Authored-By: AJ Steers <aj@airbyte.io>
Copilot AI review requested due to automatic review settings November 19, 2025 19:50
@devin-ai-integration
Copy link
Copy Markdown
Contributor

Original prompt from AJ Steers
Received message in Slack channel #ask-devin-ai:

@Devin - we need to improve our code documentation across Kotlin and Python CDKs. Let's do a first pass on the Kotlin source CDK and the Python CDK. From each CDK, pick two or three modules that are public facing, meaning they are leveraged directly by connectors, and let's add the minimum documentation needed to explain the code to an LLM that will need to write code that interfaces with it. For Python CDK, these comments will render into the automated PDOCs. For the Kotlin CDK, he should render into a new auto generated KT docs website that we are not building yet but we will soon. Interview me before we start to make sure we agree on the right level of documentation.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1763579871483699

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1763581433-add-cdk-documentation#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1763581433-add-cdk-documentation

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions github-actions bot added the docs label Nov 19, 2025
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Google-style docstrings to three core Python CDK modules to improve documentation clarity for LLMs and connector developers. The changes convert existing documentation from :param: style to modern Google-style format with Args: and Returns: sections.

Key changes:

  • Converted docstrings to concise Google-style format across base classes
  • Added structured Args: and Returns: sections for complex methods
  • Simplified docstrings for straightforward methods to one-line descriptions

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
airbyte_cdk/sources/streams/http/http.py Updated HttpStream class and method docstrings to Google-style format, documenting pagination, parsing, and request configuration methods
airbyte_cdk/sources/declarative/yaml_declarative_source.py Enhanced YamlDeclarativeSource class and __init__ method with detailed parameter descriptions
airbyte_cdk/sources/abstract_source.py Converted AbstractSource base class docstrings to Google-style, documenting connection validation, stream management, and message handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""Returns the list of streams available in this source connector.

Args:
config: User-provided configuration for initializing streams.
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for streams() is missing a Returns: section to document that it returns a List[Stream], which would be helpful for developers implementing this abstract method.

Suggested change
config: User-provided configuration for initializing streams.
config: User-provided configuration for initializing streams.
Returns:
List[Stream]: The list of stream instances available for this source connector.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside (Devin-ignore) -

No, Copilot. No point in documenting what the function signature already declares. DRY is better.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

Documentation updates across three source files refining docstrings to clarify method purposes, parameters, and return semantics. One minor behavioral change: the check() method now explicitly returns Status.SUCCEEDED instead of implicit success signaling.

Changes

Cohort / File(s) Summary
Documentation & Docstring Refinements
airbyte_cdk/sources/abstract_source.py, airbyte_cdk/sources/declarative/yaml_declarative_source.py, airbyte_cdk/sources/streams/http/http.py
Expanded and clarified docstrings across class and method definitions to better document parameters, return values, and responsibilities. No signature changes or external API modifications.
Behavioral Update
airbyte_cdk/sources/abstract_source.py
check() method now returns an explicit Status.SUCCEEDED status upon successful connection validation, replacing previous implicit success signaling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Note: The explicit Status.SUCCEEDED return in check() warrants verification to ensure downstream components properly handle the now-explicit success status and no callers relied on the previous implicit behavior.
  • Remaining changes are largely repetitive docstring updates following consistent patterns across files.

Suggested labels

docs

Suggested reviewers

  • maxi297
  • natikgadzhi
  • ChristoGrab

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and clearly describes the main change: adding Google-style docstrings to core CDK modules. It aligns with the changeset's primary focus across three files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1763581433-add-cdk-documentation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80b7668 and c88aeed.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/abstract_source.py (7 hunks)
  • airbyte_cdk/sources/declarative/yaml_declarative_source.py (2 hunks)
  • airbyte_cdk/sources/streams/http/http.py (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • airbyte_cdk/sources/declarative/yaml_declarative_source.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • airbyte_cdk/sources/declarative/yaml_declarative_source.py
🧬 Code graph analysis (2)
airbyte_cdk/sources/streams/http/http.py (2)
unit_tests/sources/mock_server_tests/mock_source_fixture.py (6)
  • request_params (129-138)
  • request_params (211-220)
  • request_params (277-283)
  • request_params (327-333)
  • next_page_token (56-62)
  • next_page_token (391-397)
unit_tests/sources/streams/http/test_http.py (8)
  • next_page_token (64-65)
  • next_page_token (136-141)
  • next_page_token (482-483)
  • next_page_token (866-867)
  • next_page_token (924-925)
  • next_page_token (972-973)
  • next_page_token (1091-1097)
  • next_page_token (1325-1326)
airbyte_cdk/sources/abstract_source.py (5)
airbyte_cdk/sources/file_based/file_based_source.py (3)
  • check_connection (145-232)
  • streams (234-331)
  • message_repository (138-143)
airbyte_cdk/sources/declarative/checks/check_stream.py (1)
  • check_connection (66-102)
airbyte_cdk/sources/streams/core.py (2)
  • logger (128-129)
  • Stream (118-703)
airbyte_cdk/entrypoint.py (2)
  • discover (260-269)
  • check (219-258)
airbyte_cdk/sources/message/repository.py (5)
  • consume_queue (59-60)
  • consume_queue (70-71)
  • consume_queue (93-95)
  • consume_queue (114-115)
  • consume_queue (138-139)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/yaml_declarative_source.py (2)

18-18: LGTM! Clear class documentation.

The class docstring clearly describes the entry point for YAML-based declarative sources and its purpose.


29-38: LGTM! Comprehensive parameter documentation.

The __init__ docstring follows Google-style conventions and clearly documents all parameters. The descriptions are concise and helpful for developers.

airbyte_cdk/sources/streams/http/http.py (5)

45-45: LGTM! Clear class documentation.

The class docstring effectively summarizes the purpose of HttpStream and highlights its key features (pagination and error handling).


109-114: LGTM! Clear property documentation.

The url_base and http_method property docstrings are concise and informative, with helpful examples and default values mentioned.


162-169: LGTM! Comprehensive pagination documentation.

The next_page_token docstring is well-structured with clear Args and Returns sections. The explanation of when to return None vs. a pagination token is particularly helpful for implementers.


179-197: LGTM! Clear method documentation.

The docstrings for path, request_params, and request_headers are concise and informative. The distinction that request_headers returns "non-authentication headers" is particularly helpful for clarity.


251-261: LGTM! Excellent method documentation.

The parse_response docstring is comprehensive and well-structured. It clearly documents all parameters and the return value, making it easy for developers to understand how to implement this abstract method.

airbyte_cdk/sources/abstract_source.py (6)

53-53: LGTM! Clear class documentation.

The class docstring effectively describes AbstractSource as the orchestrator for stream reading and state management, which accurately reflects its role in the CDK.


59-68: LGTM! Comprehensive connection validation documentation.

The check_connection docstring clearly documents the validation contract, including both the success and failure cases. The explanation of the return tuple is particularly helpful for implementers.


72-101: LGTM! Clear core method documentation.

The docstrings for streams, discover, check, and read are concise and accurately describe each method's purpose. The streams docstring appropriately documents the config parameter.


210-231: LGTM! Clear internal method documentation.

The docstrings for _serialize_exception, raise_exception_on_missing_stream, and _read_stream are well-written and accurately describe their purposes. The mention of "optional stream-specific error message" in _serialize_exception is a helpful detail.


288-311: LGTM! Clear message handling documentation.

The docstrings for _emit_queued_messages, _get_message, and message_repository clearly describe the message handling flow. The _get_message docstring helpfully mentions both the conversion and passthrough cases.


315-322: LGTM! Clear feature documentation with appropriate warning.

The stop_sync_on_stream_failure docstring is comprehensive and clearly explains the behavior. The WARNING about the in-development status is appropriate and helps developers understand the stability expectations. The explanation of default vs. override behavior is also helpful, wdyt?

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PyTest Results (Fast)

3 813 tests  ±0   3 801 ✅ ±0   6m 21s ⏱️ -10s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c88aeed. ± Comparison against base commit 80b7668.

@github-actions
Copy link
Copy Markdown

PyTest Results (Full)

3 816 tests  ±0   3 804 ✅ ±0   11m 2s ⏱️ +6s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit c88aeed. ± Comparison against base commit 80b7668.

@aaronsteers
Copy link
Copy Markdown
Member Author

Closing for now. No further action needed. I think this is fine as a test. I might consider next time doing all source classes in on PR and all stream classes in a separate pr, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants