feat(sdm-cli): add new args --manifest-path and --components-path to SDM CLI#556
Conversation
This reverts commit bb383d9.
5b5d011 to
bd452a8
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds two optional arguments (--manifest-path and --components-path) to the SDM CLI tool to support remote manifest injection and loading of custom components. Key changes include:
- Updates to the argument parser and tests to handle the new CLI options.
- Enhancements in the source creation logic to inject the manifest and register components.
- Minor type-hint adjustments in the connector-related modules.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| unit_tests/test_entrypoint.py | Updated tests to validate new CLI arguments. |
| unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py | Added tests for registering custom components from file. |
| unit_tests/source_declarative_manifest/test_source_declarative_remote_manifest.py | Added tests for parsing a manifest file. |
| unit_tests/source_declarative_manifest/conftest.py | Introduced helper functions for component verification and temporary file management. |
| airbyte_cdk/entrypoint.py | Updated argument parsers to include new optional arguments. |
| airbyte_cdk/connector.py | Modified type hints for improved clarity. |
| airbyte_cdk/cli/source_declarative_manifest/_run.py | Integrated new argument handling and adjusted parsing/injection logic for manifests and components. |
📝 WalkthroughWalkthroughThe changes introduce support for injecting external declarative manifests and Python component files into the Airbyte CLI via new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant _run.py
participant ManifestLoader
participant ComponentsLoader
User->>CLI: airbyte check/discover/read --manifest-path X --components-path Y
CLI->>_run.py: parse_args()
_run.py->>ManifestLoader: _parse_manifest_from_file(X)
ManifestLoader-->>_run.py: Manifest dict
_run.py->>ComponentsLoader: _register_components_from_file(Y)
ComponentsLoader-->>_run.py: Components registered
_run.py->>_run.py: Insert manifest into config
_run.py->>_run.py: Continue with processing (e.g., create source)
Would you like me to help draft some additional tests for edge cases in manifest loading or component registration, wdyt? Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
🔇 Additional comments (10)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
unit_tests/test_entrypoint.py (2)
120-145: Looks good – but would an explicit assertion on the new CLI defaults make the intent clearer?The new
components_pathandmanifest_pathkeys are included inexpected_argswithNonevalues, which is great.
Would you consider adding an explicit assertion that these keys are absent from the generated CLI string (because_as_arglistskips falsy values) to document that we rely on the parser defaults rather than on the presence of the flags, wdyt?
187-234: Patching the class method may hide real-world regressions – could we exercise the real parser instead?
test_parse_new_argsmonkey-patchesAirbyteEntrypoint.parse_argsto return a cannedNamespace. This verifies that the calling code accepts the new attributes, but it doesn’t validate that the actual parser wiring recognises--manifest-pathand--components-path.Would it be safer to call the real parser (as the earlier parametrised test does) and only patch if the parser isn’t fully initialised? That way the test would fail if someone removes the arguments from the CLI by accident, wdyt?
unit_tests/source_declarative_manifest/conftest.py (2)
60-69: Minor: consider adding a newline at EOF in the sample component stringSome linters complain when the embedded code lacks a trailing newline. Adding it keeps
black/flake8quiet, wdyt?
71-84: Why not assert the class path too?
verify_components_loadedchecks the function but not theSimpleClass.
Would addingassert hasattr(components, "SimpleClass")(and maybe instantiating it) give fuller confidence that class objects also survive the double-registration trick, wdyt?airbyte_cdk/cli/source_declarative_manifest/_run.py (3)
199-201: Component loader: should we validate file existence first?
_register_components_from_fileproceeds to create a spec even if the path does not exist.spec_from_file_locationwill then returnNone, but that is detected after the more cryptic call. Pre-emptively checkingcomponents_path.exists()and raising a clearFileNotFoundErrorcould improve UX, wdyt?
255-265: Return type can be tightened & error message clarified
- The function always returns a
dictor raises –| Noneisn’t needed.- The except block re-wraps every exception as
ValueError, which hides the original stack trace; chaining (raise … from error) would keep context.Would you consider:
def _parse_manifest_from_file(filepath: str) -> dict[str, Any]: try: with open(filepath, "r", encoding="utf-8") as manifest_file: manifest_content = yaml.safe_load(manifest_file) if not isinstance(manifest_content, dict): raise TypeError(f"Manifest must be a mapping, got {type(manifest_content).__name__}") return manifest_content except Exception as exc: raise ValueError(f"Failed to load manifest file {filepath}") from excwdyt?
267-289: Potential double-registration side-effectsRegistering the same module object under two names solves the dataclass issue – nice hack!
Have we considered cleaning these entries up after the sync finishes to avoid pollutingsys.modulesfor subsequent runs in the same interpreter (e.g. unit tests or local REPL sessions)? Maybe afinallyblock inrun()could delete them, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (7)
airbyte_cdk/cli/source_declarative_manifest/_run.py(6 hunks)airbyte_cdk/connector.py(2 hunks)airbyte_cdk/entrypoint.py(3 hunks)unit_tests/source_declarative_manifest/conftest.py(2 hunks)unit_tests/source_declarative_manifest/test_source_declarative_remote_manifest.py(2 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py(3 hunks)unit_tests/test_entrypoint.py(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (2)
airbyte_cdk/cli/source_declarative_manifest/_run.py (1)
_register_components_from_file(267-288)unit_tests/source_declarative_manifest/conftest.py (2)
verify_components_loaded(71-83)components_file(87-103)
🔇 Additional comments (11)
unit_tests/source_declarative_manifest/test_source_declarative_remote_manifest.py (3)
5-7: New imports look good!Clean addition of necessary imports for the new test function.
11-12: Updated imports to include new function under test.Good practice to include the specific function being tested in the imports.
36-41: Well-implemented test for manifest parsing functionality.The test effectively mocks file reading and validates the correct parsing behavior of
_parse_manifest_from_file. Great to see proper test coverage for the new functionality.unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (3)
19-21: Added import for new function being tested.Good addition of the function to be tested in the imports list.
37-40: Good refactoring to use shared test utilities.Moving common code to conftest.py improves maintainability. The shared utilities
SAMPLE_COMPONENTS_PY_TEXTandverify_components_loadedcan now be reused across tests.
289-295: Well-structured test for component loading functionality.The test properly checks the dynamic loading and registration of components by:
- Using the components_file fixture
- Calling the function under test
- Verifying the components were correctly registered
This complements the manifest loading tests and provides good coverage for the new CLI arguments.
airbyte_cdk/connector.py (2)
11-11: Added MutableMapping to imports.Good addition to support the type annotation change.
44-46: Changed config type from Mapping to MutableMapping.This change allows for mutation of the configuration, which is necessary for the new functionality that injects manifests and components. The change is appropriately minimal and focused.
Have you considered any backward compatibility implications? Since
MutableMappingis a subclass ofMapping, most existing code should continue to function properly, but it might be worth noting this change in documentation or release notes, wdyt?airbyte_cdk/entrypoint.py (3)
87-98: Added new CLI arguments for manifest and components paths.The implementation adds two optional arguments to the
checkcommand:
--manifest-path: For injecting a YAML manifest file--components-path: For specifying custom componentsThese align well with the PR objectives and provide proper help text.
110-121: Added manifest and components path arguments to discover command.Good consistency in adding the same arguments to the
discovercommand with identical help text.
141-152: Added manifest and components path arguments to read command.The arguments are consistently added to the
readcommand, maintaining parity with the other commands.I notice these arguments are not added to the
speccommand. According to the PR objectives, support for--manifest-pathin thespeccommand will be added in a future PR, which seems like a reasonable approach.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
airbyte_cdk/cli/source_declarative_manifest/_run.py (3)
94-99: IsAirbyteEntrypoint.parse_argsguaranteed to be a classmethod?The code is calling
AirbyteEntrypoint.parse_args(args)directly as if it were a class method. From reviewing other code in the codebase, this method seems to be typically called on an instance. If it's actually an instance method, this would raise a TypeError at runtime.Would it make sense to either:
- Instantiate before calling:
AirbyteEntrypoint(MockSource()).parse_args(args)- Or verify that
parse_argsis decorated with@classmethodin theAirbyteEntrypointclass?Just want to make sure this works reliably at runtime, wdyt?
#!/bin/bash # Verify if parse_args is a classmethod or instance method in AirbyteEntrypoint rg "def parse_args" -A 3 -B 2 --context | grep -A 7 "class AirbyteEntrypoint"
181-186: Consider handling empty manifests explicitly.The current code silently skips injection if
injected_manifestis falsy (which happens if the YAML file is empty), which could lead to confusing "manifest missing" errors later.Would it be more user-friendly to check specifically if
injected_manifest is Noneafter parsing and raise a clear error immediately if the manifest file is empty or invalid? This would help users identify the root cause more easily, wdyt?- if hasattr(parsed_args, "manifest_path") and parsed_args.manifest_path: - injected_manifest = _parse_manifest_from_file(parsed_args.manifest_path) - if injected_manifest: - config["__injected_declarative_manifest"] = injected_manifest + if hasattr(parsed_args, "manifest_path") and parsed_args.manifest_path: + injected_manifest = _parse_manifest_from_file(parsed_args.manifest_path) + if injected_manifest is None: + raise ValueError(f"Manifest file at {parsed_args.manifest_path} is empty or invalid") + config["__injected_declarative_manifest"] = injected_manifest
258-258: Good inclusion of explicit UTF-8 encoding.The explicit file encoding specification addresses the previous review comment and ensures consistent behavior across different platforms.
🧹 Nitpick comments (1)
airbyte_cdk/cli/source_declarative_manifest/_run.py (1)
255-265: Well-implemented manifest parsing with proper error handling.The
_parse_manifest_from_filefunction looks good with proper error handling, validation, and explicit UTF-8 encoding.One small suggestion: in the error message (line 264), you might want to include the specific validation error message when the manifest isn't a dictionary to help users debug more quickly.
- raise ValueError(f"Failed to load manifest file from {filepath}: {error}") + if isinstance(error, ValueError) and "Manifest must be a dictionary" in str(error): + raise ValueError(f"Failed to load manifest file from {filepath}: {error}") + else: + raise ValueError(f"Failed to load manifest file from {filepath}: {error}. Ensure the file contains valid YAML and is a dictionary.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
airbyte_cdk/cli/source_declarative_manifest/_run.py(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (6)
airbyte_cdk/cli/source_declarative_manifest/_run.py (6)
56-60: Consistent type change fromMappingtoMutableMapping.The change from
MappingtoMutableMappinglooks good. This aligns with the need to modify the config by injecting manifests.
175-179: Clear error handling for missing manifest.Good error message that explicitly mentions the alternative
--manifest-pathargument. This helps users understand how to provide the manifest correctly.
267-289: Dynamic component loading looks good.The implementation for dynamically loading Python components is well structured. The approach of registering the module under two different names (
componentsandsource_declarative_manifest.components) is a clever way to handle dataclass module resolution issues.The previous review comment noted that hardcoding "components" as the module name might lead to conflicts, but I see this was intentionally kept as-is due to the current contract requirements. That makes sense for consistency.
172-174: Consistent argument parsing approach.Using
AirbyteEntrypoint.parse_argsacross the codebase provides consistency in how CLI arguments are handled. This is a good refactoring.
199-201: Components path integration is clean and concise.The check for components_path and conditional registration is implemented in a clean and straightforward way. Good addition!
229-253: Clean refactoring of input parsing.The refactored
_parse_inputs_into_config_catalog_statefunction with proper typing looks good. It cleanly separates the concerns of argument parsing and input processing.
…rbytehq/airbyte-python-cdk into christo/sdm-cli-manifest-path
Aaron ("AJ") Steers (aaronsteers)
left a comment
There was a problem hiding this comment.
Looks great, Christo Grabowski (@ChristoGrab). Thanks for including tests. 👍
What
Adds two new optional args to the
source-declarative-manifestCLI tool:--manifest-path: expects a string representing the relative path to a local manifest to inject--components-path: expects a string representing the relative path to a source's custom components moduleThis allows a user to pass a valid remote (local from a dev perspective) manifest file to be serialized and injected into the provided config when running
check,discoverandread, as well as dynamically load any custom components from the designated components file.Both arguments are optional and only have an effect when a local YAML source is not detected by the CLI (ie, not a built connector image)
Note on
specSpec is handled slightly differently than the other interfaces, and is currently hard-coded to spit out the SDM spec when run outside of a built image. I intend to add
--manifest-pathto the spec command as well in a follow up PR.Issue to add this is here: https://github.com/airbytehq/airbyte-internal-issues/issues/13052
For now, this update allows devs to test
check,discoverandreadinterfaces on manifest-only connectors locally.How to test
From the root of the airbyte-python-cdk repo:
You can now pass the arguments in SDM commands, ie:
Summary by CodeRabbit
--manifest-pathand--components-path) for relevant commands.