Skip to content

feat: implement validators, validation strategies, and RemapField transformer#533

Merged
Patrick Nilan (pnilan) merged 26 commits intomainfrom
pnilan/feat/implement-validators
May 14, 2025
Merged

feat: implement validators, validation strategies, and RemapField transformer#533
Patrick Nilan (pnilan) merged 26 commits intomainfrom
pnilan/feat/implement-validators

Conversation

@pnilan
Copy link
Copy Markdown
Contributor

@pnilan Patrick Nilan (pnilan) commented May 6, 2025

What

How

  • New Validator base class
  • New DpathValidator and PredicateValidator validator classes
  • New ValidationStrategy base class
  • New ValidateAdheresToSchema validation strategy class
  • New ConfigTransformation base class
  • New RemapField config transformation class

Recommended Review Order

  • validator.py
  • dpath_validator.py and unit tests
  • predicate_validator.py and unit tests
  • validation_strategy.py
  • validate_adheres_to_schema and unit tests
  • config_transformation.py
  • remap_field.py and unit tests

Summary by CodeRabbit

  • New Features
    • Introduced transformations for configuration dictionaries: adding fields with nested and conditional support, removing fields with conditional logic and list-aware handling, and remapping field values based on static mappings.
    • Added a suite of validators for configuration data, including path-based extraction validation, predicate validation, and JSON schema adherence validation strategies.
  • Tests
    • Added comprehensive unit tests for all new transformation and validation features, covering typical, edge, and error cases to ensure robustness and correctness.

@github-actions github-actions bot added the enhancement New feature or request label May 6, 2025
@pnilan Patrick Nilan (pnilan) marked this pull request as ready for review May 8, 2025 00:46
Copilot AI review requested due to automatic review settings May 8, 2025 00:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds runtime components to support config migrations to low-code by introducing new validators, validation strategies, and a configuration transformer.

  • Introduces a Validator base class and implementations (ValidateAdheresToSchema, PredicateValidator, DpathValidator) with comprehensive unit tests.
  • Implements a ValidationStrategy base class and a schema-based validation strategy.
  • Adds a config transformation (RemapField) that remaps field values based on a provided mapping.

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py New unit tests covering JSON schema adherence validation.
unit_tests/sources/declarative/validators/test_predicate_validator.py New unit tests for PredicateValidator validating both success and error scenarios.
unit_tests/sources/declarative/validators/test_dpath_validator.py New unit tests for DpathValidator covering valid paths, error handling, and wildcard use.
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py New unit tests for RemapField ensuring correct field remapping behavior.
airbyte_cdk/sources/declarative/validators/*.py New validator implementations and base classes supporting unified validation strategies.
airbyte_cdk/sources/declarative/transformations/config_transformations/*.py New configuration transformation implementation for remapping field values.

Comment thread unit_tests/sources/declarative/validators/test_predicate_validator.py Outdated
Comment thread unit_tests/sources/declarative/validators/test_dpath_validator.py Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented May 8, 2025

📝 Walkthrough

Walkthrough

This change introduces a new suite of configuration transformation and validation utilities in the Airbyte CDK. It adds abstract base classes, concrete implementations for adding, removing, and remapping configuration fields, and a set of validators including schema adherence and dpath-based validation. Comprehensive unit tests for all new functionality are also included.

Changes

Files / Grouped Paths Change Summary
airbyte_cdk/sources/declarative/transformations/config_transformations/init.py New module initializer importing and exporting ConfigAddFields, ConfigRemapField, and ConfigRemoveFields.
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py Introduces abstract base class ConfigTransformation for config transformations with an abstract transform method.
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py Adds ConfigAddFields for adding fields to config dictionaries, supporting nested paths, value interpolation, and optional conditions. Includes data classes for field definitions.
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py Adds ConfigRemoveFields for removing fields from config dictionaries by path, with optional conditional removal and support for list elements.
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py Adds ConfigRemapField for remapping field values in config dictionaries based on a static mapping and dynamic field path resolution.
airbyte_cdk/sources/declarative/validators/init.py New module initializer importing and exporting Validator, DpathValidator, ValidationStrategy, ValidateAdheresToSchema, and PredicateValidator.
airbyte_cdk/sources/declarative/validators/validator.py Adds abstract base class Validator with an abstract validate method for input validation.
airbyte_cdk/sources/declarative/validators/validation_strategy.py Adds abstract base class ValidationStrategy with an abstract validate method for value validation.
airbyte_cdk/sources/declarative/validators/dpath_validator.py Adds DpathValidator for validating values extracted from nested dictionaries using dpath paths and a validation strategy.
airbyte_cdk/sources/declarative/validators/predicate_validator.py Adds PredicateValidator, a dataclass applying a validation strategy to a given value.
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py Adds ValidateAdheresToSchema, a validation strategy enforcing JSON schema adherence with error handling for invalid JSON and schema violations.
unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py Adds tests for ConfigAddFields, covering static and interpolated values, nested fields, overwrites, invalid definitions, and conditional field addition.
unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py Adds tests for ConfigRemoveFields, covering top-level and nested field removal, non-existent fields, and conditional removal.
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py Adds tests for ConfigRemapField, covering remapping, missing values, interpolated paths, empty maps, empty paths, and sequential remaps.
unit_tests/sources/declarative/validators/test_dpath_validator.py Adds tests for DpathValidator, covering successful validation, missing paths, validation failures, empty paths, empty input, and wildcard path support.
unit_tests/sources/declarative/validators/test_predicate_validator.py Adds tests for PredicateValidator, covering successful and failed validation, and handling of complex objects.
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py Adds tests for ValidateAdheresToSchema, covering valid/invalid data, missing fields, invalid types, invalid schemas, nullable fields, empty schemas, and JSON string input.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ConfigTransformation
    participant ConfigAddFields
    participant ConfigRemoveFields
    participant ConfigRemapField

    User->>ConfigAddFields: transform(config)
    ConfigAddFields->>ConfigAddFields: (Evaluate fields and condition)
    ConfigAddFields->>ConfigAddFields: (Add fields to config)

    User->>ConfigRemoveFields: transform(config)
    ConfigRemoveFields->>ConfigRemoveFields: (Evaluate condition)
    ConfigRemoveFields->>ConfigRemoveFields: (Remove fields by path)

    User->>ConfigRemapField: transform(config)
    ConfigRemapField->>ConfigRemapField: (Resolve field path)
    ConfigRemapField->>ConfigRemapField: (Remap value if in map)
Loading
sequenceDiagram
    participant User
    participant Validator
    participant DpathValidator
    participant PredicateValidator
    participant ValidateAdheresToSchema
    participant ValidationStrategy

    User->>DpathValidator: validate(input_data)
    DpathValidator->>DpathValidator: (Extract value by path)
    DpathValidator->>ValidationStrategy: validate(value)

    User->>PredicateValidator: validate()
    PredicateValidator->>ValidationStrategy: validate(value)

    User->>ValidateAdheresToSchema: validate(value)
    ValidateAdheresToSchema->>ValidateAdheresToSchema: (Parse value if JSON string)
    ValidateAdheresToSchema->>ValidateAdheresToSchema: (Validate against schema)
Loading

Suggested reviewers

  • darynaishchenko
  • maxi297
  • natikgadzhi

Would you like to consider involving additional reviewers familiar with the Airbyte CDK validation or transformation patterns, wdyt?

Tip

⚡️ Faster reviews with caching
  • CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.

Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2dc105 and 833d9e7.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (16)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)

9-23: Type hint mismatch between base class and implementation, wdyt?

The transform method's parameter type is Dict[str, Any], but the implementation in remap_field.py uses MutableMapping[str, Any]. Since the method is meant to modify the config in-place, MutableMapping seems more appropriate here. Would you consider changing this to be consistent with the implementation?

-    def transform(
-        self,
-        config: Dict[str, Any],
-    ) -> None:
+    def transform(
+        self,
+        config: MutableMapping[str, Any],
+    ) -> None:

Also, you might want to add the missing import:

-from typing import Any, Dict
+from typing import Any, Dict, MutableMapping
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)

23-34: Possible redundancy in field path conversion, wdyt?

The initialization logic first converts all paths to InterpolatedString using a list comprehension, then loops through them again to check if they're strings and convert them. Could we simplify this logic since we're already handling everything in the list comprehension?

    def __post_init__(self) -> None:
        if not self.field_path:
            raise Exception("field_path cannot be empty.")
-        self._field_path = [
-            InterpolatedString.create(path, parameters={}) for path in self.field_path
-        ]
-        for path_index in range(len(self.field_path)):
-            if isinstance(self.field_path[path_index], str):
-                self._field_path[path_index] = InterpolatedString.create(
-                    self.field_path[path_index], parameters={}
-                )
+        self._field_path = []
+        for path in self.field_path:
+            if isinstance(path, str):
+                self._field_path.append(InterpolatedString.create(path, parameters={}))
+            else:
+                self._field_path.append(path)

Also, consider using a more specific exception type with a clearer error message:

        if not self.field_path:
-            raise Exception("field_path cannot be empty.")
+            raise ValueError("field_path cannot be empty. Please provide at least one path component.")

35-60: Implementation seems robust!

The transform logic safely navigates the nested dictionary structure and only applies transformations when appropriate. I like the early returns when paths don't exist or aren't mappings, which makes the code cleaner and prevents errors.

One suggestion - consider adding a log statement when a remapping occurs, which could be helpful for debugging:

        if field_name in current and current[field_name] in self.map:
+            # Could add logging here if needed
+            # e.g., logger.debug(f"Remapping {field_name} from {current[field_name]} to {self.map[current[field_name]]}")
            current[field_name] = self.map[current[field_name]]
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)

1-27: Simple but effective validator implementation, wdyt about a small enhancement?

The PredicateValidator class looks well-structured with a clear single responsibility - it stores a value and delegates validation to a provided strategy. The implementation is clean and the docstrings are clear.

I notice that this class doesn't inherit from the Validator abstract base class, while it's placed in the validators package. This might be intentional based on the different method signatures (validate() vs validate(input_data: Any)), but it could be worth considering whether it should implement the common interface for consistency. Alternatively, renaming it to distinguish its different role might be helpful.

Would adding type validation for the strategy parameter in the constructor make sense to catch configuration errors earlier? Just a thought!

airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (2)

21-31: Implementation is concise and straightforward

Nice implementation of the schema validation strategy! It's clean and effectively wraps the jsonschema library while providing a good error message.

One question: should we also catch jsonschema.SchemaError separately? It happens when the schema itself is invalid (as shown in your test_given_invalid_schema_when_validate_then_raises_error test), and we might want to distinguish between invalid schema vs invalid data in the error message, wdyt?

try:
    jsonschema.validate(instance=value, schema=self.schema)
except jsonschema.ValidationError as e:
    raise ValueError(f"JSON schema validation error: {e.message}")
+except jsonschema.SchemaError as e:
+    raise ValueError(f"Invalid JSON schema: {e.message}")

21-31: Suggestion: Add support for JSON string inputs

I noticed that your test suite includes a test for validating JSON strings (test_given_json_string_when_validate_then_succeeds), but the implementation doesn't explicitly handle JSON string inputs. The test is passing, which suggests jsonschema might handle this automatically, but it could be more robust to add explicit handling, wdyt?

def validate(self, value: Any) -> None:
    """
    Validates the value against the JSON schema.

    :param value: The value to validate
    :raises ValueError: If the value does not adhere to the schema
    """
+    # If value is a JSON string, parse it first
+    import json
+    if isinstance(value, str):
+        try:
+            value = json.loads(value)
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON string: {e}")
+            
    try:
        jsonschema.validate(instance=value, schema=self.schema)
    except jsonschema.ValidationError as e:
        raise ValueError(f"JSON schema validation error: {e.message}")
airbyte_cdk/sources/declarative/validators/dpath_validator.py (3)

25-34: Simplify the path initialization

The current implementation of __post_init__ looks like it's doing the same conversion twice. You first create self._field_path by converting all elements, then you iterate through self.field_path again to convert any string elements. I think we can simplify this, wdyt?

def __post_init__(self) -> None:
-    self._field_path = [
-        InterpolatedString.create(path, parameters={}) for path in self.field_path
-    ]
-    for path_index in range(len(self.field_path)):
-        if isinstance(self.field_path[path_index], str):
-            self._field_path[path_index] = InterpolatedString.create(
-                self.field_path[path_index], parameters={}
-            )
+    self._field_path = [
+        InterpolatedString.create(path, parameters={}) for path in self.field_path
+    ]

47-65: Refactor to reduce code duplication

The validation logic for wildcard and non-wildcard paths contains duplicated code. We could extract the common validation logic into a helper method to make the code more DRY, wdyt?

if "*" in path:
    try:
        values = dpath.values(input_data, path)
    except KeyError as e:
        raise KeyError(f"Error validating path '{self.field_path}': {e}")
    for value in values:
-        try:
-            self.strategy.validate(value)
-        except Exception as e:
-            raise ValueError(f"Error validating value '{value}': {e}")
+        self._validate_value(value)
else:
    try:
        value = dpath.get(input_data, path)
    except KeyError as e:
        raise KeyError(f"Error validating path '{self.field_path}': {e}")
-    try:
-        self.strategy.validate(value)
-    except Exception as e:
-        raise ValueError(f"Error validating value '{value}': {e}")
+    self._validate_value(value)

+def _validate_value(self, value: Any) -> None:
+    try:
+        self.strategy.validate(value)
+    except Exception as e:
+        raise ValueError(f"Error validating value '{value}': {e}")

35-42: Consider exposing a config parameter for path interpolation

Currently, you're passing an empty dictionary to the eval method. If the field paths contain Jinja templates, they would need context to be properly evaluated. Would it make sense to accept a config parameter in the validate method and pass that to the eval calls, wdyt?

-def validate(self, input_data: dict[str, Any]) -> None:
+def validate(self, input_data: dict[str, Any], config: dict[str, Any] = None) -> None:
    """
    Extracts the value at the specified path and applies the validation strategy.

    :param input_data: Dictionary containing the data to validate
+   :param config: Optional configuration for interpolating path elements
    :raises ValueError: If the path doesn't exist or validation fails
    """
-    path = [path.eval({}) for path in self._field_path]
+    config = config or {}
+    path = [path.eval(config) for path in self._field_path]
unit_tests/sources/declarative/validators/test_predicate_validator.py (1)

1-7: Add missing copyright header

The copyright header is missing from this file. All other files in this PR have it, wdyt?

+#
+# Copyright (c) 2025 Airbyte, Inc., all rights reserved.
+#
+
from unittest import TestCase

import pytest
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (2)

128-130: Test could be more thorough for JSON string inputs

The test for validating JSON strings doesn't make assertions about the actual validation - it just checks that no exception is raised. Consider testing both valid and invalid JSON strings to ensure the validation is working as expected, wdyt?

def test_given_json_string_when_validate_then_succeeds(self):
    schema = {
        "type": "object",
        "properties": {
            "id": {"type": "integer"},
            "name": {"type": "string"},
        },
    }

    validator = ValidateAdheresToSchema(schema=schema)

    validator.validate('{"id": 1, "name": "John Doe"}')
+
+    # Test invalid JSON string
+    with pytest.raises(ValueError) as exc_info:
+        validator.validate('{"id": "not-an-integer", "name": "John Doe"}')
+    
+    assert "integer" in str(exc_info.value)
+
+    # Test malformed JSON string
+    with pytest.raises(ValueError) as exc_info:
+        validator.validate('{bad json}')
+    
+    assert "Invalid JSON" in str(exc_info.value)

1-7: Add missing copyright header

The copyright header is missing from this file. All other files in this PR have it, wdyt?

+#
+# Copyright (c) 2025 Airbyte, Inc., all rights reserved.
+#
+
from unittest import TestCase

import jsonschema
import pytest
unit_tests/sources/declarative/validators/test_dpath_validator.py (4)

32-33: Could the assertion be more specific?

The assertion assert strategy.validated_value only checks that the value is truthy. Would it be more precise to assert the exact expected value: assert strategy.validated_value == "test@example.com"? wdyt?


91-93: Redundant assertions?

There are two assertions checking the same thing, once with pytest's assert and once with unittest's assertIn. Would it be cleaner to stick with one assertion style for consistency? wdyt?

assert strategy.validate_called
assert strategy.validated_value in ["user1@example.com", "user2@example.com"]
-self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])

Or if you prefer unittest style:

assert strategy.validate_called
-assert strategy.validated_value in ["user1@example.com", "user2@example.com"]
self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])

23-94: Consider adding more wildcard test cases

The tests are comprehensive, but would it be valuable to add a test with multiple wildcards in the path, like ["users", "*", "addresses", "*", "city"]? This would verify that nested wildcards work correctly. Additionally, the current wildcard test only verifies that the strategy was called with one of the expected values, but not that it was called for all matching paths. wdyt?


23-94: Consider adding docstrings to test methods

The tests are well-structured and have descriptive names, but would adding docstrings to explain the purpose and expectations of each test help future developers understand the test cases better? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a81b3b and dad6100.

📒 Files selected for processing (13)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/dpath_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/predicate_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validation_strategy.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validator.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_dpath_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_predicate_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (10)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-65)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • validate (21-31)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
unit_tests/sources/declarative/validators/test_predicate_validator.py (3)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • PredicateValidator (12-26)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
unit_tests/sources/declarative/validators/test_dpath_validator.py (2)
  • MockValidationStrategy (9-20)
  • validate (16-20)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • RemapField (15-60)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • transform (35-60)
airbyte_cdk/sources/declarative/validators/__init__.py (5)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • DpathValidator (16-65)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • PredicateValidator (12-26)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • ValidateAdheresToSchema (14-31)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • Validator (9-18)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (3)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
airbyte_cdk/sources/declarative/validators/validator.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-65)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • validate (21-31)
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
  • RemapField (15-60)
  • transform (35-60)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (4)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/validator.py (2)
  • Validator (9-18)
  • validate (11-18)
unit_tests/sources/declarative/validators/test_dpath_validator.py (1)
  • validate (16-20)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (15)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)

5-7: Imports and exports look good!

Clean organization of imports and exports using __all__ for explicitly defining the module's public API. This makes the code more maintainable and provides clear documentation of what's available from this package.

airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)

14-22: Class structure looks good!

Clean dataclass implementation with well-defined attributes and clear docstring. The typing is precise and allows for both interpolated strings and raw strings in the field path.

unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (7)

11-36: Well-structured test for valid inputs!

Great test with thorough assertions including verification that the original config remains unchanged. The use of deepcopy ensures test isolation and makes the test more reliable.


37-51: Good edge case handling!

I like that you're testing the case where the value isn't in the map and verifying that the configuration remains unchanged.


52-67: Good test for non-existent paths!

Testing that the configuration remains unchanged when the path doesn't exist is important for ensuring the code behaves correctly in edge cases.


68-82: Great test for interpolated paths!

Testing the interpolation functionality demonstrates that dynamic paths work correctly.


83-92: Good edge case for empty map!

Testing with an empty map confirms that no changes are made when there's nothing to remap.


93-96: Good validation test!

Testing that an empty field path raises an exception validates the input checking in the constructor.


97-112: Excellent test for sequential transformations!

This test confirms that multiple transformations can be applied in sequence, which is important for validating how the transformations fit into a larger pipeline.

airbyte_cdk/sources/declarative/validators/validator.py (1)

1-19: Well-designed abstract base class, looks good!

This Validator ABC provides a clean foundation for the validation framework. The docstring is clear and the abstract method defines a consistent contract for concrete implementations to follow.

The interface is simple yet powerful - accepting any input data and raising ValueError for validation failures. This will make it straightforward to create various validation implementations.

airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)

1-23: Clear strategy interface design, nicely done!

The ValidationStrategy abstract base class creates a clean separation of concerns between validators (which extract/provide data) and strategies (which perform actual validation logic). This design pattern will make it easier to mix and match different validation approaches.

The method signature and docstring provide a clear contract for implementations. I particularly like that this approach allows strategies to be reused across different validators, which promotes code reuse.

airbyte_cdk/sources/declarative/validators/__init__.py (1)

1-20: Well-organized package exports

The __init__.py file appropriately collects and exports all the validator classes, providing a clean public API for the package. This makes it easy for consumers to import the validators they need directly from the package.

The exports include both abstract base classes (Validator and ValidationStrategy) and concrete implementations, giving users access to the full validation framework.

unit_tests/sources/declarative/validators/test_dpath_validator.py (3)

1-7: Clean imports with good organization!

The imports are well organized, following the standard practice of stdlib, third-party, and local imports. Nice job organizing them with proper spacing!


9-21: Great mock implementation!

The MockValidationStrategy is well-designed with configurable behavior and tracking mechanisms. This makes testing much easier and provides good verification capabilities. Love the clear naming and simple implementation.


61-77: Good edge case testing!

Nice job testing the empty path list and empty input data scenarios. These edge cases are important to verify the expected behavior of the validator.

Comment thread unit_tests/sources/declarative/validators/test_predicate_validator.py Outdated
Comment thread unit_tests/sources/declarative/validators/test_dpath_validator.py Outdated
Comment thread unit_tests/sources/declarative/validators/test_dpath_validator.py Outdated
Comment thread airbyte_cdk/sources/declarative/validators/dpath_validator.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)

36-40: Good error handling with descriptive error messages.

The exception handling here is clean and the error message includes the original validation error, which will be helpful for debugging.

One small thing - have you considered adding a check for empty schemas? An empty schema will validate anything, which might not be the intended behavior in some cases. wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dad6100 and 565b709.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (4)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-65)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py

[error] 5-11: Ruff: Import block is un-sorted or un-formatted. Organize imports. Fixable with the --fix option.

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (2)

14-21: The class definition looks clean and well-documented.

The @dataclass approach works well here, and the schema type is properly defined. The class documentation clearly explains its purpose.


22-35: Nice handling of string inputs by attempting JSON parsing.

I like how you handle the case where a string input might contain JSON data. The error message is clear and helpful.

One question though - should we consider validating the schema itself before using it? Sometimes invalid schemas can lead to unexpected validation results. wdyt?

Comment thread airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py Outdated
Copy link
Copy Markdown
Contributor

@brianjlai Brian Lai (brianjlai) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and everything looks good. We talked about adding the ConfigTransformation implementations for AddFields and RemoveFields. But the rest we can add as needed during migrations. And the comment about the mapping. Will approve so I don't block you, but happy to give the final look if you want to once you add the other transformations

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (1)

1-101: ⚠️ Potential issue

Fix import file mismatch error

The pipeline is failing with an import file mismatch error, which suggests there might be cached .pyc files or a naming conflict with another test file.

#!/bin/bash
# List all test files with similar names to identify potential conflicts
find unit_tests -name "test_remove_fields*.py"

# Check for __pycache__ directories that might contain conflicting .pyc files
find unit_tests -name "__pycache__" -type d

Have you considered cleaning the pycache directories before running the tests? Alternatively, you could rename this test file to something more unique, like test_remove_fields_transformation.py, wdyt?

🧰 Tools
🪛 GitHub Actions: PyTest Matrix

[error] 1-1: Import file mismatch error: imported module 'test_remove_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.

unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (1)

1-98: ⚠️ Potential issue

Fix import file mismatch error

Similar to the other test file, there's a pipeline failure due to an import file mismatch error.

#!/bin/bash
# List all test files with similar names to identify potential conflicts
find unit_tests -name "test_add_fields*.py"

# Check for __pycache__ directories that might contain conflicting .pyc files
find unit_tests -name "__pycache__" -type d

Could we rename this test file to something more unique like test_add_fields_transformation.py to avoid conflicts? Or maybe cleaning the pycache directories before running tests would help, wdyt?

🧰 Tools
🪛 GitHub Actions: Pytest (Fast)

[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file to collect. Hint: remove pycache / .pyc files and/or use a unique basename for your test file modules.

🪛 GitHub Actions: PyTest Matrix

[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.

🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)

47-49: Consider validating field_pointers

Have you considered adding validation to ensure that field_pointers is not empty? This could prevent unexpected behavior if an empty list is provided.

Maybe something like this would be helpful, wdyt?

def __post_init__(self) -> None:
+    if not self.field_pointers:
+        logger.warning("RemoveFields instantiated with empty field_pointers list. No fields will be removed.")
    self._filter_interpolator = InterpolatedBoolean(condition=self.condition, parameters={})
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (4)

1-4: License header looks good.

The copyright year is set to 2025, which is in the future. Is this intentional or should it be the current year instead? wdyt?


67-72: Class attributes look well defined.

Good use of field() for internal state management. Would it make sense to add documentation for the condition parameter since it's an important part of the functionality? wdyt?


73-103: Initialization logic effectively validates and processes input.

The validation of field paths ensures that empty paths are rejected, which is good defensive programming. One potential improvement: would it be valuable to add validation for the path format beyond just checking length? For instance, ensuring the path elements are all strings or valid dictionary keys?


104-121: Transform implementation looks solid but could use a safeguard.

The transform method correctly implements the required behavior, but there's no protection against overwriting existing fields. Would it be helpful to add an optional parameter to control whether overwriting existing fields is allowed? This could prevent accidental data loss in some scenarios.

Also, have you considered adding error handling around the dpath.new call to provide more descriptive error messages if something goes wrong? This would make debugging easier for users.

@dataclass
class AddFields(ConfigTransformation):
    # ... existing attributes ...
+   allow_overwrite: bool = True  # Controls whether existing fields can be overwritten
    
    # ... rest of the class ...
    
    def transform(
        self,
        config: MutableMapping[str, Any],
    ) -> None:
        # ... existing code ...
        
        for parsed_field in self._parsed_fields:
            valid_types = (parsed_field.value_type,) if parsed_field.value_type else None
            value = parsed_field.value.eval(config, valid_types=valid_types)
            is_empty_condition = not self.condition
            if is_empty_condition or self._filter_interpolator.eval(
                config, value=value, path=parsed_field.path
            ):
+               # Check if field exists and respect allow_overwrite setting
+               try:
+                   if not self.allow_overwrite and dpath.util.search(config, parsed_field.path, yielded=True):
+                       continue  # Skip this field as it exists and overwriting is not allowed
                    dpath.new(config, parsed_field.path, value)
+               except Exception as e:
+                   raise ValueError(f"Failed to add field at path {parsed_field.path}: {str(e)}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4727b28 and 711384c.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (3)
  • AddedFieldDefinition (19-24)
  • AddFields (37-120)
  • transform (104-120)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (3)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1)
  • AddFields (37-120)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • RemapField (15-60)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)
  • RemoveFields (21-72)
unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
  • RemoveFields (21-72)
  • transform (53-72)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_boolean.py (1)
  • InterpolatedBoolean (29-66)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
  • ConfigTransformation (9-23)
🪛 GitHub Actions: Pytest (Fast)
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py

[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file to collect. Hint: remove pycache / .pyc files and/or use a unique basename for your test file modules.

🪛 GitHub Actions: PyTest Matrix
unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py

[error] 1-1: Import file mismatch error: imported module 'test_add_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.

unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py

[error] 1-1: Import file mismatch error: imported module 'test_remove_fields' points to a different file path than the test file collected. Hint: remove pycache / .pyc files or use unique basenames for test modules.

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (21)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)

5-9: Export classes look good!

The module exports three transformation classes that handle different aspects of config manipulation - adding fields, remapping fields, and removing fields. This provides a comprehensive API for configuration transformations.

unit_tests/sources/declarative/transformations/config_transformations/test_remove_fields.py (5)

7-10: Clean import structure

The import looks good, directly importing the class under test.


12-30: Test case covers basic field removal correctly

This test verifies that top-level fields are properly removed while preserving unrelated fields. Good assertions to check both presence and absence of fields.


33-62: Comprehensive test for nested field removal

Good test coverage for removing nested fields at different levels of the hierarchy while preserving unrelated fields. The explicit assertion of the final config structure is helpful.


65-81: Great test for handling non-existent fields

This test verifies that the transformation gracefully handles attempts to remove fields that don't exist at various nesting levels. Nice use of copy() to verify the config remains unchanged.


84-100: Good conditional logic test

This test effectively verifies that the transformation only removes fields when the specified condition is met. The test cases cover both true and false conditions.

unit_tests/sources/declarative/transformations/config_transformations/test_add_fields.py (7)

7-10: Clean imports

Good job importing both the class under test and its supporting data class.


13-27: Test for static values looks good

This test effectively verifies that static values are correctly added to an empty config. Good use of assertions to check the resulting structure.


30-40: Nested fields test is well-structured

This test verifies that nested fields are correctly created, including all parent objects. The assertion clearly validates the expected nested structure.


43-58: Thorough interpolation test

Good test for interpolated values that covers both reference to existing fields and expression evaluation. Nice verification that the original field remains unchanged.


61-66: Good validation testing

This test correctly verifies that invalid inputs (empty path or non-string value) raise appropriate exceptions. Using pytest.raises is the right approach here.


69-79: Field overwrite test is clear

This test confirms that existing fields are overwritten rather than preserved, which is important behavior to document and verify.


82-97: Good conditional test

This test effectively verifies the conditional logic for adding fields, testing both when the condition is true and when it's false.

airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (4)

4-17: Good imports and logger setup

The imports look appropriate for the functionality being implemented. Using a dedicated logger for the airbyte namespace is a good practice.


20-49: Well-documented class definition

The class documentation is thorough, explaining the behavior, edge cases (especially around list handling), and providing usage examples. The YAML example is particularly helpful.


50-52: Good initialization

The __post_init__ method correctly initializes the InterpolatedBoolean for condition evaluation.


53-72: Robust transform implementation

The transform method has good error handling, gracefully handling missing paths and properly logging other exceptions. The condition check at the beginning is efficient, avoiding unnecessary work when conditions aren't met.

airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (4)

5-16: Import structure looks clean and well-organized.

Good job organizing imports by category and using absolute imports. The types are properly imported as well.


18-25: Well-structured dataclass for field definitions.

Nice use of the frozen=True attribute for immutability. The type annotations are clear and the documentation explains the purpose well.


27-34: Internal parsed field representation looks good.

I like the separation between the external API (AddedFieldDefinition) and the internal representation (ParsedAddFieldDefinition). This provides clean separation of concerns.


36-66: Documentation for AddFields is thorough and helpful.

Excellent docstring with clear examples demonstrating different use cases. The YAML examples are particularly helpful for users.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)

20-21: Consider supporting dynamic values through InterpolatedMapping?

The static mapping might be limiting for scenarios where we want to interpolate values dynamically. For example, when remapping a field to a value that incorporates parts of the existing config like {{ config.url }}/v1/etc. A previous reviewer noted this might be useful for cases like the Amazon Seller Partner connector, wdyt?

🧹 Nitpick comments (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)

5-9: Concise and well-structured exports!

The import structure and __all__ declaration look good, providing a clean public API for the config transformations package.

Have you considered also exporting the base ConfigTransformation class for users who might want to create their own custom transformations? This would make the inheritance hierarchy more discoverable, wdyt?

unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)

68-82: Nice test for interpolated paths

The test effectively demonstrates the dynamic path interpolation capability, which is a powerful feature. Have you considered adding a test case for when the interpolated value doesn't exist or evaluates to something unexpected (like a non-string)? This would ensure robust error handling, wdyt?

unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (1)

1-99: Consider adding tests for array element removal

The tests are comprehensive for object field removal, but I noticed that according to the implementation in remove_fields.py, the transformation also supports removing elements from arrays by setting them to None. Would it be valuable to add a test case that verifies this behavior specifically, wdyt?

For example:

def test_given_array_element_field_pointer_element_is_set_to_none():
    transformation = RemoveFields(
        field_pointers=[["array", 1]]  # Remove second element
    )
    config = {"array": ["keep", "remove", "also_keep"]}
    
    transformation.transform(config)
    
    assert config["array"] == ["keep", None, "also_keep"]
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)

23-34: Simplified initialization logic possible?

The post-init logic seems to be doing redundant work. You first create a new field_path list with all elements converted to InterpolatedString, then loop through again to check and convert any remaining strings. Perhaps this could be simplified to a single loop?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = []
+ for path in self.field_path:
+     if isinstance(path, str):
+         self._field_path.append(InterpolatedString.create(path, parameters={}))
+     else:
+         self._field_path.append(path)

48-60: Consider adding debug or error logging

The transform method silently returns in several cases where the path doesn't exist or intermediate values aren't mappings. Adding debug logging might help with troubleshooting when the transformation doesn't behave as expected. Also, you might want to provide a return value indicating if a transformation was actually applied?

airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1)

62-66: Consider adding debug logging

The error handling catches PathNotFound but doesn't log any information. For debugging purposes, it might be helpful to add debug logs when a path is not found. This could help identify potential configuration issues without causing runtime failures.

try:
    dpath.delete(config, pointer)
except dpath.exceptions.PathNotFound:
+   # Using logging here instead of print for proper integration with the framework's logging system
+   import logging
+   logging.debug(f"Path {pointer} not found in config during removal")
    pass
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (3)

18-25: Well-structured field definition dataclass

The frozen dataclass design is a good choice for immutability of field definitions. One small suggestion - would it make sense to add a docstring explaining what value_type is for and how it's used? It's not immediately obvious from the context.


73-102: Post-initialization logic could be simplified

The logic for processing field values is a bit verbose with nested if-else statements. Consider refactoring to reduce nesting:

def __post_init__(self) -> None:
    self._filter_interpolator = InterpolatedBoolean(condition=self.condition, parameters={})

    for add_field in self.fields:
        if len(add_field.path) < 1:
            raise ValueError(
                f"Expected a non-zero-length path for the AddFields transformation {add_field}"
            )

-       if not isinstance(add_field.value, InterpolatedString):
-           if not isinstance(add_field.value, str):
-               raise ValueError(
-                   f"Expected a string value for the AddFields transformation: {add_field}"
-               )
-           else:
-               self._parsed_fields.append(
-                   ParsedAddFieldDefinition(
-                       add_field.path,
-                       InterpolatedString.create(add_field.value, parameters={}),
-                       value_type=add_field.value_type,
-                   )
-               )
-       else:
-           self._parsed_fields.append(
-               ParsedAddFieldDefinition(
-                   add_field.path,
-                   add_field.value,
-                   value_type=add_field.value_type,
-               )
-           )
+       # Handle the value based on its type
+       if isinstance(add_field.value, InterpolatedString):
+           value = add_field.value
+       elif isinstance(add_field.value, str):
+           value = InterpolatedString.create(add_field.value, parameters={})
+       else:
+           raise ValueError(
+               f"Expected a string value for the AddFields transformation: {add_field}"
+           )
+           
+       self._parsed_fields.append(
+           ParsedAddFieldDefinition(
+               add_field.path,
+               value,
+               value_type=add_field.value_type,
+           )
+       )

117-119: Consider field-specific conditions

Currently, the same condition is evaluated for all fields. Would it make sense to support field-specific conditions in the AddedFieldDefinition class? This would provide more granular control over when each field is added.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 711384c and c64e588.

📒 Files selected for processing (7)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
  • ConfigRemoveFields (18-66)
  • transform (50-66)
airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_boolean.py (1)
  • InterpolatedBoolean (29-66)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
  • ConfigTransformation (9-23)
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
🔇 Additional comments (23)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (7)

6-8: Good import with convenient alias

The alias makes the code more readable by shortening the class name in the tests. Nice approach!


11-36: Thorough test for field remapping behavior

The test comprehensively verifies that the target field is remapped while other fields remain unchanged. Great attention to detail with the deepcopy to verify against the original state!


37-51: Proper handling of values not in mapping

Good test case ensuring that fields with values not present in the mapping dictionary remain unchanged. This verifies the transformation doesn't modify fields unnecessarily.


52-67: Robust handling of missing field paths

This test properly verifies that the configuration remains unchanged when the field path doesn't exist in the config. This ensures the transformation is non-destructive when dealing with unexpected inputs.


83-92: Empty map handling verified

Good test ensuring that an empty mapping dictionary doesn't modify the configuration. This is important for defensive programming.


93-96: Appropriate exception validation for empty field path

This test correctly verifies that an exception is raised when an empty field path is provided, preventing runtime errors.


97-113: Comprehensive sequence test

Great test demonstrating multiple transformations applied sequentially on different fields. This verifies that transformations compose well without interfering with each other.

unit_tests/sources/declarative/transformations/config_transformations/test_config_remove_fields.py (4)

10-29: Well-structured test for top-level field removal

The test effectively verifies that specified fields are removed while unrelated fields remain intact. Good explicit assertion of the final state with assert config == {...}.


31-61: Thorough test for nested field removal

This test comprehensively checks that nested fields are properly removed while maintaining the structure of the configuration. The multiple assertions provide good clarity about the expected state.


63-80: Good error tolerance verification

This test confirms that attempting to remove non-existent fields doesn't throw errors and leaves the configuration unchanged. This verifies the robustness of the transformation when handling unexpected inputs.


82-99: Effective conditional removal test

The test appropriately verifies that fields are only removed when the specified condition evaluates to true. This demonstrates the conditional application feature works correctly.

unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py (7)

7-12: Clean imports with useful alias

Good practice using an alias for the transformation class to make the tests more readable. The separate import for AddedFieldDefinition is also clear.


15-30: Clear test for adding multiple static fields

This test effectively verifies that multiple static fields can be added at the top level. The explicit assertion of the entire expected config is a good practice.


32-43: Good test for nested field creation

This test verifies that nested dictionaries are properly created when adding fields at deep paths. This is an important feature for working with complex configurations.


45-61: Comprehensive interpolation test

Great test that verifies both string interpolation (referencing existing config values) and expression evaluation. It demonstrates the flexibility of the transformation for dynamic value generation.


63-69: Proper validation tests

Good tests ensuring that invalid inputs (empty path or non-string value) raise appropriate exceptions. This helps prevent runtime errors from invalid configurations.


71-82: Overwrite behavior verification

This test confirms that existing fields are properly overwritten when a new value is added at the same path. This is important for understanding the precedence behavior of the transformation.


84-100: Effective conditional addition test

This test correctly verifies that fields are only added when the specified condition evaluates to true. This demonstrates the conditional application feature works properly.

airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)

35-61: Ensure map values are type-compatible with field values

There's currently no validation that the mapped value has the same type as the original field value, which could lead to unexpected behavior downstream. Would it make sense to add optional type checking or at least a warning if types differ significantly?

airbyte_cdk/sources/declarative/transformations/config_transformations/remove_fields.py (2)

44-46: Should empty field_pointers be validated?

There's validation for empty field paths in the AddFields transformation, but no validation that field_pointers itself is not empty. Is this intentional? Having an empty list of field_pointers would make this transformation a no-op.


1-67: Overall implementation looks clean and well-documented!

The use of dpath for handling nested fields is a good choice, and the conditional application of the transformation is well implemented. The detailed docstring explaining special handling of list elements is particularly helpful.

airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (2)

114-120: Consider handling overwrite behavior explicitly

The current implementation uses dpath.new without specifying how to handle the case where the field already exists. You might want to provide control over whether existing fields should be overwritten or preserved. wdyt?

if is_empty_condition or self._filter_interpolator.eval(
    config, value=value, path=parsed_field.path
):
-   dpath.new(config, parsed_field.path, value)
+   # Allow explicit control over overwriting behavior
+   dpath.new(config, parsed_field.path, value, overwrite=True)  # or maybe make this configurable

1-121: Well-structured implementation with good documentation

The overall design is clean with appropriate use of helper classes and good documentation including YAML examples. The type validation for values is a nice touch for ensuring data integrity.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)

1-129: 💡 Verification agent

🧩 Analysis chain

Format the file to fix linter errors.

The pipeline failures indicate a Ruff format issue. Running ruff format on this file should resolve the formatting issues.


🏁 Script executed:

#!/bin/bash
# Check which formatting issues Ruff would fix
ruff format --diff unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py

Length of output: 2159


Format test_config_remap_field.py with ruff to fix trailing commas, blank lines, and line wrapping

To resolve the linter errors, you can run:

ruff format -i unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py

Or manually apply the following changes:

• Add a trailing comma after config=config
• Remove the extra blank line before mapping = {...}
• Wrap the two long RemapField(...).transform(...) calls in parentheses

--- a/unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py
+++ b/unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py
@@ -20,7 +20,7 @@
         remap_transform = RemapField(
             field_path=["authorization", "auth_type"],
             map={"client_credentials": "oauth2", "api_key": "key_auth"},
-            config=config
+            config=config,
         )
@@ -113,7 +113,6 @@
     def test_amazon_seller_partner_marketplace_remap_with_interpolated_mapping(self):
-
         mapping = {
             "endpoint": {
                 "ES": "{{ 'https://sellingpartnerapi' if config.environment == 'production' else 'https://sandbox.sellingpartnerapi' }}-eu.amazon.com",
@@ -121,8 +120,12 @@
         production_config = {"environment": "production", "marketplace": "ES"}
-        RemapField(field_path=["marketplace"], map=mapping["endpoint"], config=sandbox_config).transform(sandbox_config)
-        RemapField(field_path=["marketplace"], map=mapping["endpoint"], config=production_config).transform(production_config)
+        RemapField(
+            field_path=["marketplace"], map=mapping["endpoint"], config=sandbox_config
+        ).transform(sandbox_config)
+        RemapField(
+            field_path=["marketplace"], map=mapping["endpoint"], config=production_config
+        ).transform(production_config)

Wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

🧹 Nitpick comments (11)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (5)

1-3: Check the copyright year.

The copyright notice lists 2025, which is a future date. Should this be updated to the current year?

- # Copyright (c) 2025 Airbyte, Inc., all rights reserved.
+ # Copyright (c) 2023 Airbyte, Inc., all rights reserved.

wdyt?


73-103: Consider handling unexpected value types more gracefully.

The validation logic correctly handles InterpolatedString and str types, but what happens with other types? Currently, it would raise a ValueError, but the error message only mentions expecting a string value, which might be confusing.

        if not isinstance(add_field.value, InterpolatedString):
            if not isinstance(add_field.value, str):
                raise ValueError(
-                    f"Expected a string value for the AddFields transformation: {add_field}"
+                    f"Expected a string or InterpolatedString value for the AddFields transformation, but got {type(add_field.value).__name__}: {add_field}"
                )

wdyt?


104-120: Document overwriting behavior and consider exception handling.

The transform method looks solid! A few things to consider:

  1. dpath.new will overwrite existing values at the specified path. This behavior isn't explicitly documented.
  2. There's no exception handling for the dpath.new call, which could fail if there are issues with the path or value.

Consider updating the method docstring to mention the overwrite behavior and adding try-except handling:

    def transform(
        self,
        config: MutableMapping[str, Any],
    ) -> None:
        """
        Transforms a config by adding fields based on the provided field definitions.
+       If a field already exists at the specified path, it will be overwritten.

        :param config: The user-provided configuration to be transformed
        """
        for parsed_field in self._parsed_fields:
            valid_types = (parsed_field.value_type,) if parsed_field.value_type else None
            value = parsed_field.value.eval(config, valid_types=valid_types)
            if not self.condition or self._filter_interpolator.eval(
                config, value=value, path=parsed_field.path
            ):
+               try:
                    dpath.new(config, parsed_field.path, value)
+               except Exception as e:
+                   raise ValueError(f"Failed to add field at path {parsed_field.path}: {str(e)}")

wdyt?


67-71: Document the condition parameter and its context variables.

The condition parameter enables conditional field addition, which is powerful! However, it would be helpful to document that value and path are available as variables in the condition evaluation, as seen in the transform method.

Consider enhancing the docstring to clarify this:

    fields: List[AddedFieldDefinition]
+   # A boolean expression that determines whether a field should be added.
+   # The expression can reference the config being transformed, as well as
+   # `value` (the evaluated field value) and `path` (the field path).
    condition: str = ""
    _parsed_fields: List[ParsedAddFieldDefinition] = field(
        init=False, repr=False, default_factory=list
    )

wdyt?


36-66: Excellent documentation with helpful YAML examples!

The class documentation is thorough and the YAML examples make it easy to understand how to use this transformation. Consider also adding an example of using the conditional feature since it's a powerful capability.

        # by supplying any valid Jinja template directive or expression
        - path: ["two_times_two"]
          value: "{{ 2 * 2 }}"
+
+       # conditional field addition
+       - path: ["optional_field"]
+         value: "{{ config.source_value }}"
+         condition: "{{ value != 'skip_me' }}"

wdyt?

airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (5)

2-2: Copyright year appears to be set to the future.

The copyright year is set to 2025, but we're currently in 2024. Should we update the year to the current one? wdyt?

- # Copyright (c) 2025 Airbyte, Inc., all rights reserved.
+ # Copyright (c) 2024 Airbyte, Inc., all rights reserved.

19-20: Should docstring mention interpolated mapping support?

The docstring describes this as a "transformation that remaps a field's value based on a static map", but I can see from the implementation and tests that it also supports interpolated values in the map. Might be helpful to clarify this in the docstring to make the capability more discoverable. wdyt?

- Transformation that remaps a field's value to another value based on a static map.
+ Transformation that remaps a field's value to another value based on a map.
+ The map values can include interpolated strings that reference config values.

29-36: Simplify field_path conversion logic.

The current implementation creates _field_path and then potentially overwrites elements in a separate loop. Perhaps we could simplify this to a single operation? wdyt?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = [
+     path if isinstance(path, InterpolatedString) else InterpolatedString.create(path, parameters={})
+     for path in self.field_path
+ ]

27-28: Consider using a more specific exception type.

The code raises a generic Exception when the field_path is empty. Using a more specific exception type like ValueError could make error handling more consistent across the codebase. wdyt?

- raise Exception("field_path cannot be empty.")
+ raise ValueError("field_path cannot be empty.")

44-47: Docstring should mention early return behavior.

The transform method silently returns without making changes if the field path doesn't exist or if intermediate nodes aren't mappings. This is fine, but should be documented in the docstring for clarity. wdyt?

 Transforms a config by remapping a field value based on the provided map.
 If the original value is found in the map, it's replaced with the mapped value.
 If the value is not in the map, the field remains unchanged.
+If any part of the field path doesn't exist or isn't a mapping, no changes are made.
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1)

94-97: Test could be more specific about exception.

The test verifies that an exception is raised with an empty field path, but doesn't check the exception message. Would it be helpful to be more specific about the expected exception type and message? wdyt?

- with pytest.raises(Exception):
+ with pytest.raises(Exception, match="field_path cannot be empty."):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c64e588 and 01cf5a6.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py

[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (1)

18-25: The class design looks great!

Good job on defining a clear, immutable data structure for field additions with appropriate type annotations.

unit_tests/sources/declarative/transformations/config_transformations/test_config_remap_field.py (8)

1-10: The imports look good!

The necessary imports for testing are present, and the style aligns with the rest of the codebase.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 1-1: Ruff format check failed. File would be reformatted. Run 'ruff format' to fix code style issues.


12-37: Comprehensive basic test case.

This test thoroughly verifies the basic field remapping functionality, including checking that only the intended field is modified while others remain unchanged. Good job!


38-52: Good edge case testing.

Testing the behavior when a value isn't present in the mapping is important. This ensures the transformer doesn't modify fields unexpectedly.


53-68: Good test for non-existent field paths.

This test ensures that when the field path doesn't exist in the config, no changes are made to the config. This is important for ensuring the transformer behaves correctly in all scenarios.


69-83: Good test for interpolated path support.

This test verifies that the transformer correctly handles interpolated field paths, which is an important feature for dynamic configurations.


84-93: Good edge case testing for empty map.

Testing with an empty map is important to ensure the transformer handles this edge case correctly without modifying the config.


98-114: Good test for sequential transformations.

This test ensures that multiple transformations can be applied in sequence, which is important for complex configuration scenarios.


115-129: Excellent real-world test case.

This test with the Amazon Seller Partner use case validates that the transformer handles complex interpolated mappings with conditional logic. This is exactly the scenario mentioned in the previous review comment.

Copy link
Copy Markdown
Contributor

@brianjlai Brian Lai (brianjlai) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few small things, but this looks ready to go. nice work! ✅

let me know if my explanation for the runtime vs. parse time eval of the mapping makes sense too since its a somewhat subtle way that we think about interpolation processing

@pnilan Patrick Nilan (pnilan) merged commit dc10839 into main May 14, 2025
25 of 27 checks passed
@pnilan Patrick Nilan (pnilan) deleted the pnilan/feat/implement-validators branch May 14, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants