feat(property chunking): Allow PropertiesFromEndpoint to be defined on HttpRequester#507
Conversation
… allow for interpolation on other request options
📝 WalkthroughWalkthroughThis change introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Manifest
participant ModelToComponentFactory
participant HttpRequester
participant QueryProperties
User->>Manifest: Defines HttpRequester with fetch_properties_from_endpoint
Manifest->>ModelToComponentFactory: Passes manifest for component creation
ModelToComponentFactory->>HttpRequester: Instantiates HttpRequester
alt fetch_properties_from_endpoint defined
ModelToComponentFactory->>QueryProperties: Dynamically creates QueryProperties from endpoint
end
ModelToComponentFactory-->>User: Returns configured retriever
Possibly related PRs
Suggested labels
Suggested reviewers
Would you like to consider adding an integration test that exercises a full end-to-end flow with the new ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1981-1985: Introducefetch_properties_from_endpointon HttpRequester
The new optional field correctly referencesPropertiesFromEndpointand matches the PR objective of enabling dynamic property injection. Would it be helpful to:
- Add a schema‐level
oneOfornotconstraint to disallow defining bothfetch_properties_from_endpointandQueryPropertiesinrequest_parametersat validation time, rather than waiting for the factory error?- Include an example under
HttpRequester’sexamplesto demonstrate howfetch_properties_from_endpointis used in practice?Wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
2948-2964: Explicit clash detection is great – consider simplifying the second scan?Love that we now surface an error when both
request_parametersandfetch_properties_from_endpointdefine aPropertiesFromEndpoint– much clearer 👏.Today we (1) call
_query_properties_in_request_parametersto know a clash may exist, and (2) iterate a second time overrequest_parametersto collectQueryPropertiesModelinstances. That means two scans of the same dict.Might it be cleaner to have
_query_properties_in_request_parametersreturn the (key, model) of the firstQueryPropertiesModelit finds (orNone) and at the same time detect duplicates?
This would:
- avoid the second loop,
- centralise the “only one allowed” validation,
- let the caller directly obtain
query_properties_keyand the model.wdyt?
- if self._query_properties_in_request_parameters(model.requester): + qp_result = self._query_properties_in_request_parameters(model.requester) + if qp_result: ... - query_properties_definitions = [] - for key, request_parameter in model.requester.request_parameters.items(): - if isinstance(request_parameter, QueryPropertiesModel): - query_properties_key = key - query_properties_definitions.append(request_parameter) + query_properties_key, qp_model = qp_result + query_properties = self._create_component_from_model( + model=qp_model, config=config + )(Fully understand if you prefer to keep the current shape, just throwing the idea out there 😊)
2975-2986: Shouldfetch_properties_from_endpointexpose more knobs?When
fetch_properties_from_endpointis used we create a syntheticQueryPropertiesModelwithalways_include_properties=Noneandproperty_chunking=None. That mirrors the “no-chunking” requirement, but it also silently prevents a user from opting intoalways_include_properties.Would it make sense to copy over (or allow in the schema) extra fields so callers can still control
always_include_propertieswhile leavingproperty_chunkingdisabled? Something like:QueryPropertiesModel( type="QueryProperties", property_list=model.requester.fetch_properties_from_endpoint, always_include_properties=getattr(model.requester, "always_include_properties", None), property_chunking=None, )No pressure, just wondering if you foresee a need for that flexibility down the road – wdyt?
3108-3119: Minor typing nit: prefercollections.abc.Mappingovertyping.Mappingforisinstance
typing.Mappingworks today (it aliases tocollections.abc.Mappingin ≥3.9), but using it withisinstancewas never the intended public API and could become brittle.Switching to
from collections.abc import Mapping(or importing as an alias) would make the intent explicit and future-proof the runtime check – wdyt?-from typing import Mapping +from collections.abc import Mapping ... - if request_parameters and isinstance(request_parameters, Mapping): + if request_parameters and isinstance(request_parameters, Mapping):unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4445-4526: Good validation test for conflicting property definitions!This test ensures we fail early when a developer accidentally defines
PropertiesFromEndpointboth directly on the requester and within request parameters. This kind of validation prevents subtle runtime issues and makes the configuration errors more apparent.I notice you've set up a comprehensive test case that attempts to use both approaches simultaneously. Have you considered adding a more detailed error message in the actual implementation? The validation is correct, but wdyt about making the error message specify exactly what's conflicting?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(2 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(3 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py(0 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py(3 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
PropertiesFromEndpoint(14-40)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
2257-2261: LGTM! This is a nice addition for flexibility in the HTTP requester.The new field allows dynamic property retrieval from endpoints that can be injected into outbound requests via
stream_partition.extra_fields. This seems especially useful for cases like CRMSearch streams where properties need to be referenced in POST request bodies. The implementation reuses the existingQueryPropertiescomponent while addressing the chunking limitation, which is a clean approach.airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1467-1467: Formatting cleanup approved: removed extraneous spaces inside the enum forFileUploaderto align with the other component definitions.
2377-2377: Formatting cleanup approved: removed extraneous spaces inside the enum forKeyTransformationfor consistency with other enums.unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
4219-4299: Great addition of a test for the newfetch_properties_from_endpointfeature!This test appropriately verifies that
PropertiesFromEndpointcan be defined directly on the requester rather than nested inside request parameters. I especially like how you validate that property chunking is properly disabled when using this approach (lines 4282-4283).The test comprehensively checks all aspects of the feature:
- Correct object instantiation
- Property paths set correctly
- Underlying retriever configuration
- No property chunking by default
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
LGTM! Thanks for enabling this for request body too
|
Anatolii tested the feature w/ |
Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12646
While implementing CRMSearch streams for Hubspot, we found a gap where we need to be able to reference the properties retrieved dynamically to inject them into other request inputs. Specifically JSON body for POST requests.
Within these other request inputs, property chunking itself doesn't really make sense because there aren't input limits unlike request parameters.
This implementation does a clever trick where under the hood we reuse the
QueryPropertiesruntime component without any property chunking enabled. And by doing so, nearly all of the rest of the flow can be reused. And the interpolation context for the request inputs should still have access to the stream slice. For example:{{ stream_partition.extra_fields['query_properties'] }}Summary by CodeRabbit