#707 Handle Undefined keys in records#708
Conversation
📝 WalkthroughWalkthroughThe change updates the initialization logic of the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant StreamRecordHandler
User->>StreamRecordHandler: Initialize with keys (may include empty/falsy)
StreamRecordHandler->>StreamRecordHandler: Build quick_lookup dictionary
alt Key is falsy
StreamRecordHandler->>StreamRecordHandler: Map key to "undefined"
else Key is not falsy
StreamRecordHandler->>StreamRecordHandler: Map key to normalized or display case
end
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes were found. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)airbyte/records.py✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte/records.py (1)
144-152: Smart defensive approach for handling falsy keys - a few considerations though, wdyt?The logic to handle falsy keys by mapping them to "undefined" is a solid defensive programming approach that prevents the normalizer from encountering empty strings (which would raise
PyAirbyteNameNormalizationErrorbased on the relevant code snippet).However, I'm curious about a few aspects:
Potential conflicts: Could the hardcoded "undefined" string ever conflict with legitimate field names in schemas? Maybe using a more unique identifier like
"__undefined__"or"_airbyte_undefined"would be safer?Alternative approaches: Have you considered other options like:
- Filtering out falsy keys entirely instead of mapping them?
- Making the undefined placeholder configurable?
- Logging when this fallback occurs for debugging purposes?
Edge cases: What happens if a schema legitimately expects an empty string as a field name? Should this behavior be documented or configurable?
The implementation looks clean, but adding some tests around this behavior would really help validate these edge cases, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte/records.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
airbyte/records.py (2)
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#369
File: airbyte/_connector_base.py:0-0
Timestamp: 2024-10-08T15:34:31.026Z
Learning: In this codebase, `message.record.stream` is a required property enforced by schema, so it will not be `None`.
Learnt from: aaronsteers
PR: airbytehq/PyAirbyte#369
File: airbyte/_connector_base.py:0-0
Timestamp: 2024-09-17T21:18:12.530Z
Learning: In this codebase, `message.record.stream` is a required property enforced by schema, so it will not be `None`.
🧬 Code Graph Analysis (1)
airbyte/records.py (1)
airbyte/_util/name_normalizers.py (2)
normalize(23-25)normalize(53-87)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (No Creds)
| if self._normalize_keys | ||
| else self.to_display_case(key) | ||
| key: ( | ||
| "undefined" |
There was a problem hiding this comment.
maybe append a hash to this to avoid collisions?
ea4c758 to
6794987
Compare
This should resolve #707 by handling record field keys that are undefined. This is technically valid json, but undesirable. Not sure if this is the desired approach, however, so I'll keep this in draft and hold off on tests until I hear more.
Summary by CodeRabbit