fix(cdk): support start_date format without microseconds in file-based connectors#945
Conversation
…d connectors Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1773159423-fix-start-date-format-parsing#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1773159423-fix-start-date-format-parsingPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
…TIME_FORMAT Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…Spec pattern_descriptor Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…ate, not cursor values Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a private Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Would you like me to suggest a brief docstring and a couple of edge-case tests for leap seconds or uncommon timezone formats, wdyt? 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…al format support Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…llback Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…s://git-manager.devin.ai/proxy/github.com/airbytehq/airbyte-python-cdk into devin/1773159423-fix-start-date-format-parsing
…ipping tzinfo Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
…tart_date Co-Authored-By: Daryna Ishchenko <darina.ishchenko17@gmail.com>
review comment was resolved #945 (review)
6876663
into
main
fix(cdk): support start_date format without microseconds in file-based connectors
Summary
AbstractFileBasedStreamReader.filter_files_by_globs_and_start_datecrashes at runtime whenstart_dateis provided without microseconds (e.g.2025-01-01T00:00:00Z), becausedatetime.strptimewith%frequires the fractional-seconds component. This format is commonly produced by Terraform and other API clients.The spec validator (updated in CDK v7.7.1) already accepts this format, so the failure only surfaces at runtime during file filtering — not at config validation time.
Fix: Adds a
_parse_start_dateinstance method with a three-tier fallback:self.DATE_TIME_FORMAT(%Y-%m-%dT%H:%M:%S.%fZ) — the strict format with microseconds%Y-%m-%dT%H:%M:%SZ— the shorter ISO8601 variant without microsecondsab_datetime_parse()fromairbyte_cdk.utils.datetime_helpers— handles date-only (YYYY-MM-DD), timezone offsets, and other formats supported bydateutilFor the
ab_datetime_parsefallback, the result is first converted to UTC via.astimezone(timezone.utc)and then made naive via.replace(tzinfo=None)to remain comparable withRemoteFile.last_modified(a naivedatetime). This ensures non-UTC offsets like+05:30are correctly converted before comparison.Relates to: https://github.com/airbytehq/oncall/issues/9390
Updates since last revision
ab_datetime_parsefallback: now uses.astimezone(timezone.utc).replace(tzinfo=None)instead of.replace(tzinfo=None). Previously, a non-UTC offset like2025-01-01T00:00:00+05:30would silently discard the offset and produce2025-01-01 00:00:00instead of the correct UTC-equivalent2024-12-31 18:30:00.with_timezone_offset_converted_to_utcverifying correct UTC conversion for timezone-offset inputs.Previous updates
ab_datetime_parseas a third-level fallback in_parse_start_date. If bothstrptimeformats fail, the method now delegates toab_datetime_parse()rather than raisingValueError."2025-01-01") exercising theab_datetime_parsefallback path.@staticmethoddecorator from_parse_start_date; the method now usesself.DATE_TIME_FORMATinstead of referencingAbstractFileBasedStreamReader.DATE_TIME_FORMATdirectly. This ensures subclass overrides ofDATE_TIME_FORMATare respected._parse_start_dateto document that the fallback format originates fromAbstractFileBasedSpec'spattern_descriptor:"YYYY-MM-DD, YYYY-MM-DDTHH:mm:ssZ, or YYYY-MM-DDTHH:mm:ss.SSSSSSZ".Review & Testing Checklist for Human
ab_datetime_parsefallback scope is acceptable.ab_datetime_parseis very flexible — it handles Unix timestamps, various ISO8601 variants, and anythingdateutil.parser.parse()can handle. This is broader than what the spec'spattern_descriptoradvertises. Verify that silently accepting unexpected formats won't mask config errors.strptimebranches. The first two branches parseZas a literal character (not as a UTC designator), producing naive datetimes. The third branch converts to UTC then strips tzinfo. This is consistent as long asRemoteFile.last_modifiedis always naive and effectively in UTC — confirm this assumption holds across connectors.start_dateparsing sites. The cursors (DefaultFileBasedCursor,FileBasedConcurrentCursor) also useDATE_TIME_FORMATwithstrptime— but those parse state values the CDK itself serialized, so they should always include microseconds. Confirm this assumption holds.Suggested test plan:
start_dateformats:"2025-01-01T00:00:00Z"(without microseconds)"2025-01-01T00:00:00.000000Z"(with microseconds)"2025-01-01"(date-only)"2025-01-01T00:00:00+05:30"(with non-UTC offset)last_modified >= start_date.Notes
Requested by: Daryna Ishchenko (@darynaishchenko)
Devin session
Summary by CodeRabbit
Bug Fixes
Tests