[FLINK-39401] Extend raw format to support line-delimiter option#27897
[FLINK-39401] Extend raw format to support line-delimiter option#27897RocMarshal merged 4 commits intoapache:masterfrom
Conversation
|
@featzhang Thanks for this contribution. The commit message in the PR should include the Jira ticket id. |
8205f0c to
3161013
Compare
|
Thanks for the thorough review @spuru9 and @rmetzger! Here's what was addressed in this update:
|
3161013 to
bb33242
Compare
RocMarshal
left a comment
There was a problem hiding this comment.
Thanks @featzhang for the contribution.
Left some comments.
PTAL~
bb33242 to
b219d10
Compare
Review Comments Addressed✅ Fixed comments (from @RocMarshal)
📁 Modified files
💬 Design note for line 131 (null message with delimiter)The current behavior (returning early on
@RocMarshal PTAL, thanks! |
|
Thanks @RocMarshal for pointing that out! I've now gone through the entire
The fix covers all test methods: @RocMarshal PTAL, thanks! |
RocMarshal
left a comment
There was a problem hiding this comment.
Thanks @featzhang for update.
BTW, It would be perfect if the updated summary could reply point-by-point in the comments and be a bit more concise. That way, the reviewer wouldn’t need to break things down one by one and locate the corresponding changes to verify them.
LGTM+1 if the CI passed.
Looking forward to your next contribution~
|
The CI failure in the latest build (#74134) is unrelated to this PR. The failing tests are:
These tests check that The code changes in this PR only touch:
@flinkbot run azure |
|
Hi, @featzhang The CI failure was fixed in https://issues.apache.org/jira/browse/FLINK-39457. |
…tion - Use StandardCharsets.UTF_8 instead of string literals in tests - Simplify assertions using hasToString() method - Add documentation for field nullability relationship Resolves review comments from @RocMarshal
…ut RawFormatLineDelimiterTest
8f7dea3 to
2742e22
Compare
I have fixed it by rebase the latest main branch, and CI has been successful. |
|
Thanks @featzhang . |
|
@featzhang , @RocMarshal any reason config option was added without FLIP/ML discussion? All of the following are public interfaces that people build around: |
Hi @snuyanzin, thanks for raising this concern — and apologies for not initiating the discussion beforehand. The
My understanding (which may be wrong) was that minor additive options within a format connector fall below the FLIP threshold, similar to how many other That said, I fully respect the community's position. If the consensus is that this option should have gone through FLIP/ML discussion, I'm happy to:
What would you recommend, @snuyanzin and @RocMarshal? |
|
Thanks @snuyanzin for the attention and reminder, @featzhang for the response. Please let me to add some information from my perspective at the time when reviewing the PR, and sorry for missing that earlier, which may have caused confusion: I had also considered whether we needed a new FLIP to track this parameter. However, based on the following reasoning, I proceeded with the merge:
That said, if the introduction of this parameter feels too abrupt, I’m happy to revert it immediately. We can then reintroduce this change later, strictly following the specification you outlined. To ensure consistency, I've incorporated this into my review checklist. Going forward, I'll follow the specification closely to avoid similar issues. Any input is appreciated! |
|
I don't think we need revert here however would be great to have more visibility |
|
Thanks @snuyanzin . |
Summary
This PR extends the
rawformat to support a new optionalraw.line-delimiterconfig option.When
raw.line-delimiteris set:raw.charset, split by the delimiter (String.split(Pattern.quote(delimiter), -1)), and oneRowDatais emitted per segment viadeserialize(byte[], Collector<T>).When
raw.line-delimiteris not set, all existing behavior is preserved exactly (backward compatible).Example SQL
Changes
RawFormatOptionsLINE_DELIMITERConfigOption<String>with no default valueRawFormatFactoryoptionalOptions()RawFormatDeserializationSchemadeserialize(byte[], Collector)to split by delimiter when set; addlineDelimiterfield toequals/hashCodeRawFormatSerializationSchemalineDelimiterfield toequals/hashCodeRawFormatFactoryTesttestLineDelimiterOption()RawFormatLineDelimiterTestTest Plan
RawFormatLineDelimiterTest(9 tests):\ndelimiter → 3 rows||→ 3 rows\ndelimiter → correct splitting\n→ appends\n||→ appends||RawFormatFactoryTest.testLineDelimiterOption(): verifies factory produces schemas with correct delimiter