[FLINK-39401] Extend raw format to support line-delimiter option#27897
Open
featzhang wants to merge 1 commit intoapache:masterfrom
Open
[FLINK-39401] Extend raw format to support line-delimiter option#27897featzhang wants to merge 1 commit intoapache:masterfrom
featzhang wants to merge 1 commit intoapache:masterfrom
Conversation
Add a new optional `raw.line-delimiter` config option to the raw format. - RawFormatOptions: add LINE_DELIMITER ConfigOption<String> with no default value - RawFormatFactory: read the option and pass it to schema constructors; register it in optionalOptions() - RawFormatDeserializationSchema: override deserialize(byte[], Collector) to split the message by the delimiter and emit one RowData per part when the delimiter is set; single-record deserialize(byte[]) is unchanged for backward compatibility - RawFormatSerializationSchema: append delimiter bytes to the serialized value when the delimiter is set; null rows are unaffected - RawFormatFactoryTest: add testLineDelimiterOption() covering factory wiring with the new option - RawFormatLineDelimiterTest: new test class covering deserialization splitting (newline, custom delimiter, GBK charset, null message) and serialization appending (newline, custom delimiter, null row)
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR extends the
rawformat to support a new optionalraw.line-delimiterconfig option.When
raw.line-delimiteris set:raw.charset, split by the delimiter (String.split(Pattern.quote(delimiter), -1)), and oneRowDatais emitted per segment viadeserialize(byte[], Collector<T>).When
raw.line-delimiteris not set, all existing behavior is preserved exactly (backward compatible).Example SQL
Changes
RawFormatOptionsLINE_DELIMITERConfigOption<String>with no default valueRawFormatFactoryoptionalOptions()RawFormatDeserializationSchemadeserialize(byte[], Collector)to split by delimiter when set; addlineDelimiterfield toequals/hashCodeRawFormatSerializationSchemalineDelimiterfield toequals/hashCodeRawFormatFactoryTesttestLineDelimiterOption()RawFormatLineDelimiterTestTest Plan
RawFormatLineDelimiterTest(9 tests):\ndelimiter → 3 rows||→ 3 rows\ndelimiter → correct splitting\n→ appends\n||→ appends||RawFormatFactoryTest.testLineDelimiterOption(): verifies factory produces schemas with correct delimiter