Skip to content

Commit bec0da0

Browse files
authored
[doc] document disabling Stream Load compression via sink.properties.compress_type (#3638)
## Versions - [x] dev - [x] 4.x - [ ] 3.x - [ ] 2.1 or older (not covered by version/language sync gate) ## Languages - [x] Chinese - [x] English - [ ] Japanese candidate translation needed ## Docs Checklist - [x] Checked by AI - [ ] Test Cases Built - [x] Updated required version and language counterparts, or explained why not - [x] If only one language changed, confirmed whether source/translation counterparts need sync ## Summary Since Flink Doris Connector 26.1.0, gz compression is enabled by default for Stream Load (apache/doris-flink-connector#648). This PR documents how to disable it: - Append to the 26.1.0 release note: compression can be disabled by setting `'sink.properties.compress_type' = ''`. - Append the same note to the `sink.properties.*` description in the connector configuration table so users can find it from the main config docs. Applied to all four locations: `docs/` (dev EN), `versioned_docs/version-4.x/` (4.x EN), and the corresponding `i18n/zh-CN` Chinese translations.
1 parent 9c3de31 commit bec0da0

10 files changed

Lines changed: 10 additions & 10 deletions

File tree

docs-next/connection-integration/data-integration/flink-doris-connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -817,7 +817,7 @@ After the Flink cluster is started, you can run the corresponding command accord
817817
| Key | Default Value | Required | Comment |
818818
| --------------------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
819819
| sink.label-prefix | -- | Y | The label prefix used for Stream Load imports. In 2pc scenarios, it must be globally unique to guarantee the EOS semantics of Flink. |
820-
| sink.properties.* | -- | N | Stream Load import parameters. For example: `'sink.properties.column_separator' = ', '` defines the column separator; `'sink.properties.escape_delimiters' = 'true'` indicates that special characters are used as separators, and `\x01` will be converted to the binary `0x01`; for JSON-format imports: `'sink.properties.format' = 'json'`, `'sink.properties.read_json_by_line' = 'true'`. For detailed parameters, see [Stream Load](../../data-operate/import/import-way/stream-load-manual.md#import-configuration-parameters). Group Commit mode: `'sink.properties.group_commit' = 'sync_mode'` sets group commit to synchronous mode. Flink Connector supports configuring group commit for imports starting from 1.6.2. For detailed usage and limitations, see [Group Commit](../../data-operate/import/load-best-practices/group-commit-manual.md). |
820+
| sink.properties.* | -- | N | Stream Load import parameters. For example: `'sink.properties.column_separator' = ', '` defines the column separator; `'sink.properties.escape_delimiters' = 'true'` indicates that special characters are used as separators, and `\x01` will be converted to the binary `0x01`; for JSON-format imports: `'sink.properties.format' = 'json'`, `'sink.properties.read_json_by_line' = 'true'`. For detailed parameters, see [Stream Load](../../data-operate/import/import-way/stream-load-manual.md#import-configuration-parameters). Group Commit mode: `'sink.properties.group_commit' = 'sync_mode'` sets group commit to synchronous mode. Flink Connector supports configuring group commit for imports starting from 1.6.2. For detailed usage and limitations, see [Group Commit](../../data-operate/import/load-best-practices/group-commit-manual.md). Since 26.1.0, gz compression is enabled by default for Stream Load; it can be disabled by setting `'sink.properties.compress_type' = ''`. |
821821
| sink.enable-delete | TRUE | N | Whether to enable deletion. This option requires the Doris table to have batch deletion enabled (enabled by default in Doris 0.15+) and only supports the Unique model. |
822822
| sink.enable-2pc | TRUE | N | Whether to enable two-phase commit (2pc). The default is true, which guarantees Exactly-Once semantics. For information on two-phase commit, see [Stream Load 2PC](../../data-operate/transaction.md#streamload-2pc). |
823823
| sink.buffer-size | 1MB | N | Buffer size for the write data cache, in bytes. Modifying this is not recommended; the default configuration is sufficient. |

docs/ecosystem/flink-doris-connector/flink-doris-connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -833,7 +833,7 @@ After starting the Flink cluster, you can directly run the following command:
833833
| Key | Default Value | Required | Comment |
834834
| --------------------------- | ------------- | -------- | ------------------------------------------------------------ |
835835
| sink.label-prefix | -- | Y | The label prefix used for Stream load import. In the 2pc scenario, it is required to be globally unique to ensure the EOS semantics of Flink. |
836-
| sink.properties.* | -- | N | Import parameters for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines the column separator, and 'sink.properties.escape_delimiters' = 'true' means that special characters as delimiters, like \x01, will be converted to binary 0x01. For JSON format import, 'sink.properties.format' = 'json', 'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to [here](../../data-operate/import/import-way/stream-load-manual.md#load-configuration-parameters). For Group Commit mode, for example, 'sink.properties.group_commit' = 'sync_mode' sets the group commit to synchronous mode. The Flink connector has supported import configuration group commit since version 1.6.2. For detailed usage and limitations, refer to [group commit](../../data-operate/import/group-commit-manual.md). |
836+
| sink.properties.* | -- | N | Import parameters for Stream Load. For example, 'sink.properties.column_separator' = ', ' defines the column separator, and 'sink.properties.escape_delimiters' = 'true' means that special characters as delimiters, like \x01, will be converted to binary 0x01. For JSON format import, 'sink.properties.format' = 'json', 'sink.properties.read_json_by_line' = 'true'. For detailed parameters, refer to [here](../../data-operate/import/import-way/stream-load-manual.md#load-configuration-parameters). For Group Commit mode, for example, 'sink.properties.group_commit' = 'sync_mode' sets the group commit to synchronous mode. The Flink connector has supported import configuration group commit since version 1.6.2. For detailed usage and limitations, refer to [group commit](../../data-operate/import/group-commit-manual.md). Since 26.1.0, gz compression is enabled by default for Stream Load; it can be disabled by setting 'sink.properties.compress_type' = ''. |
837837
| sink.enable-delete | TRUE | N | Whether to enable deletion. This option requires the Doris table to have the batch deletion feature enabled (enabled by default in Doris 0.15+ versions), and only supports the Unique model. |
838838
| sink.enable-2pc | TRUE | N | Whether to enable two-phase commit (2pc). The default is true, ensuring Exactly-Once semantics. For details about two-phase commit, refer to [here](../../data-operate/transaction.md#streamload-2pc). |
839839
| sink.buffer-size | 1MB | N | The size of the write data cache buffer, in bytes. It is not recommended to modify it, and the default configuration can be used. |

docs/ecosystem/flink-doris-connector/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
### Features & Improvements
2323

24-
- Enable gz compression by default for StreamLoad [#648](https://github.com/apache/doris-flink-connector/pull/648)
24+
- Enable gz compression by default for StreamLoad [#648](https://github.com/apache/doris-flink-connector/pull/648). Compression can be disabled by setting `'sink.properties.compress_type' = ''`.
2525

2626
### Credits
2727

i18n/zh-CN/docusaurus-plugin-content-docs-next/current/connection-integration/data-integration/flink-doris-connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -817,7 +817,7 @@ Flink Doris Connector 集成了 [Flink CDC](https://nightlies.apache.org/flink/f
817817
| Key | Default Value | Required | Comment |
818818
| --------------------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
819819
| sink.label-prefix | -- | Y | Stream Load 导入使用的 label 前缀。2pc 场景下要求全局唯一,用来保证 Flink 的 EOS 语义。 |
820-
| sink.properties.* | -- | N | Stream Load 的导入参数。例如:`'sink.properties.column_separator' = ', '` 定义列分隔符;`'sink.properties.escape_delimiters' = 'true'` 表示特殊字符作为分隔符,`\x01` 会被转换为二进制的 `0x01`;JSON 格式导入:`'sink.properties.format' = 'json'``'sink.properties.read_json_by_line' = 'true'`,详细参数参考 [Stream Load](../../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group Commit 模式:`'sink.properties.group_commit' = 'sync_mode'` 设置 group commit 为同步模式。Flink Connector 从 1.6.2 开始支持导入配置 group commit,详细使用与限制参考 [Group Commit](../../data-operate/import/load-best-practices/group-commit-manual.md)|
820+
| sink.properties.* | -- | N | Stream Load 的导入参数。例如:`'sink.properties.column_separator' = ', '` 定义列分隔符;`'sink.properties.escape_delimiters' = 'true'` 表示特殊字符作为分隔符,`\x01` 会被转换为二进制的 `0x01`;JSON 格式导入:`'sink.properties.format' = 'json'``'sink.properties.read_json_by_line' = 'true'`,详细参数参考 [Stream Load](../../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group Commit 模式:`'sink.properties.group_commit' = 'sync_mode'` 设置 group commit 为同步模式。Flink Connector 从 1.6.2 开始支持导入配置 group commit,详细使用与限制参考 [Group Commit](../../data-operate/import/load-best-practices/group-commit-manual.md)从 26.1.0 开始 Stream Load 默认启用 gz 压缩,可通过设置 `'sink.properties.compress_type' = ''` 关闭压缩。 |
821821
| sink.enable-delete | TRUE | N | 是否启用删除。此选项需要 Doris 表开启批量删除功能(Doris 0.15+ 版本默认开启),只支持 Unique 模型。 |
822822
| sink.enable-2pc | TRUE | N | 是否开启两阶段提交(2pc),默认为 true,保证 Exactly-Once 语义。关于两阶段提交可参考 [Stream Load 2PC](../../data-operate/transaction.md#streamload-2pc)|
823823
| sink.buffer-size | 1MB | N | 写数据缓存 buffer 大小,单位字节。不建议修改,默认配置即可 |

i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector/flink-doris-connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -835,7 +835,7 @@ Flink Doris Connector 中集成了[Flink CDC](https://nightlies.apache.org/flink
835835
| Key | Default Value | Required | Comment |
836836
| --------------------------- | ------------- | -------- | ------------------------------------------------------------ |
837837
| sink.label-prefix | -- | Y | Stream load 导入使用的 label 前缀。2pc 场景下要求全局唯一,用来保证 Flink 的 EOS 语义。 |
838-
| sink.properties.* | -- | N | Stream Load 的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符, 'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON 格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' = 'true' 详细参数参考[这里](../../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit 为同步模式。flink connector 从 1.6.2 开始支持导入配置 group commit,详细使用和限制参考 [group commit](../../data-operate/import/group-commit-manual.md)|
838+
| sink.properties.* | -- | N | Stream Load 的导入参数。例如: 'sink.properties.column_separator' = ', ' 定义列分隔符, 'sink.properties.escape_delimiters' = 'true' 特殊字符作为分隔符,\x01 会被转换为二进制的 0x01。JSON 格式导入 'sink.properties.format' = 'json' , 'sink.properties.read_json_by_line' = 'true' 详细参数参考[这里](../../data-operate/import/import-way/stream-load-manual.md#导入配置参数)。Group Commit 模式 例如:'sink.properties.group_commit' = 'sync_mode' 设置 group commit 为同步模式。flink connector 从 1.6.2 开始支持导入配置 group commit,详细使用和限制参考 [group commit](../../data-operate/import/group-commit-manual.md)从 26.1.0 开始 Stream Load 默认启用 gz 压缩,可通过设置 'sink.properties.compress_type' = '' 关闭压缩。 |
839839
| sink.enable-delete | TRUE | N | 是否启用删除。此选项需要 Doris 表开启批量删除功能 (Doris0.15+ 版本默认开启),只支持 Unique 模型。 |
840840
| sink.enable-2pc | TRUE | N | 是否开启两阶段提交 (2pc),默认为 true,保证 Exactly-Once 语义。关于两阶段提交可参考[这里](../../data-operate/transaction.md#streamload-2pc)|
841841
| sink.buffer-size | 1MB | N | 写数据缓存 buffer 大小,单位字节。不建议修改,默认配置即可 |

i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
### 功能与改进
2323

24-
- StreamLoad 默认启用 gz 压缩 [#648](https://github.com/apache/doris-flink-connector/pull/648)
24+
- StreamLoad 默认启用 gz 压缩 [#648](https://github.com/apache/doris-flink-connector/pull/648)。可通过设置 `'sink.properties.compress_type' = ''` 关闭压缩。
2525

2626
### 致谢
2727

0 commit comments

Comments
 (0)