apache · JNSimba · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/docs/ecosystem/flink-doris-connector/flink-doris-connector.md b/docs/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -6,7 +6,7 @@
 }
 ---

 The [Flink Doris Connector](https://github.com/apache/doris-flink-connector) is used to read from and write data to a Doris cluster through Flink. It also integrates [FlinkCDC](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.2/docs/connectors/flink-sources/overview/), which allows for more convenient full database synchronization with upstream databases such as MySQL.

 Using the Flink Connector, you can perform the following operations:

@@ -33,7 +33,7 @@
 | 24.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.0.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 26.0.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |
 | 26.1.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |

 ## Usage
@@ -42,7 +42,7 @@

 #### Jar

 You can download the corresponding version of the Flink Doris Connector Jar file [here](https://doris.apache.org/download#doris-ecosystem), then copy this file to the `classpath` of your `Flink` setup to use the `Flink-Doris-Connector`. For a `Standalone` mode Flink deployment, place this file under the `lib/` folder. For a Flink cluster running in `Yarn` mode, place the file into the pre-deployment package.

 #### Maven

@@ -93,7 +93,7 @@
 |          | Streaming Write | Batch Write |
 |----------|----------------|-------------|
 | **Trigger Condition** | Relies on Flink Checkpoints and follows Flink's checkpoint cycle to write to Doris | Periodic submission based on connector-defined time or data volume thresholds |
 | **Consistency** | Exactly-Once | At-Least-Once; Exactly-Once can be ensured with the primary key model |
 | **Latency** | Limited by the Flink checkpoint interval, generally higher | Independent batch mechanism with flexible adjustment |
 | **Fault Tolerance & Recovery** | Fully consistent with Flink state recovery | Relies on external deduplication logic (e.g., Doris primary key deduplication) |

@@ -106,7 +106,7 @@

 Taking a Standalone cluster as an example:

 1. Download the Flink installation package, e.g., [Flink 1.18.1](https://archive.apache.org/dist/flink/flink-1.18.1/flink-1.18.1-bin-scala_2.12.tgz);
 2. After extraction, place the Flink Doris Connector package in `<FLINK_HOME>/lib`;
 3. Navigate to the `<FLINK_HOME>` directory and run `bin/start-cluster.sh` to start the Flink cluster;
 4. You can verify if the Flink cluster started successfully using the `jps` command.
@@ -206,8 +206,8 @@

 When Flink reads data from Doris, the Doris Source is currently a bounded stream and does not support continuous reading in a CDC manner. Data can be read from Doris using Thrift or ArrowFlightSQL (supported from version 24.0.0 onward). Starting from version 2.1, ArrowFlightSQL is the recommended approach.

 - **Thrift**: Data is read by calling the BE's Thrift interface. For detailed steps, refer to [Reading Data via Thrift Interface](https://github.com/apache/doris/blob/master/samples/doris-demo/doris-source-demo/README.md).
 - **ArrowFlightSQL**: Based on Doris 2.1, this method allows high-speed reading of large volumes of data using the Arrow Flight SQL protocol. For more information, refer to [High-speed Data Transfer via Arrow Flight SQL](https://doris.apache.org/docs/dev/db-connect/arrow-flight-sql-connect/).

 #### Using FlinkSQL to Read Data

@@ -275,7 +275,7 @@
 env.execute("Doris Source Test");
 ```

 For the complete code, refer to:[DorisSourceDataStream.java](https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/example/DorisSourceDataStream.java)

 ### Writing Data to Doris

@@ -289,7 +289,7 @@

 #### Using FlinkSQL to Write Data

 For testing, Flink's [Datagen](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/datagen/) is used to simulate the continuously generated upstream data.

 ```SQL
 -- enable checkpoint
@@ -383,7 +383,7 @@
 env.execute("doris test");
 ```

 For the complete code, refer to:[DorisSinkExample.java](https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/example/DorisSinkExample.java)

 ##### RowData Format

@@ -452,7 +452,7 @@
 env.execute("doris test");
 ```

 For the complete code, refer to:[DorisSinkExampleRowData.java](https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/example/DorisSinkExampleRowData.java) 

 ##### Debezium Format

@@ -1130,7 +1130,7 @@
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db).
+   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [FE Configuration](../../admin-manual/config/fe-config).
 
    Meanwhile, frequently modifying the label and restarting a task may also lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the label of each task needs to be unique. And when restarting from a checkpoint, the Flink task will actively abort the transactions that have been pre-committed successfully but not yet committed. Frequent label modifications and restarts will result in a large number of pre-committed successful transactions that cannot be aborted and thus occupy transactions. In the Unique model, 2pc can also be disabled to achieve idempotent writes.
 
@@ -1148,4 +1148,13 @@
 
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
-   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+
+8. **When using Flink CDC to sync large tables from databases such as Oracle, an `I/O exception (java.net.SocketException) ... Broken pipe` error is reported. How to handle it?**
+
+   This error usually occurs when the data volume of a single Stream Load request exceeds the limit on the BE side. You can adjust it from the following aspects:
+   - Increase the `streaming_load_max_mb` parameter in `be.conf` on the BE side (default 10240, in MB), so that a single Stream Load can carry more data. The BE needs to be restarted to take effect.
+   - Enable batch mode (`sink.enable.batch-mode=true`), so that the Connector automatically splits the data into batches internally, avoiding too much data in a single Stream Load.
+   - Try to increase the parallelism of Oracle CDC by adding `--oracle-conf scan.incremental.snapshot.enabled=true` (experimental feature) to the startup command, which enables parallel reading of Oracle full data.
+
+   For more Flink CDC related issues, please refer to [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/).
diff --git a/...n-content-docs/current/ecosystem/flink-doris-connector/flink-doris-connector.md b/...n-content-docs/current/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -34,7 +34,7 @@
 | 24.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.0.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 26.0.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |
 | 26.1.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |

 ## 使用方式
@@ -95,7 +95,7 @@
 |          | 流式写入 | 批量写入 |
 |----------|----------|----------|
 | **触发条件** | 依赖 Flink 的 Checkpoint，跟随 Flink 的 Checkpoint 周期写入到 Doris 中 | 基于 Connector 内的时间阈值、数据量阈值进行周期性提交，写入到 Doris 中 |
 | **一致性** | Exactly-Once | At-Least-Once，基于主键模型可以保证 Exactly-Once |
 | **延迟** | 受 Checkpoint 时间间隔限制，通常较高 | 独立的批处理机制，灵活调整 |
 | **容错与恢复** | 与 Flink 状态恢复完全一致 | 依赖外部去重逻辑（如 Doris 主键去重） |

@@ -182,7 +182,7 @@
      'username' = 'root',
      'password' = '',
      'sink.label-prefix' = 'doris_label'
 );

 INSERT INTO StudentTrans SELECT id, concat('prefix_',name), age+1 FROM Student;
 ```
@@ -1131,7 +1131,7 @@
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db)。
+   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [FE 配置项](../../admin-manual/config/fe-config)。
    同时，一个任务频繁修改 label 重启，也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型)，每个任务的 label 需要唯一，并且从 checkpoint 重启时，flink 任务才会主动 abort 掉之前已经 precommit 成功，没有 commit 的 txn，频繁修改 label 重启，会导致大量 precommit 成功的 txn 无法被 abort，占用事务。在 Unique 模型下也可关闭 2pc，可以实现幂等写入。
 
 4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
@@ -1149,3 +1149,12 @@
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
    Flink 会先向 FE 请求，收到 307 后会向重定向后的 BE 请求。当 FE 在 FullGC/压力大/网络延迟的时候，HttpClient 默认会在一定时间 (3 秒) 没有等到响应会发送数据，由于默认情况下请求体是 InputStream，当收到 307 响应时，数据无法重放，会直接报错。有三种方式可以解决：1.升级到 Connector25.1.0 以上，调长了默认时间；2.修改 auto-redirect=false，直接向 BE 发起请求（不适用部分云上场景）；3.主键模型可以开启攒批模式。
+
+8. **使用 Flink CDC 同步 Oracle 等数据库的大表时，出现 `I/O exception (java.net.SocketException) ... Broken pipe` 报错怎么办？**
+
+   该报错通常是单次 Stream Load 写入的数据量超过了 BE 端的限制导致的。可以从以下几个方面进行调整：
+   - 调大 BE 端 `be.conf` 中的 `streaming_load_max_mb` 参数（默认 10240，单位 MB），允许单次 Stream Load 写入更多数据，修改后需要重启 BE 生效。
+   - 开启攒批模式（`sink.enable.batch-mode=true`），由 Connector 内部按批次大小自动切分写入，避免单次写入数据量过大。
+   - 尝试调高 Oracle CDC 的并发度，在启动命令中增加 `--oracle-conf scan.incremental.snapshot.enabled=true`（实验性功能），开启后可以多并发读取 Oracle 全量数据。
+
+   更多 Flink CDC 相关问题可以参考 [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/)。
diff --git a/...ntent-docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md b/...ntent-docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -34,7 +34,7 @@
 | 24.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.0.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 26.0.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |
 | 26.1.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |

 ## 使用方式
@@ -95,7 +95,7 @@
 |          | 流式写入 | 批量写入 |
 |----------|----------|----------|
 | **触发条件** | 依赖 Flink 的 Checkpoint，跟随 Flink 的 Checkpoint 周期写入到 Doris 中 | 基于 Connector 内的时间阈值、数据量阈值进行周期性提交，写入到 Doris 中 |
 | **一致性** | Exactly-Once | At-Least-Once，基于主键模型可以保证 Exactly-Once |
 | **延迟** | 受 Checkpoint 时间间隔限制，通常较高 | 独立的批处理机制，灵活调整 |
 | **容错与恢复** | 与 Flink 状态恢复完全一致 | 依赖外部去重逻辑（如 Doris 主键去重） |

@@ -182,7 +182,7 @@
      'username' = 'root',
      'password' = '',
      'sink.label-prefix' = 'doris_label'
 );

 INSERT INTO StudentTrans SELECT id, concat('prefix_',name), age+1 FROM Student;
 ```
@@ -1131,7 +1131,7 @@
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db)。
+   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [FE 配置项](../../admin-manual/config/fe-config)。
    同时，一个任务频繁修改 label 重启，也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型)，每个任务的 label 需要唯一，并且从 checkpoint 重启时，flink 任务才会主动 abort 掉之前已经 precommit 成功，没有 commit 的 txn，频繁修改 label 重启，会导致大量 precommit 成功的 txn 无法被 abort，占用事务。在 Unique 模型下也可关闭 2pc，可以实现幂等写入。
 
 4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
@@ -1149,3 +1149,12 @@
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
    Flink 会先向 FE 请求，收到 307 后会向重定向后的 BE 请求。当 FE 在 FullGC/压力大/网络延迟的时候，HttpClient 默认会在一定时间 (3 秒) 没有等到响应会发送数据，由于默认情况下请求体是 InputStream，当收到 307 响应时，数据无法重放，会直接报错。有三种方式可以解决：1.升级到 Connector25.1.0 以上，调长了默认时间；2.修改 auto-redirect=false，直接向 BE 发起请求（不适用部分云上场景）；3.主键模型可以开启攒批模式。
+
+8. **使用 Flink CDC 同步 Oracle 等数据库的大表时，出现 `I/O exception (java.net.SocketException) ... Broken pipe` 报错怎么办？**
+
+   该报错通常是单次 Stream Load 写入的数据量超过了 BE 端的限制导致的。可以从以下几个方面进行调整：
+   - 调大 BE 端 `be.conf` 中的 `streaming_load_max_mb` 参数（默认 10240，单位 MB），允许单次 Stream Load 写入更多数据，修改后需要重启 BE 生效。
+   - 开启攒批模式（`sink.enable.batch-mode=true`），由 Connector 内部按批次大小自动切分写入，避免单次写入数据量过大。
+   - 尝试调高 Oracle CDC 的并发度，在启动命令中增加 `--oracle-conf scan.incremental.snapshot.enabled=true`（实验性功能），开启后可以多并发读取 Oracle 全量数据。
+
+   更多 Flink CDC 相关问题可以参考 [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/)。
diff --git a/...ioned_docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md b/...ioned_docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -33,7 +33,7 @@
 | 24.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.0.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 25.1.0            | 1.15 - 1.20               | 1.0+          | 8            | -             |
 | 26.0.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |
 | 26.1.0            | 1.15 - 1.20,2.0 - 2.2     | 1.0+          | 8(1.x),17(2.x) | -             |

 ## Usage
@@ -93,7 +93,7 @@
 |          | Streaming Write | Batch Write |
 |----------|----------------|-------------|
 | **Trigger Condition** | Relies on Flink Checkpoints and follows Flink's checkpoint cycle to write to Doris | Periodic submission based on connector-defined time or data volume thresholds |
 | **Consistency** | Exactly-Once | At-Least-Once; Exactly-Once can be ensured with the primary key model |
 | **Latency** | Limited by the Flink checkpoint interval, generally higher | Independent batch mechanism with flexible adjustment |
 | **Fault Tolerance & Recovery** | Fully consistent with Flink state recovery | Relies on external deduplication logic (e.g., Doris primary key deduplication) |

@@ -1130,7 +1130,7 @@
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db).
+   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [FE Configuration](../../admin-manual/config/fe-config).
 
    Meanwhile, frequently modifying the label and restarting a task may also lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the label of each task needs to be unique. And when restarting from a checkpoint, the Flink task will actively abort the transactions that have been pre-committed successfully but not yet committed. Frequent label modifications and restarts will result in a large number of pre-committed successful transactions that cannot be aborted and thus occupy transactions. In the Unique model, 2pc can also be disabled to achieve idempotent writes.
 
@@ -1148,4 +1148,13 @@
 
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
-   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+
+8. **When using Flink CDC to sync large tables from databases such as Oracle, an `I/O exception (java.net.SocketException) ... Broken pipe` error is reported. How to handle it?**
+
+   This error usually occurs when the data volume of a single Stream Load request exceeds the limit on the BE side. You can adjust it from the following aspects:
+   - Increase the `streaming_load_max_mb` parameter in `be.conf` on the BE side (default 10240, in MB), so that a single Stream Load can carry more data. The BE needs to be restarted to take effect.
+   - Enable batch mode (`sink.enable.batch-mode=true`), so that the Connector automatically splits the data into batches internally, avoiding too much data in a single Stream Load.
+   - Try to increase the parallelism of Oracle CDC by adding `--oracle-conf scan.incremental.snapshot.enabled=true` (experimental feature) to the startup command, which enables parallel reading of Oracle full data.
+
+   For more Flink CDC related issues, please refer to [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/).