[doc] add FAQ for Stream Load Broken pipe when syncing large tables via Flink CDC (#3598)

JNSimba · web-flow · commit d1a42c6e590f · 2026-04-29T17:18:03.000+08:00
## Summary

Add a new FAQ entry to the Flink Doris Connector docs (English/Chinese,
dev/4.x) for the `I/O exception (java.net.SocketException) ... Broken
pipe` error that users may encounter when syncing large tables (e.g.
Oracle) via Flink CDC.
diff --git a/docs/ecosystem/flink-doris-connector/flink-doris-connector.md b/docs/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -1130,7 +1130,7 @@ In the whole database synchronization tool provided by the Connector, no additio
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db).
+   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [FE Configuration](../../admin-manual/config/fe-config).
 
    Meanwhile, frequently modifying the label and restarting a task may also lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the label of each task needs to be unique. And when restarting from a checkpoint, the Flink task will actively abort the transactions that have been pre-committed successfully but not yet committed. Frequent label modifications and restarts will result in a large number of pre-committed successful transactions that cannot be aborted and thus occupy transactions. In the Unique model, 2pc can also be disabled to achieve idempotent writes.
 
@@ -1148,4 +1148,13 @@ In the whole database synchronization tool provided by the Connector, no additio
 
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
-   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+
+8. **When using Flink CDC to sync large tables from databases such as Oracle, an `I/O exception (java.net.SocketException) ... Broken pipe` error is reported. How to handle it?**
+
+   This error usually occurs when the data volume of a single Stream Load request exceeds the limit on the BE side. You can adjust it from the following aspects:
+   - Increase the `streaming_load_max_mb` parameter in `be.conf` on the BE side (default 10240, in MB), so that a single Stream Load can carry more data. The BE needs to be restarted to take effect.
+   - Enable batch mode (`sink.enable.batch-mode=true`), so that the Connector automatically splits the data into batches internally, avoiding too much data in a single Stream Load.
+   - Try to increase the parallelism of Oracle CDC by adding `--oracle-conf scan.incremental.snapshot.enabled=true` (experimental feature) to the startup command, which enables parallel reading of Oracle full data.
+   
+   For more Flink CDC related issues, please refer to [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/).
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector/flink-doris-connector.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -1131,7 +1131,7 @@ from KAFKA_SOURCE;
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db)。
+   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [FE 配置项](../../admin-manual/config/fe-config)。
    同时，一个任务频繁修改 label 重启，也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型)，每个任务的 label 需要唯一，并且从 checkpoint 重启时，flink 任务才会主动 abort 掉之前已经 precommit 成功，没有 commit 的 txn，频繁修改 label 重启，会导致大量 precommit 成功的 txn 无法被 abort，占用事务。在 Unique 模型下也可关闭 2pc，可以实现幂等写入。
 
 4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
@@ -1149,3 +1149,12 @@ from KAFKA_SOURCE;
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
    
    Flink 会先向 FE 请求，收到 307 后会向重定向后的 BE 请求。当 FE 在 FullGC/压力大/网络延迟的时候，HttpClient 默认会在一定时间 (3 秒) 没有等到响应会发送数据，由于默认情况下请求体是 InputStream，当收到 307 响应时，数据无法重放，会直接报错。有三种方式可以解决：1.升级到 Connector25.1.0 以上，调长了默认时间；2.修改 auto-redirect=false，直接向 BE 发起请求（不适用部分云上场景）；3.主键模型可以开启攒批模式。
+
+8. **使用 Flink CDC 同步 Oracle 等数据库的大表时，出现 `I/O exception (java.net.SocketException) ... Broken pipe` 报错怎么办？**
+
+   该报错通常是单次 Stream Load 写入的数据量超过了 BE 端的限制导致的。可以从以下几个方面进行调整：
+   - 调大 BE 端 `be.conf` 中的 `streaming_load_max_mb` 参数（默认 10240，单位 MB），允许单次 Stream Load 写入更多数据，修改后需要重启 BE 生效。
+   - 开启攒批模式（`sink.enable.batch-mode=true`），由 Connector 内部按批次大小自动切分写入，避免单次写入数据量过大。
+   - 尝试调高 Oracle CDC 的并发度，在启动命令中增加 `--oracle-conf scan.incremental.snapshot.enabled=true`（实验性功能），开启后可以多并发读取 Oracle 全量数据。
+   
+   更多 Flink CDC 相关问题可以参考 [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/)。
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -1131,7 +1131,7 @@ from KAFKA_SOURCE;
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db)。
+   这是因为同一个库并发导入超过了 100，可通过调整 fe.conf 的参数 `max_running_txn_num_per_db` 来解决，具体可参考 [FE 配置项](../../admin-manual/config/fe-config)。
    同时，一个任务频繁修改 label 重启，也可能会导致这个错误。2pc 场景下 (Duplicate/Aggregate 模型)，每个任务的 label 需要唯一，并且从 checkpoint 重启时，flink 任务才会主动 abort 掉之前已经 precommit 成功，没有 commit 的 txn，频繁修改 label 重启，会导致大量 precommit 成功的 txn 无法被 abort，占用事务。在 Unique 模型下也可关闭 2pc，可以实现幂等写入。
 
 4. **tablet writer write failed, tablet_id=190958, txn_id=3505530, err=-235**
@@ -1149,3 +1149,12 @@ from KAFKA_SOURCE;
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
    
    Flink 会先向 FE 请求，收到 307 后会向重定向后的 BE 请求。当 FE 在 FullGC/压力大/网络延迟的时候，HttpClient 默认会在一定时间 (3 秒) 没有等到响应会发送数据，由于默认情况下请求体是 InputStream，当收到 307 响应时，数据无法重放，会直接报错。有三种方式可以解决：1.升级到 Connector25.1.0 以上，调长了默认时间；2.修改 auto-redirect=false，直接向 BE 发起请求（不适用部分云上场景）；3.主键模型可以开启攒批模式。
+
+8. **使用 Flink CDC 同步 Oracle 等数据库的大表时，出现 `I/O exception (java.net.SocketException) ... Broken pipe` 报错怎么办？**
+
+   该报错通常是单次 Stream Load 写入的数据量超过了 BE 端的限制导致的。可以从以下几个方面进行调整：
+   - 调大 BE 端 `be.conf` 中的 `streaming_load_max_mb` 参数（默认 10240，单位 MB），允许单次 Stream Load 写入更多数据，修改后需要重启 BE 生效。
+   - 开启攒批模式（`sink.enable.batch-mode=true`），由 Connector 内部按批次大小自动切分写入，避免单次写入数据量过大。
+   - 尝试调高 Oracle CDC 的并发度，在启动命令中增加 `--oracle-conf scan.incremental.snapshot.enabled=true`（实验性功能），开启后可以多并发读取 Oracle 全量数据。
+   
+   更多 Flink CDC 相关问题可以参考 [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/)。
diff --git a/versioned_docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md b/versioned_docs/version-4.x/ecosystem/flink-doris-connector/flink-doris-connector.md
@@ -1130,7 +1130,7 @@ In the whole database synchronization tool provided by the Connector, no additio
 
 3. **errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100**
 
-   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [max_running_txn_num_per_db](../../admin-manual/config/fe-config#max_running_txn_num_per_db).
+   This is because the concurrent imports into the same database exceed 100. It can be solved by adjusting the parameter `max_running_txn_num_per_db` in `fe.conf`. For specific details, please refer to [FE Configuration](../../admin-manual/config/fe-config).
 
    Meanwhile, frequently modifying the label and restarting a task may also lead to this error. In the 2pc scenario (for Duplicate/Aggregate models), the label of each task needs to be unique. And when restarting from a checkpoint, the Flink task will actively abort the transactions that have been pre-committed successfully but not yet committed. Frequent label modifications and restarts will result in a large number of pre-committed successful transactions that cannot be aborted and thus occupy transactions. In the Unique model, 2pc can also be disabled to achieve idempotent writes.
 
@@ -1148,4 +1148,13 @@ In the whole database synchronization tool provided by the Connector, no additio
 
 7. **stream load error: HTTP/1.1 307 Temporary Redirect**
 
-   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+   Flink will first request FE, and after receiving 307, it will request BE after redirection. When FE is in FullGC/high pressure/network delay, HttpClient will send data without waiting for a response within a certain period of time (3 seconds) by default. Since the request body is InputStream by default, when a 307 response is received, the data cannot be replayed and an error will be reported directly. There are three ways to solve this problem: 1. Upgrade to Connector25.1.0 or above to increase the default time; 2. Modify auto-redirect=false to directly initiate a request to BE (not applicable to some cloud scenarios); 3. The unique key model can enable batch mode.
+
+8. **When using Flink CDC to sync large tables from databases such as Oracle, an `I/O exception (java.net.SocketException) ... Broken pipe` error is reported. How to handle it?**
+
+   This error usually occurs when the data volume of a single Stream Load request exceeds the limit on the BE side. You can adjust it from the following aspects:
+   - Increase the `streaming_load_max_mb` parameter in `be.conf` on the BE side (default 10240, in MB), so that a single Stream Load can carry more data. The BE needs to be restarted to take effect.
+   - Enable batch mode (`sink.enable.batch-mode=true`), so that the Connector automatically splits the data into batches internally, avoiding too much data in a single Stream Load.
+   - Try to increase the parallelism of Oracle CDC by adding `--oracle-conf scan.incremental.snapshot.enabled=true` (experimental feature) to the startup command, which enables parallel reading of Oracle full data.
+   
+   For more Flink CDC related issues, please refer to [Flink CDC FAQ](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.6/docs/faq/faq/).