[test][pipeline-e2e] Shrink MySqlToHudiE2eITCase write volume to fit MOR window

leonardBang · claude · leonardBang · commit 03d52200ea9a · 2026-06-20T23:33:21.000+08:00
The products workload wrote ~20011 rows across 20 schema evolutions into a
Hudi MERGE_ON_READ table, which could not fully materialize and be read
back within validateSinkResult's 20-minute window (rows stalled and
snapshot reads ballooned as log files piled up), making the test flaky and
the suite hit the 90-minute CI limit. Reduce the per-batch insert count
from 1000 to 100 (~2011 rows total) while keeping all 20 ALTER iterations,
so schema-evolution coverage is unchanged but the table stays small enough
to materialize and read quickly.

Co-Authored-By: Claude Opus 4.7 &lt;noreply@anthropic.com&gt;
diff --git a/flink-cdc-e2e-tests/flink-cdc-pipeline-e2e-tests/src/test/java/org/apache/flink/cdc/pipeline/tests/MySqlToHudiE2eITCase.java b/flink-cdc-e2e-tests/flink-cdc-pipeline-e2e-tests/src/test/java/org/apache/flink/cdc/pipeline/tests/MySqlToHudiE2eITCase.java
@@ -310,12 +310,12 @@ public void testSyncWholeDatabase() throws Exception {
      * <ol>
      *   <li><b>Column Addition:</b> It sequentially adds 10 new columns, named {@code point_c_0}
      *       through {@code point_c_9}, each with a {@code VARCHAR(10)} type. After each column is
-     *       added, it executes a batch of 1000 {@code INSERT} statements, populating the columns
-     *       that exist at that point.
+     *       added, it executes a batch of {@code statementBatchCount} {@code INSERT} statements,
+     *       populating the columns that exist at that point.
      *   <li><b>Column Modification:</b> After all columns are added, it enters a second phase. In
-     *       each of the 10 iterations, it first inserts another 1000 rows and then modifies the
-     *       data type of the first new column ({@code point_c_0}), progressively increasing its
-     *       size from {@code VARCHAR(10)} to {@code VARCHAR(19)}.
+     *       each of the 10 iterations, it first inserts another {@code statementBatchCount} rows
+     *       and then modifies the data type of the first new column ({@code point_c_0}),
+     *       progressively increasing its size from {@code VARCHAR(10)} to {@code VARCHAR(19)}.
      * </ol>
      *
      * <p>Throughout this process, the method constructs and returns a list of strings. Each string
@@ -333,7 +333,12 @@ private List<String> createChangesAndValidate(Statement stat) throws SQLExceptio
 
         // Auto-increment id will start from this
         int currentId = 113;
-        final int statementBatchCount = 1000;
+        // Keep the per-batch insert count small: a Hudi MERGE_ON_READ table accumulates a log
+        // file per delta commit, and snapshot reads slow down sharply as those pile up. With 20
+        // schema evolutions below, a large count makes the table unable to fully materialize (and
+        // be read back) within validateSinkResult's window. The schema-evolution coverage comes
+        // from the 20 ALTER iterations, not from the row volume.
+        final int statementBatchCount = 100;
 
         // Step 1 - Add Column: Add 10 columns with VARCHAR(10) sequentially
         for (int addColumnRepeat = 0; addColumnRepeat < 10; addColumnRepeat++) {
@@ -368,7 +373,7 @@ private List<String> createChangesAndValidate(Statement stat) throws SQLExceptio
 
         // Step 2 - Modify type for the columns added in Step 1, increasing the VARCHAR length
         for (int modifyColumnRepeat = 0; modifyColumnRepeat < 10; modifyColumnRepeat++) {
-            // Perform 1000 inserts as a batch, continuing the ID sequence from Step 1
+            // Perform a batch of inserts, continuing the ID sequence from Step 1
             for (int statementCount = 0; statementCount < statementBatchCount; statementCount++) {
                 stat.addBatch(
                         String.format(