@@ -95,11 +95,14 @@ clustering and small-file merging must be performed exclusively via Incremental
9595## Run Incremental Clustering
9696{{< hint info >}}
9797
98- Currently, only support running Incremental Clustering in spark, support for flink will be added in the near future .
98+ only support running Incremental Clustering in batch mode .
9999
100100{{< /hint >}}
101101
102- To run a Incremental Clustering job, follow these instructions.
102+ To run a Incremental Clustering job, follow these instructions.
103+
104+ You don’t need to specify any clustering-related parameters when running Incremental Clustering,
105+ these options are already defined as table options. If you need to change clustering settings, please update the corresponding table options.
103106
104107{{< tabs "incremental-clustering" >}}
105108
@@ -117,8 +120,46 @@ CALL sys.compact(table => 'T')
117120-- run incremental clustering with full mode, this will recluster all data
118121CALL sys .compact (table => ' T' , compact_strategy => ' full' )
119122```
120- You don’t need to specify any clustering-related parameters when running Incremental Clustering,
121- these are already defined as table options. If you need to change clustering settings, please update the corresponding table options.
123+ {{< /tab >}}
124+
125+ {{< tab "Flink Action" >}}
126+
127+ Run the following command to submit a incremental clustering job for the table.
128+
129+ ``` bash
130+ < FLINK_HOME> /bin/flink run \
131+ /path/to/paimon-flink-action-{{< version > }}.jar \
132+ compact \
133+ --warehouse < warehouse-path> \
134+ --database < database-name> \
135+ --table < table-name> \
136+ [--compact_strategy < minor / full> ] \
137+ [--table_conf < table_conf> ] \
138+ [--catalog_conf < paimon-catalog-conf> [--catalog_conf < paimon-catalog-conf> ...]]
139+ ` ` `
140+
141+ Example: run incremental clustering
142+
143+ ` ` ` bash
144+ < FLINK_HOME> /bin/flink run \
145+ /path/to/paimon-flink-action-{{< version > }}.jar \
146+ compact \
147+ --warehouse s3:///path/to/warehouse \
148+ --database test_db \
149+ --table test_table \
150+ --table_conf sink.parallelism= 2 \
151+ --compact_strategy minor \
152+ --catalog_conf s3.endpoint= https://**** .com \
153+ --catalog_conf s3.access-key= ***** \
154+ --catalog_conf s3.secret-key= *****
155+ ` ` `
156+ * ` --compact_strategy` Determines how to pick files to be cluster, the default is ` minor` .
157+ * ` full` : All files will be selected for clustered.
158+ * ` minor` : Pick the set of files that need to be clustered based on specified conditions.
159+
160+ Note: write parallelism is set by ` sink.parallelism` , if too big, may generate a large number of small files.
161+
162+ You can use ` -D execution.runtime-mode=batch` or ` -yD execution.runtime-mode=batch` (for the ON-YARN scenario) to use batch mode.
122163{{< /tab > }}
123164
124165{{< /tabs > }}
0 commit comments