Skip to content

Commit 7e9c740

Browse files
committed
add docs
1 parent f975d7d commit 7e9c740

1 file changed

Lines changed: 45 additions & 4 deletions

File tree

docs/content/append-table/incremental-clustering.md

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,11 +95,14 @@ clustering and small-file merging must be performed exclusively via Incremental
9595
## Run Incremental Clustering
9696
{{< hint info >}}
9797

98-
Currently, only support running Incremental Clustering in spark, support for flink will be added in the near future.
98+
only support running Incremental Clustering in batch mode.
9999

100100
{{< /hint >}}
101101

102-
To run a Incremental Clustering job, follow these instructions.
102+
To run a Incremental Clustering job, follow these instructions.
103+
104+
You don’t need to specify any clustering-related parameters when running Incremental Clustering,
105+
these options are already defined as table options. If you need to change clustering settings, please update the corresponding table options.
103106

104107
{{< tabs "incremental-clustering" >}}
105108

@@ -117,8 +120,46 @@ CALL sys.compact(table => 'T')
117120
-- run incremental clustering with full mode, this will recluster all data
118121
CALL sys.compact(table => 'T', compact_strategy => 'full')
119122
```
120-
You don’t need to specify any clustering-related parameters when running Incremental Clustering,
121-
these are already defined as table options. If you need to change clustering settings, please update the corresponding table options.
123+
{{< /tab >}}
124+
125+
{{< tab "Flink Action" >}}
126+
127+
Run the following command to submit a incremental clustering job for the table.
128+
129+
```bash
130+
<FLINK_HOME>/bin/flink run \
131+
/path/to/paimon-flink-action-{{< version >}}.jar \
132+
compact \
133+
--warehouse <warehouse-path> \
134+
--database <database-name> \
135+
--table <table-name> \
136+
[--compact_strategy <minor / full>] \
137+
[--table_conf <table_conf>] \
138+
[--catalog_conf <paimon-catalog-conf> [--catalog_conf <paimon-catalog-conf> ...]]
139+
```
140+
141+
Example: run incremental clustering
142+
143+
```bash
144+
<FLINK_HOME>/bin/flink run \
145+
/path/to/paimon-flink-action-{{< version >}}.jar \
146+
compact \
147+
--warehouse s3:///path/to/warehouse \
148+
--database test_db \
149+
--table test_table \
150+
--table_conf sink.parallelism=2 \
151+
--compact_strategy minor \
152+
--catalog_conf s3.endpoint=https://****.com \
153+
--catalog_conf s3.access-key=***** \
154+
--catalog_conf s3.secret-key=*****
155+
```
156+
* `--compact_strategy` Determines how to pick files to be cluster, the default is `minor`.
157+
* `full` : All files will be selected for clustered.
158+
* `minor` : Pick the set of files that need to be clustered based on specified conditions.
159+
160+
Note: write parallelism is set by `sink.parallelism`, if too big, may generate a large number of small files.
161+
162+
You can use `-D execution.runtime-mode=batch` or `-yD execution.runtime-mode=batch` (for the ON-YARN scenario) to use batch mode.
122163
{{< /tab >}}
123164
124165
{{< /tabs >}}

0 commit comments

Comments
 (0)