Skip to content

Commit b8726cf

Browse files
authored
[docs] Clarify that empty partition directories are not deleted by default (#8304)
1 parent 25b7b0d commit b8726cf

3 files changed

Lines changed: 17 additions & 1 deletion

File tree

docs/docs/learn-paimon/understand-files.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -367,7 +367,9 @@ Let's say all 4 snapshots in the above diagram are about to expire. The expire p
367367
368368
3. Finally, it deletes the snapshots themselves and writes the earliest hint file.
369369
370-
If any directories are left empty after the deletion process, they will be deleted as well.
370+
If any directories are left empty after the deletion process, they will be deleted as well,
371+
but only when `snapshot.clean-empty-directories` is enabled (default is `false`).
372+
By default, empty directories are kept on disk. See [Manage Snapshots](../maintenance/manage-snapshots#expire-snapshots).
371373
372374
373375
Let's say another snapshot, `snapshot-5` is created and snapshot expiration is triggered. `snapshot-1` to `snapshot-4` are

docs/docs/maintenance/manage-partitions.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@ __Note:__ After the partition expires, it is logically deleted and the latest sn
4949
files in the file system are not immediately physically deleted, it depends on when the corresponding snapshot expires.
5050
See [Expire Snapshots](./manage-snapshots#expire-snapshots).
5151

52+
Also, even after the data files are physically deleted by snapshot expiration, the empty partition directories are
53+
**not** removed by default. To clean up empty directories, set
54+
`'snapshot.clean-empty-directories' = 'true'` on the table. Please note that on object stores (e.g. OSS, S3)
55+
this may cause performance issues, which is why the option defaults to `false`.
56+
5257
:::
5358

5459
An example for single partition field:

docs/docs/maintenance/manage-snapshots.mdx

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,11 +83,20 @@ Snapshot expiration is controlled by the following table properties.
8383
<td>Integer</td>
8484
<td>The maximum number of snapshots allowed to expire at a time.</td>
8585
</tr>
86+
<tr>
87+
<td><h5>snapshot.clean-empty-directories</h5></td>
88+
<td>No</td>
89+
<td style={{wordWrap: "break-word"}}>false</td>
90+
<td>Boolean</td>
91+
<td>Whether to try to delete empty directories (e.g. partition and bucket directories) left behind after the data files are deleted during snapshot expiration. Defaults to <code>false</code>: empty directories are kept. Enabling it has caveats: HDFS may print exceptions in NameNode, and object stores (OSS/S3) may suffer performance issues due to the extra prefix operations required to list and delete directory markers.</td>
92+
</tr>
8693
</tbody>
8794
</table>
8895

8996
When the number of snapshots is less than `snapshot.num-retained.min`, no snapshots will be expired(even the condition `snapshot.time-retained` meet), after which `snapshot.num-retained.max` and `snapshot.time-retained` will be used to control the snapshot expiration until the remaining snapshot meets the condition.
9097

98+
Note that snapshot expiration is also what physically deletes data files dropped by [partition expiration](./manage-partitions#expiring-partitions). However, the empty partition and bucket directories left behind after the data files are deleted are **not** removed by default. To clean them up, enable `snapshot.clean-empty-directories` (see the option above). This is off by default because on object stores (OSS/S3) the prefix operations needed to delete directory markers can be expensive.
99+
91100
The following example show more details(`snapshot.num-retained.min` is 2, `snapshot.time-retained` is 1h, `snapshot.num-retained.max` is 5):
92101

93102
> snapshot item is described using tuple (snapshotId, corresponding time)

0 commit comments

Comments
 (0)