@@ -17,41 +17,57 @@ OCI Data Flow supports Delta Lake by default when your Applications run Spark 3.
1717How to optimize data lake using DeltaLake functions:
1818Configure your preferences (please check DeltaLake doc):
1919
20+ ```
2021spark.conf.set('spark.databricks.delta.retentionDurationCheck.enabled', 'False')
2122spark.conf.set('spark.databricks.delta.optimize.repartition.enabled','True')
2223spark.conf.set('spark.databricks.delta.optimize.preserveInsertionOrder', 'False')
2324
24- Preserve vacuum history:
25+ # Preserve vacuum history:
2526spark.conf.set('spark.databricks.delta.vacuum.logging.enabled','True')
2627
27- Set retention time for optimized files (ready to delete:
28+ # Set retention time for optimized files (ready to delete:
2829spark.conf.set("spark.databricks.delta.deletedFileRetentionDuration","0")
30+ ```
2931
3032
3133Check existing table details (look for numFiles and sizeInBytes:
34+ ```
3235spark.sql("describe detail atm").show(truncate=False)
36+ ```
37+ ```
3338+------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
3439|format|id |name |description|location |createdAt |lastModified |partitionColumns |numFiles|sizeInBytes|properties|minReaderVersion|minWriterVersion|tableFeatures |
3540+------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
3641|delta |c15ad4ca-8c0f-4747-b064-1492d7b4b3c4|spark_catalog.default.hsl_trains|NULL |oci://dataflow_app@fro8fl9kuqli/hsl_trains_data_part|2024-09-05 10:19:10.057|2024-09-06 08:45:01|[year, month, day]|2024 |16333676 |{} |1 |2 |[appendOnly, invariants]|
3742+------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
43+ ```
3844
3945Run optimzation:
46+ ```
4047spark.sql("OPTIMIZE atm").show(truncate=False)
48+ ```
4149
4250Check files you can delete:
51+ ```
4352spark.sql("vacuum atm RETAIN 0 HOURS DRY RUN")
53+ ```
4454
4555Delete optimized and consolidated files:
56+ ```
4657spark.sql("vacuum atm RETAIN 0 HOURS")
58+ ```
4759
4860and check details of your table:
61+ ```
4962spark.sql("describe detail atm").show(truncate=False)
63+ ```
64+ ```
5065+----------------+----------------+------------------------+
5166|format|id |name |description|location |createdAt |lastModified |partitionColumns |numFiles|sizeInBytes|properties|minReaderVersion|minWriterVersion|tableFeatures |
5267+------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
5368|delta |c15ad4ca-8c0f-4747-b064-1492d7b4b3c4|spark_catalog.default.hsl_trains|NULL |oci://dataflow_app@fro8fl9kuqli/hsl_trains_data_part|2024-09-05 10:19:10.057|2024-09-06 08:47:48|[year, month, day]|7 |1583521 |{} |1 |2 |[appendOnly, invariants]|
54- +------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
69+ +------+------------------------------------+--------------------------------+-----------+----------------------------------------------------+-----------------------+-------------------+------------------+--------+-----------+----------+----------------+----------------+------------------------+
70+ ```
5571
5672Enjoy increased performance of your queries!
5773
0 commit comments