Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 22 additions & 22 deletions docs/get-started/VeloxIceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,15 @@ The "Gluten Support" column is now ready to be populated with:
| spark.sql.iceberg.check-ordering | true | Validates the write schema column order matches the table schema order |✅ |
| spark.sql.iceberg.planning.preserve-data-grouping | false | When true, co-locate scan tasks for the same partition in the same read split, used in Storage Partitioned Joins |✅ |
| spark.sql.iceberg.aggregate-push-down.enabled | true | Enables pushdown of aggregate functions (MAX, MIN, COUNT) | |
| spark.sql.iceberg.distribution-mode | See Spark Writes | Controls distribution strategy during writes | |
| spark.sql.iceberg.distribution-mode | See Spark Writes | Controls distribution strategy during writes | 🚫 |
| spark.wap.id | null | Write-Audit-Publish snapshot staging ID | |
| spark.wap.branch | null | WAP branch name for snapshot commit | |
| spark.sql.iceberg.compression-codec | Table default | Write compression codec (e.g., zstd, snappy) | |
| spark.sql.iceberg.compression-level | Table default | Compression level for Parquet/Avro | |
| spark.sql.iceberg.compression-strategy | Table default | Compression strategy for ORC | |
| spark.sql.iceberg.compression-codec | Table default | Write compression codec (e.g., zstd, snappy) ||
| spark.sql.iceberg.compression-level | Table default | Compression level for Parquet/Avro ||
| spark.sql.iceberg.compression-strategy | Table default | Compression strategy for ORC ||
| spark.sql.iceberg.data-planning-mode | AUTO | Scan planning mode for data files (AUTO, LOCAL, DISTRIBUTED) | |
| spark.sql.iceberg.delete-planning-mode | AUTO | Scan planning mode for delete files (AUTO, LOCAL, DISTRIBUTED) | |
| spark.sql.iceberg.advisory-partition-size | Table default | Advisory size (bytes) used for writing to the Table when Spark's Adaptive Query Execution is enabled. Used to size output files | |
| spark.sql.iceberg.advisory-partition-size | Table default | Advisory size (bytes) used for writing to the Table when Spark's Adaptive Query Execution is enabled. Used to size output files ||
| spark.sql.iceberg.locality.enabled | false | Report locality information for Spark task placement on executors |✅ |
| spark.sql.iceberg.executor-cache.enabled | true | Enables cache for executor-side (currently used to cache Delete Files) |❌|
| spark.sql.iceberg.executor-cache.timeout | 10 | Timeout in minutes for executor cache entries |❌|
Expand Down Expand Up @@ -161,14 +161,14 @@ The "Gluten Support" column is now ready to be populated with:
| Spark option | Default | Description | Gluten Support |
| --- | --- | --- | --- |
| write-format | Table write.format.default | File format to use for this write operation; parquet, avro, or orc |⚠️ Parquet only|
| target-file-size-bytes | As per table property | Overrides this table's write.target-file-size-bytes | |
| target-file-size-bytes | As per table property | Overrides this table's write.target-file-size-bytes | |
| check-nullability | true | Sets the nullable check on fields | |
| snapshot-property.custom-key | null | Adds an entry with custom-key and corresponding value in the snapshot summary (the snapshot-property. prefix is only required for DSv2) | |
| fanout-enabled | false | Overrides this table's write.spark.fanout.enabled |✅|
| check-ordering | true | Checks if input schema and table schema are same | |
| isolation-level | null | Desired isolation level for Dataframe overwrite operations. null => no checks (for idempotent writes), serializable => check for concurrent inserts or deletes in destination partitions, snapshot => checks for concurrent deletes in destination partitions. | |
| validate-from-snapshot-id | null | If isolation level is set, id of base snapshot from which to check concurrent write conflicts into a table. Should be the snapshot before any reads from the table. Can be obtained via Table API or Snapshots table. If null, the table's oldest known snapshot is used. | |
| compression-codec | Table write.(fileformat).compression-codec | Overrides this table's compression codec for this write | |
| compression-codec | Table write.(fileformat).compression-codec | Overrides this table's compression codec for this write |⚠️ Parquet only|
| compression-level | Table write.(fileformat).compression-level | Overrides this table's compression level for Parquet and Avro tables for this write | |
| compression-strategy | Table write.orc.compression-strategy | Overrides this table's compression strategy for ORC tables for this write | |
| distribution-mode | See Spark Writes for defaults | Override this table's distribution mode for this write |🚫|
Expand All @@ -194,42 +194,42 @@ extracted from https://iceberg.apache.org/docs/latest/configuration/

| Property | Default | Description | Gluten Support |
| --- | --- | --- | --- |
| write.format.default | parquet | Default file format for the table; parquet, avro, or orc | |
| write.format.default | parquet | Default file format for the table; parquet, avro, or orc |⚠️ Parquet only|
| write.delete.format.default | data file format | Default delete file format for the table; parquet, avro, or orc | |
| write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group size | |
| write.parquet.page-size-bytes | 1048576 (1 MB) | Parquet page size |✅|
| write.parquet.page-row-limit | 20000 | Parquet page row limit | |
| write.parquet.dict-size-bytes | 2097152 (2 MB) | Parquet dictionary page size | |
| write.parquet.compression-codec | zstd | Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed | |
| write.parquet.compression-codec | zstd | Parquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed ||
| write.parquet.compression-level | null | Parquet compression level | |
| write.parquet.bloom-filter-enabled.column.col1 | (not set) | Hint to parquet to write a bloom filter for the column: 'col1' | |
| write.parquet.bloom-filter-max-bytes | 1048576 (1 MB) | The maximum number of bytes for a bloom filter bitset | |
| write.parquet.bloom-filter-fpp.column.col1 | 0.01 | The false positive probability for a bloom filter applied to 'col1' (must > 0.0 and < 1.0) | |
| write.parquet.stats-enabled.column.col1 | (not set) | Controls whether to collect parquet column statistics for column 'col1' | |
| write.avro.compression-codec | gzip | Avro compression codec: gzip(deflate with 9 level), zstd, snappy, uncompressed | |
| write.avro.compression-level | null | Avro compression level | |
| write.orc.stripe-size-bytes | 67108864 (64 MB) | Define the default ORC stripe size, in bytes | |
| write.orc.block-size-bytes | 268435456 (256 MB) | Define the default file system block size for ORC files | |
| write.orc.compression-codec | zlib | ORC compression codec: zstd, lz4, lzo, zlib, snappy, none | |
| write.orc.compression-strategy | speed | ORC compression strategy: speed, compression | |
| write.orc.bloom.filter.columns | (not set) | Comma separated list of column names for which a Bloom filter must be created | |
| write.orc.bloom.filter.fpp | 0.05 | False positive probability for Bloom filter (must > 0.0 and < 1.0) | |
| write.avro.compression-codec | gzip | Avro compression codec: gzip(deflate with 9 level), zstd, snappy, uncompressed ||
| write.avro.compression-level | null | Avro compression level ||
| write.orc.stripe-size-bytes | 67108864 (64 MB) | Define the default ORC stripe size, in bytes ||
| write.orc.block-size-bytes | 268435456 (256 MB) | Define the default file system block size for ORC files ||
| write.orc.compression-codec | zlib | ORC compression codec: zstd, lz4, lzo, zlib, snappy, none ||
| write.orc.compression-strategy | speed | ORC compression strategy: speed, compression ||
| write.orc.bloom.filter.columns | (not set) | Comma separated list of column names for which a Bloom filter must be created ||
| write.orc.bloom.filter.fpp | 0.05 | False positive probability for Bloom filter (must > 0.0 and < 1.0) ||
| write.location-provider.impl | null | Optional custom implementation for LocationProvider | |
| write.metadata.compression-codec | none | Metadata compression codec; none or gzip | |
| write.metadata.metrics.max-inferred-column-defaults | 100 | Defines the maximum number of columns for which metrics are collected. Columns are included with a pre-order traversal of the schema: top level fields first; then all elements of the first nested s... | |
| write.metadata.metrics.default | truncate(16) | Default metrics mode for all columns in the table; none, counts, truncate(length), or full | |
| write.metadata.metrics.column.col1 | (not set) | Metrics mode for column 'col1' to allow per-column tuning; none, counts, truncate(length), or full | |
| write.target-file-size-bytes | 536870912 (512 MB) | Controls the size of files generated to target about this many bytes |✅|
| write.delete.target-file-size-bytes | 67108864 (64 MB) | Controls the size of delete files generated to target about this many bytes | |
| write.distribution-mode | not set, see engines for specific defaults, for example Spark Writes | Defines distribution of write data: none: don't shuffle rows; hash: hash distribute by partition key ; range: range distribute by partition key or sort key if table has an SortOrder | |
| write.delete.distribution-mode | (not set) | Defines distribution of write delete data | |
| write.update.distribution-mode | (not set) | Defines distribution of write update data | |
| write.merge.distribution-mode | (not set) | Defines distribution of write merge data | |
| write.distribution-mode | not set, see engines for specific defaults, for example Spark Writes | Defines distribution of write data: none: don't shuffle rows; hash: hash distribute by partition key ; range: range distribute by partition key or sort key if table has an SortOrder |🚫|
| write.delete.distribution-mode | (not set) | Defines distribution of write delete data |🚫|
| write.update.distribution-mode | (not set) | Defines distribution of write update data |🚫|
| write.merge.distribution-mode | (not set) | Defines distribution of write merge data |🚫|
| write.wap.enabled | false | Enables write-audit-publish writes | |
| write.summary.partition-limit | 0 | Includes partition-level summary stats in snapshot summaries if the changed partition count is less than this limit | |
| write.metadata.delete-after-commit.enabled | false | Controls whether to delete the oldest tracked version metadata files after each table commit. See the Remove old metadata files section for additional details | |
| write.metadata.previous-versions-max | 100 | The max number of previous version metadata files to track | |
| write.spark.fanout.enabled | false | Enables the fanout writer in Spark that does not require data to be clustered; uses more memory | |
| write.spark.fanout.enabled | false | Enables the fanout writer in Spark that does not require data to be clustered; uses more memory ||
| write.object-storage.enabled | false | Enables the object storage location provider that adds a hash component to file paths | |
| write.object-storage.partitioned-paths | true | Includes the partition values in the file path | |
| write.data.path | table location + /data | Base location for data files | |
Expand Down
Loading