Skip to content

Commit c32b58d

Browse files
g-talbotclaude
andcommitted
feat: add Parquet merge policy for compaction (Phase 2)
Adds a constant write amplification merge policy for Parquet splits, adapted from the existing ConstWriteAmplificationMergePolicy but using byte size instead of document count as the primary size metric. This is Phase 2 of the Parquet compaction project — the decision layer that determines which splits to merge within each compaction scope. Key components: - ParquetMergePolicy trait mirroring the MergePolicy interface - CompactionScope grouping by (index_uid, sort_fields, window_start) - ConstWriteAmplificationParquetMergePolicy with bounded write amp - finalize_operations() for cold window compaction - 33 tests: unit, proptest (MC-CONSERVE/LEVEL/WA/IDEMPOTENT), simulation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3fc479f commit c32b58d

6 files changed

Lines changed: 1269 additions & 0 deletions

File tree

quickwit/Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

quickwit/quickwit-parquet-engine/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ ulid = { workspace = true }
3030

3131
[dev-dependencies]
3232
proptest = { workspace = true }
33+
rand = { workspace = true }
3334
regex = { workspace = true }
3435
tempfile = { workspace = true }
3536

quickwit/quickwit-parquet-engine/src/merge/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
//! file has non-overlapping key ranges.
2222
2323
mod merge_order;
24+
pub mod policy;
2425
mod schema;
2526
mod writer;
2627

0 commit comments

Comments
 (0)