[phase-31 3/4] Writer + pipeline wiring by g-talbot · Pull Request #6244 · quickwit-oss/quickwit

g-talbot · 2026-03-30T17:21:48Z

Summary

Wire TableConfig into ParquetWriter sort path and add self-describing Parquet file metadata for compaction (Phase 31 Metadata Foundation, PR 3 of 4).

Stacks on gtt/phase-31-compaction-metadata (PR #6243).

What's included

storage/writer.rs (rewritten):

ParquetWriter::new() takes &TableConfig, resolves sort field names to physical columns
sort_batch() uses resolved fields with per-column ASC/DESC direction
SS-1 debug_assert verification: re-sort output and check identity permutation
build_compaction_key_value_metadata(): embeds sort_fields, window_start, window_duration, num_merge_ops, row_keys (base64+JSON) in Parquet kv_metadata
SS-5 verify_ss5_kv_consistency(): kv entries must match source struct
write_to_file_with_metadata() replaces write_to_file()
prepare_write() shared prep for both bytes and file write paths
resolve_sort_fields(): parse sort schema, map to ParquetField, skip missing columns

storage/config.rs:

to_writer_properties_with_metadata(sorting_cols, kv_metadata) accepts dynamic sort columns and optional KV metadata
to_writer_properties() delegates with empty defaults
Removed static sorting_columns() method (now in writer)

storage/split_writer.rs:

ParquetSplitWriter::new() takes &TableConfig parameter

quickwit-indexing (5 files):

All ParquetSplitWriter::new() callers updated with &TableConfig::default()

Verification

cargo build -p quickwit-parquet-engine -p quickwit-indexing ✅
cargo test -p quickwit-parquet-engine -- storage:: ✅ (23 tests)
cargo clippy -p quickwit-parquet-engine --all-features --tests ✅

Test plan

Writer sorts by TableConfig-driven fields (test_write_sorts_data)
Compaction KV metadata embedded and read back (test_write_to_file_with_compaction_metadata)
No qh.* keys when metadata=None (test_write_to_file_without_metadata_has_no_qh_keys)
Pre-Phase-31 splits produce empty KV vec
RowKeys base64+proto roundtrip
META-07 self-describing Parquet roundtrip (cold file reconstruction)
Clippy clean

🤖 Generated with Claude Code

Wire TableConfig-driven sort order into ParquetWriter and add self-describing Parquet file metadata for compaction: - ParquetWriter::new() takes &TableConfig, resolves sort fields at construction via parse_sort_fields() + ParquetField::from_name() - sort_batch() uses resolved fields with per-column direction (ASC/DESC) - SS-1 debug_assert verification: re-sort and check identity permutation - build_compaction_key_value_metadata(): embeds sort_fields, window_start, window_duration, num_merge_ops, row_keys (base64) in Parquet kv_metadata - SS-5 verify_ss5_kv_consistency(): kv_metadata matches source struct - write_to_file_with_metadata() replaces write_to_file() - prepare_write() shared method for bytes and file paths - ParquetWriterConfig gains to_writer_properties_with_metadata() - ParquetSplitWriter passes TableConfig through - All callers in quickwit-indexing updated with TableConfig::default() - 23 storage tests pass including META-07 self-describing roundtrip Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mattmkim · 2026-04-06T18:07:18Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76b703ad24

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T18:13:41Z

quickwit/quickwit-parquet-engine/src/storage/writer.rs

+                schema
+                    .index_of(sf.name.as_str())
+                    .ok()


Normalize logical tag names before sort-column lookup

resolve_sort_fields() preserves sort keys like tag_service/tag_env, and sort_batch() then does schema.index_of(sf.name) directly; with current metrics batches using physical columns such as service, env, and host, these keys are silently dropped from the actual sort. That means splits are no longer ordered by those tag dimensions (only by the remaining matched keys), which regresses clustering/pruning behavior compared to the previous SORT_ORDER path and can change compaction ordering whenever rows share the same metric/timestamp but differ by tags.

Useful? React with 👍 / 👎.

can ignore, we're updating sort order string to include the physical column names (service, env, etc.)

chatgpt-codex-connector · 2026-04-06T18:13:41Z

quickwit/quickwit-parquet-engine/src/storage/split_writer.rs

-        let metadata = MetricsSplitMetadata::builder()
+        // Compute window_start from the earliest timestamp in the batch.
+        let window_duration = self.table_config.window_duration_secs;
+        let window_start_secs = if window_duration > 0 && time_range.start_secs > 0 {


Compute window_start when earliest timestamp is zero

The window_start calculation is guarded by time_range.start_secs > 0, so batches whose minimum timestamp is exactly Unix epoch 0 skip window_start even when window_duration_secs is configured. This causes Phase-31 window metadata to be omitted for those splits and, in debug builds, violates the builder’s TW-1 pairing invariant (window_start absent while duration is non-zero), leading to avoidable assertion failures.

Useful? React with 👍 / 👎.

i guess technically, we should try and set window_start_secs first, but we should never have the minimum timestamp be epoch 0

mattmkim · 2026-04-06T19:31:52Z

quickwit/quickwit-parquet-engine/src/storage/writer.rs

+        let sorted_batch = take_record_batch(batch, &indices)?;
+
+        // SS-1: verify the output is actually sorted.
+        #[cfg(debug_assertions)]


[nit] do we need this assertion still? can we not rely on the tests?

g-talbot mentioned this pull request Mar 30, 2026

[phase-31 4/4] PostgreSQL metastore — migration + compaction columns #6245

Open

3 tasks

g-talbot requested review from fulmicoton-dd and mattmkim March 30, 2026 17:23

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from c012908 to 780c585 Compare March 31, 2026 20:41

g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 3bbfb71 to 95c3596 Compare March 31, 2026 20:55

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch 2 times, most recently from 08577b5 to 955f230 Compare March 31, 2026 21:03

g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 179ccd2 to ed6d687 Compare March 31, 2026 21:08

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 955f230 to 3e73d80 Compare March 31, 2026 21:08

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from ed6d687 to a4d0d36 Compare March 31, 2026 21:26

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 3e73d80 to 2f78fe8 Compare March 31, 2026 21:26

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from a4d0d36 to f05d4e7 Compare March 31, 2026 21:31

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 2f78fe8 to 2703ca5 Compare March 31, 2026 21:31

Base automatically changed from gtt/phase-31-compaction-metadata to gtt/phase-31-sort-schema March 31, 2026 21:31

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from f05d4e7 to 4a0507e Compare March 31, 2026 21:33

g-talbot force-pushed the gtt/phase-31-sort-schema branch from 2703ca5 to 8fce718 Compare March 31, 2026 21:33

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from 4a0507e to bc9458d Compare March 31, 2026 21:40

g-talbot force-pushed the gtt/phase-31-sort-schema branch from 8fce718 to 018a265 Compare March 31, 2026 21:40

g-talbot changed the base branch from gtt/phase-31-sort-schema to gtt/phase-31-compaction-metadata March 31, 2026 21:42

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 598de1a to c95095b Compare March 31, 2026 21:50

g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 2599f67 to 46903b3 Compare April 1, 2026 10:48

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from c95095b to 5b1c080 Compare April 1, 2026 10:48

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from 46903b3 to 74bfd04 Compare April 1, 2026 11:02

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 5b1c080 to 6ebf40b Compare April 1, 2026 11:02

g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from de0f8c6 to 2d9e6eb Compare April 1, 2026 11:30

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 6ebf40b to 295f59c Compare April 1, 2026 11:30

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from 295f59c to f3c03dc Compare April 1, 2026 12:25

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from 2d9e6eb to 00c7245 Compare April 1, 2026 12:25

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from f3c03dc to acb5d28 Compare April 1, 2026 13:50

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from 00c7245 to cff2cc1 Compare April 1, 2026 13:50

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from acb5d28 to a8dc7a2 Compare April 1, 2026 16:25

g-talbot force-pushed the gtt/phase-31-writer-wiring branch 2 times, most recently from 0d97e82 to c48b0de Compare April 1, 2026 16:59

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from a8dc7a2 to 4ce16a3 Compare April 1, 2026 16:59

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from c48b0de to ef0ba36 Compare April 1, 2026 19:24

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch 2 times, most recently from b89e965 to b9566a6 Compare April 1, 2026 20:18

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from ef0ba36 to df6e699 Compare April 1, 2026 20:18

g-talbot force-pushed the gtt/phase-31-compaction-metadata branch from b9566a6 to b6eb595 Compare April 1, 2026 20:50

g-talbot force-pushed the gtt/phase-31-writer-wiring branch from df6e699 to 76b703a Compare April 1, 2026 20:50

chatgpt-codex-connector bot reviewed Apr 6, 2026

View reviewed changes

mattmkim approved these changes Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[phase-31 3/4] Writer + pipeline wiring#6244

[phase-31 3/4] Writer + pipeline wiring#6244
g-talbot wants to merge 1 commit intogtt/phase-31-compaction-metadatafrom
gtt/phase-31-writer-wiring

g-talbot commented Mar 30, 2026

Uh oh!

mattmkim commented Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

mattmkim Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Uh oh!

mattmkim Apr 6, 2026

Uh oh!

mattmkim Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

g-talbot commented Mar 30, 2026

Summary

What's included

Verification

Test plan

Uh oh!

mattmkim commented Apr 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

mattmkim Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

mattmkim Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

mattmkim Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants