Skip to content

enable partitioning for metrics data#6340

Merged
mattmkim merged 10 commits intomainfrom
matthew.kim/metrics-partitioning
Apr 30, 2026
Merged

enable partitioning for metrics data#6340
mattmkim merged 10 commits intomainfrom
matthew.kim/metrics-partitioning

Conversation

@mattmkim
Copy link
Copy Markdown
Contributor

@mattmkim mattmkim commented Apr 24, 2026

Description

Enables partitioning of metric (points and sketches) data. A partition_id will be added to split metadata (the split_metadata_json field).

Summary of functionality (basically mirrors logs):

  • Evaluates the index partition_key routing expression against RecordBatch rows.
  • Splits incoming metrics/sketch Arrow batches into partition-local batches in ParquetDocProcessor.
  • Carries all partition-local batches from one RawDocBatch in a single ProcessedParquetBatch, with one checkpoint delta covering the whole source batch.
  • Updates ParquetIndexer to maintain per-partition accumulators and flush the whole workbench together on threshold, force commit, timeout, or shutdown. (potential downside of this is the potential of having small partition batches/small files, but this greatly simplifies the implementation, and is what logs does).
  • Enforces max_num_partitions for metrics partitions and routes overflow partitions to OTHER_PARTITION_ID.
  • Updates ParquetPackager to write one split per flushed partition batch and persist partition_id into split metadata.

This PR keeps the new Arrow-backed routing and metrics pipeline changes behind the existing metrics feature. quickwit-doc-mapper now exposes its Arrow RoutingExprContext only when quickwit-doc-mapper/metrics is enabled, and quickwit-indexing/metrics wires that feature in alongside its existing optional arrow and quickwit-parquet-engine dependencies.

for @g-talbot : partition_id is not a column in postgres, it's in a JSON column, so we can't effectively filter/group by partition_id in postgres. but looking at how logs does compaction, they don't filter by partition_id from postgres, they just group file sin memory by partition_id after getting all the metadata back.

How was this PR tested?

Describe how you tested this PR.

@mattmkim mattmkim marked this pull request as ready for review April 28, 2026 14:33
@mattmkim mattmkim changed the title [draft] partitioning for metrics enable partitioning for metrics data Apr 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdafdcb627

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +63 to +67
DataType::Dictionary(_, value_type) if value_type.as_ref() == &DataType::Utf8 => {
let dict = column
.as_any()
.downcast_ref::<arrow::array::DictionaryArray<Int32Type>>()
.expect("dictionary column should be DictionaryArray<Int32>");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid panicking on non-Int32 dictionary keys

When a referenced routing column arrives as a valid Arrow Dictionary with a non-Int32 key type, this match accepts it because it only checks the value type, but the subsequent downcast to DictionaryArray<Int32Type> returns None and the expect panics. This can take down the doc processor for otherwise string-valued tag columns encoded with UInt8, Int16, etc.; either match DataType::Dictionary(Int32, Utf8) explicitly or handle the supported key types instead of treating all dictionaries as Int32.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@g-talbot g-talbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM'ing to keep you unblocked, but there's a codex comment and it's a real one that I had to deal with in the merger too.

g-talbot added a commit that referenced this pull request Apr 28, 2026
Now that ParquetSplitMetadata has partition_id (from Matt's PR #6340),
include it in CompactionScope so splits with different partitions are
never merged together. Adds 2 new scope tests for partition isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/quickwit-oss/quickwit/blob/0ec77d339ba7d04f2f6466a2b029421167aa0124/quickwit-indexing/src/actors/metrics_pipeline/parquet_doc_processor.rs#L414-L418
P2 Badge Preserve partition ids across Arrow batches

When one workbench receives multiple Arrow IPC batches, applying this max_num_partitions cap independently inside partition_batch can misroute rows for partitions that are already open in the indexer. For example with max_num_partitions = 2, if the first IPC batch opens partitions A and B, and a later IPC batch first sees C, D, then A, the A rows are converted to OTHER_PARTITION_ID here before the indexer sees them; the indexer can no longer recover that they belonged to A, so the same routing key is split between its real partition and OTHER. The cap needs to be enforced with workbench/global partition state, or the doc processor should forward the true partition ids and let the indexer do the overflow routing.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mattmkim and others added 3 commits April 28, 2026 17:27
* feat: add Parquet merge policy for compaction (Phase 2)

Adds a constant write amplification merge policy for Parquet splits,
adapted from the existing ConstWriteAmplificationMergePolicy but using
byte size instead of document count as the primary size metric. This is
Phase 2 of the Parquet compaction project — the decision layer that
determines which splits to merge within each compaction scope.

Key components:
- ParquetMergePolicy trait mirroring the MergePolicy interface
- CompactionScope grouping by (index_uid, sort_fields, window_start)
- ConstWriteAmplificationParquetMergePolicy with bounded write amp
- finalize_operations() for cold window compaction
- 33 tests: unit, proptest (MC-CONSERVE/LEVEL/WA/IDEMPOTENT), simulation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add partition_id to CompactionScope, rebase on #6340

Now that ParquetSplitMetadata has partition_id (from Matt's PR #6340),
include it in CompactionScope so splits with different partitions are
never merged together. Adds 2 new scope tests for partition isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: rustfmt nightly comment wrapping and import ordering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: finalize_operations must respect MC-LEVEL invariant

finalize_operations() was running single_merge_operation over all young
splits without grouping by num_merge_ops level. This could merge level-0
and level-1 splits together, stamping the output with max(levels) + 1
and prematurely maturing lower-level data.

Fix: group by num_merge_ops in finalize just like operations() does,
then apply the lower merge factor within each level independently.

Added test_finalize_respects_mc_level_invariant (unit) and
proptest_finalize_respects_mc_level (property test) — both caught the
bug before the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: include window_duration_secs in CompactionScope

CompactionScope only used window_start_secs, so splits with the same
start but different durations (e.g. after a window config change) would
be grouped together. The merge engine requires all inputs to agree on
both window_start and window_duration, so merging across durations would
fail validation.

Added test_different_window_duration which caught the bug before the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add MP-1/MP-2/MP-3 runtime invariant checks for merge operations

Add three merge-policy invariants to the shared verification layer
(quickwit-dst) with check_invariant! enforcement in
ParquetMergeOperation::new():

- MP-1: all splits share the same num_merge_ops level
- MP-2: merge op has at least 2 input splits
- MP-3: all splits share the same compaction scope (sort_fields + window)

Shared pure functions in quickwit_dst::invariants::merge_policy are the
single source of truth, usable by Stateright models and production code.
Debug builds panic on violation; all builds emit metrics via the
invariant recorder.

Tests written first (4 should_panic tests failed before adding checks,
pass after). Plus 1 positive test and 3 unit tests for shared functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: nightly rustfmt — merge imports, unwrap short line, trailing newline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mattmkim mattmkim enabled auto-merge (squash) April 30, 2026 13:24
@mattmkim mattmkim merged commit 4d6fd21 into main Apr 30, 2026
9 checks passed
@mattmkim mattmkim deleted the matthew.kim/metrics-partitioning branch April 30, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants