Commit 6be2cd9
test(merge-pipeline): bonus — prefix_len=1 multi-RG inputs + m:n outputs
Adds the bonus scenario: three multi-metric inputs each written with
`rg_partition_prefix_len = 1` and one row group per distinct
metric_name (via `row_group_size = ROWS_PER_METRIC_PER_INPUT` so the
writer flushes at every metric boundary after sorting). Merged with a
small `ParquetMergePipelineParams::target_split_size_bytes = 500`
that forces the executor's `num_outputs` calculation to ask the
engine for multiple outputs — exercising the m:n merge path now
reachable through the actor pipeline (PR's earlier commit removed
the `num_outputs = 1` hardcode).
Both engines covered:
- `test_prefix_aligned_multi_metric_three_input_multi_output_in_memory_engine`
- `test_prefix_aligned_multi_metric_three_input_multi_output_streaming_engine`
The streaming-engine variant also asserts
`PEAK_BODY_COL_PAGE_CACHE_LEN > 0` (under `ms7_serial_lock`) so a
silent fallback to the in-memory path would fail.
The shared assertion helper
`assert_three_input_three_metric_multi_output_correct` checks the
m:n contract end-to-end at the pipeline level:
- All three input splits replaced.
- ≥ 2 output splits staged (proves splitting happened).
- Sum of per-output row counts = total input rows.
- Each output internally monotonic on `sorted_series`.
- Across outputs, the `sorted_series` partition is disjoint — no
two outputs share any key, which is the "non-overlapping output"
contract the engine promises.
- Union of metric_names / services across outputs = full input set.
- Every output has `num_merge_ops = 1`, `row_keys_proto`, and a
`metric_name` zonemap regex.
To pin the test to exactly one merge (not a cascade of merges over
the now-multiple staged outputs), `make_pipeline_params` now takes
`max_merge_ops` and the bonus tests set it to `1`: outputs land at
`num_merge_ops = 1`, equal to the policy ceiling, and the planner
refuses to merge them again. The existing n=1 tests stay at 5
(headroom — they produce a single output that can't trigger another
merge anyway, since `merge_factor = 3`).
Updates the module doc to drop the now-stale scope note about m:n
not being reachable through the pipeline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 81c6514 commit 6be2cd9
1 file changed
Lines changed: 470 additions & 27 deletions
File tree
- quickwit/quickwit-indexing/src/actors/parquet_pipeline
0 commit comments