Skip to content

Commit 090e649

Browse files
g-talbotclaude
andcommitted
style: nightly rustfmt + drop useless borrows in assert!
Reformats doc comments / format strings under nightly rustfmt (`wrap_comments`, `format_strings`), and removes two redundant `&` in `assert!` arguments flagged by clippy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c8889b8 commit 090e649

5 files changed

Lines changed: 66 additions & 85 deletions

File tree

quickwit/quickwit-config/src/node_config/mod.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -453,9 +453,7 @@ impl CacheConfig {
453453
fn deserialize_with_default<'de, D, const DEFAULT_CAPACITY: u64>(
454454
deserializer: D,
455455
) -> Result<CacheConfig, D::Error>
456-
where
457-
D: Deserializer<'de>,
458-
{
456+
where D: Deserializer<'de> {
459457
use serde_with::{DeserializeAs, FromInto, PickFirst, Same};
460458

461459
let mut cache_config: CacheConfig =

quickwit/quickwit-indexing/src/actors/parquet_pipeline/parquet_merge_executor.rs

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,19 @@
1616
//!
1717
//! Receives a `ParquetMergeScratch` from the downloader. Two engines are available:
1818
//!
19-
//! - **Streaming engine** (`execute_merge_operation`): column-major, page-bounded body cache.
20-
//! Used unconditionally for promotion merges (the in-memory path can't handle mixed
21-
//! prefix lengths). Optionally used for regular merges when the node-level
19+
//! - **Streaming engine** (`execute_merge_operation`): column-major, page-bounded body cache. Used
20+
//! unconditionally for promotion merges (the in-memory path can't handle mixed prefix lengths).
21+
//! Optionally used for regular merges when the node-level
2222
//! `IndexerConfig::parquet_merge_use_streaming_engine` flag is true.
23-
//! - **In-memory engine** (`merge_sorted_parquet_files`): buffers all inputs in memory and
24-
//! runs inside `run_cpu_intensive`. Kept as the runtime fallback so production can flip
25-
//! back via YAML config if the streaming engine hits a bug. To be removed once the
26-
//! streaming path has soaked.
23+
//! - **In-memory engine** (`merge_sorted_parquet_files`): buffers all inputs in memory and runs
24+
//! inside `run_cpu_intensive`. Kept as the runtime fallback so production can flip back via YAML
25+
//! config if the streaming engine hits a bug. To be removed once the streaming path has soaked.
2726
//!
2827
//! Routing in `handle()`:
2928
//!
30-
//! - `target_prefix_len_override.is_some()` → streaming engine. Promotion is the whole point
31-
//! of `target_prefix_len_override`, and the in-memory path's
32-
//! `extract_and_validate_input_metadata` would bail on the mixed `rg_partition_prefix_len`
33-
//! before any output is produced.
29+
//! - `target_prefix_len_override.is_some()` → streaming engine. Promotion is the whole point of
30+
//! `target_prefix_len_override`, and the in-memory path's `extract_and_validate_input_metadata`
31+
//! would bail on the mixed `rg_partition_prefix_len` before any output is produced.
3432
//! - Else `use_streaming_engine == true` → streaming engine (the new default once soaked).
3533
//! - Else → in-memory engine (the runtime fallback).
3634
//!

quickwit/quickwit-indexing/src/actors/parquet_pipeline/parquet_merge_pipeline_multi_metric_test.rs

Lines changed: 29 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -18,27 +18,24 @@
1818
//! (which covers the simpler two-input, one-metric-per-input case) with
1919
//! the harder scenarios:
2020
//!
21-
//! - **Three inputs**, each carrying **three metrics** (`aaa.alpha`, `bbb.beta`,
22-
//! `ccc.gamma`). Across inputs, the metrics overlap and the per-metric
23-
//! timeseries IDs collide (each row's `timeseries_id` is derived from the
24-
//! metric name, so input-x, input-y, input-z all share the same set of IDs
25-
//! per metric). Timestamps within each (metric, timeseries) overlap across
26-
//! inputs but are unique — the merge must interleave rows from all three
27-
//! inputs heavily, not concatenate them.
28-
//! - **Multi-row-group output** via `ParquetWriterConfig::row_group_size = 50`
29-
//! on the n=1 tests, so the 180-row merge output breaks into 4 row groups.
30-
//! Exercises the writer's multi-RG path in both engines.
31-
//! - **Multi-row-group inputs with `rg_partition_prefix_len = 1`** in the
32-
//! bonus tests (`write_prefix_aligned_input`): the writer flushes one row
33-
//! group per distinct `metric_name`, so each input file carries three row
34-
//! groups in alignment with the sort prefix. The streaming engine reads
35-
//! these through its prefix-aware fast path.
21+
//! - **Three inputs**, each carrying **three metrics** (`aaa.alpha`, `bbb.beta`, `ccc.gamma`).
22+
//! Across inputs, the metrics overlap and the per-metric timeseries IDs collide (each row's
23+
//! `timeseries_id` is derived from the metric name, so input-x, input-y, input-z all share the
24+
//! same set of IDs per metric). Timestamps within each (metric, timeseries) overlap across inputs
25+
//! but are unique — the merge must interleave rows from all three inputs heavily, not concatenate
26+
//! them.
27+
//! - **Multi-row-group output** via `ParquetWriterConfig::row_group_size = 50` on the n=1 tests, so
28+
//! the 180-row merge output breaks into 4 row groups. Exercises the writer's multi-RG path in
29+
//! both engines.
30+
//! - **Multi-row-group inputs with `rg_partition_prefix_len = 1`** in the bonus tests
31+
//! (`write_prefix_aligned_input`): the writer flushes one row group per distinct `metric_name`,
32+
//! so each input file carries three row groups in alignment with the sort prefix. The streaming
33+
//! engine reads these through its prefix-aware fast path.
3634
//! - **m:n merges** in the bonus tests: a small
37-
//! `ParquetMergePipelineParams::target_split_size_bytes` forces the
38-
//! executor to ask the engine for `num_outputs > 1`. The bonus
39-
//! assertions cover the multi-output contract — sum-equals-total,
40-
//! internal monotonicity, inter-output `sorted_series` disjointness,
41-
//! and union-equals-full-set on metrics/services.
35+
//! `ParquetMergePipelineParams::target_split_size_bytes` forces the executor to ask the engine
36+
//! for `num_outputs > 1`. The bonus assertions cover the multi-output contract —
37+
//! sum-equals-total, internal monotonicity, inter-output `sorted_series` disjointness, and
38+
//! union-equals-full-set on metrics/services.
4239
//!
4340
//! Both `ParquetMergePipelineParams::use_streaming_engine = false` (the
4441
//! in-memory engine) and `= true` (the streaming engine) are exercised
@@ -689,14 +686,11 @@ async fn assert_three_input_three_metric_single_output_correct(
689686
/// - The pipeline staged at least two output splits (proves splitting happened).
690687
/// - The sum of per-output row counts equals the total input row count.
691688
/// - Each output is internally sorted on `sorted_series`.
692-
/// - Across outputs, the `sorted_series` partition is **disjoint** (no two
693-
/// outputs share any `sorted_series` value — the merge engine splits at
694-
/// series boundaries, never inside).
695-
/// - The union of metric_names / services across outputs covers the full
696-
/// input set.
697-
/// - Every output declares `num_merge_ops = 1` (first merge over level-0
698-
/// inputs) and has `row_keys_proto` + `metric_name` zonemap regex
699-
/// populated.
689+
/// - Across outputs, the `sorted_series` partition is **disjoint** (no two outputs share any
690+
/// `sorted_series` value — the merge engine splits at series boundaries, never inside).
691+
/// - The union of metric_names / services across outputs covers the full input set.
692+
/// - Every output declares `num_merge_ops = 1` (first merge over level-0 inputs) and has
693+
/// `row_keys_proto` + `metric_name` zonemap regex populated.
700694
async fn assert_three_input_three_metric_multi_output_correct(
701695
staged_metadata: &Arc<std::sync::Mutex<Vec<ParquetSplitMetadata>>>,
702696
replaced_ids: &Arc<std::sync::Mutex<Vec<String>>>,
@@ -749,8 +743,8 @@ async fn assert_three_input_three_metric_multi_output_correct(
749743
let series = extract_binary_column(&batch, SORTED_SERIES_COLUMN);
750744
assert!(
751745
!series.is_empty(),
752-
"every output must have at least one row (empty outputs should be dropped \
753-
by the engine)"
746+
"every output must have at least one row (empty outputs should be dropped by the \
747+
engine)"
754748
);
755749
for i in 1..series.len() {
756750
assert!(
@@ -773,8 +767,7 @@ async fn assert_three_input_three_metric_multi_output_correct(
773767
let (right_min, _, right_file) = &window[1];
774768
assert!(
775769
left_max < right_min,
776-
"outputs {} and {} overlap on sorted_series: \
777-
left max = {:?}, right min = {:?}",
770+
"outputs {} and {} overlap on sorted_series: left max = {:?}, right min = {:?}",
778771
left_file,
779772
right_file,
780773
left_max,
@@ -929,8 +922,8 @@ async fn test_multi_metric_three_input_single_output_streaming_engine() {
929922
|staged, replaced, storage| async move {
930923
assert!(
931924
PEAK_BODY_COL_PAGE_CACHE_LEN.load(Ordering::Relaxed) > 0,
932-
"streaming engine did not write to PEAK_BODY_COL_PAGE_CACHE_LEN — \
933-
routing may have silently fallen back to the in-memory engine",
925+
"streaming engine did not write to PEAK_BODY_COL_PAGE_CACHE_LEN — routing may \
926+
have silently fallen back to the in-memory engine",
934927
);
935928
assert_three_input_three_metric_single_output_correct(&staged, &replaced, &storage)
936929
.await;
@@ -1052,8 +1045,8 @@ async fn test_prefix_aligned_multi_metric_three_input_multi_output_streaming_eng
10521045
|staged, replaced, storage| async move {
10531046
assert!(
10541047
PEAK_BODY_COL_PAGE_CACHE_LEN.load(Ordering::Relaxed) > 0,
1055-
"streaming engine did not write to PEAK_BODY_COL_PAGE_CACHE_LEN — \
1056-
routing may have silently fallen back to the in-memory engine",
1048+
"streaming engine did not write to PEAK_BODY_COL_PAGE_CACHE_LEN — routing may \
1049+
have silently fallen back to the in-memory engine",
10571050
);
10581051
assert_three_input_three_metric_multi_output_correct(
10591052
&staged,

quickwit/quickwit-indexing/src/actors/parquet_pipeline/parquet_merge_pipeline_test.rs

Lines changed: 22 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -447,19 +447,17 @@ async fn test_merge_pipeline_end_to_end() {
447447
///
448448
/// Verifies, in order:
449449
/// - **Step 5**: `replaced_split_ids = [split-a, split-b]`.
450-
/// - **Step 6**: staged `ParquetSplitMetadata` — `num_rows = 100`,
451-
/// `time_range = [100, 250)`, `metric_names = {cpu.usage, mem.usage}`,
452-
/// `num_merge_ops = 1`, `sort_fields` matches the table config,
453-
/// `row_keys_proto` present + non-empty, `zonemap_regexes` contains
454-
/// `metric_name`, `low_cardinality_tags["service"] = {web, api}`.
455-
/// - **Step 7**: merged Parquet file content — row count, all 100 timestamps
456-
/// match expected and metadata `time_range`, both service / host values
457-
/// survive, `sorted_series` is monotonically non-decreasing, and
458-
/// `cpu.usage` rows precede `mem.usage` rows (the global sort order
450+
/// - **Step 6**: staged `ParquetSplitMetadata` — `num_rows = 100`, `time_range = [100, 250)`,
451+
/// `metric_names = {cpu.usage, mem.usage}`, `num_merge_ops = 1`, `sort_fields` matches the table
452+
/// config, `row_keys_proto` present + non-empty, `zonemap_regexes` contains `metric_name`,
453+
/// `low_cardinality_tags["service"] = {web, api}`.
454+
/// - **Step 7**: merged Parquet file content — row count, all 100 timestamps match expected and
455+
/// metadata `time_range`, both service / host values survive, `sorted_series` is monotonically
456+
/// non-decreasing, and `cpu.usage` rows precede `mem.usage` rows (the global sort order
459457
/// semantics).
460-
/// - **Step 8**: Parquet KV metadata — `qh.sort_fields`, `qh.num_merge_ops`,
461-
/// `qh.row_keys` (non-empty), `qh.zonemap_regexes` parses as JSON and
462-
/// contains `metric_name`, and cross-validates with the staged metadata.
458+
/// - **Step 8**: Parquet KV metadata — `qh.sort_fields`, `qh.num_merge_ops`, `qh.row_keys`
459+
/// (non-empty), `qh.zonemap_regexes` parses as JSON and contains `metric_name`, and
460+
/// cross-validates with the staged metadata.
463461
async fn assert_cpu_mem_merge_outputs_correct(
464462
staged_metadata: &Arc<std::sync::Mutex<Vec<ParquetSplitMetadata>>>,
465463
replaced_ids: &Arc<std::sync::Mutex<Vec<String>>>,
@@ -639,8 +637,8 @@ async fn assert_cpu_mem_merge_outputs_correct(
639637
sorted_series[i] >= sorted_series[i - 1],
640638
"sorted_series must be monotonically non-decreasing at row {}: {:?} < {:?}",
641639
i,
642-
&sorted_series[i],
643-
&sorted_series[i - 1]
640+
sorted_series[i],
641+
sorted_series[i - 1]
644642
);
645643
}
646644

@@ -718,20 +716,18 @@ async fn assert_cpu_mem_merge_outputs_correct(
718716
/// reachable path — there is no silent fallback. The test confirms
719717
/// this by:
720718
///
721-
/// 1. Asserting the merge published with the right `replaced_split_ids`
722-
/// (the merge actually ran end-to-end through the executor).
723-
/// 2. Reading `PEAK_BODY_COL_PAGE_CACHE_LEN` and asserting it is
724-
/// non-zero (the streaming engine writes to this atomic on every
725-
/// body-col page assembly; if the in-memory engine had run instead
726-
/// the counter would stay at zero).
727-
/// 3. Asserting row count and metric names on the output match the
728-
/// inputs (the streaming engine produces correct results, not just
729-
/// "something").
719+
/// 1. Asserting the merge published with the right `replaced_split_ids` (the merge actually ran
720+
/// end-to-end through the executor).
721+
/// 2. Reading `PEAK_BODY_COL_PAGE_CACHE_LEN` and asserting it is non-zero (the streaming engine
722+
/// writes to this atomic on every body-col page assembly; if the in-memory engine had run
723+
/// instead the counter would stay at zero).
724+
/// 3. Asserting row count and metric names on the output match the inputs (the streaming engine
725+
/// produces correct results, not just "something").
730726
#[allow(
731727
clippy::await_holding_lock,
732-
reason = "the lock is `std::sync::Mutex` and the `#[tokio::test]` runtime is \
733-
single-threaded, so holding the guard across await won't deadlock another \
734-
thread — see `ms7_serial_lock` rationale"
728+
reason = "the lock is `std::sync::Mutex` and the `#[tokio::test]` runtime is single-threaded, \
729+
so holding the guard across await won't deadlock another thread — see \
730+
`ms7_serial_lock` rationale"
735731
)]
736732
#[tokio::test]
737733
async fn test_merge_pipeline_end_to_end_with_streaming_engine_flag() {

quickwit/quickwit-parquet-engine/src/merge/tests.rs

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1923,13 +1923,10 @@ mod parity {
19231923

19241924
/// Verify the m:n merge contract on a single engine's outputs:
19251925
///
1926-
/// 1. Sum of per-output row counts equals the total input row count
1927-
/// (no duplication, no loss).
1928-
/// 2. Within each output, the `sorted_series` column is monotonically
1929-
/// non-decreasing.
1930-
/// 3. Across outputs, after sorting by min `sorted_series`, every
1931-
/// output's max sorted_series is strictly less than the next
1932-
/// output's min — the partition is disjoint on the keyspace.
1926+
/// 1. Sum of per-output row counts equals the total input row count (no duplication, no loss).
1927+
/// 2. Within each output, the `sorted_series` column is monotonically non-decreasing.
1928+
/// 3. Across outputs, after sorting by min `sorted_series`, every output's max sorted_series is
1929+
/// strictly less than the next output's min — the partition is disjoint on the keyspace.
19331930
///
19341931
/// Holds for any merge with `num_outputs >= 1`. Trivial for n=1
19351932
/// (only invariant 2 is non-trivial there).
@@ -1976,8 +1973,7 @@ mod parity {
19761973
let (right_min, _, right_path) = &window[1];
19771974
assert!(
19781975
left_max < right_min,
1979-
"outputs {} and {} overlap on sorted_series: \
1980-
left max = {:?}, right min = {:?}",
1976+
"outputs {} and {} overlap on sorted_series: left max = {:?}, right min = {:?}",
19811977
left_path.display(),
19821978
right_path.display(),
19831979
left_max,

0 commit comments

Comments
 (0)