Commit a255ae3
fix(merge): adapter rejects unsorted input; consumer honors SS-3; stronger test verifiers
Three adversarial-review findings on the prefix/RG machinery, bundled
because they touch the same producer/consumer contract:
**F8: Legacy adapter rejects SS-1-violating input upfront.**
The adapter walked rows in physical order and emitted one RG per
prefix-value run. An unsorted legacy input (rows `[A,A,B,B,A,A]`)
produced a 3-RG file where two RGs shared prefix `A`, violating PA-3.
The streaming merge engine would later reject it mid-merge — but only
after a quietly-bad file had been built. Now `compute_prefix_value_slices`
tracks each slice's composite prefix-value bytes and bails with
`LegacyAdapterError::InputNotSorted` on duplicates, surfacing the
SS-1 violation before any file lands on disk.
**F12: Consumer-side SS-3 (cross-layer divergence, discovered while
wiring F2's chunk-level verifier into the SS-3 test).** The adapter
implements SS-3 correctly (missing-from-schema → synthesized NullArray
during slice computation, file stamps `prefix_len = N`). The streaming
engine's reader did not: `find_prefix_parquet_col_indices` hard-required
every named prefix column to be physically present, so a file the
adapter produced from an SS-3 input was unreadable by the merge engine.
Now `find_prefix_parquet_col_indices` returns `Vec<Option<PrefixColumn>>`
and `extract_rg_composite_prefix_key` emits a constant null marker
(`encode_byte_array_prefix(&[])`) for None slots. The column contributes
no cross-RG ordering signal (constant everywhere) so region boundaries
are driven entirely by the present columns. Both halves of SS-3 now
agree end-to-end.
Known limitation: cross-file SS-3 — where some inputs have a sort
column and others don't — uses [0x00, 0x00] for the null contribution,
which sorts BEFORE non-null per the encoded-empty-string convention.
That weakly violates SS-2 (nulls sort last). Single-file SS-3 is
correct because every RG in such a file contributes the same constant.
If cross-file SS-3 becomes a production scenario, the encoding needs
a leading-0xff sentinel instead. Not exercised today.
**F2/F9/F11: Wire `assert_unique_rg_prefix_keys` into prefix-claiming
tests.** Tests asserting `num_row_groups == N` + KV stamped to N would
have passed even with an off-by-one in slice-boundary detection or
column-content scrambling. The verifier reads chunk-level statistics
directly: PA-1 (intra-RG `min == max`) + PA-3 (inter-RG uniqueness)
on the composite key. Wired into six tests:
- streaming engine: `test_streaming_merge_with_prefix_len_two`,
`test_multi_rg_metric_aligned_input_produces_multi_rg_output`,
`test_streaming_merge_with_desc_prefix_col`
- legacy adapter: `test_target_prefix_len_two_splits_by_metric_and_service`,
`test_legacy_input_with_sort_fields_produces_prefix_aligned_multi_rg`,
`test_missing_prefix_col_treated_as_null_satisfies_alignment` (now
passes thanks to F12).
Also: `assert_unique_rg_prefix_keys` no longer short-circuits on
single-RG files — they still go through PA-1 because an unsorted
single-RG file CAN have `min != max` on a prefix column.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 61ad48c commit a255ae3
3 files changed
Lines changed: 254 additions & 32 deletions
File tree
- quickwit/quickwit-parquet-engine/src
- merge
- streaming
- storage
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1893 | 1893 | | |
1894 | 1894 | | |
1895 | 1895 | | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
1896 | 1907 | | |
1897 | 1908 | | |
1898 | 1909 | | |
| |||
2068 | 2079 | | |
2069 | 2080 | | |
2070 | 2081 | | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
| 2085 | + | |
| 2086 | + | |
| 2087 | + | |
| 2088 | + | |
| 2089 | + | |
| 2090 | + | |
| 2091 | + | |
| 2092 | + | |
| 2093 | + | |
| 2094 | + | |
| 2095 | + | |
2071 | 2096 | | |
2072 | 2097 | | |
2073 | 2098 | | |
| |||
2784 | 2809 | | |
2785 | 2810 | | |
2786 | 2811 | | |
| 2812 | + | |
| 2813 | + | |
| 2814 | + | |
| 2815 | + | |
| 2816 | + | |
| 2817 | + | |
| 2818 | + | |
| 2819 | + | |
| 2820 | + | |
| 2821 | + | |
2787 | 2822 | | |
2788 | 2823 | | |
2789 | 2824 | | |
| |||
2805 | 2840 | | |
2806 | 2841 | | |
2807 | 2842 | | |
2808 | | - | |
| 2843 | + | |
| 2844 | + | |
| 2845 | + | |
| 2846 | + | |
2809 | 2847 | | |
2810 | 2848 | | |
2811 | 2849 | | |
| |||
Lines changed: 64 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
115 | | - | |
116 | | - | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
117 | 128 | | |
118 | 129 | | |
119 | 130 | | |
120 | 131 | | |
121 | | - | |
122 | | - | |
| 132 | + | |
| 133 | + | |
123 | 134 | | |
124 | 135 | | |
125 | 136 | | |
| |||
129 | 140 | | |
130 | 141 | | |
131 | 142 | | |
132 | | - | |
| 143 | + | |
133 | 144 | | |
134 | 145 | | |
135 | 146 | | |
| |||
139 | 150 | | |
140 | 151 | | |
141 | 152 | | |
| 153 | + | |
| 154 | + | |
142 | 155 | | |
143 | 156 | | |
144 | 157 | | |
145 | 158 | | |
146 | 159 | | |
147 | 160 | | |
148 | 161 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
159 | 166 | | |
160 | 167 | | |
161 | 168 | | |
162 | | - | |
| 169 | + | |
163 | 170 | | |
164 | 171 | | |
165 | 172 | | |
| |||
179 | 186 | | |
180 | 187 | | |
181 | 188 | | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
186 | 203 | | |
187 | 204 | | |
188 | 205 | | |
189 | | - | |
| 206 | + | |
190 | 207 | | |
191 | 208 | | |
192 | 209 | | |
193 | 210 | | |
194 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
195 | 219 | | |
196 | 220 | | |
197 | 221 | | |
| |||
569 | 593 | | |
570 | 594 | | |
571 | 595 | | |
572 | | - | |
573 | | - | |
574 | | - | |
575 | | - | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
576 | 612 | | |
577 | 613 | | |
578 | 614 | | |
| |||
594 | 630 | | |
595 | 631 | | |
596 | 632 | | |
597 | | - | |
598 | | - | |
| 633 | + | |
| 634 | + | |
599 | 635 | | |
600 | 636 | | |
601 | 637 | | |
| |||
0 commit comments