Commit 57e0a21
feat(legacy-adapter): parameterize on target_prefix_len with composite-prefix support
`LegacyInputAdapter::try_open` now takes `target_prefix_len: u32`
chosen by the caller, matching the merge plan's consensus prefix
length. The adapter slices the consolidated batch at every transition
of the first N sort columns (composite key, via `RowConverter` over
all N fields) and emits one output row group per slice, stamping the
output with `qh.rg_partition_prefix_len = target_prefix_len`. With
`target_prefix_len = 0` the adapter takes the original single-RG
passthrough path with no prefix-alignment claim.
A sort column that is named in `qh.sort_fields` but missing from the
file's arrow schema is treated as implicitly null at every row per
SS-3. A constantly-null column trivially satisfies alignment on that
column (null == null) and contributes no transitions, so the split
boundaries are driven by the columns that are present. This matches
the merge engine's compaction-time treatment of missing columns and
keeps a legacy file with an evolved schema usable as a prefix-aligned
input.
`PrefixUnresolvable` now fires only on cases where the file doesn't
advertise enough sort *names* to honor the request:
- `qh.sort_fields` absent or unparseable
- `qh.sort_fields` declares fewer sort columns than `target_prefix_len`
A column missing from the arrow schema no longer counts as
unresolvable; the adapter materialises a `NullArray` of the batch's
length in that slot and proceeds.
Tests:
- `test_target_prefix_len_zero_passes_through_as_single_rg` — explicit
N=0 fallback, no prefix KV stamped.
- `test_target_prefix_len_two_splits_by_metric_and_service` — composite
prefix (`metric_name`, `service`) → 4 RGs, KV declares prefix_len=2.
- `test_target_prefix_len_one_without_sort_fields_returns_unresolvable`
— no `qh.sort_fields` KV → `PrefixUnresolvable`.
- `test_target_prefix_len_exceeds_declared_sort_cols_returns_unresolvable`
— sort schema declares 2 cols, caller asks 3 → `PrefixUnresolvable`.
- `test_missing_prefix_col_treated_as_null_satisfies_alignment` —
sort schema declares `metric_name|env|-timestamp_secs` but `env`
is absent from the arrow schema → no error, only metric_name
transitions split RGs, KV still stamps prefix_len=2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 347b7bd commit 57e0a21
1 file changed
Lines changed: 470 additions & 109 deletions
0 commit comments