Skip to content

Commit 57e0a21

Browse files
g-talbotclaude
andcommitted
feat(legacy-adapter): parameterize on target_prefix_len with composite-prefix support
`LegacyInputAdapter::try_open` now takes `target_prefix_len: u32` chosen by the caller, matching the merge plan's consensus prefix length. The adapter slices the consolidated batch at every transition of the first N sort columns (composite key, via `RowConverter` over all N fields) and emits one output row group per slice, stamping the output with `qh.rg_partition_prefix_len = target_prefix_len`. With `target_prefix_len = 0` the adapter takes the original single-RG passthrough path with no prefix-alignment claim. A sort column that is named in `qh.sort_fields` but missing from the file's arrow schema is treated as implicitly null at every row per SS-3. A constantly-null column trivially satisfies alignment on that column (null == null) and contributes no transitions, so the split boundaries are driven by the columns that are present. This matches the merge engine's compaction-time treatment of missing columns and keeps a legacy file with an evolved schema usable as a prefix-aligned input. `PrefixUnresolvable` now fires only on cases where the file doesn't advertise enough sort *names* to honor the request: - `qh.sort_fields` absent or unparseable - `qh.sort_fields` declares fewer sort columns than `target_prefix_len` A column missing from the arrow schema no longer counts as unresolvable; the adapter materialises a `NullArray` of the batch's length in that slot and proceeds. Tests: - `test_target_prefix_len_zero_passes_through_as_single_rg` — explicit N=0 fallback, no prefix KV stamped. - `test_target_prefix_len_two_splits_by_metric_and_service` — composite prefix (`metric_name`, `service`) → 4 RGs, KV declares prefix_len=2. - `test_target_prefix_len_one_without_sort_fields_returns_unresolvable` — no `qh.sort_fields` KV → `PrefixUnresolvable`. - `test_target_prefix_len_exceeds_declared_sort_cols_returns_unresolvable` — sort schema declares 2 cols, caller asks 3 → `PrefixUnresolvable`. - `test_missing_prefix_col_treated_as_null_satisfies_alignment` — sort schema declares `metric_name|env|-timestamp_secs` but `env` is absent from the arrow schema → no error, only metric_name transitions split RGs, KV still stamps prefix_len=2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 347b7bd commit 57e0a21

1 file changed

Lines changed: 470 additions & 109 deletions

File tree

0 commit comments

Comments
 (0)