You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: compute per-output split metadata in merge engine
The merge engine now extracts metric_names, time_range, and
low_cardinality_tags from each output file's actual rows during the
merge write pass.
Previously, MergeOutputFile only contained physical metadata (num_rows,
size_bytes, row_keys, zonemaps). The downstream metadata_aggregation
function inferred logical metadata by unioning all input splits — which
is incorrect when num_outputs > 1, since each output contains only a
subset of the globally sorted rows.
Now each MergeOutputFile carries:
- metric_names: distinct metrics in this output's rows
- time_range: min/max timestamp_secs from this output's rows
- low_cardinality_tags: service names from this output's rows
Reuses existing extract_metric_names, extract_service_names, and
extract_time_range from split_writer (made pub(crate)).
Includes test that verifies per-output metadata is computed from actual
rows when merging 2 inputs into 2 outputs with different metric names.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments