Skip to content

Commit d8188cb

Browse files
g-talbotclaude
andauthored
fix: encode descending sort columns correctly in sorted_series key (#6338)
* fix: encode descending sort columns correctly in sorted_series key The sorted_series key uses storekey for order-preserving binary encoding, but storekey always encodes ascending. For sort schemas with descending pre-timestamp columns (e.g. "-service"), the encoded bytes must be bitwise-NOTed so that ascending memcmp on the composite key gives the correct descending order. This is the standard ordered-code technique (see Google's OrderedCode). Without this fix, a descending pre-timestamp column would cause sorted_series bytes to not be monotonically ascending in the file, breaking DataFusion streaming aggregation assumptions and merge correctness. Changes: - KeyColumn now carries `descending: bool` from the parsed sort schema - encode_row_key inverts storekey bytes for descending columns (ordinal + value together) - Added invert_bytes helper - timeseries_id is always ascending (it's a hash tiebreaker) Test: ascending schema gives alpha < zebra in key bytes; descending schema gives zebra < alpha (inverted). Keys from different schemas differ for the same data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: only invert value bytes for descending columns, not ordinal When a descending sort column has a null value, the null row skips the column entirely in the key. If the ordinal bytes were inverted (!ordinal = 0xFE), null rows (whose next byte is the small timeseries_id ordinal, e.g. 0x02) would sort before non-null rows, violating the writer's nulls_first=false behavior. Fix: write the ordinal normally (ascending) and only invert the storekey-encoded value bytes. This ensures: - Non-null descending: ordinal(small) + !value - Null (skipped): next ordinal(slightly larger) - So non-null keys < null keys, matching physical nulls-last order Added test: descending service with null — verifies key(beta) < key(alpha) < key(null). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reject sort schemas with tag columns after timeseries_id The sorted_series key encodes columns up to (and including) timeseries_id. Any tag column after timeseries_id would be in the physical sort order but not in the key, breaking merge correctness — two rows with the same timeseries_id but different extra_tag values would have identical sorted_series keys. Added validation in validate_schema: after timeseries_id, only timestamp and tiebreaker may follow. Any other column is rejected at parse time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3fc479f commit d8188cb

0 file changed

File tree

    0 commit comments

    Comments
     (0)