Affected Version
Druid 36.0.0
Description
Range partition pruning produces incorrect query results (data loss) when filtering on LONG-typed dimensions that were used as partition keys, if those LONG values have different string widths (different digit counts). The bug causes segments containing matching rows to be incorrectly pruned from the query plan.
Although docs (and example) mentions string dimensions in context of secondary partitioning, it is NOT explicitly said that it does not work for non-string dimensions nor index_parall / index_hadoop reject non-string dimensions during runtime. I only found this issue because MSQ ingestion CLUSTERED_BY fails on non-string dimensions.
Root Cause
-
At ingestion time: Range partitioning (DimensionRangePartitionsSpec) builds partition boundaries by stringifying dimension values via InputRow.getDimension(), which always returns List<String>. This produces StringTuple boundaries stored in DimensionRangeShardSpec (e.g., start=["100", "200"]).
-
At query time: When the SQL planner or native query engine issues a numeric range filter like WHERE id >= 80 on a LONG dimension:
- SQL generates a
RangeFilter("id", ColumnType.LONG, 80L, null, ...) via Expressions.toLeafFilter().
RangeFilter.getDimensionRangeSet() converts the LONG bound to a string via lowerEval.asString() → produces Range.atLeast("80").
- This range is compared lexicographically against the segment's stringified
StringTuple bounds in DimensionRangeShardSpec.possibleInDomain().
-
The bug: Lexicographic comparison of "80" vs ["100", "200"]:
'1' < '8' (lexicographically) → segment is pruned.
- But numerically: every value in [100, 200] satisfies
>= 80 → segment should be kept.
- Result: Rows matching the filter are silently dropped from query results.
The code comment in RangeFilter.getDimensionRangeSet() even acknowledges this is wrong:
// We need to return a RangeSet<String>, but we have Object, not String. We align with the interface by
// converting things to String, but we'd probably be better off adjusting the interface to something that is
// more type aware in the future
Steps to Reproduce
1. Add failing unit test to demonstrate the bug:
File: processing/src/test/java/org/apache/druid/query/filter/FilterSegmentPrunerTest.java
@Test
void testNumericRangePruning_mixedWidthLongs_falseNegative()
{
final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
DataSegment segLargeIds = makeDataSegment(
interval,
makeRange("id", 0, "100", "200")
);
DimFilter filter =
new RangeFilter("id", ColumnType.LONG, 80L, null, false, false, null);
FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
// BUG: rows with id in [100, 200] all match `id >= 80`, but the segment
// is wrongly pruned because '100' and '200' are lex-less-than '80'.
Assertions.assertTrue(
pruner.include(segLargeIds),
"Segment with numeric ids [100,200] must be kept for `id >= 80`."
);
}
Run the test:
mvn test -pl processing -am \
-Dtest="org.apache.druid.query.filter.FilterSegmentPrunerTest#testNumericRangePruning_mixedWidthLongs_falseNegative" \
-Dsurefire.failIfNoSpecifiedTests=false \
-Pskip-static-checks -Dweb.console.skip=true -T1C
Expected: Test passes (segment is kept).
Actual: Test fails with:
AssertionFailedError: Segment with numeric ids [100,200] must be kept for `id >= 80`.
==> expected: <true> but was: <false>
2. Contrasting working test (same-width LONGs):
@Test
void testNumericRangePruning_sameWidthLongs_works()
{
final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";
DataSegment segOlder = makeDataSegment(
interval,
makeRange("id", 0, "1000000000", "1500000000")
);
DataSegment segNewer = makeDataSegment(
interval,
makeRange("id", 1, "1500000000", "2000000000")
);
DimFilter filter =
new RangeFilter("id", ColumnType.LONG, 1_700_000_000L, null, false, false, null);
FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);
Assertions.assertFalse(pruner.include(segOlder)); // Correctly pruned
Assertions.assertTrue(pruner.include(segNewer)); // Correctly kept
}
This test passes because all 10-digit values have the same string width, so lexicographic order coincides with numeric order.
Impact
Affected queries:
- SQL queries with numeric comparison operators on LONG partition dimensions:
SELECT * FROM datasource WHERE account_id >= 100
- Native queries with
RangeFilter on LONG partition dimensions.
Affected scenarios:
-
Variable-width LONG values used as partition dimensions:
- IDs spanning multiple magnitude ranges (e.g., 1-digit to 6-digit account IDs).
- Any range including values with different digit counts.
-
Negative LONG values (lexicographic sign ordering '-' < '0'...'9' differs from numeric).
-
Range filters on the first partition dimension (or any dimension when the leading dimensions match as singletons, enabling deeper pruning).
Safe scenarios (no bug):
- Equality filters (
=, IN, NOT IN) work correctly (they also use stringified values for comparison).
- Same-width LONG values (e.g., 10-digit Unix timestamps in seconds, zero-padded IDs) work correctly because lexicographic order happens to match numeric order.
- Partitioning by STRING dimensions (already lexicographic).
Example Real-World Scenario
Setup:
- Datasource partitioned by
partitionDimensions: ["account_id"] (LONG type).
- Ingestion produces segments with boundaries like:
{"start": ["100"], "end": ["200"]} // 3-digit IDs
{"start": ["200"], "end": ["1000"]} // 3-4 digit IDs
{"start": ["1000"], "end": ["10000"]} // 4-5 digit IDs
Query:
SELECT COUNT(*) FROM datasource WHERE account_id >= 80
Expected: All rows with account_id >= 80 (includes segments with IDs 100-200, 200-1000, 1000-10000, etc.).
Actual: Segment ["100", "200"] is incorrectly pruned because lexicographically "100" < "80" and "200" < "80" (comparing first character: '1' < '8', '2' < '8'). Query returns 0 rows from that segment even though numerically all rows match.
Debugging Already Done
-
Code review:
- Traced ingestion path:
RangePartitionIndexTaskInputRowIteratorBuilder → InputRow.getDimension() → stringified values → StringSketch → StringTuple boundaries.
- Traced query path:
Expressions.toLeafFilter() → RangeFilter → getDimensionRangeSet() → lowerEval.asString() → lexicographic Range<String>.
- Confirmed
DimensionRangeShardSpec.possibleInDomain() performs lexicographic intersection via Guava Range<String> operations.
-
Test coverage gap:
- No existing tests in
FilterSegmentPrunerTest, DimensionRangeShardSpecTest, or RangeFilterTests exercise numeric RangeFilter against DimensionRangeShardSpec with variable-width values.
- All existing tests use STRING-typed filters or same-width numeric values.
-
Related findings:
BoundDimFilter.getDimensionRangeSet() returns null for non-lexicographic orderings (safer, but older code path).
- SQL planner always generates
RangeFilter (not BoundDimFilter) for >=, >, <, <= operators since the "always use RangeFilter" change in Expressions.toLeafFilter().
Configuration Details
Cluster size: N/A (reproducible in unit tests).
Ingestion spec - index_parallel (relevant excerpt):
{
"partitionsSpec": {
"type": "range",
"targetRowsPerSegment": 5000000,
"partitionDimensions": ["account_id", "secondary_timestamp_ms_unix"]
}
}
Schema (relevant excerpt):
{
"dimensionsSpec": {
"dimensions": [
{"name": "account_id", "type": "long"},
{"name": "secondary_timestamp_ms_unix", "type": "long"}
]
}
}
Query context: Default (no special settings).
Short term fix
We can use "secondaryPartitionPruning": false to disable pruning in affected queries.
Affected Version
Druid 36.0.0
Description
Range partition pruning produces incorrect query results (data loss) when filtering on LONG-typed dimensions that were used as partition keys, if those LONG values have different string widths (different digit counts). The bug causes segments containing matching rows to be incorrectly pruned from the query plan.
Although docs (and example) mentions string dimensions in context of secondary partitioning, it is NOT explicitly said that it does not work for non-string dimensions nor index_parall / index_hadoop reject non-string dimensions during runtime. I only found this issue because MSQ ingestion CLUSTERED_BY fails on non-string dimensions.
Root Cause
At ingestion time: Range partitioning (
DimensionRangePartitionsSpec) builds partition boundaries by stringifying dimension values viaInputRow.getDimension(), which always returnsList<String>. This producesStringTupleboundaries stored inDimensionRangeShardSpec(e.g.,start=["100", "200"]).At query time: When the SQL planner or native query engine issues a numeric range filter like
WHERE id >= 80on a LONG dimension:RangeFilter("id", ColumnType.LONG, 80L, null, ...)viaExpressions.toLeafFilter().RangeFilter.getDimensionRangeSet()converts the LONG bound to a string vialowerEval.asString()→ producesRange.atLeast("80").StringTuplebounds inDimensionRangeShardSpec.possibleInDomain().The bug: Lexicographic comparison of
"80"vs["100", "200"]:'1' < '8'(lexicographically) → segment is pruned.>= 80→ segment should be kept.The code comment in
RangeFilter.getDimensionRangeSet()even acknowledges this is wrong:Steps to Reproduce
1. Add failing unit test to demonstrate the bug:
File:
processing/src/test/java/org/apache/druid/query/filter/FilterSegmentPrunerTest.javaRun the test:
Expected: Test passes (segment is kept).
Actual: Test fails with:
2. Contrasting working test (same-width LONGs):
This test passes because all 10-digit values have the same string width, so lexicographic order coincides with numeric order.
Impact
Affected queries:
RangeFilteron LONG partition dimensions.Affected scenarios:
Variable-width LONG values used as partition dimensions:
Negative LONG values (lexicographic sign ordering
'-' < '0'...'9'differs from numeric).Range filters on the first partition dimension (or any dimension when the leading dimensions match as singletons, enabling deeper pruning).
Safe scenarios (no bug):
=,IN,NOT IN) work correctly (they also use stringified values for comparison).Example Real-World Scenario
Setup:
partitionDimensions: ["account_id"](LONG type).{"start": ["100"], "end": ["200"]} // 3-digit IDs {"start": ["200"], "end": ["1000"]} // 3-4 digit IDs {"start": ["1000"], "end": ["10000"]} // 4-5 digit IDsQuery:
Expected: All rows with
account_id >= 80(includes segments with IDs 100-200, 200-1000, 1000-10000, etc.).Actual: Segment
["100", "200"]is incorrectly pruned because lexicographically"100" < "80"and"200" < "80"(comparing first character:'1' < '8','2' < '8'). Query returns 0 rows from that segment even though numerically all rows match.Debugging Already Done
Code review:
RangePartitionIndexTaskInputRowIteratorBuilder→InputRow.getDimension()→ stringified values →StringSketch→StringTupleboundaries.Expressions.toLeafFilter()→RangeFilter→getDimensionRangeSet()→lowerEval.asString()→ lexicographicRange<String>.DimensionRangeShardSpec.possibleInDomain()performs lexicographic intersection via GuavaRange<String>operations.Test coverage gap:
FilterSegmentPrunerTest,DimensionRangeShardSpecTest, orRangeFilterTestsexercise numericRangeFilteragainstDimensionRangeShardSpecwith variable-width values.Related findings:
BoundDimFilter.getDimensionRangeSet()returnsnullfor non-lexicographic orderings (safer, but older code path).RangeFilter(notBoundDimFilter) for>=,>,<,<=operators since the "always use RangeFilter" change inExpressions.toLeafFilter().Configuration Details
Cluster size: N/A (reproducible in unit tests).
Ingestion spec - index_parallel (relevant excerpt):
{ "partitionsSpec": { "type": "range", "targetRowsPerSegment": 5000000, "partitionDimensions": ["account_id", "secondary_timestamp_ms_unix"] } }Schema (relevant excerpt):
{ "dimensionsSpec": { "dimensions": [ {"name": "account_id", "type": "long"}, {"name": "secondary_timestamp_ms_unix", "type": "long"} ] } }Query context: Default (no special settings).
Short term fix
We can use
"secondaryPartitionPruning": falseto disable pruning in affected queries.