Skip to content

Range partition pruning incorrectly prunes segments when using numeric filters on LONG dimensions and index_parallel ingestion #19408

@rzepinskip

Description

@rzepinskip

Affected Version

Druid 36.0.0

Description

Range partition pruning produces incorrect query results (data loss) when filtering on LONG-typed dimensions that were used as partition keys, if those LONG values have different string widths (different digit counts). The bug causes segments containing matching rows to be incorrectly pruned from the query plan.

Although docs (and example) mentions string dimensions in context of secondary partitioning, it is NOT explicitly said that it does not work for non-string dimensions nor index_parall / index_hadoop reject non-string dimensions during runtime. I only found this issue because MSQ ingestion CLUSTERED_BY fails on non-string dimensions.

Root Cause

  1. At ingestion time: Range partitioning (DimensionRangePartitionsSpec) builds partition boundaries by stringifying dimension values via InputRow.getDimension(), which always returns List<String>. This produces StringTuple boundaries stored in DimensionRangeShardSpec (e.g., start=["100", "200"]).

  2. At query time: When the SQL planner or native query engine issues a numeric range filter like WHERE id >= 80 on a LONG dimension:

    • SQL generates a RangeFilter("id", ColumnType.LONG, 80L, null, ...) via Expressions.toLeafFilter().
    • RangeFilter.getDimensionRangeSet() converts the LONG bound to a string via lowerEval.asString() → produces Range.atLeast("80").
    • This range is compared lexicographically against the segment's stringified StringTuple bounds in DimensionRangeShardSpec.possibleInDomain().
  3. The bug: Lexicographic comparison of "80" vs ["100", "200"]:

    • '1' < '8' (lexicographically) → segment is pruned.
    • But numerically: every value in [100, 200] satisfies >= 80 → segment should be kept.
    • Result: Rows matching the filter are silently dropped from query results.

The code comment in RangeFilter.getDimensionRangeSet() even acknowledges this is wrong:

// We need to return a RangeSet<String>, but we have Object, not String. We align with the interface by
// converting things to String, but we'd probably be better off adjusting the interface to something that is
// more type aware in the future

Steps to Reproduce

1. Add failing unit test to demonstrate the bug:

File: processing/src/test/java/org/apache/druid/query/filter/FilterSegmentPrunerTest.java

@Test
void testNumericRangePruning_mixedWidthLongs_falseNegative()
{
  final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";

  DataSegment segLargeIds = makeDataSegment(
      interval,
      makeRange("id", 0, "100", "200")
  );

  DimFilter filter =
      new RangeFilter("id", ColumnType.LONG, 80L, null, false, false, null);
  FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);

  // BUG: rows with id in [100, 200] all match `id >= 80`, but the segment
  //      is wrongly pruned because '100' and '200' are lex-less-than '80'.
  Assertions.assertTrue(
      pruner.include(segLargeIds),
      "Segment with numeric ids [100,200] must be kept for `id >= 80`."
  );
}

Run the test:

mvn test -pl processing -am \
  -Dtest="org.apache.druid.query.filter.FilterSegmentPrunerTest#testNumericRangePruning_mixedWidthLongs_falseNegative" \
  -Dsurefire.failIfNoSpecifiedTests=false \
  -Pskip-static-checks -Dweb.console.skip=true -T1C

Expected: Test passes (segment is kept).
Actual: Test fails with:

AssertionFailedError: Segment with numeric ids [100,200] must be kept for `id >= 80`.
==> expected: <true> but was: <false>

2. Contrasting working test (same-width LONGs):

@Test
void testNumericRangePruning_sameWidthLongs_works()
{
  final String interval = "2026-01-01T00:00:00Z/2026-01-02T00:00:00Z";

  DataSegment segOlder = makeDataSegment(
      interval,
      makeRange("id", 0, "1000000000", "1500000000")
  );
  DataSegment segNewer = makeDataSegment(
      interval,
      makeRange("id", 1, "1500000000", "2000000000")
  );

  DimFilter filter =
      new RangeFilter("id", ColumnType.LONG, 1_700_000_000L, null, false, false, null);
  FilterSegmentPruner pruner = new FilterSegmentPruner(filter, null, null);

  Assertions.assertFalse(pruner.include(segOlder));  // Correctly pruned
  Assertions.assertTrue(pruner.include(segNewer));   // Correctly kept
}

This test passes because all 10-digit values have the same string width, so lexicographic order coincides with numeric order.


Impact

Affected queries:

  • SQL queries with numeric comparison operators on LONG partition dimensions:
    SELECT * FROM datasource WHERE account_id >= 100
  • Native queries with RangeFilter on LONG partition dimensions.

Affected scenarios:

  1. Variable-width LONG values used as partition dimensions:

    • IDs spanning multiple magnitude ranges (e.g., 1-digit to 6-digit account IDs).
    • Any range including values with different digit counts.
  2. Negative LONG values (lexicographic sign ordering '-' < '0'...'9' differs from numeric).

  3. Range filters on the first partition dimension (or any dimension when the leading dimensions match as singletons, enabling deeper pruning).

Safe scenarios (no bug):

  • Equality filters (=, IN, NOT IN) work correctly (they also use stringified values for comparison).
  • Same-width LONG values (e.g., 10-digit Unix timestamps in seconds, zero-padded IDs) work correctly because lexicographic order happens to match numeric order.
  • Partitioning by STRING dimensions (already lexicographic).

Example Real-World Scenario

Setup:

  • Datasource partitioned by partitionDimensions: ["account_id"] (LONG type).
  • Ingestion produces segments with boundaries like:
    {"start": ["100"], "end": ["200"]}     // 3-digit IDs
    {"start": ["200"], "end": ["1000"]}    // 3-4 digit IDs
    {"start": ["1000"], "end": ["10000"]}  // 4-5 digit IDs

Query:

SELECT COUNT(*) FROM datasource WHERE account_id >= 80

Expected: All rows with account_id >= 80 (includes segments with IDs 100-200, 200-1000, 1000-10000, etc.).

Actual: Segment ["100", "200"] is incorrectly pruned because lexicographically "100" < "80" and "200" < "80" (comparing first character: '1' < '8', '2' < '8'). Query returns 0 rows from that segment even though numerically all rows match.


Debugging Already Done

  1. Code review:

    • Traced ingestion path: RangePartitionIndexTaskInputRowIteratorBuilderInputRow.getDimension() → stringified values → StringSketchStringTuple boundaries.
    • Traced query path: Expressions.toLeafFilter()RangeFiltergetDimensionRangeSet()lowerEval.asString() → lexicographic Range<String>.
    • Confirmed DimensionRangeShardSpec.possibleInDomain() performs lexicographic intersection via Guava Range<String> operations.
  2. Test coverage gap:

    • No existing tests in FilterSegmentPrunerTest, DimensionRangeShardSpecTest, or RangeFilterTests exercise numeric RangeFilter against DimensionRangeShardSpec with variable-width values.
    • All existing tests use STRING-typed filters or same-width numeric values.
  3. Related findings:

    • BoundDimFilter.getDimensionRangeSet() returns null for non-lexicographic orderings (safer, but older code path).
    • SQL planner always generates RangeFilter (not BoundDimFilter) for >=, >, <, <= operators since the "always use RangeFilter" change in Expressions.toLeafFilter().

Configuration Details

Cluster size: N/A (reproducible in unit tests).

Ingestion spec - index_parallel (relevant excerpt):

{
  "partitionsSpec": {
    "type": "range",
    "targetRowsPerSegment": 5000000,
    "partitionDimensions": ["account_id", "secondary_timestamp_ms_unix"]
  }
}

Schema (relevant excerpt):

{
  "dimensionsSpec": {
    "dimensions": [
      {"name": "account_id", "type": "long"},
      {"name": "secondary_timestamp_ms_unix", "type": "long"}
    ]
  }
}

Query context: Default (no special settings).

Short term fix

We can use "secondaryPartitionPruning": false to disable pruning in affected queries.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions