[SPARK-56402][SS] Apply rangeScan API in stream-stream join format version 4 by HeartSaVioR · Pull Request #55267 · apache/spark

HeartSaVioR · 2026-04-09T02:05:41Z

What changes were proposed in this pull request?

This PR proposes to apply rangeScan API in stream-stream join format version 4, which will give an improvement of scanning on matching rows for time interval join and eviction.

The main idea for eviction is to perform scanning secondary index from [the end timestamp of previous scan + 1, new end timestamp], which was [None, new end timestamp]. Previously it had to go through tombstones prior batches made in prior evictions (till compaction happens), and with this change we will be able to skip those tombstones.

The idea of time interval join is straightforward - we know the timestamp range of matching rows and we used it to scope it. Previously we scan all timestamps within the key from RocksDB and apply filter. We move the due of filter to RocksDB, to leverage the same effect with the above (skipping tombstones).

Why are the changes needed?

This change will give a hit to RocksDB about the exact range to scan, reducing the chance of reading tombstone a lot.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs, and existing UTs.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude 4.6 Opus

HeartSaVioR · 2026-04-09T02:06:12Z

Only the last commit is related to this PR. Once #55226 is merged, I'll rebase.

eason-yuchen-liu · 2026-04-13T18:08:06Z

+      InternalRow.fromSeq(schema.map(f => defaultValueForType(f.dataType)))
+    }
+
+    private def defaultValueForType(dt: DataType): Any = dt match {


How can we make sure that this list is comprehensive? I am surprised that Spark does not have a native utility for this. For example, not sure if missing DecimalType, ArrayType, or MapType is fine here.

Will returning null lead to crash in the encoder? Wonder if we should just throw UnsupportedException for wildcard.

I think I simply missed the utility method since it was indirect.

Literal object has default method and Literal instance can give the value via .value. I'll update the code here.

eason-yuchen-liu · 2026-04-13T18:25:04Z

+      // startTimestamp is exclusive (already evicted), so we seek from st + 1.
+      val startKeyRow = startTimestamp.flatMap { st =>
+        if (st < Long.MaxValue) Some(createScanBoundaryRow(st + 1))
+        else None


Can we just return empty Iterator here? This avoids the silent full scan of the state.

This just means we do not know clearly about where to start from (e.g. first batch), not that we can skip scanning. W.r.t. concern of full scan, we will still be guarding the range with column family.

You mean startTimestamp == Long.maxValue means we are at the first batch? I thought it should be None.

Oh yes you raise a good point. start being Long.maxValue should not match anything.

eason-yuchen-liu · 2026-04-13T18:38:20Z

+            // Filter out entries outside [minTs, maxTs]. This is essential when using
+            // prefixScan (which returns all timestamps for the key) and serves as a
+            // safety guard for rangeScan as well.
+            if (ts < minTs || ts > maxTs) {


If it is a safety guard, maybe we should throw an error? IIUC, ts should never be greater than maxTs in either case (rangeScan or prefixScan) based on the current code? ts < minTs may be valid for prefixScan but not for rangeScan

It's a valid logic for prefixScan and safety guard for rangeScan. It's more about generalization of the code, but we can specifically check with prefixScan vs rangeScan and only apply assertion on rangeScan - do you think that'd be preferable?

Btw I realize we should not change the codebase if we want to generalize the code. pastUpperBound flag is still useful for prefixScan. Maybe I'll need to revert the code except applying rangeScan. Still, if you prefer to have assertion for rangeScan, we can put it additionally.

I agree we should generalize on the common logics, but I think this check is not one of it. I think it is good to push this range check to the separate methods of prefixScan and rangeScan since the two come with different guarantees. We can assert in rangeScan and do early exit in prefixScan this way.

Let's just assert the case for range scan rather than having duplicated code.

eason-yuchen-liu · 2026-04-13T18:40:48Z

+     * timestamp matters for ordering in the prefix encoder.
+     */
+    private def createScanBoundaryRow(timestamp: Long): UnsafeRow = {
+      val defaultKey = UnsafeProjection.create(keySchema)


nit: we are creating new UnsafeRow in each call, consider have a reuseable row?

Yeah probably defaultKey can be simply reused. Nice finding!

eason-yuchen-liu · 2026-04-13T20:52:18Z

@@ -1105,6 +1105,67 @@ class SymmetricHashJoinStateManagerEventTimeInValueSuite
    }
  }



nit: Can we test for overflow boundaries for eviction as well?

We have test with providing Range on Long.MinValue and Long.MaxValue. Would you mind to be more specific?

Ah looks like we don't have it for eviction. Working on it.

eason-yuchen-liu

I reviewed the last two commits of this PR and they look good. Thanks for making the change. This could greatly improve Stream-Stream Join performance. Please make sure to address this comment. It may be hard to find.
#55267 (comment)

HeartSaVioR · 2026-04-16T01:10:39Z

cc. @viirya

anishshri-db · 2026-04-17T20:51:50Z

  }
  /** Predicate for watermark on state keys */
-  case class JoinStateKeyWatermarkPredicate(expr: Expression, stateWatermark: Long)
+  case class JoinStateKeyWatermarkPredicate(


Can we add a high level comment to explain why the prevStateWatermark is passed here

Good suggestion! Done.

anishshri-db · 2026-04-17T20:53:27Z

            val ts = TimestampKeyStateEncoder.extractTimestamp(unsafeRowPair.key)

+            if (useRangeScan) {
+              assert(ts >= minTs && ts <= maxTs,


Could we add an error class for this ?

anishshri-db · 2026-04-17T20:54:51Z

+   * @param cfName   The column family name.
+   * @return An iterator of ByteArrayPairs in the given range.
+   */
+  def scan(


nit: should we call it scanRange ?

anishshri-db · 2026-04-17T20:58:19Z

+
+      val kvEncoder = keyValueEncoderMap.get(colFamilyName)
+      require(kvEncoder._1.supportsRangeScan,
+        "Range scan requires an encoder that supports range scanning!")


nit: with multiple values requires

viirya · 2026-04-18T17:59:54Z

+        private val iter = if (useRangeScan) {
+          val startKey = createKeyRow(key, minTs).copy()
+          // rangeScanWithMultiValues endKey is exclusive, so use maxTs + 1
+          val endKey = Some(createKeyRow(key, maxTs + 1))


Do we need to copy it like startKey?

We don't need to copy endKey since startKey and endKey should co-exist at the same time, but once we call the rangeScanWithMultiValues, both startKey and endKey aren't used.

I'm leaving code comment instead.

HeartSaVioR · 2026-04-19T21:17:09Z

The last commit (before the empty commit) is the same with #55265 .

Use bounded scan ranges in stream-stream join V4 operators to narrow the iteration scope during eviction and value lookup: - scanEvictedKeys (TsWithKeyTypeStore): use scanWithMultiValues with startKey derived from the previous batch's state watermark and endKey from the current eviction threshold. Thread prevBatchStateWatermark through JoinStateWatermarkPredicate -> SupportsEvictByTimestamp. - getValuesInRange (KeyWithTsToValuesStore): use scanWithMultiValues for bounded timestamp ranges, falling back to prefixScan for full range. Create default-valued boundary rows to avoid NullPointerException when the join key schema contains non-nullable fields (e.g. window structs).

HeartSaVioR · 2026-04-20T00:15:43Z

https://github.com/HeartSaVioR/spark/runs/72040646689

CI only failed in sparkr which is unrelated.

HeartSaVioR · 2026-04-20T00:16:14Z

Thanks! Merging to master.

HeartSaVioR mentioned this pull request Apr 9, 2026

[WIP][REVIEW-DONE] Apply StateStore's scan operation to the pattern of range scan in stream-stream join v4 HeartSaVioR/spark#26

Closed

eason-yuchen-liu reviewed Apr 13, 2026

View reviewed changes

HeartSaVioR requested a review from eason-yuchen-liu April 14, 2026 01:12

eason-yuchen-liu approved these changes Apr 14, 2026

View reviewed changes

HeartSaVioR force-pushed the SPARK-56402-on-top-of-SPARK-56369 branch from f3a3213 to e976887 Compare April 15, 2026 03:35

anishshri-db reviewed Apr 17, 2026

View reviewed changes

HeartSaVioR force-pushed the SPARK-56402-on-top-of-SPARK-56369 branch from e976887 to ef05991 Compare April 18, 2026 13:07

HeartSaVioR requested a review from anishshri-db April 18, 2026 13:19

anishshri-db approved these changes Apr 18, 2026

View reviewed changes

viirya approved these changes Apr 18, 2026

View reviewed changes

viirya reviewed Apr 18, 2026

View reviewed changes

HeartSaVioR mentioned this pull request Apr 19, 2026

[SPARK-56400][SS] Apply rangeScan API in transformWithState Timer/TTL #55265

Closed

HeartSaVioR requested a review from viirya April 19, 2026 08:02

HeartSaVioR force-pushed the SPARK-56402-on-top-of-SPARK-56369 branch from 6ec5336 to 05eeb66 Compare April 19, 2026 21:17

HeartSaVioR added 9 commits April 20, 2026 06:19

reflect review comments

ee13139

fix

af84a09

simple assertion

1447d0c

empty commit to retrigger CI

bf1bf79

Reflect review comments from Anish

21fc683

Fix the bug found in apache#55265 (comment)

9e41e3f

code comment instead

751e667

empty commit to retrigger CI

b3e0380

HeartSaVioR force-pushed the SPARK-56402-on-top-of-SPARK-56369 branch from 05eeb66 to b3e0380 Compare April 19, 2026 21:21

HeartSaVioR closed this in 25e29fc Apr 20, 2026

		@@ -1105,6 +1105,67 @@ class SymmetricHashJoinStateManagerEventTimeInValueSuite
		}
		}

Conversation

HeartSaVioR commented Apr 9, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HeartSaVioR commented Apr 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eason-yuchen-liu Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eason-yuchen-liu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Apr 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Apr 19, 2026

Uh oh!

HeartSaVioR commented Apr 20, 2026

Uh oh!

HeartSaVioR commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eason-yuchen-liu Apr 14, 2026 •

edited

Loading

eason-yuchen-liu left a comment •

edited

Loading