Skip to content

feat: Native Broadcast nested loop join support#4429

Open
coderfender wants to merge 17 commits into
apache:mainfrom
coderfender:broadcast_nested_join_loop_support
Open

feat: Native Broadcast nested loop join support#4429
coderfender wants to merge 17 commits into
apache:mainfrom
coderfender:broadcast_nested_join_loop_support

Conversation

@coderfender
Copy link
Copy Markdown
Contributor

@coderfender coderfender commented May 25, 2026

Which issue does this PR close?

Closes #198

Wires Spark's BroadcastNestedLoopJoinExec to DataFusion's NestedLoopJoinExec.

Rationale for this change

Native support for BroadcastNestedLoopJoin to improve native performance.

What changes are included in this PR?

How are these changes tested?

  1. Unit tests in CometJoinSuite
  2. Setup based benches

@coderfender coderfender changed the title feat: Broadcast nested join loop support feat: Native Broadcast nested join loop support May 25, 2026
@coderfender coderfender force-pushed the broadcast_nested_join_loop_support branch from 1ef4611 to 57c26b8 Compare May 25, 2026 17:40
@coderfender coderfender changed the title feat: Native Broadcast nested join loop support feat: Native Broadcast nested loop join support May 25, 2026
@coderfender
Copy link
Copy Markdown
Contributor Author

Bunch TPCDS queries seem to be using BNLJ warranting golden file regen

@coderfender coderfender marked this pull request as ready for review May 28, 2026 22:17
@coderfender
Copy link
Copy Markdown
Contributor Author

coderfender commented May 28, 2026

Benchmarks (local M5 pro) (Edited after running benchmarks with JIT disabled) :

Running benchmark: range join (BETWEEN)
  Running case: Spark
  Stopped after 23 iterations, 2043 ms
  Running case: Comet
  Stopped after 33 iterations, 2051 ms

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 26.3.2
Apple M5 Pro
range join (BETWEEN):                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                73             89          10         14.4          69.7       1.0X
Comet                                                49             62          11         21.4          46.8       1.5X

Running benchmark: inequality join (>)
  Running case: Spark
  Stopped after 21 iterations, 2038 ms
  Running case: Comet
  Stopped after 34 iterations, 2028 ms

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 26.3.2
Apple M5 Pro
inequality join (>):                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                78             97          19         13.4          74.4       1.0X
Comet                                                48             60           6         21.8          46.0       1.6X

Running benchmark: left outer non-equi
  Running case: Spark
  Stopped after 27 iterations, 2028 ms
  Running case: Comet
  Stopped after 35 iterations, 2028 ms

OpenJDK 64-Bit Server VM 17.0.16+8-LTS on Mac OS X 26.3.2
Apple M5 Pro
left outer non-equi:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                63             75          11         16.6          60.1       1.0X
Comet                                                42             58          16         25.2          39.7       1.5X

@coderfender
Copy link
Copy Markdown
Contributor Author

TODO : Build vs Probe swap to DF's BNLJ op

@coderfender
Copy link
Copy Markdown
Contributor Author

Investigating test failures

@coderfender coderfender force-pushed the broadcast_nested_join_loop_support branch from eed19ae to eb3fe00 Compare May 29, 2026 22:29
@coderfender coderfender force-pushed the broadcast_nested_join_loop_support branch 2 times, most recently from 7221f56 to d5a4bf5 Compare May 29, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support BroadcastNestedLoopJoinExec

1 participant