feat: Add memory-limited execution for NestedLoopJoinExec#21448
Open
viirya wants to merge 3 commits intoapache:mainfrom
Open
feat: Add memory-limited execution for NestedLoopJoinExec#21448viirya wants to merge 3 commits intoapache:mainfrom
viirya wants to merge 3 commits intoapache:mainfrom
Conversation
Implement multi-pass execution strategy for NestedLoopJoinExec when the left (build) side exceeds the memory budget. Instead of failing with OOM, the operator now: 1. Buffers left-side data in chunks that fit within memory limits 2. Spills the right side to disk on the first pass via SpillManager 3. Re-reads the right side from the spill file for each subsequent left chunk This is enabled automatically when disk spilling is available and the right side has a single partition. Multi-partition right side falls back to the existing OnceFut-based path. Phase 1 supports INNER, LEFT, LEFT SEMI, LEFT ANTI, and LEFT MARK join types. RIGHT/FULL joins with global right bitmap tracking will follow in a later phase. Tracking issue: apache#15760 Co-authored-by: Isaac
RIGHT, FULL, RIGHT SEMI, RIGHT ANTI, and RIGHT MARK joins require tracking which right-side rows have been matched across ALL left chunks. The current implementation only tracks right-side matches per-batch within a single left chunk, which would silently produce incorrect results in multi-pass mode. Gate the memory-limited path on `!need_produce_right_in_final()` so these join types fall back to the standard OnceFut path. A global right bitmap spanning all left chunks will be added in Phase 3. Co-authored-by: Isaac
Instead of deciding the execution path at execute() time, always attempt to load all left data in memory via OnceFut first. If that fails with ResourcesExhausted, each partition independently falls back to memory-limited mode by: 1. Re-executing the left child to get a fresh stream 2. Setting up SpillManager for right-side spilling 3. Switching to incremental chunked loading This removes the right_partition_count == 1 restriction — fallback now works regardless of how many right partitions exist. Each partition independently re-executes the left child on OOM. The fallback is gated on: - Disk manager supports temp files - Join type supports multi-pass (!need_produce_right_in_final) Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
NestedLoopJoinExec currently fails with an OOM error when the left (build) side exceeds the memory budget. This PR adds a spill-to-disk fallback so the query can complete instead of crashing.
What changes are included in this PR?
When collect_left_input via OnceFut fails with ResourcesExhausted, each partition independently falls back to a memory-limited multi-pass strategy:
The fallback is transparent — if memory is sufficient, the existing OnceFut path is used with zero overhead. It is currently gated to join types that don't require global right-side bitmap tracking (INNER, LEFT, LEFT SEMI, LEFT ANTI, LEFT MARK). RIGHT/FULL joins retain the existing OOM behavior until adding a cross-chunk right bitmap.
Are these changes tested?
Unit tests
Are there any user-facing changes?
No