Skip to content

Commit 7661629

Browse files
Correct stream ordered deallocation in Join (rapidsai#20981)
Our `Join.do_evaluate` function had a stream-ordering bug. We have 3 streams at play 1. left 2. right 3. result The result depends on the left / right inputs, so we need to ensure that the deallocation of left / right happen after `result` is ready. We included a `join_cuda_streams` just before the `return` to ensure that it works. Unfortunately, we had reassigned `left` and `right` to new dataframes which were on the `result` stream, so all the streams were the same, so we didn't actually accomplish what we wanted. The fix is to hold a reference to the original streams and use that. This hopefully fixes a flaky test we've observed in cudf-polars with the rapidsmpf runtime (https://github.com/rapidsai/cudf/actions/runs/20222481433/job/58046997204#step:12:1639). Authors: - Tom Augspurger (https://github.com/TomAugspurger) Approvers: - Bradley Dice (https://github.com/bdice) - Richard (Rick) Zamora (https://github.com/rjzamora) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#20981
1 parent 0b65607 commit 7661629

1 file changed

Lines changed: 11 additions & 4 deletions

File tree

  • python/cudf_polars/cudf_polars/dsl

python/cudf_polars/cudf_polars/dsl/ir.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
1+
# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES.
22
# SPDX-License-Identifier: Apache-2.0
33
"""
44
DSL nodes for the LogicalPlan of polars.
@@ -2384,8 +2384,14 @@ def do_evaluate(
23842384
context: IRExecutionContext,
23852385
) -> DataFrame:
23862386
"""Evaluate and return a dataframe."""
2387+
# Save the original streams before any reassignments, since we need
2388+
# them for the final join_cuda_streams call to ensure proper stream
2389+
# ordering for deallocations.
2390+
original_left_stream = left.stream
2391+
original_right_stream = right.stream
23872392
stream = get_joined_cuda_stream(
2388-
context.get_cuda_stream, upstreams=(left.stream, right.stream)
2393+
context.get_cuda_stream,
2394+
upstreams=(original_left_stream, original_right_stream),
23892395
)
23902396
how, nulls_equal, zlice, suffix, coalesce, maintain_order = options
23912397
if how == "Cross":
@@ -2536,9 +2542,10 @@ def do_evaluate(
25362542
result = result.slice(zlice)
25372543

25382544
# Join the original streams back into the result stream to ensure that the
2539-
# deallocations (on the original streams) happen after the result is ready
2545+
# deallocations (on the original streams) happen after the result is ready.
25402546
join_cuda_streams(
2541-
downstreams=(left.stream, right.stream), upstreams=(result.stream,)
2547+
downstreams=(original_left_stream, original_right_stream),
2548+
upstreams=(result.stream,),
25422549
)
25432550

25442551
return result

0 commit comments

Comments
 (0)