Skip to content

Forward-merge main into pandas3#22406

Merged
galipremsagar merged 12 commits into
pandas3from
main
May 7, 2026
Merged

Forward-merge main into pandas3#22406
galipremsagar merged 12 commits into
pandas3from
main

Conversation

@AyodeAwe
Copy link
Copy Markdown
Contributor

@AyodeAwe AyodeAwe commented May 7, 2026

Forward-merge triggered by automated cron job to keep pandas3 up-to-date with main.

If this PR has conflicts, it will remain open for manual resolution.

See forward-merger docs for more info.

vuule and others added 10 commits May 6, 2026 20:22
…der (#22387)

When the number of elements in the Avro block is stored as a negative number, the block also includes its size in bytes. This PR allows the reader to correctly parse such files.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: #22387
…ainers (#22338)

Set AWS_IDP_URL and update AWS_ROLE_ARN to use `token.rapids.nvidia.com`

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: #22338
Fixes #22136

This PR gueared the homogeneous numeric `DataFrame.to_cupy` fast path  so it only uses `table_to_array` when `dtype` is `None` or exactly matches the source column `dtype`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - https://github.com/apps/pre-commit-ci

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #22342
…22384)

The `cudf-polars-ir-signatures` pre-commit hook uses `language: python` but is just a local script (`./ci/check_cudf_polars_ir.py`) that only depends on stdlib modules (`ast`, `argparse`, `sys`, `typing`) and has a `#!/usr/bin/env python3` shebang.

With `language: python`, pre-commit unnecessarily creates a virtualenv for this hook. `language: script` is the correct setting — it runs the entry point directly as an executable, relying on the shebang for interpreter selection, with no virtualenv overhead.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #22384
This PR fixes a potential infinite loop in parquet page header count/decode kernels if case of malformed input.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Paul Mattione (https://github.com/pmattione-nvidia)

URL: #22274
…#22281)

closes #21466
closes #21767

Waiting for #22212

* Makes rapidsmpf a required dependency of cudf_polars
* Removes the following `StreamingExecutor` options as they were "experimental" with associated code paths
    * `StreamingExecutor.runtime`
    * `StreamingExecutor.shuffle_method`
    * `StreamingExecutor.unique_fraction`
    * `StreamingExecutor.groupby_n_ary`
    * `StreamingExecutor.rapidsmpf_spill`
* Removes the task runtime and associated tests
* Some tests we modified to only test 1 specific test configuration because of #22346 to pass these tests for now. Planning on revisiting this once rapidsmpf becomes the default

Ops-Bot-Merge-Barrier: true

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Murray (https://github.com/Matt711)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #22281
This PR uses the host worker pool to submit hybrid scan's host-read IO tasks so that the mutex can be safely released after submission.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Shruti Shivakumar (https://github.com/shrshi)

URL: #21992
Follow up #22144

Adds Python bindings for the `cudf::apply_deletion_mask` API and adds pytests for stream compaction.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Bradley Dice (https://github.com/bdice)
  - Matthew Murray (https://github.com/Matt711)

URL: #22145
@AyodeAwe AyodeAwe requested review from a team as code owners May 7, 2026 10:58
@AyodeAwe AyodeAwe requested review from KyleFromNVIDIA and rjzamora and removed request for a team May 7, 2026 10:58
@AyodeAwe
Copy link
Copy Markdown
Contributor Author

AyodeAwe commented May 7, 2026

FAILURE - Unable to forward-merge automatically, manual merge is necessary.

cc @Matt711 @galipremsagar @mroeschke

Do not use the Resolve conflicts option in this PR. Follow these instructions: https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

@AyodeAwe AyodeAwe requested review from pmattione-nvidia and wence- and removed request for a team May 7, 2026 10:58
@github-actions github-actions Bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue Java Affects Java cuDF API. labels May 7, 2026
@github-actions github-actions Bot added cudf-polars Issues specific to cudf-polars pylibcudf Issues specific to the pylibcudf package labels May 7, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python May 7, 2026
rjzamora and others added 2 commits May 7, 2026 13:52
- Follow up to #22315 - Further revises `sort_actor` in preparation for rapidsai/rapidsmpf#853
- Part of #22128
- Breaks apart `sort_actor` logic into modular steps, so we can avoid collecting boundaries when we already know the boundaries (future work).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #22350
…22381)

Builds on the cached `streaming_engines` fixture from #22364, which amortizes SPMD bootstrap via `_reset()`, and extends the same pattern to Dask and Ray.

With this change, the test matrix runs against:

`["in-memory", "spmd", "spmd-small", "dask", "ray"]`

subject to package availability and `rrun` gating.

We might change the different setups later, but for now CI runs:

| Engine        | Block Size(s)         | GPU Configuration |
|----------------|-----------------------|-------------------|
| `SPMDEngine`   | `"medium"`, `"small"` | Single GPU        |
| `DaskEngine`   | `"medium"`            | Single GPU        |
| `RayEngine`    | `"medium"`            | Two GPUs          |

Authors:
  - Mads R. B. Kristensen (https://github.com/madsbk)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Matthew Murray (https://github.com/Matt711)
  - Bradley Dice (https://github.com/bdice)
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #22381
@galipremsagar galipremsagar merged commit c93cf14 into pandas3 May 7, 2026
193 of 198 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake CMake build issue cudf-polars Issues specific to cudf-polars Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.