Skip to content

restore conda-python-tests on CUDA 13#395

Merged
rapids-bot[bot] merged 17 commits into
rapidsai:mainfrom
jameslamb:cuda13
Feb 3, 2026
Merged

restore conda-python-tests on CUDA 13#395
rapids-bot[bot] merged 17 commits into
rapidsai:mainfrom
jameslamb:cuda13

Conversation

@jameslamb
Copy link
Copy Markdown
Member

@jameslamb jameslamb commented Jan 30, 2026

Closes #296

Restores CUDA 13 conda test CI jobs, now that there are conda-forge PyTorch packages with CUDA 13 support (conda-forge/pytorch-cpu-feedstock#477)

Also modifies pytorch conda dependency to meet these requirements:

  • cugraph-pyg must be installable on a system without a GPU
  • cugraph-pyg's tests require CUDA-enabled builds of PyTorch

With the following mix of things:

  • add a require_gpu matrix filter in dependencies.yaml which pulls in pytorch-gpu opted-into in test CI jobs but otherwise not
    • conda-forge::pytorch-gpu is a metapackage that forces the installation of CUDA variants of conda-forge::pytorch... that should replace the "accidentally pulled in a CPU-only variant" case with a loud, clear conda solver error
  • depend on mkl in the test x86_64 environment but without version constraints
    • allow pytorch to declare its range of compatible mkl versions
    • this still prevents OpenBLAS variants from getting installed, which I think was part of the goal of [FIX] Add mkl version, limit tensordict to 0.6.2 #161
    • keeping this out of cugraph-pyg's dependencies still makes it possible to install alongside nomkl, even though that combination is untested
  • add comments in the cugraph-pyg conda recipe explaining why it doesn't depend on pytorch-gpu

I hope this will be a relatively future-proof way to guarantee CI here keeps picking up the PyTorch versions this project wants to tet against.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 30, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 30, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Comment thread conda/recipes/cugraph-pyg/recipe.yaml Outdated
- pylibwholegraph =${{ minor_version }}
- python
- pytorch >=2.3
- pytorch-gpu >=2.3
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a thread here.

I'd tried this because the first CI run pulled in CPU-only packages, which failed like this:

libtorch                                 2.5.1  cpu_mkl_h791ef64_107          conda-forge              53MB
...
E           AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
...
==== 385 failed, 1 passed, 468 skipped, 1 warning in 193.01s (0:03:13) ===

(build link)

Recall that we discussed a while ago not wanting to depend on pytorch-gpu: #99 (comment)

Because it requires __cuda in the install environment, which won't be the case in devcontainers or generally when people build container images (just as 2 examples, there are others).

At a minimum we'll probably want to constrain pytorch to GPU builds here in CI. I think it would also be helpful to raise a more informative error in this case, as I recommended back in #99 (comment)

I can put up an issue or PR about that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conda devcontainers failed in exactly the way described in those comments, as expected.

Could not solve for environment specs
The following package could not be installed
└─ pytorch-gpu >=2.10 * is not installable because it requires
   └─ pytorch ==2.10.0 cuda*_generic*200, which requires
      └─ __cuda =* *, which is missing on the system.

(build link)

That's ok, I knew explicitly adding pytorch-gpu here wasn't going to be exactly what we wanted. Just confirming that that's still the behavior.

Comment thread dependencies.yaml
common:
- output_types: conda
packages:
- mkl<2024.1.0
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been added back in #161 to fix another "getting CPU PyTorch build when we don't want that" type of problem, but it's causing issues for PyTorch 2.10 (the first version on conda-forge to support CUDA 13):

error    libmamba Could not solve for environment specs
    The following packages are incompatible
    ├─ cuda-version =13.1 * is installable and it requires
    │  └─ cudatoolkit ==13.1|=13.1 *, which can be installed;
    ├─ cugraph-pyg =26.4,>=0.0.0a0 * is installable with the potential options
    │  ├─ cugraph-pyg 26.04.00a10 would require
    │  │  └─ pytorch-gpu >=2.3 * with the potential options
    │  │     ├─ pytorch-gpu [2.10.0|2.7.1|2.8.0|2.9.1] would require
    │  │     │  └─ pytorch [==2.10.0 cuda*_generic*200|==2.10.0 cuda*_mkl*300|...|==2.9.1 cuda*_mkl*303], which requires
    │  │     │     └─ cuda-version >=12.9,<13 *, which conflicts with any installable versions previously reported;
    │  │     ├─ pytorch-gpu 2.10.0 would require
    │  │     │  └─ pytorch ==2.10.0 cuda*_generic*201 with the potential options
    │  │     │     ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
    │  │     │     └─ pytorch 2.10.0 would require
    │  │     │        └─ nomkl =* *, which requires
    │  │     │           └─ mkl <0.a0 *, which can be installed;
    │  │     ├─ pytorch-gpu 2.10.0 would require
    │  │     │  └─ pytorch ==2.10.0 cuda*_mkl*301 with the potential options
    │  │     │     ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
    │  │     │     └─ pytorch 2.10.0 would require
    │  │     │        └─ mkl >=2025.3.0,<2026.0a0 *, which can be installed;
    │  │     ├─ pytorch-gpu [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1] would require
    │  │     │  └─ pytorch [==2.3.0 cuda118_py311h4ee7bbc_301|==2.3.0 cuda118_py39hd44be3b_300|...|==2.5.1 cuda118_py313h0a01257_300], which requires
    │  │     │     └─ cudatoolkit [>=11.8,<12 *|>=11.8.0,<12.0a0 *], which conflicts with any installable versions previously reported;
    │  │     ├─ pytorch-gpu [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1] would require
    │  │     │  └─ pytorch [==2.3.0 cuda120_py312h26b3cf7_301|==2.3.0 cuda120_py38heb61fd4_300|...|==2.5.1 cuda120_py312h6defd05_300], which requires
    │  │     │     └─ cuda-version >=12.0,<13 *, which conflicts with any installable versions previously reported;
    │  │     ├─ pytorch-gpu [2.4.1|2.5.1] would require
    │  │     │  └─ pytorch [==2.4.1 cuda*_mkl*306|==2.5.1 cuda*302|==2.5.1 cuda*303] but there are no viable options
    │  │     │     ├─ pytorch [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1], which cannot be installed (as previously explained);
    │  │     │     ├─ pytorch [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1], which cannot be installed (as previously explained);
    │  │     │     └─ pytorch [2.5.1|2.6.0|2.7.0|2.7.1] would require
    │  │     │        └─ cuda-version >=12.6,<13 *, which conflicts with any installable versions previously reported;
    │  │     └─ pytorch-gpu [2.5.1|2.6.0|2.7.0|2.7.1] would require
    │  │        └─ pytorch [==2.5.1 cuda*304|==2.5.1 cuda*305|...|==2.7.1 cuda*_mkl*300], which cannot be installed (as previously explained);
    │  └─ cugraph-pyg [26.04.00a3|26.04.00a4|26.04.00a5|26.04.00a6] conflicts with any installable versions previously reported;
    └─ mkl <2024.1.0 * is not installable because it conflicts with any installable versions previously reported.

(build link)

Let's see what happens when it's removed.

Copy link
Copy Markdown
Member Author

@jameslamb jameslamb Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loosening this up did allow the GPU-enabled pytorch to be pulled in and all tests to pass on CUDA 13.

  + cuda-version                              13.1  h2ff5cdb_3                      conda-forge              22kB
...
  + libtorch                                2.10.0  cuda130_mkl_hfedd1fc_301        conda-forge             491MB
...
  + mkl                                   2025.3.0  h0e700b2_463                    conda-forge             126MB
...
  + pytorch                                 2.10.0  cuda130_mkl_py313_h06a2bf6_301  conda-forge              28MB
  + pytorch-gpu                             2.10.0  cuda129_mkl_h0d04637_301        conda-forge              52kB
  + pytorch_geometric                        2.6.1  pyhecae5ae_2                    conda-forge             598kB\
...
  + torchdata                               0.11.0  py313h9d3d25e_0                 conda-forge             151kB

@jameslamb
Copy link
Copy Markdown
Member Author

/ok to test

Comment thread dependencies.yaml
- matrix:
require_gpu: "true"
packages:
- pytorch-gpu
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not adding a pin here, so this only acts to guarantee a CUDA variant of pytorch is installed.

Limiting the range of pytorch versions is handled by the run: dependencies of cugraph-pyg.

@jameslamb jameslamb changed the title WIP: restore conda-python-tests on CUDA 13 restore conda-python-tests on CUDA 13 Feb 1, 2026
@jameslamb jameslamb requested a review from jakirkham February 1, 2026 04:59
@jameslamb jameslamb marked this pull request as ready for review February 1, 2026 04:59
@jameslamb jameslamb requested review from a team as code owners February 1, 2026 04:59
@greptile-apps

This comment was marked as spam.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment thread dependencies.yaml Outdated
Comment on lines +354 to +355
# prefer 'mkl' to 'openblas' on x86_64... this helps constrain
# compatible pytorch and other libraries together
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw the comment below about removing mkl upper bounds. What happens if mkl is removed completely?

Copy link
Copy Markdown
Member Author

@jameslamb jameslamb Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment happens to solve to the mkl variants anyway and CI passes here.

I say "happens to" because nothing in this project is directly constraining that... some change in some dependency tomorrow could lead to a solve with the nomkl / OpenBLAS line of packages instead.

And I don't know whether it'd be acceptable for nomkl / OpenBLAS sets of packages to get pulled in here, as the changes from #161 mean that hasn't been tested in CI here for at least 10 months (and I think never directly tested).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure but I think we want to accept the nomkl variants here too. Maybe we can do one ad-hoc CI run here with that constraint to check, and then relax it. It's okay to default to the mkl variants, but I don't think we should have a strong preference between them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah happy to try that, let's see what happens: 6a94b62

Copy link
Copy Markdown
Member Author

@jameslamb jameslamb Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like cugraph-pyg and pytorch-gpu are installable with nomkl for CUDA 12.9, 13.0 and 13.1 and that the tests here pass.

For example, on CUDA 13.0.2 (x86_64) I see the following for cugraph-pyg:

  + libtorch                                2.10.0  cuda130_generic_hdd464c9_201        conda-forge             491MB
...
  + nomkl                                      1.0  h5ca1d4c_0                          conda-forge               4kB
...
  + pytorch                                 2.10.0  cuda130_generic_py312_hbdc3359_201  conda-forge              27MB
...
=== 139 passed, 1 skipped, 19 warnings in 286.58s (0:04:46) ===

(build link)

In CUDA 12.2.2 x86_64 environments, the solver isn't able to find a compatible mix of packages:

full solver output (click me)
error    libmamba Could not solve for environment specs
    The following packages are incompatible
    ├─ cuda-version =12.2 * is requested and can be installed;
    ├─ nomkl =* * is installable and it requires
    │  └─ mkl <0.a0 *, which can be installed;
    └─ pytorch-gpu =* * is not installable because there are no viable options
       ├─ pytorch-gpu [1.13.0|1.13.1|2.0.0|2.1.0] would require
       │  └─ pytorch [==1.13.0 cuda112py310he33e0d6_200|==1.13.0 cuda112py311h13fee9e_200|...|==2.1.0 cuda120py39hf872c3d_301], which requires
       │     └─ mkl >=2022.2.1,<2023.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [2.1.0|2.1.2|...|2.5.1] would require
       │  └─ pytorch [==2.1.0 cuda112_py310hce1e03f_302|==2.1.0 cuda112_py39h53f755c_303|...|==2.5.1 cuda126_py313ha14af55_301], which requires
       │     └─ mkl >=2023.2.0,<2024.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu 2.1.0 would require
       │  └─ pytorch ==2.1.0 cuda120py310ha3a684c_300, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 2.1.0 would require
       │  └─ pytorch ==2.1.0 cuda120py311h513d03c_300, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 2.1.0 would require
       │  └─ pytorch ==2.1.0 cuda120py312hfe5e8c6_300, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 2.1.0 would require
       │  └─ pytorch ==2.1.0 cuda120py38h1932296_300, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 2.1.0 would require
       │  └─ pytorch ==2.1.0 cuda120py39hf872c3d_300, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu [2.10.0|2.7.1|2.8.0|2.9.1] would require
       │  └─ pytorch [==2.10.0 cuda*_generic*200|==2.7.1 cuda*_generic*201|...|==2.9.1 cuda*_generic*204], which requires
       │     └─ cuda-version >=12.9,<13 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu 2.10.0 would require
       │  └─ pytorch ==2.10.0 cuda*_generic*201 but there are no viable options
       │     ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
       │     └─ pytorch 2.10.0 would require
       │        └─ cuda-version >=13.0,<14 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [2.10.0|2.8.0|2.9.1] would require
       │  └─ pytorch [==2.10.0 cuda*_mkl*300|==2.10.0 cuda*_mkl*301|...|==2.9.1 cuda*_mkl*304], which requires
       │     └─ mkl >=2025.3.0,<2026.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [2.4.1|2.5.1|...|2.8.0] would require
       │  └─ pytorch [==2.4.1 cuda*_mkl*306|==2.4.1 cuda118_*_305|...|==2.8.0 cuda*_mkl*301], which requires
       │     └─ mkl >=2024.2.2,<2025.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [2.5.1|2.6.0|2.7.0|2.7.1] would require
       │  └─ pytorch [==2.5.1 cuda*_generic*207|==2.5.1 cuda*_generic*208|...|==2.7.1 cuda*_generic*200], which requires
       │     └─ cuda-version >=12.6,<13 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu 1.10.0 would require
       │  └─ pytorch [==1.10.0 cuda102py37h689c94d_1|==1.10.0 cuda102py37h98b7ee3_0|...|==1.10.0 cuda112py39h4e14dd4_0], which requires
       │     └─ mkl >=2021.4.0,<2022.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [1.10.1|1.10.2|1.11.0|1.12.0] would require
       │  └─ pytorch [==1.10.1 cuda102py37hc804c4d_0|==1.10.1 cuda102py38h9fb240c_0|...|==1.12.0 cuda112py39ha0cca9b_200], which requires
       │     └─ mkl [=2022 *|>=2022.0.1,<2023.0a0 *], which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda102py37hc804c4d_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda102py38h9fb240c_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda102py39hfe0cb5b_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda110py37h4121e64_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda110py38hf0a79ac_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda110py39he47eb21_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda111py37hc0ce48b_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda111py38hc64aeea_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda111py39h930882a_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda112py37hc1ee5ce_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda112py38h6425f36_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu 1.10.2 would require
       │  └─ pytorch ==1.10.2 cuda112py39h4de5995_0, which does not exist (perhaps a missing channel);
       ├─ pytorch-gpu [1.12.0|1.12.1] would require
       │  └─ pytorch [==1.12.0 cuda102py310hdf4a2db_202|==1.12.0 cuda102py37haad9b4f_202|...|==1.12.1 cuda112py39hb0b7ed5_201], which requires
       │     └─ mkl >=2022.1.0,<2023.0a0 *, which conflicts with any installable versions previously reported;
       ├─ pytorch-gpu [1.6.0|1.7.1|1.8.0] would require
       │  └─ pytorch [==1.6.0 cuda100py36hd82b6f9_1|==1.6.0 cuda100py37h50b9e00_1|...|==1.8.0 cuda112py39h716d6ff_1], which requires
       │     └─ mkl >=2020.4,<2021.0a0 *, which conflicts with any installable versions previously reported;
       └─ pytorch-gpu [1.9.0|1.9.1] would require
          └─ pytorch [==1.9.0 cuda102py36he3537ca_1|==1.9.0 cuda102py37h92fd811_1|...|==1.9.1 cuda112py39h4e14dd4_3], which requires
             └─ mkl >=2021.3.0,<2022.0a0 *, which conflicts with any installable versions previously reported.
critical libmamba Could not solve for environment specs

(build link)

But that's fine, I think the tests passing in environments with nomkl on CUDA 12.9, 13.0, and 13.1 is enough justification to just remove this mkl constraint entirely here and let either family of packages be chosen. It seems cugraph-pyg isn't sensitive to that choice (at least it any ways observable in its unit tests).

Pushed 727d89e, now there won't be any mkl / nomkl constraints explicitly in this project at all.

Comment thread dependencies.yaml Outdated
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread conda/recipes/cugraph-pyg/recipe.yaml Outdated
# because we want it to be possible to at least install `cugraph-pyg` in an environment without a GPU,
# to support use cases like building container images.
- pytorch >=2.3
- pytorch_geometric >=2.5,<2.7
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version constraint mismatch: dependencies.yaml:417 specifies pytorch_geometric>=2.5,<2.8 but this file uses <2.7

Suggested change
- pytorch_geometric >=2.5,<2.7
- pytorch_geometric >=2.5,<2.8

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, looks like we missed that in #360

Updated in e67bef1

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@jameslamb jameslamb requested a review from bdice February 2, 2026 17:38
Copy link
Copy Markdown
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thanks for getting this done!

@jameslamb
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot Bot merged commit b578a28 into rapidsai:main Feb 3, 2026
77 checks passed
@jameslamb jameslamb deleted the cuda13 branch February 3, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: add CUDA 13 conda python tests

3 participants