restore conda-python-tests on CUDA 13#395
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| - pylibwholegraph =${{ minor_version }} | ||
| - python | ||
| - pytorch >=2.3 | ||
| - pytorch-gpu >=2.3 |
There was a problem hiding this comment.
Starting a thread here.
I'd tried this because the first CI run pulled in CPU-only packages, which failed like this:
libtorch 2.5.1 cpu_mkl_h791ef64_107 conda-forge 53MB
...
E AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
...
==== 385 failed, 1 passed, 468 skipped, 1 warning in 193.01s (0:03:13) ===
Recall that we discussed a while ago not wanting to depend on pytorch-gpu: #99 (comment)
Because it requires __cuda in the install environment, which won't be the case in devcontainers or generally when people build container images (just as 2 examples, there are others).
At a minimum we'll probably want to constrain pytorch to GPU builds here in CI. I think it would also be helpful to raise a more informative error in this case, as I recommended back in #99 (comment)
I can put up an issue or PR about that.
There was a problem hiding this comment.
The conda devcontainers failed in exactly the way described in those comments, as expected.
Could not solve for environment specs
The following package could not be installed
└─ pytorch-gpu >=2.10 * is not installable because it requires
└─ pytorch ==2.10.0 cuda*_generic*200, which requires
└─ __cuda =* *, which is missing on the system.
That's ok, I knew explicitly adding pytorch-gpu here wasn't going to be exactly what we wanted. Just confirming that that's still the behavior.
| common: | ||
| - output_types: conda | ||
| packages: | ||
| - mkl<2024.1.0 |
There was a problem hiding this comment.
This had been added back in #161 to fix another "getting CPU PyTorch build when we don't want that" type of problem, but it's causing issues for PyTorch 2.10 (the first version on conda-forge to support CUDA 13):
error libmamba Could not solve for environment specs
The following packages are incompatible
├─ cuda-version =13.1 * is installable and it requires
│ └─ cudatoolkit ==13.1|=13.1 *, which can be installed;
├─ cugraph-pyg =26.4,>=0.0.0a0 * is installable with the potential options
│ ├─ cugraph-pyg 26.04.00a10 would require
│ │ └─ pytorch-gpu >=2.3 * with the potential options
│ │ ├─ pytorch-gpu [2.10.0|2.7.1|2.8.0|2.9.1] would require
│ │ │ └─ pytorch [==2.10.0 cuda*_generic*200|==2.10.0 cuda*_mkl*300|...|==2.9.1 cuda*_mkl*303], which requires
│ │ │ └─ cuda-version >=12.9,<13 *, which conflicts with any installable versions previously reported;
│ │ ├─ pytorch-gpu 2.10.0 would require
│ │ │ └─ pytorch ==2.10.0 cuda*_generic*201 with the potential options
│ │ │ ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
│ │ │ └─ pytorch 2.10.0 would require
│ │ │ └─ nomkl =* *, which requires
│ │ │ └─ mkl <0.a0 *, which can be installed;
│ │ ├─ pytorch-gpu 2.10.0 would require
│ │ │ └─ pytorch ==2.10.0 cuda*_mkl*301 with the potential options
│ │ │ ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
│ │ │ └─ pytorch 2.10.0 would require
│ │ │ └─ mkl >=2025.3.0,<2026.0a0 *, which can be installed;
│ │ ├─ pytorch-gpu [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1] would require
│ │ │ └─ pytorch [==2.3.0 cuda118_py311h4ee7bbc_301|==2.3.0 cuda118_py39hd44be3b_300|...|==2.5.1 cuda118_py313h0a01257_300], which requires
│ │ │ └─ cudatoolkit [>=11.8,<12 *|>=11.8.0,<12.0a0 *], which conflicts with any installable versions previously reported;
│ │ ├─ pytorch-gpu [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1] would require
│ │ │ └─ pytorch [==2.3.0 cuda120_py312h26b3cf7_301|==2.3.0 cuda120_py38heb61fd4_300|...|==2.5.1 cuda120_py312h6defd05_300], which requires
│ │ │ └─ cuda-version >=12.0,<13 *, which conflicts with any installable versions previously reported;
│ │ ├─ pytorch-gpu [2.4.1|2.5.1] would require
│ │ │ └─ pytorch [==2.4.1 cuda*_mkl*306|==2.5.1 cuda*302|==2.5.1 cuda*303] but there are no viable options
│ │ │ ├─ pytorch [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1], which cannot be installed (as previously explained);
│ │ │ ├─ pytorch [2.3.0|2.3.1|2.4.0|2.4.1|2.5.1], which cannot be installed (as previously explained);
│ │ │ └─ pytorch [2.5.1|2.6.0|2.7.0|2.7.1] would require
│ │ │ └─ cuda-version >=12.6,<13 *, which conflicts with any installable versions previously reported;
│ │ └─ pytorch-gpu [2.5.1|2.6.0|2.7.0|2.7.1] would require
│ │ └─ pytorch [==2.5.1 cuda*304|==2.5.1 cuda*305|...|==2.7.1 cuda*_mkl*300], which cannot be installed (as previously explained);
│ └─ cugraph-pyg [26.04.00a3|26.04.00a4|26.04.00a5|26.04.00a6] conflicts with any installable versions previously reported;
└─ mkl <2024.1.0 * is not installable because it conflicts with any installable versions previously reported.
Let's see what happens when it's removed.
There was a problem hiding this comment.
Loosening this up did allow the GPU-enabled pytorch to be pulled in and all tests to pass on CUDA 13.
+ cuda-version 13.1 h2ff5cdb_3 conda-forge 22kB
...
+ libtorch 2.10.0 cuda130_mkl_hfedd1fc_301 conda-forge 491MB
...
+ mkl 2025.3.0 h0e700b2_463 conda-forge 126MB
...
+ pytorch 2.10.0 cuda130_mkl_py313_h06a2bf6_301 conda-forge 28MB
+ pytorch-gpu 2.10.0 cuda129_mkl_h0d04637_301 conda-forge 52kB
+ pytorch_geometric 2.6.1 pyhecae5ae_2 conda-forge 598kB\
...
+ torchdata 0.11.0 py313h9d3d25e_0 conda-forge 151kB
|
/ok to test |
| - matrix: | ||
| require_gpu: "true" | ||
| packages: | ||
| - pytorch-gpu |
There was a problem hiding this comment.
Intentionally not adding a pin here, so this only acts to guarantee a CUDA variant of pytorch is installed.
Limiting the range of pytorch versions is handled by the run: dependencies of cugraph-pyg.
This comment was marked as spam.
This comment was marked as spam.
| # prefer 'mkl' to 'openblas' on x86_64... this helps constrain | ||
| # compatible pytorch and other libraries together |
There was a problem hiding this comment.
I saw the comment below about removing mkl upper bounds. What happens if mkl is removed completely?
There was a problem hiding this comment.
The environment happens to solve to the mkl variants anyway and CI passes here.
I say "happens to" because nothing in this project is directly constraining that... some change in some dependency tomorrow could lead to a solve with the nomkl / OpenBLAS line of packages instead.
And I don't know whether it'd be acceptable for nomkl / OpenBLAS sets of packages to get pulled in here, as the changes from #161 mean that hasn't been tested in CI here for at least 10 months (and I think never directly tested).
There was a problem hiding this comment.
I'm not 100% sure but I think we want to accept the nomkl variants here too. Maybe we can do one ad-hoc CI run here with that constraint to check, and then relax it. It's okay to default to the mkl variants, but I don't think we should have a strong preference between them.
There was a problem hiding this comment.
Ok yeah happy to try that, let's see what happens: 6a94b62
There was a problem hiding this comment.
It looks to me like cugraph-pyg and pytorch-gpu are installable with nomkl for CUDA 12.9, 13.0 and 13.1 and that the tests here pass.
For example, on CUDA 13.0.2 (x86_64) I see the following for cugraph-pyg:
+ libtorch 2.10.0 cuda130_generic_hdd464c9_201 conda-forge 491MB
...
+ nomkl 1.0 h5ca1d4c_0 conda-forge 4kB
...
+ pytorch 2.10.0 cuda130_generic_py312_hbdc3359_201 conda-forge 27MB
...
=== 139 passed, 1 skipped, 19 warnings in 286.58s (0:04:46) ===
In CUDA 12.2.2 x86_64 environments, the solver isn't able to find a compatible mix of packages:
full solver output (click me)
error libmamba Could not solve for environment specs
The following packages are incompatible
├─ cuda-version =12.2 * is requested and can be installed;
├─ nomkl =* * is installable and it requires
│ └─ mkl <0.a0 *, which can be installed;
└─ pytorch-gpu =* * is not installable because there are no viable options
├─ pytorch-gpu [1.13.0|1.13.1|2.0.0|2.1.0] would require
│ └─ pytorch [==1.13.0 cuda112py310he33e0d6_200|==1.13.0 cuda112py311h13fee9e_200|...|==2.1.0 cuda120py39hf872c3d_301], which requires
│ └─ mkl >=2022.2.1,<2023.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [2.1.0|2.1.2|...|2.5.1] would require
│ └─ pytorch [==2.1.0 cuda112_py310hce1e03f_302|==2.1.0 cuda112_py39h53f755c_303|...|==2.5.1 cuda126_py313ha14af55_301], which requires
│ └─ mkl >=2023.2.0,<2024.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu 2.1.0 would require
│ └─ pytorch ==2.1.0 cuda120py310ha3a684c_300, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 2.1.0 would require
│ └─ pytorch ==2.1.0 cuda120py311h513d03c_300, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 2.1.0 would require
│ └─ pytorch ==2.1.0 cuda120py312hfe5e8c6_300, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 2.1.0 would require
│ └─ pytorch ==2.1.0 cuda120py38h1932296_300, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 2.1.0 would require
│ └─ pytorch ==2.1.0 cuda120py39hf872c3d_300, which does not exist (perhaps a missing channel);
├─ pytorch-gpu [2.10.0|2.7.1|2.8.0|2.9.1] would require
│ └─ pytorch [==2.10.0 cuda*_generic*200|==2.7.1 cuda*_generic*201|...|==2.9.1 cuda*_generic*204], which requires
│ └─ cuda-version >=12.9,<13 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu 2.10.0 would require
│ └─ pytorch ==2.10.0 cuda*_generic*201 but there are no viable options
│ ├─ pytorch [2.10.0|2.7.1|2.8.0|2.9.1], which cannot be installed (as previously explained);
│ └─ pytorch 2.10.0 would require
│ └─ cuda-version >=13.0,<14 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [2.10.0|2.8.0|2.9.1] would require
│ └─ pytorch [==2.10.0 cuda*_mkl*300|==2.10.0 cuda*_mkl*301|...|==2.9.1 cuda*_mkl*304], which requires
│ └─ mkl >=2025.3.0,<2026.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [2.4.1|2.5.1|...|2.8.0] would require
│ └─ pytorch [==2.4.1 cuda*_mkl*306|==2.4.1 cuda118_*_305|...|==2.8.0 cuda*_mkl*301], which requires
│ └─ mkl >=2024.2.2,<2025.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [2.5.1|2.6.0|2.7.0|2.7.1] would require
│ └─ pytorch [==2.5.1 cuda*_generic*207|==2.5.1 cuda*_generic*208|...|==2.7.1 cuda*_generic*200], which requires
│ └─ cuda-version >=12.6,<13 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu 1.10.0 would require
│ └─ pytorch [==1.10.0 cuda102py37h689c94d_1|==1.10.0 cuda102py37h98b7ee3_0|...|==1.10.0 cuda112py39h4e14dd4_0], which requires
│ └─ mkl >=2021.4.0,<2022.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [1.10.1|1.10.2|1.11.0|1.12.0] would require
│ └─ pytorch [==1.10.1 cuda102py37hc804c4d_0|==1.10.1 cuda102py38h9fb240c_0|...|==1.12.0 cuda112py39ha0cca9b_200], which requires
│ └─ mkl [=2022 *|>=2022.0.1,<2023.0a0 *], which conflicts with any installable versions previously reported;
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda102py37hc804c4d_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda102py38h9fb240c_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda102py39hfe0cb5b_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda110py37h4121e64_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda110py38hf0a79ac_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda110py39he47eb21_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda111py37hc0ce48b_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda111py38hc64aeea_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda111py39h930882a_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda112py37hc1ee5ce_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda112py38h6425f36_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu 1.10.2 would require
│ └─ pytorch ==1.10.2 cuda112py39h4de5995_0, which does not exist (perhaps a missing channel);
├─ pytorch-gpu [1.12.0|1.12.1] would require
│ └─ pytorch [==1.12.0 cuda102py310hdf4a2db_202|==1.12.0 cuda102py37haad9b4f_202|...|==1.12.1 cuda112py39hb0b7ed5_201], which requires
│ └─ mkl >=2022.1.0,<2023.0a0 *, which conflicts with any installable versions previously reported;
├─ pytorch-gpu [1.6.0|1.7.1|1.8.0] would require
│ └─ pytorch [==1.6.0 cuda100py36hd82b6f9_1|==1.6.0 cuda100py37h50b9e00_1|...|==1.8.0 cuda112py39h716d6ff_1], which requires
│ └─ mkl >=2020.4,<2021.0a0 *, which conflicts with any installable versions previously reported;
└─ pytorch-gpu [1.9.0|1.9.1] would require
└─ pytorch [==1.9.0 cuda102py36he3537ca_1|==1.9.0 cuda102py37h92fd811_1|...|==1.9.1 cuda112py39h4e14dd4_3], which requires
└─ mkl >=2021.3.0,<2022.0a0 *, which conflicts with any installable versions previously reported.
critical libmamba Could not solve for environment specs
But that's fine, I think the tests passing in environments with nomkl on CUDA 12.9, 13.0, and 13.1 is enough justification to just remove this mkl constraint entirely here and let either family of packages be chosen. It seems cugraph-pyg isn't sensitive to that choice (at least it any ways observable in its unit tests).
Pushed 727d89e, now there won't be any mkl / nomkl constraints explicitly in this project at all.
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
| # because we want it to be possible to at least install `cugraph-pyg` in an environment without a GPU, | ||
| # to support use cases like building container images. | ||
| - pytorch >=2.3 | ||
| - pytorch_geometric >=2.5,<2.7 |
There was a problem hiding this comment.
Version constraint mismatch: dependencies.yaml:417 specifies pytorch_geometric>=2.5,<2.8 but this file uses <2.7
| - pytorch_geometric >=2.5,<2.7 | |
| - pytorch_geometric >=2.5,<2.8 |
alexbarghi-nv
left a comment
There was a problem hiding this comment.
Approved, thanks for getting this done!
|
/merge |
Closes #296
Restores CUDA 13 conda test CI jobs, now that there are conda-forge PyTorch packages with CUDA 13 support (conda-forge/pytorch-cpu-feedstock#477)
Also modifies
pytorchconda dependency to meet these requirements:cugraph-pygmust be installable on a system without a GPUcugraph-pyg's tests require CUDA-enabled builds of PyTorchWith the following mix of things:
require_gpumatrix filter independencies.yamlwhich pulls inpytorch-gpuopted-into in test CI jobs but otherwise notconda-forge::pytorch-gpuis a metapackage that forces the installation of CUDA variants ofconda-forge::pytorch... that should replace the "accidentally pulled in a CPU-only variant" case with a loud, clear conda solver errormklin the test x86_64 environment but without version constraintspytorchto declare its range of compatiblemklversionscugraph-pyg's dependencies still makes it possible to install alongsidenomkl, even though that combination is untestedcugraph-pygconda recipe explaining why it doesn't depend onpytorch-gpuI hope this will be a relatively future-proof way to guarantee CI here keeps picking up the PyTorch versions this project wants to tet against.