Skip to content

Triton bump, py3.14 + CUDA 13.0#477

Merged
h-vetinari merged 14 commits into
conda-forge:mainfrom
mgorny:triton-py314-cuda13
Jan 29, 2026
Merged

Triton bump, py3.14 + CUDA 13.0#477
h-vetinari merged 14 commits into
conda-forge:mainfrom
mgorny:triton-py314-cuda13

Conversation

@mgorny
Copy link
Copy Markdown
Contributor

@mgorny mgorny commented Jan 26, 2026

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #457
Fixes #420

mgorny and others added 10 commits January 26, 2026 13:27
Signed-off-by: Michał Górny <mgorny@quansight.com>
CUDA 13.0 requires architecture `sm_75` or higher, and renamed `sm_101` to
`sm_110`. To build for these, maintainers will need to modify their existing list of
specified architectures (e.g. `CMAKE_CUDA_ARCHITECTURES`, `TORCH_CUDA_ARCH_LIST`, etc.)
for their package.

Since CUDA 12.8, the conda-forge nvcc package now sets `CUDAARCHS` and
in its activation script to a string containing all
of the supported real architectures plus the virtual architecture of the
latest. Recipes for packages who use these variables to control their build
but do not want to build for all supported architectures will need to override
these variables in their build script.

ref: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#new-features

> [[!IMPORTANT]]
> Remember to update any CUDA 11/12 specific selector syntax in the recipe to include
> CUDA 13. For example `# [(cuda_compiler_version or "None").startswith("12")]`
> might be replaced with `# [cuda_compiler_version != "None"]`.
Thanks to @carterbox for the patch:
conda-forge#457 (comment)

Signed-off-by: Michał Górny <mgorny@quansight.com>
…6.01.26.08.52.07

Other tools:
- conda-build 25.11.1
- rattler-build 0.55.1
- rattler-build-conda-compat 1.4.10
@conda-forge-admin
Copy link
Copy Markdown
Contributor

conda-forge-admin commented Jan 26, 2026

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The magma output has been superseded by libmagma-devel.
  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/21452160226. Examine the logs at this URL for more detail.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 26, 2026

Reminder to self: once this is merged, enable triton tests on py3.14.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 26, 2026

Minimal test run passed. Now let's do the full thing…

@mgorny mgorny force-pushed the triton-py314-cuda13 branch from f5e1df4 to cfb7619 Compare January 26, 2026 17:47
@h-vetinari
Copy link
Copy Markdown
Member

@mgorny, this is still using GPU agents to build the CPU versions, c.f. my comment from #475

@h-vetinari, do we want to include CUDA 13 migration for when the final is released?

As long as you use a development install of smithy (combined with the skip from #332, so that CPU builds run on non-GPU agents), that's OK for me.

Copy link
Copy Markdown
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Michał! 🙏

Had a question below about MAGMA usage

Comment thread recipe/meta.yaml
Comment thread recipe/meta.yaml
@h-vetinari h-vetinari changed the title Triton bump, py3.14 + CUDA Triton bump, py3.14 + CUDA 13.0 Jan 26, 2026
@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 26, 2026

@mgorny, this is still using GPU agents to build the CPU versions, c.f. my comment from #475

@h-vetinari, do we want to include CUDA 13 migration for when the final is released?

As long as you use a development install of smithy (combined with the skip from #332, so that CPU builds run on non-GPU agents), that's OK for me.

Ah, sorry, I was missing #332. Was wondering why git conda-smithy didn't produce any differences, and figure out the relevant changes must've been released already.

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 26, 2026

Looks like libmagma-devel change broke Windows.

h-vetinari and others added 2 commits January 27, 2026 08:34
Co-Authored-By: Isuru Fernando <isuruf@gmail.com>
…forge-pinning 2026.01.26.08.52.07

Other tools:
- conda-build 25.11.1.dev19+dirty
- rattler-build 0.55.1
- rattler-build-conda-compat 1.4.10
@h-vetinari
Copy link
Copy Markdown
Member

It looks like we might have a new must-fix issue for any new PRs here: #478

@mgorny mgorny marked this pull request as ready for review January 27, 2026 09:51
Signed-off-by: Michał Górny <mgorny@quansight.com>
Fixes conda-forge#479

Signed-off-by: Michał Górny <mgorny@quansight.com>
@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 28, 2026

Added the TorchConfig.cmake fixes tested in #480, since we'd be having another build round anyway.

@h-vetinari
Copy link
Copy Markdown
Member

Your fix looks obviously correct™️, so I'm going to cancel CI. The server is heavily congested right now, so this has too little marginal benefit IMO. We can merge this without rerunning CI once the congestion has cleared a bit.

@h-vetinari
Copy link
Copy Markdown
Member

OK, flash-attn is through (well, at least has stopped consuming agents), tensorflow is down to one job that's almost done, and while there's a stray webkit still around, that shouldn't stop us from merging this one.

Bombs away!

@h-vetinari h-vetinari merged commit 238fe50 into conda-forge:main Jan 29, 2026
13 of 32 checks passed
@bdice
Copy link
Copy Markdown

bdice commented Jan 29, 2026

Congrats and great thanks to @mgorny and everyone who helped with this effort!

@Tobias-Fischer
Copy link
Copy Markdown
Contributor

Not sure where the best place is to report, but in conda-forge/theseus-ai-feedstock#32 I get

  Theseus CUDA support: True (forced by THESEUS_FORCE_CUDA env var)
  Traceback (most recent call last):
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
      main()
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
      json_out["return_val"] = hook(**hook_input["kwargs"])
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 175, in prepare_metadata_for_build_wheel
      return hook(metadata_directory, config_settings)
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/setuptools/build_meta.py", line 378, in prepare_metadata_for_build_wheel
      self.run_setup()
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/setuptools/build_meta.py", line 518, in run_setup
      super().run_setup(setup_script=setup_script)
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/setuptools/build_meta.py", line 317, in run_setup
      exec(code, locals())
    File "<string>", line 136, in <module>
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1408, in CUDAExtension
      library_dirs += library_paths(device_type="cuda")
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1626, in library_paths
      if (not os.path.exists(_join_cuda_home(lib_dir)) and
    File "/home/conda/feedstock_root/build_artifacts/theseus-ai_1769725632963/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 3094, in _join_cuda_home
      raise OSError('CUDA_HOME environment variable is not set. '
  OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
  error: subprocess-exited-with-error

@h-vetinari
Copy link
Copy Markdown
Member

None of the CUDA builds has been uploaded yet. Your PR ends up using CPU pytorch.

(I noticed too late that this PR closed #457 and #420; given the enormous amount of time necessary to build out pytorch, we should have waited until builds are online before giving the migrator the sign to move on)

@jakirkham
Copy link
Copy Markdown
Member

Alternatively we could configure downstream feedstocks to wait until the PyTorch packages are available before attempting migration

bot:
    # only open PRs if resulting environment is solvable, useful for tightly coupled packages
    check_solvable: true

@h-vetinari
Copy link
Copy Markdown
Member

I much prefer PRs to be opened even if they don't pass yet. That's infinitely more visible than something lost in the bowels of the bot infrastructure. It's only a minor inconvenience if CI on those PRs has to be restarted, which would have been nice to avoid, but it's ultimately not a big deal IMO.

@h-vetinari
Copy link
Copy Markdown
Member

win+CUDA12.8:

$ gh run download 21475924468 --repo conda-forge/pytorch-cpu-feedstock --name conda_artifacts_21475924468_win_64_channel_targetsconda-forge_maincu_hca575dce
$ unzip pytorch-cpu-feedstock_conda_artifacts_.zip
$ cd bld/win-64 && rm current_repodata.json index.html repodata*
$ ls
libtorch-2.10.0-cuda128_mkl_h97e3598_301.conda       pytorch-gpu-2.10.0-cuda128_mkl_hc88b545_301.conda
pytorch-2.10.0-cuda128_mkl_py310_hdd2a298_301.conda  pytorch-tests-2.10.0-cuda128_mkl_py310_hf0eca92_301.conda
pytorch-2.10.0-cuda128_mkl_py311_h0cb71aa_301.conda  pytorch-tests-2.10.0-cuda128_mkl_py311_hc85c64c_301.conda
pytorch-2.10.0-cuda128_mkl_py312_hc4f88d7_301.conda  pytorch-tests-2.10.0-cuda128_mkl_py312_hb3d0777_301.conda
pytorch-2.10.0-cuda128_mkl_py313_h716786b_301.conda  pytorch-tests-2.10.0-cuda128_mkl_py313_hd85d54a_301.conda
pytorch-2.10.0-cuda128_mkl_py314_hc058aa6_301.conda  pytorch-tests-2.10.0-cuda128_mkl_py314_hfe9566a_301.conda
$ ls | xargs anaconda upload
$ DELEGATE=h-vetinari
PACKAGE_VERSION=2.10.0
for package in libtorch pytorch pytorch-gpu pytorch-tests; do
  anaconda copy --from-label main --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

The CUDA 13.0 build failed due to losing connection with the agent; if we're lucky the reduction in GPU arches means that libtorch will be small enough to succeed uploading upon restart.

@mgorny mgorny deleted the triton-py314-cuda13 branch January 30, 2026 10:11
@h-vetinari
Copy link
Copy Markdown
Member

h-vetinari commented Jan 30, 2026

Obviously, with an extra python version to build & test for pytorch & pytorch-tests, our runtime for the CUDA 12.9 builds has blown up further still - 22h30 for the longest single job.

I also noticed that the libtorch builds for 12.9 and 13.0 have a massive size difference; ~470MB for 13.0 and ~850MB for 12.9. Part of that is explained by -compress-mode=size (5c1be2d), but we should perhaps consider thinning out the GPU arches a bit also for 12.9...

@mgorny
Copy link
Copy Markdown
Contributor Author

mgorny commented Jan 30, 2026

I suppose we could start considering removing some of the targets common to CUDA 12.x and 13.x, but we probably need to be careful (though I think PTX should make this less harmful?)

@jakirkham
Copy link
Copy Markdown
Member

Perhaps this would be worth discussing in a new issue?

@h-vetinari
Copy link
Copy Markdown
Member

Feel free to open an issue!

@h-vetinari
Copy link
Copy Markdown
Member

All builds are online now. I've just started CI for d392c50, which is the backport of the CMake fix to v2.9.x.

rapids-bot Bot pushed a commit to rapidsai/cugraph-gnn that referenced this pull request Feb 3, 2026
Closes #296 

Restores CUDA 13 conda test CI jobs, now that there are conda-forge PyTorch packages with CUDA 13 support (conda-forge/pytorch-cpu-feedstock#477)

Also modifies `pytorch` conda dependency to meet these requirements:

* `cugraph-pyg` must be installable on a system without a GPU
* `cugraph-pyg`'s tests require CUDA-enabled builds of PyTorch

With the following mix of things:

* add a `require_gpu` matrix filter in `dependencies.yaml` which pulls in `pytorch-gpu` opted-into in test CI jobs but otherwise not
  - *`conda-forge::pytorch-gpu` is a metapackage that forces the installation of CUDA variants of `conda-forge::pytorch`... that should replace the "accidentally pulled in a CPU-only variant" case with a loud, clear conda solver error*
* depend on `mkl` in the test x86_64 environment but without version constraints
  - *allow `pytorch` to declare its range of compatible `mkl` versions*
  - *this still prevents OpenBLAS variants from getting installed, which I think was part of the goal of #161*
  - *keeping this out of `cugraph-pyg`'s dependencies still makes it possible to install alongside `nomkl`, even though that combination is untested*
* add comments in the `cugraph-pyg` conda recipe explaining why it doesn't depend on `pytorch-gpu`

I hope this will be a relatively future-proof way to guarantee CI here keeps picking up the PyTorch versions this project wants to tet against.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Bradley Dice (https://github.com/bdice)

URL: #395
rapids-bot Bot pushed a commit to rapidsai/cudf that referenced this pull request Mar 5, 2026
There are now `pytorch` CUDA 13 packages (started with `pytorch` 2.10: conda-forge/pytorch-cpu-feedstock#477)

This adds them to the test environment so they'll be tested in CUDA 13 integration testing jobs.

More details on the history of PyTorch in those jobs: #20748 (comment)

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #21663
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants