Skip to content

RuntimeError: operator torchvision::nms does not exist #9435

@mgiessing

Description

@mgiessing

🐛 Describe the bug

I built pytorch and torchvision from source against CUDA12.4.1 because that is the latest version available for IBM Power9 ppc64le with V100 GPU.

python3 -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA version: {torch.version.cuda}'); print(f'CUDA available: {torch.cuda.is_available()}')"

PyTorch version: 2.10.0
CUDA version: 12.4
CUDA available: True

It isn't clearly documented how to build the wheel so I tried to recreate it based on the gh actions log:

Inside a manylinux_2_28_ppc64le container with CUDA12.4.1 installed:

curl -L https://micro.mamba.pm/api/micromamba/linux-ppc64le/latest | tar -xvj
export MAMBA_ROOT_PREFIX=$HOME/.micromamba  # optional, defaults to ~/micromamba
eval "$(./bin/micromamba shell hook -s posix)"

yum install -y libjpeg-turbo-devel libwebp-devel freetype gnutls zip

export rel=v0.25.0
export ver=cp312-cp312

export BUILD_VERSION=${rel:1}
export PYTORCH_BUILD_NUMBER=0
export PYTORCH_BUILD_VERSION=${rel:1}
export FORCE_CUDA=1

git clone --depth 1 -b ${rel} --recursive https://github.com/pytorch/vision.git && cd vision
curl -LO https://raw.githubusercontent.com/pytorch/test-infra/refs/heads/main/.github/scripts/repair_manylinux_2_28.sh
chmod +x repair_manylinux_2_28.sh
sed -i "s/aarch64/ppc64le/g" packaging/post_build_script.sh

export BUILD_VERSION=${rel:1}
export PYTORCH_BUILD_NUMBER=0
export PYTORCH_BUILD_VERSION=${rel:1}

micromamba create -n py-${ver} -c conda-forge python=${ver} conda libwebp libjpeg-turbo -y
micromamba activate py-${ver}
bash packaging/pre_build_script.sh
pip3 install Cython "auditwheel<6.3" numpy future ninja pyyaml http://10.x.x.x/whl/torch/cu124/torch-2.10.0-cp${ver//./}-cp${ver//./}-manylinux_2_28_ppc64le.whl --upgrade setuptools==72.1.0 

python3 setup.py clean
python3 setup.py bdist_wheel
./repair_manylinux_2_28.sh /vision/$(ls dist/*whl)

bash packaging/post_build_script.sh

Then wheel is then uploaded to the 10.x.x.x server from where I install it.

When I try to install it into a python:3.12-slim container I get this error:

export TORCH_VER=2.10.0
export PY_VER=cp312-cp312

pip3 install numpy \
  http://10.x.x.x/whl/torch/cu124/torch-${TORCH_VER}-${PY_VER}-manylinux_2_28_ppc64le.whl \
  http://10.x.x.x/whl/torchvision/cu124/torchvision-0.25.0-${PY_VER}-manylinux_2_28_ppc64le.whl

python3 -c "import torchvision"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/library.py", line 1073, in register
    use_lib._register_fake(
  File "/usr/local/lib/python3.12/site-packages/torch/library.py", line 203, in _register_fake
    handle = entry.fake_impl.register(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist

Versions

python collect_env.py
Collecting environment information...
PyTorch version: 2.10.0
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 13 (trixie) (ppc64le)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.41

Python version: 3.12.13 (main, Mar  3 2026, 20:38:43) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-4.18.0-553.36.1.el8_10.ppc64le-ppc64le-with-glibc2.41
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB
Nvidia driver version: 550.54.15
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False
Caching allocator config: N/A

CPU:
Architecture:                         ppc64le
Byte Order:                           Little Endian
CPU(s):                               160
On-line CPU(s) list:                  0-159
Model name:                           POWER9, altivec supported
Model:                                2.3 (pvr 004e 1203)
Thread(s) per core:                   4
Core(s) per socket:                   20
Socket(s):                            2
Frequency boost:                      enabled
CPU(s) scaling MHz:                   100%
CPU max MHz:                          3800.0000
CPU min MHz:                          2300.0000
L1d cache:                            1.3 MiB (40 instances)
L1i cache:                            1.3 MiB (40 instances)
L2 cache:                             10 MiB (20 instances)
L3 cache:                             200 MiB (20 instances)
NUMA node(s):                         6
NUMA node0 CPU(s):                    0-79
NUMA node8 CPU(s):                    80-159
NUMA node252 CPU(s):                  
NUMA node253 CPU(s):                  
NUMA node254 CPU(s):                  
NUMA node255 CPU(s):                  
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:             Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.4.3
[pip3] torch==2.10.0
[pip3] torchvision==0.25.0
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions