Skip to content

Fix: Native CUDA 13 support and dynamic CUDA_HOME fetching for CuPy#136

Open
wzgrx wants to merge 10 commits into
Fannovel16:mainfrom
wzgrx:main
Open

Fix: Native CUDA 13 support and dynamic CUDA_HOME fetching for CuPy#136
wzgrx wants to merge 10 commits into
Fannovel16:mainfrom
wzgrx:main

Conversation

@wzgrx
Copy link
Copy Markdown

@wzgrx wzgrx commented Mar 28, 2026

Fix: Native CUDA 13 support and dynamic CUDA_HOME fetching for CuPy

Description
Motivation & Context
With the release of the new RTX 50-series GPUs and the adoption of CUDA 13.x, users are experiencing installation and runtime failures with CuPy. Previously, the installation script either attempted a risky fallback to CUDA 12.x or forced a source compilation that often failed. Additionally, CuPy's JIT compiler (RawModule) frequently failed to locate the necessary CUDA headers in CUDA 13 environments when CUDA_HOME was not explicitly set.

Since CuPy now officially provides cupy-cuda13x wheels, this PR updates the installation logic and JIT compiler paths to natively support CUDA 13.

Changes Included:

install.py:

Updated get_cuda_ver_from_dir to correctly detect CUDA 13 and return '13x', fetching the official cupy-cuda13x wheel instead of falling back to 12.x.

Added cupy-cuda13x to the pip uninstall cleanup list to prevent environment conflicts.

requirements-with-cupy.txt:

Updated the generic cupy-wheel (which acts as an empty shell) to cupy-cuda13x to streamline manual installations on modern setups.

vfi_models/ops/cupy_ops/utils.py:

Enhanced cuda_launch to dynamically fetch and inject CUDA_HOME and CUDA_PATH using get_cuda_home_path(). This ensures that CuPy's NVRTC compiler can successfully locate the PyTorch-bundled or system CUDA components, preventing RawModule runtime crashes on unconfigured environments.

How to Test

Run install.py on a machine with CUDA 13.x (e.g., equipped with an RTX 5090).

Verify that cupy-cuda13x is installed automatically without triggering a source build.

Run a frame interpolation node requiring CuPy (e.g., RIFE) and verify that the CUDA kernels compile and launch successfully without missing header errors.

@wzgrx wzgrx changed the title fix cu130 Fix: Native CUDA 13 support and dynamic CUDA_HOME fetching for CuPy Apr 2, 2026
@essence25
Copy link
Copy Markdown

essence25 commented Apr 22, 2026

And... it is still not fixed/updated into the main release. I posted this exact fix in the issues section also.

@wzgrx
Inside "vfi_utils.py" please also change this (the first 2 links are dead):

BASE_MODEL_DOWNLOAD_URLS = [
    "https://github.com/styler00dollar/VSGAN-tensorrt-docker/releases/download/models/",
    "https://github.com/Fannovel16/ComfyUI-Frame-Interpolation/releases/download/models/",
    "https://github.com/dajes/frame-interpolation-pytorch/releases/download/v1.0.0/"
]

To this:

BASE_MODEL_DOWNLOAD_URLS = [
    "https://github.com/dajes/frame-interpolation-pytorch/releases/download/v1.0.2/"
]

That will update film_net_fp32.pt to v1.0.2 and get rid of this warning that we always get with v1.0.0:

python_embeded\Lib\site-packages\torch\nn\modules\module.py:1790: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\Convolution.cpp:1025.)

Here are the details for the updated model:

v1.0.2 Newer torch version, improved support for dt argument Latest
The model is re-exported with torch 2.1.1

Includes minor fixes:

Fix UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created
Now it doesn't ignore dt argument thanks to @niqodea (however it seems that authors of the model recommend to stick with .5)
Supports batched inference

https://github.com/dajes/frame-interpolation-pytorch/releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants