Thanks for improving torch-cudagraph-debug.
Install a CUDA-enabled PyTorch build first, then install this package from the source checkout:
python -m pip install --upgrade pip
python -m pip install --no-build-isolation -e .
python -m pip install pytest build twineCPU-only environments can run Python-level tests. CUDA graph behavior requires a CUDA-enabled PyTorch runtime and a GPU.
Run local checks before opening a pull request:
python -m py_compile $(find src tests examples -name '*.py')
python -m pytest -q tests
python -m build --sdist
python -m twine check dist/*GPU tests are marked with pytest.mark.gpu but are included in the default test
suite; they skip automatically when CUDA or the native extension is unavailable.
The host callback path must not call Python or CUDA APIs. Keep callback work to native CPU operations on already-allocated host memory. If a change needs CUDA work, enqueue it before the host callback so it becomes part of graph capture.
This project publishes source distributions first. Do not add prebuilt CUDA wheels unless the release process also covers PyTorch, CUDA, Python, and platform compatibility for those wheels.