Skip to content

Many qdp tests instantiate QdpEngine or use device="cuda" without checking for GPU first #1331

@andrewmusselman

Description

@andrewmusselman

Summary

Two related test-hygiene patterns cause spurious failures on environments where the test suite previously skipped cleanly:

Pattern A: No skipif on QdpEngine(0) instantiation. ~12 tests instantiate QdpEngine(0) (directly or via QdpBenchmark / QuantumDataLoader) without first checking that CUDA is available. On systems with CUDA_VISIBLE_DEVICES=, they error with CUDA_ERROR_NO_DEVICE rather than skipping.

Pattern B: Hardcoded device="cuda" without arch check. ~68 tests in testing/qdp/test_bindings.py use torch.tensor(..., device="cuda") or equivalent, then fail with cudaErrorNoKernelImageForDevice on GPUs whose compute capability isn't in the PyTorch wheel's compiled arch list. PR #1323 added a capability-aware fallback in qumat_qdp.loader._select_torch_device for production code, but tests don't use it.

Reproducer

On a host with a Pascal GPU (sm_61) and CUDA Toolkit installed (also requires the LD_PRELOAD workaround from Issue [N1] for the wheel quirk):

git checkout mahout-qumat-0.6.0-RC2
export LD_PRELOAD=$VIRTUAL_ENV/lib/python3.12/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
 
# Pattern A: CUDA masked → 12 ERROR instead of SKIPPED
CUDA_VISIBLE_DEVICES= uv run pytest -v   # 12 failed, 719 passed, 142 skipped
 
# Patterns A+B combined: CUDA visible but arch incompatible → 80 ERROR
uv run pytest -v                          # 80 failed, 785 passed, 8 skipped

Specific failing tests (Pattern A, CUDA masked)

  • testing/qdp/test_benchmark_api.py::test_qdp_benchmark_run_throughput
  • testing/qdp/test_benchmark_api.py::test_qdp_benchmark_run_latency
  • testing/qdp/test_benchmark_api.py::test_qdp_benchmark_device_id_propagated
  • testing/qdp/test_bindings.py::test_encode
  • testing/qdp/test_bindings.py::test_dlpack_device
  • testing/qdp/test_bindings.py::test_dlpack_single_use
  • testing/qdp/test_bindings.py::test_dlpack_with_stream[stream_legacy]
  • testing/qdp/test_bindings.py::test_dlpack_with_stream[stream_per_thread]
  • testing/qdp/test_bindings.py::test_pytorch_integration
  • testing/qdp/test_bindings.py::test_precision[float32-complex64]
  • testing/qdp/test_bindings.py::test_precision[float64-complex128]
  • testing/qdp_python/test_quantum_data_loader.py::test_synthetic_loader_batch_count
    All produce the same error: Failed to initialize CUDA device 0: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected").

Pattern B failures (CUDA-visible) are mostly in testing/qdp/test_bindings.py::test_encode_cuda_tensor*, test_angle_encode_cuda_tensor*, test_basis_encode_*, etc.

Comparison to RC1

Same hardware/setup with RC1 + CUDA_VISIBLE_DEVICES= produced 4 failures; RC2 produces 12. The delta is mostly new tests added since RC1 (the test_benchmark_api.py set, and a few new test_bindings.py cases) which inherited the no-skipif pattern.

Suggested fix

Standardize on one of:

# Pattern A guard
pytestmark = pytest.mark.skipif(
    not _cuda_available(), reason="CUDA-capable device required"
)
 
# Pattern B guard — mirror qumat_qdp.loader._select_torch_device
pytestmark = pytest.mark.skipif(
    not _cuda_compatible_with_torch(), reason="GPU arch not supported by this PyTorch build"
)

A shared fixture / helper in testing/conftest.py would avoid copy-pasting the check. The capability-list check from PR #1323 (get_device_capabilityget_arch_list) can be reused verbatim.

Environment

  • OS: Ubuntu 24.04
  • CUDA Toolkit: 12.4 (apt nvidia-cuda-toolkit)
  • GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61)
  • Python: 3.12.12 (uv-managed)
  • PyTorch: 2.9.0+cu128 (resolved by RC2 lockfile)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions