Many qdp tests instantiate QdpEngine or use device="cuda" without checking for GPU first

### Summary
 
Two related test-hygiene patterns cause spurious failures on environments where the test suite previously skipped cleanly:
 
**Pattern A: No skipif on `QdpEngine(0)` instantiation.** ~12 tests instantiate `QdpEngine(0)` (directly or via `QdpBenchmark` / `QuantumDataLoader`) without first checking that CUDA is available. On systems with `CUDA_VISIBLE_DEVICES=`, they error with `CUDA_ERROR_NO_DEVICE` rather than skipping.
 
**Pattern B: Hardcoded `device="cuda"` without arch check.** ~68 tests in `testing/qdp/test_bindings.py` use `torch.tensor(..., device="cuda")` or equivalent, then fail with `cudaErrorNoKernelImageForDevice` on GPUs whose compute capability isn't in the PyTorch wheel's compiled arch list. PR #1323 added a capability-aware fallback in `qumat_qdp.loader._select_torch_device` for production code, but tests don't use it.
 
### Reproducer
 
On a host with a Pascal GPU (sm_61) and CUDA Toolkit installed (also requires the `LD_PRELOAD` workaround from Issue [N1] for the wheel quirk):
 
```bash
git checkout mahout-qumat-0.6.0-RC2
export LD_PRELOAD=$VIRTUAL_ENV/lib/python3.12/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
 
# Pattern A: CUDA masked → 12 ERROR instead of SKIPPED
CUDA_VISIBLE_DEVICES= uv run pytest -v   # 12 failed, 719 passed, 142 skipped
 
# Patterns A+B combined: CUDA visible but arch incompatible → 80 ERROR
uv run pytest -v                          # 80 failed, 785 passed, 8 skipped
```
 
### Specific failing tests (Pattern A, CUDA masked)
 
- `testing/qdp/test_benchmark_api.py::test_qdp_benchmark_run_throughput`
- `testing/qdp/test_benchmark_api.py::test_qdp_benchmark_run_latency`
- `testing/qdp/test_benchmark_api.py::test_qdp_benchmark_device_id_propagated`
- `testing/qdp/test_bindings.py::test_encode`
- `testing/qdp/test_bindings.py::test_dlpack_device`
- `testing/qdp/test_bindings.py::test_dlpack_single_use`
- `testing/qdp/test_bindings.py::test_dlpack_with_stream[stream_legacy]`
- `testing/qdp/test_bindings.py::test_dlpack_with_stream[stream_per_thread]`
- `testing/qdp/test_bindings.py::test_pytorch_integration`
- `testing/qdp/test_bindings.py::test_precision[float32-complex64]`
- `testing/qdp/test_bindings.py::test_precision[float64-complex128]`
- `testing/qdp_python/test_quantum_data_loader.py::test_synthetic_loader_batch_count`
All produce the same error: `Failed to initialize CUDA device 0: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")`.
 
Pattern B failures (CUDA-visible) are mostly in `testing/qdp/test_bindings.py::test_encode_cuda_tensor*`, `test_angle_encode_cuda_tensor*`, `test_basis_encode_*`, etc.
 
### Comparison to RC1
 
Same hardware/setup with RC1 + `CUDA_VISIBLE_DEVICES=` produced 4 failures; RC2 produces 12. The delta is mostly new tests added since RC1 (the `test_benchmark_api.py` set, and a few new `test_bindings.py` cases) which inherited the no-skipif pattern.
 
### Suggested fix
 
Standardize on one of:
 
```python
# Pattern A guard
pytestmark = pytest.mark.skipif(
    not _cuda_available(), reason="CUDA-capable device required"
)
 
# Pattern B guard — mirror qumat_qdp.loader._select_torch_device
pytestmark = pytest.mark.skipif(
    not _cuda_compatible_with_torch(), reason="GPU arch not supported by this PyTorch build"
)
```
 
A shared fixture / helper in `testing/conftest.py` would avoid copy-pasting the check. The capability-list check from PR #1323 (`get_device_capability` ∩ `get_arch_list`) can be reused verbatim.
 
### Environment
 
- OS: Ubuntu 24.04
- CUDA Toolkit: 12.4 (apt `nvidia-cuda-toolkit`)
- GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61)
- Python: 3.12.12 (uv-managed)
- PyTorch: 2.9.0+cu128 (resolved by RC2 lockfile)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many qdp tests instantiate QdpEngine or use device="cuda" without checking for GPU first #1331

Summary

Reproducer

Specific failing tests (Pattern A, CUDA masked)

Comparison to RC1

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Many qdp tests instantiate QdpEngine or use device="cuda" without checking for GPU first #1331

Description

Summary

Reproducer

Specific failing tests (Pattern A, CUDA masked)

Comparison to RC1

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions