Summary
test_encode_from_gpu_ptr_f32_with_stream_non_default_success (in qdp/qdp-core/tests/gpu_ptr_encoding.rs:795) fails reproducibly during full make test_rust runs, but passes 5/5 times when run in isolation. Suggests a test-isolation issue (GPU state pollution from preceding tests, or a synchronization gap exposed only under specific test ordering) rather than a defect in the f32-non-default-stream code path itself.
Failure observed
thread 'test_encode_from_gpu_ptr_f32_with_stream_non_default_success' panicked at
qdp-core/tests/gpu_ptr_encoding.rs:795:14:
encode_from_gpu_ptr_f32_with_stream (non-default stream):
InvalidInput("Input data (f32) has zero or non-finite norm
(contains NaN, Inf, or all zeros)")
The other 63 tests in gpu_ptr_encoding (including test_encode_from_gpu_ptr_f32_with_stream_success on the default stream) pass.
Reproducer
git checkout mahout-qumat-0.6.0-RC2
export CXX=g++-13 QDP_CUDA_ARCH_LIST=61+PTX
# Fails consistently as part of the full Rust suite
make test_rust # 1 failure: test_encode_from_gpu_ptr_f32_with_stream_non_default_success
# Passes consistently in isolation
cd qdp
for i in 1 2 3 4 5; do
cargo test -p qdp-core --test gpu_ptr_encoding -- \
test_encode_from_gpu_ptr_f32_with_stream_non_default_success --exact 2>&1 | tail -1
done
# test result: ok. 1 passed; 0 failed (×5)
Likely causes (worth investigating in order)
- Test ordering / state pollution. A preceding test in
gpu_ptr_encoding (or in another test binary linked into the same process if cargo-test reuses one) leaves GPU memory or stream state in a configuration that breaks the non-default-stream input setup. Try running gpu_ptr_encoding with --test-threads=1 and bisecting via --skip to find the polluting test.
- Missing synchronization in the test harness. The test creates an input buffer on a user-provided stream, then calls the encoder which reads the norm — possibly on a different stream — without a
cudaStreamSynchronize or event ordering between the two. Under high GPU load (full suite), the encoder reads before the buffer is populated.
- Encoder bug with non-default stream. Less likely given isolation passes, but if (1) and (2) rule out, the encoder may be assuming default-stream semantics in some f32 path.
Environment
- OS: Ubuntu 24.04
- CUDA Toolkit: 12.4
- GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61)
- Rust: stable, compiled with
CXX=g++-13 (CUDA 12.4 rejects gcc 14)
- Arch list:
QDP_CUDA_ARCH_LIST=61+PTX
Summary
test_encode_from_gpu_ptr_f32_with_stream_non_default_success(inqdp/qdp-core/tests/gpu_ptr_encoding.rs:795) fails reproducibly during fullmake test_rustruns, but passes 5/5 times when run in isolation. Suggests a test-isolation issue (GPU state pollution from preceding tests, or a synchronization gap exposed only under specific test ordering) rather than a defect in the f32-non-default-stream code path itself.Failure observed
The other 63 tests in
gpu_ptr_encoding(includingtest_encode_from_gpu_ptr_f32_with_stream_successon the default stream) pass.Reproducer
Likely causes (worth investigating in order)
gpu_ptr_encoding(or in another test binary linked into the same process if cargo-test reuses one) leaves GPU memory or stream state in a configuration that breaks the non-default-stream input setup. Try runninggpu_ptr_encodingwith--test-threads=1and bisecting via--skipto find the polluting test.cudaStreamSynchronizeor event ordering between the two. Under high GPU load (full suite), the encoder reads before the buffer is populated.Environment
CXX=g++-13(CUDA 12.4 rejects gcc 14)QDP_CUDA_ARCH_LIST=61+PTX