Skip to content

gpu_ptr_encoding test fails during make test_rust but passes 5/5 times in isolation — test-ordering or GPU-state-pollution issue #1332

@andrewmusselman

Description

@andrewmusselman

Summary

test_encode_from_gpu_ptr_f32_with_stream_non_default_success (in qdp/qdp-core/tests/gpu_ptr_encoding.rs:795) fails reproducibly during full make test_rust runs, but passes 5/5 times when run in isolation. Suggests a test-isolation issue (GPU state pollution from preceding tests, or a synchronization gap exposed only under specific test ordering) rather than a defect in the f32-non-default-stream code path itself.

Failure observed

thread 'test_encode_from_gpu_ptr_f32_with_stream_non_default_success' panicked at
qdp-core/tests/gpu_ptr_encoding.rs:795:14:
encode_from_gpu_ptr_f32_with_stream (non-default stream):
    InvalidInput("Input data (f32) has zero or non-finite norm
                  (contains NaN, Inf, or all zeros)")

The other 63 tests in gpu_ptr_encoding (including test_encode_from_gpu_ptr_f32_with_stream_success on the default stream) pass.

Reproducer

git checkout mahout-qumat-0.6.0-RC2
export CXX=g++-13 QDP_CUDA_ARCH_LIST=61+PTX
 
# Fails consistently as part of the full Rust suite
make test_rust    # 1 failure: test_encode_from_gpu_ptr_f32_with_stream_non_default_success
 
# Passes consistently in isolation
cd qdp
for i in 1 2 3 4 5; do
  cargo test -p qdp-core --test gpu_ptr_encoding -- \
    test_encode_from_gpu_ptr_f32_with_stream_non_default_success --exact 2>&1 | tail -1
done
# test result: ok. 1 passed; 0 failed (×5)

Likely causes (worth investigating in order)

  1. Test ordering / state pollution. A preceding test in gpu_ptr_encoding (or in another test binary linked into the same process if cargo-test reuses one) leaves GPU memory or stream state in a configuration that breaks the non-default-stream input setup. Try running gpu_ptr_encoding with --test-threads=1 and bisecting via --skip to find the polluting test.
  2. Missing synchronization in the test harness. The test creates an input buffer on a user-provided stream, then calls the encoder which reads the norm — possibly on a different stream — without a cudaStreamSynchronize or event ordering between the two. Under high GPU load (full suite), the encoder reads before the buffer is populated.
  3. Encoder bug with non-default stream. Less likely given isolation passes, but if (1) and (2) rule out, the encoder may be assuming default-stream semantics in some f32 path.

Environment

  • OS: Ubuntu 24.04
  • CUDA Toolkit: 12.4
  • GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61)
  • Rust: stable, compiled with CXX=g++-13 (CUDA 12.4 rejects gcc 14)
  • Arch list: QDP_CUDA_ARCH_LIST=61+PTX

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions