runner JAX/PJRT abort is understood. The same test passes on a
local GPU, but the GitHub Actions CUDA job can terminate with
CUDA_ERROR_LAUNCH_FAILED while PJRT releases device buffers.
Line: 120
|
if os.environ.get("GITHUB_ACTIONS") == "true" and os.environ.get( |
|
"CUDA_VISIBLE_DEVICES" |
|
): |
|
# TODO: Re-enable this in GitHub CUDA CI once the hosted/self-hosted |
|
# runner JAX/PJRT abort is understood. The same test passes on a |
|
# local GPU, but the GitHub Actions CUDA job can terminate with |
|
# CUDA_ERROR_LAUNCH_FAILED while PJRT releases device buffers. |
|
self.skipTest( |
|
"JAX training is temporarily skipped on GitHub Actions CUDA runners" |
|
) |
|
|
runner JAX/PJRT abort is understood. The same test passes on a
local GPU, but the GitHub Actions CUDA job can terminate with
CUDA_ERROR_LAUNCH_FAILED while PJRT releases device buffers.
Line: 120
deepmd-kit/source/tests/jax/test_training.py
Lines 117 to 127 in 2087416