Skip to content

Reduce allocation overhead in quantized sdpa #4137

Reduce allocation overhead in quantized sdpa

Reduce allocation overhead in quantized sdpa #4137

Triggered via pull request December 6, 2025 07:20
Status Failure
Total duration 41m 46s
Artifacts 10

cuda.yml

on: pull_request
Matrix: export-model-cuda-artifact
Matrix: test-cuda-builds
Matrix: test-models-cuda
Matrix: test-model-cuda-e2e
Waiting for pending jobs
check-all-cuda-builds
2s
check-all-cuda-builds
Fit to window
Zoom out
Zoom in

Annotations

1 error and 1 warning
export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job
No files were found with the provided path: /home/ec2-user/actions-runner/_work/_temp/artifacts/. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size Digest
google-gemma-3-4b-it-cuda-non-quantized Expired
7.22 GB
sha256:a9ae9c704d05e1f1293127d3e7690e84315bc51526cd8a3dddca5977226b3c78
google-gemma-3-4b-it-cuda-quantized-int4-tile-packed Expired
4.03 GB
sha256:c593318a822b6f67fc954724c8b9c427a397e1c2ae1a844081a56a34d8b28b87
mistralai-Voxtral-Mini-3B-2507-cuda-non-quantized Expired
6.82 GB
sha256:660821f5bae159e8874ecdf63b0d295a3772a5f126ecc44602b40aea57cd9820
mistralai-Voxtral-Mini-3B-2507-cuda-quantized-int4-tile-packed Expired
2.89 GB
sha256:e2b0d79ad0f55c07e476288cfae310795abb3b868468acebf910ec2ce8d6627d
mistralai-Voxtral-Mini-3B-2507-cuda-quantized-int4-weight-only Expired
6.14 GB
sha256:d0acf04c35885a6459c9b78d3bb0853bb90bdd2ebb129591c0d4111e82605798
openai-whisper-large-v3-turbo-cuda-non-quantized Expired
1.17 GB
sha256:c2cace7fc5b30bfec60fa8827b7fc33ad9259d95cc6d9e5043b39d7699f242a4
openai-whisper-large-v3-turbo-cuda-quantized-int4-tile-packed Expired
490 MB
sha256:51de28f1455c1ac777f4395f89910184fb442b279c92d870b55a998af409bb4f
openai-whisper-small-cuda-non-quantized Expired
361 MB
sha256:21143d20dbbdc0edbb0487b6dfa7c70ee6dd490127d634e4e93af3511764de20
openai-whisper-small-cuda-quantized-int4-tile-packed Expired
172 MB
sha256:6a8da5b62cc4110182e990b6a48bf43c32ad38df7c3915c1a8477d709167f8a6
openai-whisper-small-cuda-quantized-int4-weight-only Expired
270 MB
sha256:6a5b538e944afbb9400cc0d8a45237338820121ec189c357ed9290c885820111