Reduce allocation overhead in quantized sdpa #4137
cuda.yml
on: pull_request
Matrix: export-model-cuda-artifact
Matrix: test-cuda-builds
Matrix: test-models-cuda
Matrix: test-model-cuda-e2e
Waiting for pending jobs
check-all-cuda-builds
2s
Annotations
1 error and 1 warning
|
export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job
Process completed with exit code 1.
|
|
export-model-cuda-artifact (openai, whisper-large-v3-turbo, quantized-int4-weight-only) / linux-job
No files were found with the provided path: /home/ec2-user/actions-runner/_work/_temp/artifacts/. No artifacts will be uploaded.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
google-gemma-3-4b-it-cuda-non-quantized
Expired
|
7.22 GB |
sha256:a9ae9c704d05e1f1293127d3e7690e84315bc51526cd8a3dddca5977226b3c78
|
|
|
google-gemma-3-4b-it-cuda-quantized-int4-tile-packed
Expired
|
4.03 GB |
sha256:c593318a822b6f67fc954724c8b9c427a397e1c2ae1a844081a56a34d8b28b87
|
|
|
mistralai-Voxtral-Mini-3B-2507-cuda-non-quantized
Expired
|
6.82 GB |
sha256:660821f5bae159e8874ecdf63b0d295a3772a5f126ecc44602b40aea57cd9820
|
|
|
mistralai-Voxtral-Mini-3B-2507-cuda-quantized-int4-tile-packed
Expired
|
2.89 GB |
sha256:e2b0d79ad0f55c07e476288cfae310795abb3b868468acebf910ec2ce8d6627d
|
|
|
mistralai-Voxtral-Mini-3B-2507-cuda-quantized-int4-weight-only
Expired
|
6.14 GB |
sha256:d0acf04c35885a6459c9b78d3bb0853bb90bdd2ebb129591c0d4111e82605798
|
|
|
openai-whisper-large-v3-turbo-cuda-non-quantized
Expired
|
1.17 GB |
sha256:c2cace7fc5b30bfec60fa8827b7fc33ad9259d95cc6d9e5043b39d7699f242a4
|
|
|
openai-whisper-large-v3-turbo-cuda-quantized-int4-tile-packed
Expired
|
490 MB |
sha256:51de28f1455c1ac777f4395f89910184fb442b279c92d870b55a998af409bb4f
|
|
|
openai-whisper-small-cuda-non-quantized
Expired
|
361 MB |
sha256:21143d20dbbdc0edbb0487b6dfa7c70ee6dd490127d634e4e93af3511764de20
|
|
|
openai-whisper-small-cuda-quantized-int4-tile-packed
Expired
|
172 MB |
sha256:6a8da5b62cc4110182e990b6a48bf43c32ad38df7c3915c1a8477d709167f8a6
|
|
|
openai-whisper-small-cuda-quantized-int4-weight-only
Expired
|
270 MB |
sha256:6a5b538e944afbb9400cc0d8a45237338820121ec189c357ed9290c885820111
|
|