You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
This PR extends CUDA support for text-only LLM workflows and adds CI
coverage for Qwen3-0.6B artifacts and pybind execution.
## Why
We already validate CUDA multimodal paths, but text-generation CUDA
coverage (especially Qwen3) was incomplete.
This change adds export/run support and CI wiring so CUDA
text-generation artifacts are exercised in automated tests.
## What changed
### CUDA LLM runner/build support
- Added `llama-cuda` and `llama-cuda-debug` Makefile targets.
- Added CUDA presets/workflow presets in
`examples/models/llama/CMakePresets.json`.
- Updated `examples/models/llama/CMakeLists.txt` to link CUDA backend
when `EXECUTORCH_BUILD_CUDA=ON`.
- Updated `examples/models/llama/main.cpp`:
- Added `--data_path` convenience flag (single PTD path).
- Added `--prompt_file` support for file-based prompts.
### Gemma3 runner usability
- Updated `examples/models/gemma3/e2e_runner.cpp`:
- Added `--max_new_tokens`.
- Added `--stop_sequence` early-stop behavior.
### Optimum exporter integration and CI pin
- Bumped optimum-executorch CI pin to:
- `a9592258daacad7423fd5f39aaa59c6e36471520`
- Added `Qwen/Qwen3-0.6B` handling in
`.ci/scripts/export_model_artifact.sh` for `text-generation`.
### HuggingFace optimum CUDA test path
- Updated `.ci/scripts/test_huggingface_optimum_model.py`
(`test_text_generation`):
- Supports `recipe=cuda` export (`--device cuda --dtype bfloat16`).
- Supports CUDA quantization for this path:
- `--qlinear 4w`
- `--qlinear_packing_format tile_packed_to_4d`
- `--qembedding 8w`
- Validates presence of `aoti_cuda_blob.ptd`.
- Passes blob path into `TextLLMRunner`.
### CUDA workflow updates
- Updated `.github/workflows/cuda.yml`:
- Added `Qwen/Qwen3-0.6B` to CUDA export matrix.
- Updated `test-cuda-pybind` matrix to explicit artifact mapping.
- Added Qwen non-quantized and quantized-int4-tile-packed artifact runs
in pybind test.
- Switched `download-artifact` to matrix-provided artifact name.
## Validation
Rely on new CI jobs.
0 commit comments