Skip to content

Commit df72cc1

Browse files
committed
[Parakeet] Add Vulkan backend documentation and fix CMake build
Summary: Add Vulkan backend documentation to the Parakeet README covering export commands, quantization options, build instructions, and runner examples. Guard `quantized_ops_lib` and `custom_ops` link targets with `if(TARGET ...)` in CMakeLists.txt. These targets don't exist in Vulkan-only or XNNPACK-only builds, causing a hard CMake configure error from `target_link_options()`. This matches the existing pattern used for `optimized_native_cpu_ops_lib`. Validated on Samsung S24 (Adreno 750), 8da4w quantization, test_audio.wav (7.2s): | Metric | XNNPACK (686 MB) | Vulkan (781 MB) | Vulkan fp16 (550 MB) | |----------------|-------------------|-----------------|----------------------| | Inference | 0.56s | 0.46s | 0.32s | | Encoder speed | 188 tok/s | 275 tok/s | 360 tok/s | | Decoder speed | 657 tok/s | 373 tok/s | 746 tok/s | Authored by Claude (Anthropic)
1 parent fcccda3 commit df72cc1

2 files changed

Lines changed: 39 additions & 5 deletions

File tree

examples/models/parakeet/CMakeLists.txt

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,14 @@ endif()
4242

4343
# CPU-only builds need quantized and custom ops
4444
if(NOT EXECUTORCH_BUILD_CUDA)
45-
list(APPEND link_libraries quantized_ops_lib custom_ops)
46-
executorch_target_link_options_shared_lib(quantized_ops_lib)
47-
executorch_target_link_options_shared_lib(custom_ops)
45+
if(TARGET quantized_ops_lib)
46+
list(APPEND link_libraries quantized_ops_lib)
47+
executorch_target_link_options_shared_lib(quantized_ops_lib)
48+
endif()
49+
if(TARGET custom_ops)
50+
list(APPEND link_libraries custom_ops)
51+
executorch_target_link_options_shared_lib(custom_ops)
52+
endif()
4853
endif()
4954

5055
# XNNPACK

examples/models/parakeet/README.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ python export_parakeet_tdt.py --audio /path/to/audio.wav
2525
| Argument | Description |
2626
|----------|-------------|
2727
| `--output-dir` | Output directory for exports (default: `./parakeet_tdt_exports`) |
28-
| `--backend` | Backend for acceleration: `portable`, `xnnpack`, `metal`, `cuda`, `cuda-windows` (default: `xnnpack`) |
28+
| `--backend` | Backend for acceleration: `portable`, `xnnpack`, `vulkan`, `metal`, `cuda`, `cuda-windows` (default: `xnnpack`) |
2929
| `--dtype` | Data type: `fp32`, `bf16`, `fp16` (default: `fp32`). Metal backend supports `fp32` and `bf16` only (no `fp16`). |
3030
| `--audio` | Path to audio file for transcription test |
3131

@@ -54,7 +54,7 @@ The export script supports quantizing encoder and decoder linear layers using [t
5454
|--------|-------------|----------|
5555
| `4w` | 4-bit weight only quantization | CUDA |
5656
| `8w` | 8-bit weight only quantization | CUDA |
57-
| `8da4w` | 8-bit dynamic activation, 4-bit weight | CUDA |
57+
| `8da4w` | 8-bit dynamic activation, 4-bit weight | Vulkan, CUDA |
5858
| `8da8w` | 8-bit dynamic activation, 8-bit weight | CUDA |
5959
| `fpa4w` | Floating point activation, 4-bit weight | Metal |
6060

@@ -70,6 +70,26 @@ python export_parakeet_tdt.py \
7070
--output-dir ./parakeet_quantized_xnnpack
7171
```
7272

73+
#### Example: Dynamic Quantization for Vulkan
74+
75+
```bash
76+
python export_parakeet_tdt.py \
77+
--backend vulkan \
78+
--qlinear_encoder 8da4w \
79+
--qlinear_encoder_group_size 32 \
80+
--qlinear 8da4w \
81+
--qlinear_group_size 32 \
82+
--vulkan_force_fp16 \
83+
--output-dir ./parakeet_quantized_vulkan
84+
```
85+
86+
An additional `--vulkan_force_fp16` flag is available to have the Vulkan backend
87+
internally downcast FP32 tensors to FP16 within the Vulkan backend, forcing
88+
half-precision computation. Note that input/output tensors are still FP32, and
89+
the delegate will automatically convert them to/from FP16 upon entering and
90+
exiting the delegate. This will significantly improve latency but may slightly
91+
reduce transcription accuracy.
92+
7393
#### Example: 4-bit Weight Quantization with Tile Packing for CUDA
7494

7595
```bash
@@ -186,6 +206,9 @@ make parakeet-cpu
186206
# Metal build (macOS)
187207
make parakeet-metal
188208

209+
# Vulkan build (Linux / Android)
210+
make parakeet-vulkan
211+
189212
# CUDA build (Linux)
190213
make parakeet-cuda
191214
```
@@ -216,6 +239,12 @@ DYLD_LIBRARY_PATH=/usr/lib ./cmake-out/examples/models/parakeet/parakeet_runner
216239
--audio_path /path/to/audio.wav \
217240
--tokenizer_path examples/models/parakeet/parakeet_metal/tokenizer.model
218241

242+
# Vulkan
243+
./cmake-out/examples/models/parakeet/parakeet_runner \
244+
--model_path examples/models/parakeet/parakeet_vulkan/model.pte \
245+
--audio_path /path/to/audio.wav \
246+
--tokenizer_path examples/models/parakeet/parakeet_vulkan/tokenizer.model
247+
219248
# CUDA (include .ptd data file)
220249
./cmake-out/examples/models/parakeet/parakeet_runner \
221250
--model_path examples/models/parakeet/parakeet_cuda/model.pte \

0 commit comments

Comments
 (0)