[Parakeet] Add Vulkan backend documentation and fix CMake build

SS-JIA · SS-JIA · commit eaef2edea7a6 · 2026-04-15T13:18:39.000-04:00
Summary:
Add Vulkan backend documentation to the Parakeet README covering export
commands, quantization options, build instructions, and runner examples.

Guard `quantized_ops_lib` and `custom_ops` link targets with
`if(TARGET ...)` in CMakeLists.txt. These targets don't exist in
Vulkan-only or XNNPACK-only builds, causing a hard CMake configure
error from `target_link_options()`. This matches the existing pattern
used for `optimized_native_cpu_ops_lib`.

Validated on Samsung S24 (Adreno 750), 8da4w quantization, test_audio.wav (7.2s):

| Metric         | XNNPACK (686 MB) | Vulkan (781 MB) | Vulkan fp16 (550 MB) |
|----------------|-------------------|-----------------|----------------------|
| Inference      | 0.56s             | 0.46s           | 0.32s                |
| Encoder speed  | 188 tok/s         | 275 tok/s       | 360 tok/s            |
| Decoder speed  | 657 tok/s         | 373 tok/s       | 746 tok/s            |

Authored by Claude (Anthropic)
diff --git a/examples/models/parakeet/README.md b/examples/models/parakeet/README.md
@@ -25,7 +25,7 @@ python export_parakeet_tdt.py --audio /path/to/audio.wav
 | Argument | Description |
 |----------|-------------|
 | `--output-dir` | Output directory for exports (default: `./parakeet_tdt_exports`) |
-| `--backend` | Backend for acceleration: `portable`, `xnnpack`, `metal`, `mlx`, `cuda`, `cuda-windows` (default: `xnnpack`) |
+| `--backend` | Backend for acceleration: `portable`, `xnnpack`, `vulkan`, `metal`, `mlx`, `cuda`, `cuda-windows` (default: `xnnpack`) |
 | `--dtype` | Data type: `fp32`, `bf16`, `fp16` (default: `fp32`). Metal backend supports `fp32` and `bf16` only (no `fp16`). |
 | `--audio` | Path to audio file for transcription test |
 
@@ -54,7 +54,7 @@ The export script supports quantizing encoder and decoder linear layers using [t
 |--------|-------------|----------|
 | `4w` | 4-bit weight only quantization | CUDA, MLX, XNNPACK (embedding only) |
 | `8w` | 8-bit weight only quantization | CUDA, MLX, XNNPACK (embedding only) |
-| `8da4w` | 8-bit dynamic activation, 4-bit weight | XNNPACK |
+| `8da4w` | 8-bit dynamic activation, 4-bit weight | Vulkan, XNNPACK |
 | `8da8w` | 8-bit dynamic activation, 8-bit weight | XNNPACK |
 | `fpa4w` | Floating point activation, 4-bit weight | Metal |
 | `nvfp4` | 4-bit weight only quantization using NVIDIA's FP4 dtype | MLX |
@@ -71,6 +71,21 @@ python export_parakeet_tdt.py \
     --output-dir ./parakeet_quantized_xnnpack
 ```
 
+#### Example: Dynamic Quantization for Vulkan
+
+```bash
+python export_parakeet_tdt.py \
+    --backend vulkan \
+    --qlinear_encoder 8da4w \
+    --qlinear_encoder_group_size 32 \
+    --qlinear 8da4w \
+    --qlinear_group_size 32 \
+    --vulkan_force_fp16 \
+    --output-dir ./parakeet_quantized_vulkan
+```
+
+An additional `--vulkan_force_fp16` flag is available to have the Vulkan backend internally downcast FP32 tensors to FP16 within the Vulkan backend, forcing half-precision computation. Note that input/output tensors are still FP32, and the delegate will automatically convert them to/from FP16 upon entering and exiting the delegate. This will significantly improve latency but may slightly reduce transcription accuracy.
+
 #### Example: 4-bit Weight Quantization with Tile Packing for CUDA
 
 ```bash
@@ -217,6 +232,9 @@ make parakeet-cpu
 # Metal build (macOS)
 make parakeet-metal
 
+# Vulkan build (Linux / Android)
+make parakeet-vulkan
+
 # CUDA build (Linux)
 make parakeet-cuda
 
@@ -250,6 +268,12 @@ DYLD_LIBRARY_PATH=/usr/lib ./cmake-out/examples/models/parakeet/parakeet_runner
   --audio_path /path/to/audio.wav \
   --tokenizer_path examples/models/parakeet/parakeet_metal/tokenizer.model
 
+# Vulkan
+./cmake-out/examples/models/parakeet/parakeet_runner \
+  --model_path examples/models/parakeet/parakeet_vulkan/model.pte \
+  --audio_path /path/to/audio.wav \
+  --tokenizer_path examples/models/parakeet/parakeet_vulkan/tokenizer.model
+
 # CUDA (include .ptd data file)
 ./cmake-out/examples/models/parakeet/parakeet_runner \
   --model_path examples/models/parakeet/parakeet_cuda/model.pte \