Commit 1a43ae6
tts-cpp: chatterbox-mtl — fall back to f32 KV on GPU (no quantized CONT kernel)
The multilingual (MTL) Chatterbox variant SIGABRTs during synthesis on the
Metal backend with a quantized (q8_0/q4_0) KV cache:
ggml_metal_op_encode_impl: error: unsupported op 'CONT'
...ggml-metal-ops.cpp:203: unsupported op (in eval_step_mtl)
Root cause: the MTL variant's batched-CFG (B=2) decode reads the token-major
K/V cache as a 4D strided view (build_llama_block), which the GPU flash-attn
path materialises through a CONT. ggml-metal has no CONT kernel for quantized
tensors, so any quantized KV cache crashes on Metal. The capability probe in
chatterbox_resolve_kv_type only validates flash_attn_ext, not the downstream
CONT, so it cannot catch this — which is why ggml-org#2527 (q8 KV as the default)
shipped a broken MTL Metal path undetected.
Fix: for the MTL variant, restrict a quantized KV cache to the CPU backend
(where the quantized CONT is supported) and fall back to f32 on any GPU
backend. Gating on "not CPU" rather than a backend name is deliberately robust
across ggml builds whose Metal registry name differs ("Metal" vs "MTL"), and
composes with the existing Vulkan force-f32 inside chatterbox_resolve_kv_type.
Scope: MTL variant only. The Turbo variant (separate eval path, in the gated
SDK e2e) is unaffected and keeps quantized KV on GPU — verified on Metal.
This is a correctness stopgap: it stops the crash but gives back the q8 KV
memory saving on MTL-GPU (f32 is 4x the q8 footprint, allocated eagerly at
n_ctx). A follow-up will recover GPU q8 for MTL by reworking the batched-CFG
attention to avoid the strided-quantized-view CONT, together with the missing
MTL x backend x kv-type e2e coverage.
Repro (local): eng_footprint_driver chatterbox-t3-mtl.gguf chatterbox-s3gen-mtl.gguf
ref.wav 99 q8_0 -> SIGABRT before fix, synthesizes (f32 fallback) after.
Refs QVAC-19557
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 586268b commit 1a43ae6
1 file changed
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1830 | 1830 | | |
1831 | 1831 | | |
1832 | 1832 | | |
| 1833 | + | |
| 1834 | + | |
| 1835 | + | |
| 1836 | + | |
| 1837 | + | |
| 1838 | + | |
| 1839 | + | |
| 1840 | + | |
| 1841 | + | |
| 1842 | + | |
| 1843 | + | |
| 1844 | + | |
| 1845 | + | |
| 1846 | + | |
| 1847 | + | |
| 1848 | + | |
| 1849 | + | |
| 1850 | + | |
| 1851 | + | |
| 1852 | + | |
| 1853 | + | |
1833 | 1854 | | |
1834 | 1855 | | |
1835 | 1856 | | |
| |||
0 commit comments