Skip to content

Commit 55ee8d0

Browse files
ogad-tetherclaude
andcommitted
tts-cpp: chatterbox-mtl — q8 KV-cache e2e coverage in test_multilingual_synth
Adds end-to-end regression coverage for the q8-on-GPU align-cast fix, on top of the op-level test_metal_ops sentinel: - test_multilingual_synth: new --kv-cache-type passthrough (+ CHATTERBOX_KV_CACHE_TYPE env fallback), forwarded to the CLI. Lets the existing MTL synth harness — which already synthesizes on the GPU, writes a WAV, and validates audio sanity (RMS/peak/clipping/silence) — exercise the quantized KV path. - CMake: register mtl-synth-q8-<lang> for a diverse script subset (en/ar/ru/hi = Latin/Arabic/Cyrillic/Devanagari), labelled `mtl-q8`. On a Metal fleet these hit the exact path that used to SIGABRT ("unsupported op 'CONT'") before the fix; the script diversity doubles as cross-language alignment coverage under quantization. Same env model wiring (CHATTERBOX_T3_MTL / CHATTERBOX_S3GEN) as the f32 mtl-synth-* tests. Local validation (stock ggml Metal, M2) via an Engine-level WAV harness: q8 on Metal produces real audio across en/es/fr/de/ar/ru/hi/ko (no crash, finite, non-clipped); q8-vs-f32 alignment tracks closely (e.g. ar q8 5.12s/rms .051 vs f32 5.00s/rms .051). Refs QVAC-19557 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent c91d49f commit 55ee8d0

2 files changed

Lines changed: 45 additions & 0 deletions

File tree

tts-cpp/CMakeLists.txt

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -745,6 +745,41 @@ if (TTS_CPP_BUILD_TESTS)
745745
endif()
746746
endforeach()
747747

748+
# QVAC-19557: q8_0 KV-cache regression for the MTL variant. A quantized KV
749+
# cache used to SIGABRT on Metal ("unsupported op 'CONT'") via the alignment
750+
# probe; the dequantizing-cast fix recovers q8 KV on the GPU. These run the
751+
# full synth with --kv-cache-type q8_0 on whatever backend the fleet has
752+
# (Metal fleets exercise the original crash path) and validate the WAV is
753+
# real audio (RMS/peak/clipping/duration) — i.e. an end-to-end regression
754+
# the op-level test_metal_ops sentinel can't cover. A diverse script subset
755+
# (Latin/Arabic/Cyrillic/Devanagari) doubles as cross-language alignment
756+
# coverage under quantization. Labelled `mtl-q8` so a Metal fleet can select
757+
# them; same env model wiring (CHATTERBOX_T3_MTL / CHATTERBOX_S3GEN) as the
758+
# f32 mtl-synth-* tests above.
759+
set(_mtl_q8_phrases
760+
"en|Hello, this is a multilingual text-to-speech test."
761+
"ar|مرحبًا، هذا اختبار متعدد اللغات لتحويل النص إلى كلام."
762+
"ru|Привет, это многоязычный тест синтеза речи."
763+
"hi|नमस्ते, यह एक बहुभाषी वाक् संश्लेषण परीक्षण है।"
764+
)
765+
foreach(_entry IN LISTS _mtl_q8_phrases)
766+
string(REPLACE "|" ";" _parts "${_entry}")
767+
list(GET _parts 0 _lang)
768+
list(GET _parts 1 _text)
769+
add_test(
770+
NAME mtl-synth-q8-${_lang}
771+
COMMAND $<TARGET_FILE:test-multilingual-synth>
772+
--lang "${_lang}"
773+
--text "${_text}"
774+
--kv-cache-type q8_0
775+
--out "${_mtl_out_dir}/${_lang}-q8.wav"
776+
)
777+
set_tests_properties(mtl-synth-q8-${_lang} PROPERTIES
778+
LABELS "multilingual;mtl-q8"
779+
TIMEOUT 180
780+
)
781+
endforeach()
782+
748783
# End-to-end EOS round-trip regression. Drives tts-cli to
749784
# synthesize a set of English phrases, transcribes with whisper-cli, and
750785
# asserts the transcription is close to the input (CER guard -> catches

tts-cpp/test/test_multilingual_synth.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ void usage(const char * prog) {
170170
struct Args {
171171
std::string t3_path, s3gen_path, lang, text, out_path;
172172
std::string mecab_dict, cangjie_tsv;
173+
std::string kv_cache_type; // "" -> CLI default (f32); "q8_0"/"f16" exercise the quantized KV path
173174
int seed = 42;
174175
int n_gpu_layers = 99;
175176
bool verbose = false;
@@ -189,6 +190,7 @@ bool parse_args(int argc, char ** argv, Args & args) {
189190
else if (a == "--out") { auto v = next("--out"); if (!v) return false; args.out_path = v; }
190191
else if (a == "--seed") { auto v = next("--seed"); if (!v) return false; args.seed = std::atoi(v); }
191192
else if (a == "--n-gpu-layers") { auto v = next("--n-gpu-layers"); if (!v) return false; args.n_gpu_layers = std::atoi(v); }
193+
else if (a == "--kv-cache-type") { auto v = next("--kv-cache-type"); if (!v) return false; args.kv_cache_type = v; }
192194
else if (a == "--verbose" || a == "-v") { args.verbose = true; }
193195
else if (a == "--mecab-dict") { auto v = next("--mecab-dict"); if (!v) return false; args.mecab_dict = v; }
194196
else if (a == "--cangjie-tsv") { auto v = next("--cangjie-tsv"); if (!v) return false; args.cangjie_tsv = v; }
@@ -215,6 +217,10 @@ void resolve_env_fallbacks(Args & args) {
215217
const char * env = std::getenv("CHATTERBOX_CANGJIE_TSV");
216218
if (env && *env) args.cangjie_tsv = env;
217219
}
220+
if (args.kv_cache_type.empty()) {
221+
const char * env = std::getenv("CHATTERBOX_KV_CACHE_TYPE");
222+
if (env && *env) args.kv_cache_type = env;
223+
}
218224
}
219225

220226
int check_language_registry(const std::string & lang) {
@@ -243,6 +249,10 @@ int run_synthesis(const Args & args) {
243249
"--n-gpu-layers", std::to_string(args.n_gpu_layers),
244250
};
245251
if (args.verbose) cli_args.push_back("--verbose");
252+
if (!args.kv_cache_type.empty()) {
253+
cli_args.push_back("--kv-cache-type");
254+
cli_args.push_back(args.kv_cache_type);
255+
}
246256
if (!args.mecab_dict.empty()) {
247257
cli_args.push_back("--mecab-dict");
248258
cli_args.push_back(args.mecab_dict);

0 commit comments

Comments
 (0)