Commit ce9ee96
QVAC-21483 feat: add output-frequency selection (output_sample_rate) (#69)
* QVAC-21483 feat: add output-frequency selection (output_sample_rate)
Add an output_sample_rate knob to the Chatterbox and Supertonic engines
and the tts-cli / supertonic-cli CLIs so callers can request an output
frequency other than the engine's native rate (24 kHz Chatterbox / the
model metadata rate for Supertonic). The final PCM is resampled with the
existing Kaiser-windowed sinc resampler and SynthesisResult::sample_rate
reports the actual rate. 0 keeps the native rate (default; no behaviour
change); explicit rates are validated to [8000, 192000] Hz to match the
@qvac/tts-ggml addon's JS-side window.
- voice_features: validate_output_sample_rate + resample_for_output helpers
- chatterbox::Engine: batch resamples the whole utterance once, streaming
resamples per chunk (mel-hop bookkeeping stays native, so the documented
result.pcm == concat(chunks) invariant still holds)
- supertonic::Engine: resample in run_single_chunk (covers batch + streaming)
- CLIs: --output-sample-rate HZ on tts-cli (chatterbox egress + supertonic
path) and supertonic-cli
- tests: output-rate unit checks in test_resample + a new MTL-gated
test_output_sample_rate (batch / streaming / out-of-range validation)
* QVAC-21483 fix: make streaming output-rate conversion seam-free (batch-exact)
Code-review follow-up to the output_sample_rate feature. Streaming
resampled each chunk independently with the stateless whole-buffer
resample_sinc, which restarts the output grid and truncates the sinc
window at every chunk edge. That is only artifact-free when each chunk
length is an exact multiple of the resample ratio's denominator -- which
the pipeline does not guarantee (the trim-faded first chunk, the
finalized last chunk, or rates such as 11025 Hz break it). When violated
it injects a discontinuity at every seam (streamed-vs-batch SNR collapses
below 10 dB) plus a per-chunk length drift.
Add a stateful OutputResampler that buffers native PCM and emits each
output sample only once its sinc window is fully covered, flushing the
tail on finish(). A stable sample's window is identical whether computed
on the prefix or the full buffer, so concatenating every process() result
+ finish() is bit-for-bit identical to resampling the whole utterance
once: the streamed output now equals the batch resample, with no seams
and no drift, while result.pcm == concat(chunks) still holds.
- voice_features: OutputResampler (passthrough when target is 0/native)
- chatterbox::Engine streaming: one utterance-spanning resampler
- tts-cli streaming->stdout: same resampler, with inter-segment silence
fed through it so it lands in order; fix the stale "@ 24 kHz" banner
- engine.h: correct the "resampled independently / inaudible" comment
- test_resample: model-free checks that streamed == batch (bit-exact) for
misaligned / trim-faded / hostile-rate (11025) chunkings + passthrough
Verified: test-resample (24 checks) passes; the MTL-gated
test-output-sample-rate passes on real GGUFs; tts-cli streaming to a file
vs to stdout at 16 kHz now produce byte-identical PCM.
* QVAC-21483 fix: address review — Supertonic streaming output-rate batch-exact + engine coverage
Resolves GustavoA1604's review on PR #69:
1. supertonic_engine.cpp: run_single_chunk now returns NATIVE PCM. Batch
synthesize() resamples the whole utterance once at the end; streaming
synthesize_streaming() drives a single utterance-spanning OutputResampler
(seam-fade moved to the native domain). Per-chunk resampling restarted the
output grid at every chunk edge and injected seams + length drift at
non-native rates — this mirrors chatterbox::Engine so streamed output is
bit-identical to the batch resample.
2. supertonic_cli.cpp: validate --output-sample-rate at parse time against
kOutputSampleRateMin/Max (throws -> clean "error:" + exit 2), matching
chatterbox_cli / tts-cli instead of aborting later at the engine ctor.
3. test_output_sample_rate.cpp: assert streamed-16k == whole-buffer resample
of streamed-native (the OutputResampler property), the form that isolates
resampling and would catch a per-chunk regression.
4. test_output_sample_rate_supertonic.cpp (new) + CMake: sibling Supertonic
engine coverage — native rate, 16 kHz batch ratio, construction rejection,
streaming result.pcm == concat, and the streaming==batch-resample invariant.
---------
Co-authored-by: Zbigniew Herman <212399199+Zbig9000@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: GustavoA1604 <54457676+GustavoA1604@users.noreply.github.com>1 parent 28f37ea commit ce9ee96
13 files changed
Lines changed: 903 additions & 35 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
981 | 981 | | |
982 | 982 | | |
983 | 983 | | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
984 | 1009 | | |
985 | 1010 | | |
986 | 1011 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
406 | | - | |
| 406 | + | |
| 407 | + | |
407 | 408 | | |
408 | 409 | | |
409 | 410 | | |
| |||
634 | 635 | | |
635 | 636 | | |
636 | 637 | | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
637 | 649 | | |
638 | 650 | | |
639 | 651 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
196 | 210 | | |
197 | 211 | | |
198 | 212 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
119 | 128 | | |
120 | 129 | | |
121 | 130 | | |
| |||
0 commit comments