Commit e34d1a3
authored
fix(kokoro): voice loading, method selection, padding, and audio safety fixes (#943)
Fixes for Kokoro TTS native code. Addresses voice data truncation,
missing Synthesizer method selection, progressive speed-up on longer
inputs, phoneme token reordering, and several additional safety fixes.
### Voice loading reads only 128 of 510 rows
`voice_` was a fixed `std::array<..., kMaxInputTokens>` (128 elements),
but `hexgrad/Kokoro-82M` voice files contain 510 rows. The remaining 382
rows were silently dropped.
Changed `voice_` to `std::vector`, sized dynamically from the file. Also
fixed an OOB in `voiceID` — upstream used `std::min(phonemes.size() - 1,
noTokens)` where `noTokens` could equal 128, indexing past the end of a
128-element array. Now uses a three-way `std::min({phonemes.size() - 1,
noTokens - 1, voice_.size() - 1})`.
### Synthesizer doesn't do method selection
`DurationPredictor` discovers and selects from
`forward_8`/`forward_32`/`forward_64`/`forward_128` based on input size,
but `Synthesizer` only knew about `forward`. Added the same discovery
and selection logic. Falls back to `"forward"` if no `forward_N` methods
exist, so older models still work.
### Audio progressively speeds up on longer inputs
The Synthesizer's attention mechanism drifts on longer input sequences
(60+ tokens), causing later phonemes to be spoken progressively faster
than the Duration Predictor intended. The DP's timing predictions are
correct, but the Synthesizer compresses later phonemes into fewer
samples.
Fixed by capping `inputTokensLimit` to 60, which forces the Partitioner
to split text into shorter chunks that the Synthesizer can render
faithfully. Each chunk is roughly one sentence (~15-20 words).
### `tokenize()` scrambles phoneme order on invalid tokens
`std::partition` was used to filter out invalid (unrecognized) phoneme
tokens, but `partition` does not preserve relative order. When any
phonemes fall outside the vocabulary, the remaining valid tokens could
be reordered, producing garbled audio.
Changed to `std::stable_partition` which preserves relative order.
### `stripAudio` unsigned integer underflow
`lbound - margin` wraps `size_t` to ~2^64 when the audio's first
non-silent sample is near the start of the buffer (i.e., `lbound <
margin`). `std::max(huge_value, 0)` returns the huge value, and
`audio.subspan()` reads out-of-bounds. This is especially triggered in
streaming mode where `paddingMs=15` (margin = 360 samples) on short
chunks.
Fixed by guarding the subtraction: `lbound > margin ? lbound - margin :
0`. Also guarded `audio.size() - 1` against empty spans.
### `isStreaming_` data race
`isStreaming_` is a plain `bool` read by `stream()` on a background
thread and written by `streamStop()` from the JS thread. Non-atomic
access is undefined behavior — the compiler may optimize away the read,
making `streamStop()` ineffective.
Changed to `std::atomic<bool>`.
### `scaleDurations` drops phonemes
When aggressively shrinking durations (many tokens, few total ticks),
individual token durations can be driven to 0 by the correction loop. A
zero-duration token is skipped by `repeatInterleave`, effectively
deleting that phoneme from the output.
Fixed by clamping each scaled duration to a minimum of 1, and guarding
the correction loop so it never drives a duration below 1. Without the
correction loop guard, the clamp is immediately undone — the priority
queue picks clamped entries (they have high remainders) and subtracts 1,
defeating the purpose.
### Misc perf
- Replace temporary pause zero-vectors with `resize()` directly on the
output
- Move-capture audio in the streaming callback instead of copying
## Changes
- `Kokoro.h` — `voice_` from fixed array to vector, `isStreaming_` to
`std::atomic<bool>`
- `Kokoro.cpp` — `loadVoice()`, `synthesize()`, `generate()`,
`stream()`, constructor token limit cap
- `DurationPredictor.cpp` — `scaleDurations()` min-1 clamp with
correction loop guard
- `Synthesizer.h` — `forwardMethods_` member
- `Synthesizer.cpp` — method discovery and selection
- `Utils.cpp` — `stable_partition` in `tokenize()`, `stripAudio`
underflow guard1 parent cc11c3e commit e34d1a3
File tree
6 files changed
+78
-58
lines changed- packages/react-native-executorch/common/rnexecutorch/models/text_to_speech/kokoro
6 files changed
+78
-58
lines changedLines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
178 | | - | |
| 178 | + | |
179 | 179 | | |
180 | 180 | | |
181 | 181 | | |
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
196 | | - | |
| 196 | + | |
Lines changed: 26 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
| 42 | + | |
| 43 | + | |
47 | 44 | | |
48 | 45 | | |
49 | 46 | | |
50 | 47 | | |
51 | | - | |
| 48 | + | |
52 | 49 | | |
53 | 50 | | |
54 | 51 | | |
55 | | - | |
| 52 | + | |
56 | 53 | | |
57 | | - | |
| 54 | + | |
58 | 55 | | |
59 | | - | |
| 56 | + | |
| 57 | + | |
60 | 58 | | |
61 | 59 | | |
62 | | - | |
63 | | - | |
| 60 | + | |
| 61 | + | |
64 | 62 | | |
65 | 63 | | |
66 | 64 | | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | | - | |
| 74 | + | |
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
| |||
92 | 93 | | |
93 | 94 | | |
94 | 95 | | |
95 | | - | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
112 | | - | |
113 | | - | |
114 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
115 | 116 | | |
116 | 117 | | |
117 | 118 | | |
| |||
149 | 150 | | |
150 | 151 | | |
151 | 152 | | |
152 | | - | |
| 153 | + | |
153 | 154 | | |
154 | 155 | | |
155 | 156 | | |
| |||
219 | 220 | | |
220 | 221 | | |
221 | 222 | | |
222 | | - | |
| 223 | + | |
| 224 | + | |
223 | 225 | | |
224 | 226 | | |
225 | 227 | | |
| |||
254 | 256 | | |
255 | 257 | | |
256 | 258 | | |
257 | | - | |
258 | | - | |
259 | | - | |
| 259 | + | |
260 | 260 | | |
261 | 261 | | |
262 | 262 | | |
| |||
Lines changed: 6 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
75 | 76 | | |
76 | 77 | | |
77 | 78 | | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
84 | 82 | | |
85 | 83 | | |
86 | | - | |
| 84 | + | |
87 | 85 | | |
88 | 86 | | |
89 | 87 | | |
90 | 88 | | |
91 | 89 | | |
92 | 90 | | |
93 | | - | |
| 91 | + | |
Lines changed: 38 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
17 | 34 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
28 | 44 | | |
29 | 45 | | |
30 | 46 | | |
| |||
54 | 70 | | |
55 | 71 | | |
56 | 72 | | |
57 | | - | |
58 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
59 | 80 | | |
60 | 81 | | |
61 | 82 | | |
62 | 83 | | |
63 | 84 | | |
64 | | - | |
| 85 | + | |
65 | 86 | | |
66 | 87 | | |
67 | 88 | | |
| |||
72 | 93 | | |
73 | 94 | | |
74 | 95 | | |
75 | | - | |
76 | | - | |
| 96 | + | |
77 | 97 | | |
78 | 98 | | |
79 | 99 | | |
80 | | - | |
81 | | - | |
| 100 | + | |
| 101 | + | |
82 | 102 | | |
83 | 103 | | |
84 | 104 | | |
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
| 54 | + | |
53 | 55 | | |
54 | 56 | | |
55 | 57 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
| 58 | + | |
| 59 | + | |
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
| 88 | + | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
0 commit comments