Skip to content

Commit e66f1f8

Browse files
committed
fix(kokoro): stripAudio underflow, atomic streaming flag, duration floor
Three bugs found via adversarial audit: - stripAudio: unsigned underflow when lbound < margin wraps size_t to ~2^64, causing OOB subspan. Guard subtraction with comparison first. - isStreaming_: plain bool read/written from two threads (stream loop vs streamStop from JS). Changed to std::atomic<bool>. - scaleDurations: aggressive shrinking can drive individual token durations to zero, dropping phonemes from repeatInterleave. Floor each duration at 1 after scaling.
1 parent e7b1c1b commit e66f1f8

File tree

3 files changed

+7
-5
lines changed

3 files changed

+7
-5
lines changed

packages/react-native-executorch/common/rnexecutorch/models/text_to_speech/kokoro/DurationPredictor.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,8 +174,9 @@ void DurationPredictor::scaleDurations(Tensor &durations, size_t nTokens,
174174
float remainder =
175175
shrinking ? std::ceil(scaled) - scaled : scaled - std::floor(scaled);
176176

177-
durationsPtr[i] = static_cast<int64_t>(shrinking ? std::ceil(scaled)
178-
: std::floor(scaled));
177+
durationsPtr[i] = std::max(1LL,
178+
static_cast<int64_t>(shrinking ? std::ceil(scaled)
179+
: std::floor(scaled)));
179180
scaledSum += durationsPtr[i];
180181

181182
// Keeps the entries sorted by the remainders

packages/react-native-executorch/common/rnexecutorch/models/text_to_speech/kokoro/Kokoro.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
#pragma once
22

33
#include <array>
4+
#include <atomic>
45
#include <memory>
56
#include <optional>
67
#include <string>
@@ -80,7 +81,7 @@ class Kokoro {
8081
std::vector<std::array<float, constants::kVoiceRefSize>> voice_;
8182

8283
// Extra control variables
83-
bool isStreaming_ = false;
84+
std::atomic<bool> isStreaming_{false};
8485
};
8586
} // namespace models::text_to_speech::kokoro
8687

packages/react-native-executorch/common/rnexecutorch/models/text_to_speech/kokoro/Utils.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,8 @@ std::span<const float> stripAudio(std::span<const float> audio, size_t margin) {
5555
auto lbound = findAudioBound<false>(audio);
5656
auto rbound = findAudioBound<true>(audio);
5757

58-
lbound = std::max(lbound - margin, size_t(0));
59-
rbound = std::min(rbound + margin, audio.size() - 1);
58+
lbound = lbound > margin ? lbound - margin : 0;
59+
rbound = std::min(rbound + margin, audio.size() > 0 ? audio.size() - 1 : 0);
6060

6161
return audio.subspan(lbound, rbound >= lbound ? rbound - lbound + 1 : 0);
6262
}

0 commit comments

Comments
 (0)