Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions apps/speech/screens/SpeechToTextScreen.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ export const SpeechToTextScreen = ({ onBack }: { onBack: () => void }) => {
const [liveTranscribing, setLiveTranscribing] = useState(false);
const scrollViewRef = useRef<ScrollView>(null);

const recorder = new AudioRecorder();
const recorder = useRef(new AudioRecorder());

useEffect(() => {
AudioManager.setAudioSessionOptions({
Expand Down Expand Up @@ -115,7 +115,7 @@ export const SpeechToTextScreen = ({ onBack }: { onBack: () => void }) => {

const sampleRate = 16000;

recorder.onAudioReady(
recorder.current.onAudioReady(
{
sampleRate,
bufferLength: 0.1 * sampleRate,
Expand All @@ -131,7 +131,7 @@ export const SpeechToTextScreen = ({ onBack }: { onBack: () => void }) => {
if (!success) {
console.warn('Cannot start audio session correctly');
}
const result = recorder.start();
const result = recorder.current.start();
if (result.status === 'error') {
console.warn('Recording problems: ', result.message);
}
Expand Down Expand Up @@ -177,7 +177,7 @@ export const SpeechToTextScreen = ({ onBack }: { onBack: () => void }) => {
const handleStopTranscribeFromMicrophone = () => {
isRecordingRef.current = false;

recorder.stop();
recorder.current.stop();
model.streamStop();
console.log('Live transcription stopped');
setLiveTranscribing(false);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Since speech-to-text models can only process audio segments up to 30 seconds lon

`useSpeechToText` takes [`SpeechToTextProps`](../../06-api-reference/interfaces/SpeechToTextProps.md) that consists of:

- `model` of type [`SpeechToTextConfig`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md), containing the [`isMultilingual` flag](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual), [tokenizer source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource), [encoder source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#encodersource), and [decoder source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#decodersource).
- `model` of type [`SpeechToTextConfig`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md), containing the [`isMultilingual` flag](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual), [tokenizer source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource) and [model source](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#modelsource).
- An optional flag [`preventLoad`](../../06-api-reference/interfaces/SpeechToTextProps.md#preventload) which prevents auto-loading of the model.

You need more details? Check the following resources:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,7 @@ Create an instance of [`SpeechToTextModule`](../../06-api-reference/classes/Spee
- [`model`](../../06-api-reference/classes/SpeechToTextModule.md#model) - Object containing:
- [`isMultilingual`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#ismultilingual) - Flag indicating if model is multilingual.

- [`encoderSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#encodersource) - The location of the used encoder.

- [`decoderSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#decodersource) - The location of the used decoder.
- [`modelSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#modelsource) - The location of the used model (bundled encoder + decoder functionality).

- [`tokenizerSource`](../../06-api-reference/interfaces/SpeechToTextModelConfig.md#tokenizersource) - The location of the used tokenizer.

Expand Down
32 changes: 32 additions & 0 deletions docs/docs/06-api-reference/interfaces/TextToSpeechPhonemeInput.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Interface: TextToSpeechPhonemeInput

Defined in: [types/tts.ts:103](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L103)

Text to Speech module input for pre-computed phonemes.
Use this when you have your own phonemizer (e.g. the Python `phonemizer`
library, espeak-ng, or any custom G2P system) and want to bypass the
built-in phonemizer pipeline.

## Extended by

- [`TextToSpeechStreamingPhonemeInput`](TextToSpeechStreamingPhonemeInput.md)

## Properties

### phonemes

> **phonemes**: `string`

Defined in: [types/tts.ts:104](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L104)

pre-computed IPA phoneme string

---

### speed?

> `optional` **speed**: `number`

Defined in: [types/tts.ts:105](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L105)

optional speed argument - the higher it is, the faster the speech becomes
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Interface: TextToSpeechStreamingCallbacks

Defined in: [types/tts.ts:189](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L189)

Shared streaming lifecycle callbacks for TTS streaming modes.

## Extended by

- [`TextToSpeechStreamingInput`](TextToSpeechStreamingInput.md)
- [`TextToSpeechStreamingPhonemeInput`](TextToSpeechStreamingPhonemeInput.md)

## Properties

### onBegin()?

> `optional` **onBegin**: () => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:190](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L190)

Called when streaming begins

#### Returns

`void` \| `Promise`\<`void`\>

---

### onEnd()?

> `optional` **onEnd**: () => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:192](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L192)

Called when streaming ends

#### Returns

`void` \| `Promise`\<`void`\>

---

### onNext()?

> `optional` **onNext**: (`audio`) => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:191](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L191)

Called after each audio chunk gets calculated.

#### Parameters

##### audio

`Float32Array`

#### Returns

`void` \| `Promise`\<`void`\>
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Interface: TextToSpeechStreamingPhonemeInput

Defined in: [types/tts.ts:214](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L214)

Streaming input definition for pre-computed phonemes.
Same as `TextToSpeechStreamingInput` but accepts `phonemes` instead of `text`.

## Extends

- [`TextToSpeechPhonemeInput`](TextToSpeechPhonemeInput.md).[`TextToSpeechStreamingCallbacks`](TextToSpeechStreamingCallbacks.md)

## Properties

### onBegin()?

> `optional` **onBegin**: () => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:190](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L190)

Called when streaming begins

#### Returns

`void` \| `Promise`\<`void`\>

#### Inherited from

[`TextToSpeechStreamingCallbacks`](TextToSpeechStreamingCallbacks.md).[`onBegin`](TextToSpeechStreamingCallbacks.md#onbegin)

---

### onEnd()?

> `optional` **onEnd**: () => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:192](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L192)

Called when streaming ends

#### Returns

`void` \| `Promise`\<`void`\>

#### Inherited from

[`TextToSpeechStreamingCallbacks`](TextToSpeechStreamingCallbacks.md).[`onEnd`](TextToSpeechStreamingCallbacks.md#onend)

---

### onNext()?

> `optional` **onNext**: (`audio`) => `void` \| `Promise`\<`void`\>

Defined in: [types/tts.ts:191](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L191)

Called after each audio chunk gets calculated.

#### Parameters

##### audio

`Float32Array`

#### Returns

`void` \| `Promise`\<`void`\>

#### Inherited from

[`TextToSpeechStreamingCallbacks`](TextToSpeechStreamingCallbacks.md).[`onNext`](TextToSpeechStreamingCallbacks.md#onnext)

---

### phonemes

> **phonemes**: `string`

Defined in: [types/tts.ts:104](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L104)

pre-computed IPA phoneme string

#### Inherited from

[`TextToSpeechPhonemeInput`](TextToSpeechPhonemeInput.md).[`phonemes`](TextToSpeechPhonemeInput.md#phonemes)

---

### speed?

> `optional` **speed**: `number`

Defined in: [types/tts.ts:105](https://github.com/software-mansion/react-native-executorch/blob/main/packages/react-native-executorch/src/types/tts.ts#L105)

optional speed argument - the higher it is, the faster the speech becomes

#### Inherited from

[`TextToSpeechPhonemeInput`](TextToSpeechPhonemeInput.md).[`speed`](TextToSpeechPhonemeInput.md#speed)
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
#include <rnexecutorch/metaprogramming/TypeConcepts.h>
#include <rnexecutorch/models/object_detection/Types.h>
#include <rnexecutorch/models/ocr/Types.h>
#include <rnexecutorch/models/speech_to_text/types/Segment.h>
#include <rnexecutorch/models/speech_to_text/types/TranscriptionResult.h>
#include <rnexecutorch/models/speech_to_text/common/types/Segment.h>
#include <rnexecutorch/models/speech_to_text/common/types/TranscriptionResult.h>
#include <rnexecutorch/models/voice_activity_detection/Types.h>

using namespace rnexecutorch::models::speech_to_text::types;
using namespace rnexecutorch::models::speech_to_text;

namespace rnexecutorch::jsi_conversion {

Expand Down Expand Up @@ -513,7 +513,8 @@ inline jsi::Value getJsiValue(const Segment &seg, jsi::Runtime &runtime) {
jsi::Object wordObj(runtime);
wordObj.setProperty(
runtime, "word",
jsi::String::createFromUtf8(runtime, seg.words[i].content));
jsi::String::createFromUtf8(runtime, seg.words[i].content +
seg.words[i].punctations));
wordObj.setProperty(runtime, "start",
static_cast<double>(seg.words[i].start));
wordObj.setProperty(runtime, "end", static_cast<double>(seg.words[i].end));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
#include <string>
#include <vector>

#include "rnexecutorch/metaprogramming/ConstructorHelpers.h"
#include <ReactCommon/CallInvoker.h>
#include <executorch/extension/module/module.h>
#include <jsi/jsi.h>
#include <rnexecutorch/host_objects/JSTensorViewIn.h>
#include <rnexecutorch/host_objects/JSTensorViewOut.h>
#include <rnexecutorch/metaprogramming/ConstructorHelpers.h>
Comment thread
msluszniak marked this conversation as resolved.

namespace rnexecutorch {
namespace models {
Expand Down
Loading
Loading