Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ The first startup may spend extra time downloading assets if `models/` does not

An Android ONNX Runtime smoke example is available under [`examples/android_onnx_runtime`](./examples/android_onnx_runtime).

The example loads the exported MOSS-TTS-Nano ONNX graphs and the MOSS-Audio-Tokenizer-Nano ONNX decoder on device, synthesizes short pre-tokenized demo prompts, and writes a WAV file from Android. It is intentionally minimal and keeps model files outside the APK for local testing.
The example loads the exported MOSS-TTS-Nano ONNX graphs and the MOSS-Audio-Tokenizer-Nano ONNX decoder on device, tokenizes custom text with a small Kotlin tokenizer, and writes a WAV file from Android. It is intentionally minimal and keeps model files outside the APK for local testing.

### Export TTS-only ONNX Weights

Expand Down
2 changes: 1 addition & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ python app_onnx.py \

Android ONNX Runtime smoke 示例位于 [`examples/android_onnx_runtime`](./examples/android_onnx_runtime)。

该示例会在 Android 设备端加载导出的 MOSS-TTS-Nano ONNX 图和 MOSS-Audio-Tokenizer-Nano ONNX 解码器,合成短的预分词 demo prompt,并写出 WAV 文件。示例刻意保持最小化,并将模型文件保留在 APK 外部,便于本地测试。
该示例会在 Android 设备端加载导出的 MOSS-TTS-Nano ONNX 图和 MOSS-Audio-Tokenizer-Nano ONNX 解码器,通过小型 Kotlin tokenizer 对自定义文本分词,并写出 WAV 文件。示例刻意保持最小化,并将模型文件保留在 APK 外部,便于本地测试。

### 导出仅 TTS 的 ONNX 权重

Expand Down
25 changes: 17 additions & 8 deletions examples/android_onnx_runtime/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It intentionally stays minimal:
- no model files committed to git
- no app-specific business logic

The demo synthesizes two pre-tokenized prompts so the ONNX path can be tested without adding a large SentencePiece JNI dependency to the first Android example.
The demo includes a small pure Kotlin tokenizer for `tokenizer.model`, so you can synthesize custom text directly on Android without adding a SentencePiece JNI dependency.

## Model Files

Expand Down Expand Up @@ -58,7 +58,7 @@ adb push MOSS-Audio-Tokenizer-Nano-ONNX \

Open `examples/android_onnx_runtime` in Android Studio, connect a device, and run the `app` configuration.

Tap either demo button. The app writes a WAV file to its cache directory and prints the output path on screen.
Type custom text and tap `Generate custom text WAV`, or tap either pre-tokenized demo button. The app writes a WAV file to its cache directory and prints the output path on screen.

The sample uses:

Expand All @@ -69,19 +69,28 @@ The sample uses:

## Custom Text

For custom text input, tokenize with `tokenizer.model` using the same SentencePiece model used by the Python ONNX runtime, then pass the resulting token ids into `MossOnnxDemoEngine.synthesize`.
Custom text is handled by `SimpleSentencePieceTokenizer`, which reads the exported `tokenizer.model` and returns the text token ids used by `MossOnnxDemoEngine.synthesize`.

For a production Android app, add one of the following tokenizer paths:
You can also call the engine directly:

- a small SentencePiece JNI wrapper
- a pre-tokenization service or build step
- another Android-compatible SentencePiece implementation
```kotlin
MossOnnxDemoEngine(
modelRoot = modelRoot,
outputDir = cacheDir,
).use { engine ->
engine.synthesizeText(
text = "Hello world!",
outputFile = File(cacheDir, "custom.wav"),
)
}
```

The ONNX Runtime code is independent from the tokenizer as long as it receives the correct `IntArray` token ids.
The tokenizer intentionally implements only the inference-time pieces needed by the exported Nano `tokenizer.model`: Java NFKC-style normalization, whitespace escaping, Unigram segmentation, and BPE merge ranking. It does not interpret the full SentencePiece `precompiled_charsmap`, so compare its output against the Python tokenizer first if you replace the tokenizer model or rely on unusual normalization rules.

## Notes

- Start with `cpuThreads = 2` or `cpuThreads = 4`; device thermal behavior varies.
- The demo caps generation to `maxFrames = 160` for faster smoke testing.
- The decoded ONNX codec output is stereo; this example averages channels and writes a mono WAV for simplicity.
- Keep the model files outside the APK for local testing. Bundling them into app assets is possible but increases APK size substantially.
- Unit tests use a handcrafted tokenizer fixture by default. To compare against a real Nano tokenizer locally, run `MOSS_TOKENIZER_MODEL=/path/to/tokenizer.model ./gradlew :app:testDebugUnitTest --rerun-tasks`.
8 changes: 8 additions & 0 deletions examples/android_onnx_runtime/app/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import org.gradle.api.tasks.testing.Test

plugins {
id("com.android.application")
id("org.jetbrains.kotlin.android")
Expand Down Expand Up @@ -27,4 +29,10 @@ android {

dependencies {
implementation("com.microsoft.onnxruntime:onnxruntime-android:1.20.0")

testImplementation("junit:junit:4.13.2")
}

tasks.withType<Test>().configureEach {
inputs.property("MOSS_TOKENIZER_MODEL", System.getenv("MOSS_TOKENIZER_MODEL").orEmpty())
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ import android.app.Activity
import android.os.Bundle
import android.os.Handler
import android.os.Looper
import android.text.InputType
import android.view.ViewGroup
import android.widget.Button
import android.widget.EditText
import android.widget.LinearLayout
import android.widget.ScrollView
import android.widget.TextView
Expand All @@ -14,6 +16,8 @@ import java.io.File
class MainActivity : Activity() {
private val mainHandler = Handler(Looper.getMainLooper())
private lateinit var logView: TextView
private lateinit var customTextInput: EditText
private lateinit var generateCustomButton: Button
private lateinit var generateEnglishButton: Button
private lateinit var generateChineseButton: Button

Expand All @@ -25,6 +29,18 @@ class MainActivity : Activity() {
textSize = 14f
setTextIsSelectable(true)
}
customTextInput = EditText(this).apply {
setText("Hello world!")
hint = "Custom text"
inputType = InputType.TYPE_CLASS_TEXT or InputType.TYPE_TEXT_FLAG_MULTI_LINE
minLines = 2
}
generateCustomButton = Button(this).apply {
text = "Generate custom text WAV"
setOnClickListener {
runCustomText(customTextInput.text.toString())
}
}
generateEnglishButton = Button(this).apply {
text = "Generate English demo WAV"
setOnClickListener {
Expand All @@ -41,6 +57,8 @@ class MainActivity : Activity() {
val content = LinearLayout(this).apply {
orientation = LinearLayout.VERTICAL
setPadding(32, 32, 32, 32)
addView(customTextInput)
addView(generateCustomButton)
addView(generateEnglishButton)
addView(generateChineseButton)
addView(
Expand All @@ -53,7 +71,46 @@ class MainActivity : Activity() {
}
setContentView(ScrollView(this).apply { addView(content) })
appendLog("Place model files under:\n${modelRoot().absolutePath}")
appendLog("Tap a button to synthesize a short pre-tokenized demo prompt.")
appendLog("Enter text or tap a pre-tokenized demo prompt.")
}

private fun runCustomText(text: String) {
val trimmedText = text.trim()
if (trimmedText.isEmpty()) {
appendLog("[custom] text is empty")
return
}
setButtonsEnabled(false)
appendLog("\n[custom] starting synthesis: $trimmedText")
Thread {
try {
val outputFile = File(cacheDir, "moss_tts_nano_android_custom.wav")
MossOnnxDemoEngine(
modelRoot = modelRoot(),
outputDir = cacheDir,
cpuThreads = 2,
).use { engine ->
val result = engine.synthesizeText(
text = trimmedText,
outputFile = outputFile,
voice = "Junhao",
maxFrames = 160,
seed = 1234L,
)
appendLogFromWorker(
"[custom] done: ${result.outputFile.absolutePath}\n" +
"frames=${result.generatedFrames} " +
"sampleRate=${result.sampleRate}Hz " +
"durationMs=${result.durationMs} " +
"elapsedMs=${result.elapsedMs}",
)
}
} catch (error: Throwable) {
appendLogFromWorker("[custom] failed: ${error.javaClass.simpleName}: ${error.message}")
} finally {
mainHandler.post { setButtonsEnabled(true) }
}
}.start()
}

private fun runDemo(label: String, textTokenIds: IntArray) {
Expand Down Expand Up @@ -95,6 +152,7 @@ class MainActivity : Activity() {
}

private fun setButtonsEnabled(enabled: Boolean) {
generateCustomButton.isEnabled = enabled
generateEnglishButton.isEnabled = enabled
generateChineseButton.isEnabled = enabled
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ class MossOnnxDemoEngine(
private val codecMeta = CodecMeta.fromJson(readJson(codecMetaPath))
private val ttsDir = ttsMetaPath.parentFile ?: manifestDir
private val codecDir = codecMetaPath.parentFile ?: manifestDir
private val textTokenizer by lazy {
SimpleSentencePieceTokenizer.fromFile(File(ttsDir, "tokenizer.model"))
}
private val sessionOptions = OrtSession.SessionOptions().apply {
setOptimizationLevel(OrtSession.SessionOptions.OptLevel.ALL_OPT)
setIntraOpNumThreads(cpuThreads.coerceAtLeast(1))
Expand Down Expand Up @@ -65,6 +68,22 @@ class MossOnnxDemoEngine(
)
}

fun synthesizeText(
text: String,
outputFile: File = File(outputDir, "moss_tts_nano_android_custom.wav"),
voice: String = "Junhao",
maxFrames: Int = 160,
seed: Long = 1234L,
): SynthesisResult {
return synthesize(
textTokenIds = textTokenizer.encode(text),
outputFile = outputFile,
voice = voice,
maxFrames = maxFrames,
seed = seed,
)
}

override fun close() {
codecDecodeSession.close()
localFixedFrameSession.close()
Expand Down
Loading