OpenMOSS · werran2 · Jun 12, 2026 · Jun 12, 2026
diff --git a/README.md b/README.md
@@ -258,7 +258,7 @@ The first startup may spend extra time downloading assets if `models/` does not
 
 An Android ONNX Runtime smoke example is available under [`examples/android_onnx_runtime`](./examples/android_onnx_runtime).
 
-The example loads the exported MOSS-TTS-Nano ONNX graphs and the MOSS-Audio-Tokenizer-Nano ONNX decoder on device, synthesizes short pre-tokenized demo prompts, and writes a WAV file from Android. It is intentionally minimal and keeps model files outside the APK for local testing.
+The example loads the exported MOSS-TTS-Nano ONNX graphs and the MOSS-Audio-Tokenizer-Nano ONNX decoder on device, tokenizes custom text with a small Kotlin tokenizer, and writes a WAV file from Android. It is intentionally minimal and keeps model files outside the APK for local testing.
 
 ### Export TTS-only ONNX Weights
 

diff --git a/README_zh.md b/README_zh.md
@@ -253,7 +253,7 @@ python app_onnx.py \
 
 Android ONNX Runtime smoke 示例位于 [`examples/android_onnx_runtime`](./examples/android_onnx_runtime)。
 
-该示例会在 Android 设备端加载导出的 MOSS-TTS-Nano ONNX 图和 MOSS-Audio-Tokenizer-Nano ONNX 解码器，合成短的预分词 demo prompt，并写出 WAV 文件。示例刻意保持最小化，并将模型文件保留在 APK 外部，便于本地测试。
+该示例会在 Android 设备端加载导出的 MOSS-TTS-Nano ONNX 图和 MOSS-Audio-Tokenizer-Nano ONNX 解码器，通过小型 Kotlin tokenizer 对自定义文本分词，并写出 WAV 文件。示例刻意保持最小化，并将模型文件保留在 APK 外部，便于本地测试。
 
 ### 导出仅 TTS 的 ONNX 权重
 

diff --git a/examples/android_onnx_runtime/README.md b/examples/android_onnx_runtime/README.md
@@ -9,7 +9,7 @@ It intentionally stays minimal:
 - no model files committed to git
 - no app-specific business logic
 
-The demo synthesizes two pre-tokenized prompts so the ONNX path can be tested without adding a large SentencePiece JNI dependency to the first Android example.
+The demo includes a small pure Kotlin tokenizer for `tokenizer.model`, so you can synthesize custom text directly on Android without adding a SentencePiece JNI dependency.
 
 ## Model Files
 
@@ -58,7 +58,7 @@ adb push MOSS-Audio-Tokenizer-Nano-ONNX \
 
 Open `examples/android_onnx_runtime` in Android Studio, connect a device, and run the `app` configuration.
 
-Tap either demo button. The app writes a WAV file to its cache directory and prints the output path on screen.
+Type custom text and tap `Generate custom text WAV`, or tap either pre-tokenized demo button. The app writes a WAV file to its cache directory and prints the output path on screen.
 
 The sample uses:
 
@@ -69,19 +69,28 @@ The sample uses:
 
 ## Custom Text
 
-For custom text input, tokenize with `tokenizer.model` using the same SentencePiece model used by the Python ONNX runtime, then pass the resulting token ids into `MossOnnxDemoEngine.synthesize`.
+Custom text is handled by `SimpleSentencePieceTokenizer`, which reads the exported `tokenizer.model` and returns the text token ids used by `MossOnnxDemoEngine.synthesize`.
 
-For a production Android app, add one of the following tokenizer paths:
+You can also call the engine directly:
 
-- a small SentencePiece JNI wrapper
-- a pre-tokenization service or build step
-- another Android-compatible SentencePiece implementation
+```kotlin
+MossOnnxDemoEngine(
+    modelRoot = modelRoot,
+    outputDir = cacheDir,
+).use { engine ->
+    engine.synthesizeText(
+        text = "Hello world!",
+        outputFile = File(cacheDir, "custom.wav"),
+    )
+}
+```
 
-The ONNX Runtime code is independent from the tokenizer as long as it receives the correct `IntArray` token ids.
+The tokenizer intentionally implements only the inference-time pieces needed by the exported Nano `tokenizer.model`: Java NFKC-style normalization, whitespace escaping, Unigram segmentation, and BPE merge ranking. It does not interpret the full SentencePiece `precompiled_charsmap`, so compare its output against the Python tokenizer first if you replace the tokenizer model or rely on unusual normalization rules.
 
 ## Notes
 
 - Start with `cpuThreads = 2` or `cpuThreads = 4`; device thermal behavior varies.
 - The demo caps generation to `maxFrames = 160` for faster smoke testing.
 - The decoded ONNX codec output is stereo; this example averages channels and writes a mono WAV for simplicity.
 - Keep the model files outside the APK for local testing. Bundling them into app assets is possible but increases APK size substantially.
+- Unit tests use a handcrafted tokenizer fixture by default. To compare against a real Nano tokenizer locally, run `MOSS_TOKENIZER_MODEL=/path/to/tokenizer.model ./gradlew :app:testDebugUnitTest --rerun-tasks`.
diff --git a/examples/android_onnx_runtime/app/build.gradle.kts b/examples/android_onnx_runtime/app/build.gradle.kts
@@ -1,3 +1,5 @@
+import org.gradle.api.tasks.testing.Test
+
 plugins {
     id("com.android.application")
     id("org.jetbrains.kotlin.android")
@@ -27,4 +29,10 @@ android {
 
 dependencies {
     implementation("com.microsoft.onnxruntime:onnxruntime-android:1.20.0")
+
+    testImplementation("junit:junit:4.13.2")
+}
+
+tasks.withType<Test>().configureEach {
+    inputs.property("MOSS_TOKENIZER_MODEL", System.getenv("MOSS_TOKENIZER_MODEL").orEmpty())
 }
diff --git a/...s/android_onnx_runtime/app/src/main/java/com/openmoss/ttsnano/onnxruntime/MainActivity.kt b/...s/android_onnx_runtime/app/src/main/java/com/openmoss/ttsnano/onnxruntime/MainActivity.kt
@@ -4,8 +4,10 @@ import android.app.Activity
 import android.os.Bundle
 import android.os.Handler
 import android.os.Looper
+import android.text.InputType
 import android.view.ViewGroup
 import android.widget.Button
+import android.widget.EditText
 import android.widget.LinearLayout
 import android.widget.ScrollView
 import android.widget.TextView
@@ -14,6 +16,8 @@ import java.io.File
 class MainActivity : Activity() {
     private val mainHandler = Handler(Looper.getMainLooper())
     private lateinit var logView: TextView
+    private lateinit var customTextInput: EditText
+    private lateinit var generateCustomButton: Button
     private lateinit var generateEnglishButton: Button
     private lateinit var generateChineseButton: Button
 
@@ -25,6 +29,18 @@ class MainActivity : Activity() {
             textSize = 14f
             setTextIsSelectable(true)
         }
+        customTextInput = EditText(this).apply {
+            setText("Hello world!")
+            hint = "Custom text"
+            inputType = InputType.TYPE_CLASS_TEXT or InputType.TYPE_TEXT_FLAG_MULTI_LINE
+            minLines = 2
+        }
+        generateCustomButton = Button(this).apply {
+            text = "Generate custom text WAV"
+            setOnClickListener {
+                runCustomText(customTextInput.text.toString())
+            }
+        }
         generateEnglishButton = Button(this).apply {
             text = "Generate English demo WAV"
             setOnClickListener {
@@ -41,6 +57,8 @@ class MainActivity : Activity() {
         val content = LinearLayout(this).apply {
             orientation = LinearLayout.VERTICAL
             setPadding(32, 32, 32, 32)
+            addView(customTextInput)
+            addView(generateCustomButton)
             addView(generateEnglishButton)
             addView(generateChineseButton)
             addView(
@@ -53,7 +71,46 @@ class MainActivity : Activity() {
         }
         setContentView(ScrollView(this).apply { addView(content) })
         appendLog("Place model files under:\n${modelRoot().absolutePath}")
-        appendLog("Tap a button to synthesize a short pre-tokenized demo prompt.")
+        appendLog("Enter text or tap a pre-tokenized demo prompt.")
+    }
+
+    private fun runCustomText(text: String) {
+        val trimmedText = text.trim()
+        if (trimmedText.isEmpty()) {
+            appendLog("[custom] text is empty")
+            return
+        }
+        setButtonsEnabled(false)
+        appendLog("\n[custom] starting synthesis: $trimmedText")
+        Thread {
+            try {
+                val outputFile = File(cacheDir, "moss_tts_nano_android_custom.wav")
+                MossOnnxDemoEngine(
+                    modelRoot = modelRoot(),
+                    outputDir = cacheDir,
+                    cpuThreads = 2,
+                ).use { engine ->
+                    val result = engine.synthesizeText(
+                        text = trimmedText,
+                        outputFile = outputFile,
+                        voice = "Junhao",
+                        maxFrames = 160,
+                        seed = 1234L,
+                    )
+                    appendLogFromWorker(
+                        "[custom] done: ${result.outputFile.absolutePath}\n" +
+                            "frames=${result.generatedFrames} " +
+                            "sampleRate=${result.sampleRate}Hz " +
+                            "durationMs=${result.durationMs} " +
+                            "elapsedMs=${result.elapsedMs}",
+                    )
+                }
+            } catch (error: Throwable) {
+                appendLogFromWorker("[custom] failed: ${error.javaClass.simpleName}: ${error.message}")
+            } finally {
+                mainHandler.post { setButtonsEnabled(true) }
+            }
+        }.start()
     }
 
     private fun runDemo(label: String, textTokenIds: IntArray) {
@@ -95,6 +152,7 @@ class MainActivity : Activity() {
     }
 
     private fun setButtonsEnabled(enabled: Boolean) {
+        generateCustomButton.isEnabled = enabled
         generateEnglishButton.isEnabled = enabled
         generateChineseButton.isEnabled = enabled
     }

diff --git a/...oid_onnx_runtime/app/src/main/java/com/openmoss/ttsnano/onnxruntime/MossOnnxDemoEngine.kt b/...oid_onnx_runtime/app/src/main/java/com/openmoss/ttsnano/onnxruntime/MossOnnxDemoEngine.kt
@@ -30,6 +30,9 @@ class MossOnnxDemoEngine(
     private val codecMeta = CodecMeta.fromJson(readJson(codecMetaPath))
     private val ttsDir = ttsMetaPath.parentFile ?: manifestDir
     private val codecDir = codecMetaPath.parentFile ?: manifestDir
+    private val textTokenizer by lazy {
+        SimpleSentencePieceTokenizer.fromFile(File(ttsDir, "tokenizer.model"))
+    }
     private val sessionOptions = OrtSession.SessionOptions().apply {
         setOptimizationLevel(OrtSession.SessionOptions.OptLevel.ALL_OPT)
         setIntraOpNumThreads(cpuThreads.coerceAtLeast(1))
@@ -65,6 +68,22 @@ class MossOnnxDemoEngine(
         )
     }
 
+    fun synthesizeText(
+        text: String,
+        outputFile: File = File(outputDir, "moss_tts_nano_android_custom.wav"),
+        voice: String = "Junhao",
+        maxFrames: Int = 160,
+        seed: Long = 1234L,
+    ): SynthesisResult {
+        return synthesize(
+            textTokenIds = textTokenizer.encode(text),
+            outputFile = outputFile,
+            voice = voice,
+            maxFrames = maxFrames,
+            seed = seed,
+        )
+    }
+
     override fun close() {
         codecDecodeSession.close()
         localFixedFrameSession.close()