Skip to content

Commit c29e23d

Browse files
committed
update readme
1 parent 2905d28 commit c29e23d

2 files changed

Lines changed: 12 additions & 14 deletions

File tree

assets/moss-tts-nano.png

1.18 MB
Loading

index.html

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -234,17 +234,20 @@ <h3>MOSS-Audio-Tokenizer</h3>
234234

235235
<h3>MOSS TTS Nano</h3>
236236
<p class="arch-copy">
237-
On top of the tokenizer, MOSS-TTS-Nano uses a single Transformer
238-
backbone with RVQ-aware delayed alignment to autoregressively
239-
predict text and audio tokens together. Each delayed step sums the
240-
embeddings from all RVQ layers, and the backbone output is sent
241-
directly to <strong>17 prediction heads</strong>: one text-or-pad
242-
head plus 16 audio heads.
237+
On top of the tokenizer, MOSS-TTS-Nano can adopt a hierarchical
238+
token modeling design built around a Local Transformer. Instead of
239+
using RVQ-aware temporal delays, the model sums the embeddings from
240+
all RVQ layers at each aligned time step and feeds that hidden
241+
state into a single Transformer backbone. The backbone then
242+
produces one global latent per step, which a lightweight
243+
autoregressive <strong>Local Transformer</strong> expands into the
244+
within-step token block, sequentially predicting one text-or-pad
245+
token and 16 RVQ audio tokens.
243246
</p>
244247
<div class="arch-chip-row">
245-
<span class="arch-chip">1 backbone</span>
246-
<span class="arch-chip">17 heads</span>
247-
<span class="arch-chip">simple decode path</span>
248+
<span class="arch-chip">100 M params</span>
249+
<span class="arch-chip">Local Transformer</span>
250+
<span class="arch-chip">Tiny, Fast and Powerful</span>
248251
</div>
249252
</div>
250253

@@ -254,11 +257,6 @@ <h3>MOSS TTS Nano</h3>
254257
<section class="paper-section" id="demo">
255258
<h2>Demo</h2>
256259

257-
<p class="section-note">
258-
Each card shows the prompt speech, the text to be spoken, and the
259-
generated output side-by-side.
260-
</p>
261-
262260
<!-- Tab bar -->
263261
<div class="demo-tabs" role="tablist" aria-label="Language category">
264262
<button

0 commit comments

Comments
 (0)