@@ -91,7 +91,7 @@ <h1>MOSS-TTS-Nano</h1>
9191 < div class ="meta-grid ">
9292 < div class ="meta-item ">
9393 < p class ="meta-label "> Parameters</ p >
94- < p class ="meta-value "> 100 M</ p >
94+ < p class ="meta-value "> ~ 100 M</ p >
9595 </ div >
9696 < div class ="meta-item ">
9797 < p class ="meta-label "> Audio Supported</ p >
@@ -150,7 +150,7 @@ <h2>Key Features</h2>
150150 < div class ="feature-item ">
151151 < p class ="feature-label "> Audio Quality</ p >
152152 < p class ="feature-value "> 48 kHz Stereo</ p >
153- < p class ="feature-desc "> Native 2-channel output at full 48 kHz sample rate.</ p >
153+ < p class ="feature-desc "> Native 2-channel input and output at full 48 kHz sample rate.</ p >
154154 </ div >
155155 < div class ="feature-item ">
156156 < p class ="feature-label "> Languages</ p >
@@ -202,11 +202,11 @@ <h2>Architecture</h2>
202202 < div class ="body-text ">
203203 < h3 > MOSS-Audio-Tokenizer</ h3 >
204204 < p class ="arch-copy ">
205- The tokenizer is a causal Transformer codec that compresses
205+ The tokenizer is a causal Transformer audio codec that compresses
206206 < strong > 48 kHz stereo</ strong > audio into a < strong > 12.5 fps</ strong >
207207 RVQ token stream for scalable autoregressive modeling. In the
208208 report, the encoder and decoder each contain 12 causal
209- Transformer blocks with 10-second sliding-window attention, and
209+ Transformer blocks with sliding-window attention, and
210210 the quantizer uses 16 RVQ layers so the token sequence remains
211211 compact enough for long-context generation.
212212 </ p >
0 commit comments