Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ jobs:
run: |
python -m pip install pytest
python -m pip install -e .
python -m pip install -r requirements-dev.txt

- name: Build documentation
run: |
mkdocs build --strict

- name: Run Python tests (TTS)
run: |
Expand Down
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,16 @@ mx.save_safetensors("./8bit/kokoro-v1_0.safetensors", weights, metadata={"format
- For the web interface and API:
- FastAPI
- Uvicorn


## Documentation

To build the documentation locally:

```bash
pip install -r requirements-dev.txt
mkdocs build --strict
```

## License

[MIT License](LICENSE)
Expand All @@ -227,3 +236,4 @@ mx.save_safetensors("./8bit/kokoro-v1_0.safetensors", weights, metadata={"format
- Thanks to the Apple MLX team for providing a great framework for building TTS and STS models.
- This project uses the Kokoro model architecture for text-to-speech synthesis.
- The 3D visualization uses Three.js for rendering.

8 changes: 8 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# API Overview

MLX-Audio exposes several modules for generating speech and running a web server.

- `mlx_audio.tts.generate` – command line entry point and Python functions for TTS generation.
- `mlx_audio.server` – launch the interactive web interface and REST API.

For full details see the source code and docstrings.
13 changes: 13 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# MLX-Audio Documentation

Welcome to **MLX-Audio**, a text-to-speech (TTS) and speech-to-speech (STS) library built on Apple's MLX framework.

## Features

- Fast inference on Apple Silicon
- Multiple language and voice options
- Adjustable speaking speed from 0.5x to 2.0x
- Interactive web interface with 3D visualization
- REST API for TTS generation
- Quantization support for optimized performance

33 changes: 33 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Usage

## Installation

```bash
pip install mlx-audio

# For web interface and API dependencies
pip install -r requirements.txt
```

## Command Line Example

```bash
# Basic usage
mlx_audio.tts.generate --text "Hello, world"

# Specify prefix for output file
mlx_audio.tts.generate --text "Hello, world" --file_prefix hello

# Adjust speaking speed
mlx_audio.tts.generate --text "Hello, world" --speed 1.4
```

## Python Example

```python
from mlx_audio.tts.generate import generate_audio

text = "The MLX King lives. Let him cook!"
generate_audio(text=text)
```

10 changes: 10 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
site_name: MLX-Audio
nav:
- Home: index.md
- Usage: usage.md
- API: api.md
markdown_extensions:
- toc
- tables
plugins:
- search
1 change: 0 additions & 1 deletion mlx_audio/tts/models/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ class GenerationResult:
sample_rate: int
segment_idx: int
token_count: int
audio_samples: int
audio_duration: str
real_time_factor: float
prompt: dict
Expand Down
3 changes: 3 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mkdocs>=1.6.0
mkdocs-material>=9.5.13

2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ einops==0.8.1
einx==0.3.0
fastrtc[vad, stt]
webrtcvad>=2.0.10
dacite>=1.9.2
dacite>=1.9.2