|
4 | 4 | **Objective:** Add native Apple Silicon (MLX) inference support to RVC CLI. |
5 | 5 |
|
6 | 6 | ## Accomplishments |
7 | | -1. **MLX Core Integration**: |
8 | | - * Added `mlx` dependency for macOS. |
9 | | - * Created `rvc/lib/mlx/` package containing ported modules: |
10 | | - * `modules.py`: WaveNet |
11 | | - * `attentions.py`: MultiHeadAttention, FFN |
12 | | - * `residuals.py`: ResBlock, ResidualCouplingBlock |
13 | | - * `generators.py`: HiFiGANNSFGenerator, SineGenerator |
14 | | - * `encoders.py`: TextEncoder, PosteriorEncoder |
15 | | - * `synthesizers.py`: Synthesizer (The main generator model) |
16 | | - * **Architecture Choice**: Adopted a **Hybrid Pipeline**. We rely on the existing PyTorch implementation for complex Feature Extraction (Hubert, RMVPE) to ensure compatibility and stability, and use MLX solely for the computationally expensive HiFiGAN synthesis step. |
17 | | - |
18 | | -2. **Inference Pipeline**: |
19 | | - * Implemented `VoiceConverterMLX` and `PipelineMLX` in `rvc/infer/infer_mlx.py`. |
20 | | - * Implemented on-the-fly weight conversion in `rvc/lib/mlx/convert.py` which loads a standard RVC `.pth`, fuses `weight_norm` layers, and transposes weights to match MLX's (N, L, C) layout. |
21 | | - |
22 | | -3. **CLI Integration**: |
23 | | - * Modified `rvc_cli.py` to accept `--backend mlx`. |
24 | | - * Standard usage: `python rvc_cli.py infer ... --backend mlx`. |
| 7 | + |
| 8 | +### MLX Pipeline (`--backend mlx`) ✅ COMPLETE |
| 9 | +1. **Core Components** in `rvc/lib/mlx/`: |
| 10 | + * `modules.py`: WaveNet |
| 11 | + * `attentions.py`: MultiHeadAttention, FFN |
| 12 | + * `residuals.py`: ResBlock, ResidualCouplingBlock |
| 13 | + * `generators.py`: HiFiGANNSFGenerator, SineGenerator |
| 14 | + * `encoders.py`: TextEncoder, PosteriorEncoder |
| 15 | + * `synthesizers.py`: Synthesizer |
| 16 | + * `hubert.py`: Full HuBERT encoder |
| 17 | + * `rmvpe.py`: E2E pitch detection with DeepUnet |
| 18 | + |
| 19 | +2. **Weight Converters**: |
| 20 | + * `convert.py`: RVC Synthesizer weights |
| 21 | + * `convert_hubert.py`: HuBERT embedder weights |
| 22 | + * `convert_rmvpe.py`: RMVPE pitch predictor weights |
| 23 | + |
| 24 | +3. **Custom Implementations** (MLX lacks native support): |
| 25 | + * `BiGRU`: Bidirectional GRU wrapper |
| 26 | + * `ConvTranspose1d` / `ConvTranspose2d`: Zero-insertion + convolution |
| 27 | + |
| 28 | +4. **Performance**: ~2.97s inference on Apple Silicon (comparable to PyTorch MPS) |
25 | 29 |
|
26 | 30 | ## Critical "Tidbits" for Future Sessions |
27 | 31 |
|
28 | 32 | ### 1. Model Locations |
29 | | -The user's test models are located at: |
30 | 33 | > **`/Users/mcruz/Library/Application Support/Replay/com.replay.Replay/models`** |
31 | 34 |
|
32 | | -You should verify availability of models here before running tests. |
33 | | - |
34 | 35 | ### 2. Environment Variables |
35 | | -* **`export OMP_NUM_THREADS=1`**: This is **MANDATORY** on macOS to prevent `faiss` from crashing the process with a segmentation fault. |
| 36 | +* **`export OMP_NUM_THREADS=1`**: MANDATORY on macOS to prevent `faiss` segfault. |
| 37 | + |
| 38 | +### 3. Runtime Environment |
| 39 | +* **Conda Environment**: `conda run -n rvc python rvc_cli.py ...` |
| 40 | + |
| 41 | +### 4. Weight Conversion Commands |
| 42 | +```bash |
| 43 | +# Convert Hubert weights (one-time) |
| 44 | +python rvc/lib/mlx/convert_hubert.py |
36 | 45 |
|
37 | | -### 5. Runtime Environment |
38 | | -* **Conda Environment**: All commands must be run within the `rvc` Conda environment. |
39 | | - * Example: `conda run -n rvc python rvc_cli.py ...` or `source activate rvc` before running. |
| 46 | +# Convert RMVPE weights (one-time) |
| 47 | +python rvc/lib/mlx/convert_rmvpe.py |
| 48 | +``` |
40 | 49 |
|
41 | | -### 3. Model Compatibility |
42 | | -* **Config Required**: The MLX converter expects the `.pth` file to contain a `config` key (list of hyperparameters) alongside the `weight` key. |
43 | | -* **No Pretrained-Only**: Raw training checkpoints (like `f0G40k.pth`) often lack the `config` key and will fail to load in the current MLX implementation. Use fully trained/exported RVC models. |
| 50 | +### 5. Backend Selection |
| 51 | +| Backend | Description | |
| 52 | +|---------|-------------| |
| 53 | +| `torch` | Pure PyTorch with MPS (default) | |
| 54 | +| `mlx` | Full MLX inference (Hubert, RMVPE, Synthesizer) | |
44 | 55 |
|
45 | | -### 4. Implementation Details |
46 | | -* **Data Layout**: PyTorch uses `(N, C, L)` (Channels First). MLX components were ported to use `(N, L, C)` (Channels Last) which is more native to MLX/Transformers. The converter handles this transposition. |
47 | | -* **Missing Layers**: `mlx.nn` does not yet have a `ConvTranspose1d` layer. We implemented a custom `ConvTranspose1d` in `rvc/lib/mlx/generators.py` using an upsample-and-convolve approach. |
48 | | -* **Weight Transposition**: |
49 | | - * Regular Conv1d: PyTorch `(Out, In, K)` -> MLX `(Out, K, In)`. Transpose `(0, 2, 1)`. |
50 | | - * ConvTranspose1d: PyTorch `(In, Out, K)` -> MLX `(Out, K, In)` (effectively). Transpose `(1, 2, 0)`. |
51 | | -* **Performance**: The current implementation converts weights *every time* inference is run. For production, we should implement a mechanism to save/load converted `.npz` or `.safetensors` MLX weights. |
| 56 | +### 6. Implementation Details |
| 57 | +* **Data Layout**: MLX uses `(N, L, C)` (Channels Last). |
| 58 | +* **GRU Bias**: MLX GRU has `b` (3*H) and `bhn` (H). PyTorch `bias_hh` sliced for `bhn`. |
52 | 59 |
|
53 | 60 | ## Next Steps |
54 | | -* **Final Verification**: Run a full end-to-end test using a model from the Replay directory. |
55 | | -* **Optimization**: Cache converted MLX weights to disk. |
56 | | -* **Benchmarks**: Compare MPS (PyTorch) vs MLX performance. |
| 61 | +* **Numerical Validation**: Compare output quality between backends. |
| 62 | +* **Optimization**: Profile and optimize MLX kernels if needed. |
0 commit comments