Speech Separation

Introduction

A Mel-Band-Roformer Vocal model. This model performs slightly better than the paper equivalent due to training with more data.

Usage

Quick Start

GPU version (recommended)

python run.py --root datasets/clean/zh --gpus 0 1 2 3

If you want to use CPU multithreading, you can use the following command, but it will run very slowly

python run.py --root datasets/clean/zh

Config description: inference parameters.

num_overlap - Increasing this value can improve the quality of the outputs due to helping with artifacts created when putting the chunks back together. This will make inference times longer (you don't need to go higher than 8)

chunk_size - The length of audio input into the model (default is 352800 which is 8 seconds, 352800 was also used to train the model)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Separation

Introduction

Usage

Quick Start

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Speech Separation

Introduction

Usage

Quick Start