Contributing to SenseVoice

Thank you for your interest in contributing to SenseVoice! This guide will help you get started.

Getting Started

Prerequisites

Python 3.8 or higher
Git
(Optional) CUDA-compatible GPU for faster inference

Setting Up the Development Environment

Fork and clone the repository:

git clone https://github.com/<your-username>/SenseVoice.git
cd SenseVoice

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Verify the installation:

python -c "from funasr import AutoModel; print('Installation successful')"

Running on CPU

If you don't have a GPU, you can run SenseVoice on CPU by setting the device to "cpu":

model = AutoModel(
    model="iic/SenseVoiceSmall",
    trust_remote_code=True,
    device="cpu",
)

For the FastAPI server:

export SENSEVOICE_DEVICE=cpu
fastapi run --port 50000

How to Contribute

Reporting Bugs

Search existing issues to avoid duplicates.
Use the Bug Report template.
Include your environment details (OS, Python version, PyTorch version, GPU, CUDA version).
Provide a minimal code sample to reproduce the issue.

Submitting Pull Requests

Fork the repository and create a new branch from main:

git checkout -b your-branch-name

Make your changes. Keep commits focused and atomic.
Test your changes to make sure nothing is broken.
Push to your fork and open a Pull Request against FunAudioLLM/SenseVoice:main.
Describe your changes clearly in the PR description. Explain what changed and why.

Types of Contributions We Welcome

Bug fixes — Check open issues labeled bug for known problems.
Documentation improvements — Typo fixes, clarifications, additional examples, translations.
New examples — Demo scripts showing different use cases (emotion detection, event detection, multilingual transcription).
Performance improvements — Optimizations for inference speed or memory usage.
Test coverage — Unit tests and integration tests are very welcome.

Code Style

Follow existing code patterns and conventions in the repository.
Use type hints where applicable.
Add docstrings to new functions and classes.
Keep lines under 120 characters.

Project Structure

SenseVoice/
├── model.py          # Core SenseVoiceSmall model (encoder, CTC decoder, emotion/event embeddings)
├── api.py            # FastAPI server for inference
├── webui.py          # Gradio web interface
├── demo1.py          # Inference example using FunASR AutoModel
├── demo2.py          # Direct model inference with timestamp support
├── export.py         # ONNX model export
├── export_meta.py    # Export utilities for model rebuilding
├── finetune.sh       # Fine-tuning script with DeepSpeed support
├── requirements.txt  # Python dependencies
├── Dockerfile        # Docker build configuration
├── data/             # Example training and validation data
├── utils/            # Utilities (frontend, ONNX inference, CTC alignment, export)
└── image/            # Documentation images

Understanding the Model

SenseVoice is a non-autoregressive encoder-only model that outputs:

Speech transcription (ASR) across 50+ languages
Emotion labels: HAPPY, SAD, ANGRY, NEUTRAL, FEARFUL, DISGUSTED, SURPRISED
Audio event labels: BGM, Speech, Applause, Laughter, Cry, Sneeze, Breath, Cough
Language identification for Mandarin, English, Cantonese, Japanese, and Korean

The model uses a SANM (Self-Attention with Normalized Memory) encoder architecture with CTC decoding. Emotion and event labels are predicted from the first 4 encoder output tokens, while the remaining tokens produce the transcription.

Docker

You can also run SenseVoice using Docker:

# Build
docker build -t sensevoice .

# Run with GPU
docker run --gpus all -p 50000:50000 sensevoice

# Run on CPU
docker run -e SENSEVOICE_DEVICE=cpu -p 50000:50000 sensevoice

Questions?

Open an issue using the Questions template.
Join the community via the DingTalk group (see README).

License

By contributing to SenseVoice, you agree that your contributions will be licensed under the same license as the project. See FunASR License for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to SenseVoice

Getting Started

Prerequisites

Setting Up the Development Environment

Running on CPU

How to Contribute

Reporting Bugs

Submitting Pull Requests

Types of Contributions We Welcome

Code Style

Project Structure

Understanding the Model

Docker

Questions?

License

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to SenseVoice

Getting Started

Prerequisites

Setting Up the Development Environment

Running on CPU

How to Contribute

Reporting Bugs

Submitting Pull Requests

Types of Contributions We Welcome

Code Style

Project Structure

Understanding the Model

Docker

Questions?

License