Skip to content

Feature: FunASR as alternative ASR engine (faster + multilingual + runs on CPU) #185

@LauraGPT

Description

@LauraGPT

Context

RTranslator does real-time on-device translation, which requires fast speech recognition. FunASR (16.5K stars, MIT) provides models that are significantly faster than Whisper, especially on CPU — which matters for mobile devices.

Key advantages for mobile/on-device use

Whisper FunASR SenseVoice FunASR Paraformer
CPU speed Slow (~1x realtime) 17x realtime 15x realtime
Model size 1.5GB (large) 234MB 220MB
Languages 57 50+ zh/en
Architecture Autoregressive Non-autoregressive Non-autoregressive
Latency per 5s ~5s on CPU <300ms on CPU <400ms on CPU

The non-autoregressive architecture is the key difference — it processes the entire audio segment in a single forward pass, making it inherently faster and more suitable for real-time applications on resource-constrained devices.

Model options for mobile

  • SenseVoice-Small (234MB): 50+ languages, best for general multilingual use
  • Paraformer-zh (220MB): Best Chinese + English accuracy

Both models have ONNX exports available for mobile deployment:

  • ONNX Runtime Mobile compatible
  • funasr-onnx package for Python
  • C++ runtime available

Resources

Happy to help with integration details if interested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions