Skip to content

feat: add WebM audio format support#635

Merged
Blaizzy merged 1 commit into
Blaizzy:mainfrom
regcs:feature/webm_support
Apr 7, 2026
Merged

feat: add WebM audio format support#635
Blaizzy merged 1 commit into
Blaizzy:mainfrom
regcs:feature/webm_support

Conversation

@regcs
Copy link
Copy Markdown
Contributor

@regcs regcs commented Apr 5, 2026

Enable processing of .webm audio files/blobs (commonly produced by browser MediaRecorder APIs) by routing them through the existing ffmpeg decode/encode path. No new dependencies required.

Context

This is a commit to add my requested feature (#631). .webm is a common format and used by other open-source tools such as OpenWhispr to send audio files/blobs to OpenAI compatible endpoints.

Description

WebM is a Matroska-based container format that typically holds Opus or Vorbis audio streams. It is the default output format of the browser MediaRecorder API, meaning any web frontend that records audio will produce .webm blobs. Currently, mlx-audio rejects these files because audio_io.py does not recognize the format.

This PR adds WebM support by routing .webm files and byte streams through the existing ffmpeg decode/encode path — the same mechanism already used for M4A, AAC, OGG, and Opus. ffmpeg natively supports WebM/Matroska demuxing and Opus/Vorbis decoding, so no new dependencies or external services are required.

The key technical details:

  • Reading (decoding): .webm file extensions and EBML magic bytes (\x1a\x45\xdf\xa3) are detected and routed to _decode_ffmpeg(), which uses ffmpeg to extract raw PCM audio.
  • Writing (encoding): WebM output uses the libopus codec inside a WebM container via _encode_ffmpeg().
  • BytesIO path: The EBML header is detected from the first 4 bytes of in-memory byte streams, enabling direct processing of browser-uploaded blobs (e.g., via the /v1/audio/transcriptions endpoint) without any intermediate conversion.

Changes in the codebase

All changes are in two files:

mlx_audio/audio_io.py

  • Added "webm": "webm" to _FORMAT_MAP.
  • Added EBML magic byte detection (\x1a\x45\xdf\xa3) in _detect_format_from_bytes() to identify WebM from raw bytes.
  • Added "webm" to the ffmpeg extension routing set in read() for file path inputs.
  • Added EBML header check in the read() BytesIO branch so in-memory WebM blobs are correctly routed to ffmpeg.
  • Added a "webm" case in _encode_ffmpeg() using the libopus codec.
  • Added "webm" to the ffmpeg format set in write().
  • Updated docstrings and error messages throughout to mention WebM.

mlx_audio/tests/test_audio_io.py

  • test_write_read_webm — mono file write/read round-trip.
  • test_write_read_webm_stereo — stereo file write/read round-trip.
  • test_write_bytesio_webm — BytesIO write/read round-trip (simulates a browser blob upload).

Changes outside the codebase

None. ffmpeg is already a de facto requirement for the project (used by M4A/AAC/OGG/Opus support). No new Python packages, external services, or infrastructure changes are needed.

Additional information

  • Performance: There is no additional overhead compared to the existing OGG/Opus path — both use the same ffmpeg subprocess pipeline.
  • Design choice: WebM encoding defaults to Opus (libopus) since Opus is the standard and most widely used audio codec in WebM containers. This matches what browsers produce by default.
  • Scope: Model-specific audio loaders that bypass audio_io.py (e.g., voxtral_realtime using soundfile directly, chatterbox using librosa directly) are not modified. These models typically receive pre-decoded arrays from higher-level pipeline functions that already call audio_io.read(), so they benefit from this change indirectly.

Checklist

Enable processing of .webm audio files/blobs (commonly produced by
browser MediaRecorder APIs) by routing them through the existing ffmpeg
decode/encode path. No new dependencies required.
Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Blaizzy Blaizzy merged commit d78c851 into Blaizzy:main Apr 7, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Support .webm format

2 participants