feat: add WebM audio format support#635
Merged
Merged
Conversation
Enable processing of .webm audio files/blobs (commonly produced by browser MediaRecorder APIs) by routing them through the existing ffmpeg decode/encode path. No new dependencies required.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable processing of .webm audio files/blobs (commonly produced by browser MediaRecorder APIs) by routing them through the existing ffmpeg decode/encode path. No new dependencies required.
Context
This is a commit to add my requested feature (#631).
.webmis a common format and used by other open-source tools such as OpenWhispr to send audio files/blobs to OpenAI compatible endpoints.Description
WebM is a Matroska-based container format that typically holds Opus or Vorbis audio streams. It is the default output format of the browser
MediaRecorderAPI, meaning any web frontend that records audio will produce.webmblobs. Currently,mlx-audiorejects these files becauseaudio_io.pydoes not recognize the format.This PR adds WebM support by routing
.webmfiles and byte streams through the existing ffmpeg decode/encode path — the same mechanism already used for M4A, AAC, OGG, and Opus. ffmpeg natively supports WebM/Matroska demuxing and Opus/Vorbis decoding, so no new dependencies or external services are required.The key technical details:
.webmfile extensions and EBML magic bytes (\x1a\x45\xdf\xa3) are detected and routed to_decode_ffmpeg(), which uses ffmpeg to extract raw PCM audio.libopuscodec inside a WebM container via_encode_ffmpeg()./v1/audio/transcriptionsendpoint) without any intermediate conversion.Changes in the codebase
All changes are in two files:
mlx_audio/audio_io.py"webm": "webm"to_FORMAT_MAP.\x1a\x45\xdf\xa3) in_detect_format_from_bytes()to identify WebM from raw bytes."webm"to the ffmpeg extension routing set inread()for file path inputs.read()BytesIO branch so in-memory WebM blobs are correctly routed to ffmpeg."webm"case in_encode_ffmpeg()using thelibopuscodec."webm"to the ffmpeg format set inwrite().mlx_audio/tests/test_audio_io.pytest_write_read_webm— mono file write/read round-trip.test_write_read_webm_stereo— stereo file write/read round-trip.test_write_bytesio_webm— BytesIO write/read round-trip (simulates a browser blob upload).Changes outside the codebase
None. ffmpeg is already a de facto requirement for the project (used by M4A/AAC/OGG/Opus support). No new Python packages, external services, or infrastructure changes are needed.
Additional information
libopus) since Opus is the standard and most widely used audio codec in WebM containers. This matches what browsers produce by default.audio_io.py(e.g.,voxtral_realtimeusingsoundfiledirectly,chatterboxusinglibrosadirectly) are not modified. These models typically receive pre-decoded arrays from higher-level pipeline functions that already callaudio_io.read(), so they benefit from this change indirectly.Checklist