I'm some testing for ASR, using the new Cohere model that was added recently (#605) - thanks!
However, I seem to be hitting some kind of memory issues, but would love to confirm.
I installed mlx-audio using uv tools:
uv tool install --force git+https://github.com/Blaizzy/mlx-audio.git --prerelease=allow
I also setup the HF token, using uvx hf auth login (needed so that mlx-audio could download the cohere-transcribe model).
I ran mlx-audio like so:
❯ mlx_audio.stt.generate --model CohereLabs/cohere-transcribe-03-2026 --audio `~/Downloads/test.m4a` --output-path ~/Downloads/cohere/ --language en
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 9998.34it/s]
Download complete: : 0.00B [00:00, ?B/s] | 0/14 [00:00<?, ?it/s]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
Error: nu::shell::terminated_by_signal
× External command was terminated by a signal
╭─[repl_entry #11:1:1]
1 │ mlx_audio.stt.generate --model CohereLabs/cohere-transcribe-03-2026 --audio `~/Downloads/test.m4a` --output-path ~/Downloads/cohere/ --language en
· ───────────┬──────────
· ╰── terminated by SIGABRT (6)
╰────
/Users/victorhooi/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
dotfiles on main [!] took 44s
In this case, test.m4a is a 32MB audio file - here is the ffprobe output:
✦ ❯ ffprobe test.m4a
ffprobe version 8.0.1 Copyright (c) 2007-2025 the FFmpeg developers
built with clang version 21.1.8
configuration: --disable-static --prefix=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1 --target_os=darwin --arch=aarch64 --pkg-config=pkg-config --enable-gpl --enable-version3 --disable-nonfree --disable-static --enable-shared --enable-pic --disable-thumb --disable-small --enable-runtime-cpudetect --disable-gray --enable-swscale-alpha --enable-hardcoded-tables --enable-safe-bitstream-reader --enable-pthreads --disable-w32threads --disable-os2threads --enable-network --enable-pixelutils --datadir=/nix/store/59jc9i19x0pggr30r80caxv9hlfwfmmm-ffmpeg-8.0.1-data/share/ffmpeg --enable-ffmpeg --enable-ffplay --enable-ffprobe --bindir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-bin/bin --enable-avcodec --enable-avdevice --enable-avfilter --enable-avformat --enable-avutil --enable-swresample --enable-swscale --libdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-lib/lib --incdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-dev/include --enable-doc --enable-htmlpages --enable-manpages --mandir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-man/share/man --enable-podpages --enable-txtpages --docdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-doc/share/doc/ffmpeg --disable-alsa --disable-amf --enable-libaom --disable-libaribb24 --disable-libaribcaption --enable-libass --disable-avisynth --enable-libbluray --disable-libbs2b --enable-bzlib --disable-libcaca --disable-libcdio --disable-libcelt --disable-chromaprint --disable-libcodec2 --disable-cuda --enable-cuda-llvm --disable-cuda-nvcc --disable-cuvid --enable-libdav1d --disable-libdavs2 --disable-libdc1394 --disable-libdrm --disable-libdvdnav --disable-libdvdread --disable-libfdk-aac --disable-ffnvcodec --disable-libflite --enable-fontconfig --enable-libfontconfig --enable-libfreetype --disable-frei0r --enable-libfribidi --disable-libgme --enable-gmp --enable-gnutls --disable-libgsm --enable-libharfbuzz --enable-iconv --disable-libilbc --disable-libjack --disable-libjxl --disable-libkvazaar --disable-ladspa --disable-liblc3 --disable-liblcevc-dec --disable-lcms2 --enable-lzma --disable-metal --disable-libmfx --disable-libmodplug --enable-libmp3lame --disable-libmysofa --disable-libnpp --disable-nvdec --disable-nvenc --disable-openal --enable-liboapv --enable-opencl --disable-libopencore-amrnb --disable-libopencore-amrwb --disable-opengl --disable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --disable-libplacebo --disable-libpulse --disable-libqrencode --disable-libquirc --disable-librav1e --enable-librist --disable-librtmp --disable-librubberband --disable-libsmbclient --enable-sdl2 --disable-libshaderc --disable-libshine --disable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --disable-librsvg --enable-libsvtav1 --disable-libtensorflow --enable-libtheora --disable-libtwolame --disable-libuavs3d --disable-libv4l2 --disable-v4l2-m2m --disable-vaapi --disable-vdpau --disable-libvpl --enable-libvidstab --disable-libvmaf --disable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --disable-vulkan --disable-libvvenc --enable-libwebp --disable-whisper --enable-libx264 --enable-libx265 --disable-libxavs --disable-libxavs2 --disable-libxcb --disable-libxcb-shape --disable-libxcb-shm --disable-libxcb-xfixes --disable-libxevd --disable-libxeve --disable-xlib --enable-libxml2 --enable-libxvid --enable-libzimg --enable-zlib --disable-libzmq --enable-libzvbi --disable-debug --enable-optimizations --disable-extra-warnings --disable-stripping --cc=clang --cxx=clang++
libavutil 60. 8.100 / 60. 8.100
libavcodec 62. 11.100 / 62. 11.100
libavformat 62. 3.100 / 62. 3.100
libavdevice 62. 1.100 / 62. 1.100
libavfilter 11. 4.100 / 11. 4.100
libswscale 9. 1.100 / 9. 1.100
libswresample 6. 1.100 / 6. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.m4a':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2026-04-28T08:57:00.000000Z
com.android.version: 16
Duration: 01:24:41.10, start: 0.000000, bitrate: 50 kb/s
Stream #0:0[0x3](eng): Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 48 kb/s (default)
Metadata:
creation_time : 2026-04-28T08:57:00.000000Z
handler_name : Soun
vendor_id : [0][0][0][0]
Stream #0:1[0x5](eng): Data: none (mett / 0x7474656D), 0 kb/s, start 9.900000 (default)
Metadata:
creation_time : 2026-04-28T08:57:00.000000Z
handler_name : Meta
Stream #0:2[0x6](eng): Data: none (mett / 0x7474656D), 0 kb/s, start 10.156000 (default)
Metadata:
creation_time : 2026-04-28T08:57:00.000000Z
handler_name : Meta
Stream #0:3[0x7](eng): Data: none (mett / 0x7474656D), 0 kb/s (default)
Metadata:
creation_time : 2026-04-28T08:57:00.000000Z
handler_name : Meta
Unsupported codec with id 0 for input stream 1
Unsupported codec with id 0 for input stream 2
Unsupported codec with id 0 for input stream 3
I'm testing this on a Macbook M5 Max with 48GB RAM.
However, I'm guessing the .m4a file is heavily compressed - but mlx-audio needs to work on the uncompressed audio. Or are there other factors (audio duration?) that drive the memory usage for mlx-audio here? (The duration is 1 hr 24 minutes)
Does anybody know if the above error messages are actually due to memory? (I thought it might be due to ml-explore/mlx-examples#1366, which looks similar - but I could be off base here).
Or are there other ways I can make this work with the Cohere model?
(I did use ffmpeg to split the audio up into 10 minute chunks - and that seemed to work).
I'm some testing for ASR, using the new Cohere model that was added recently (#605) - thanks!
However, I seem to be hitting some kind of memory issues, but would love to confirm.
I installed mlx-audio using uv tools:
I also setup the HF token, using
uvx hf auth login(needed so that mlx-audio could download the cohere-transcribe model).I ran mlx-audio like so:
In this case, test.m4a is a 32MB audio file - here is the
ffprobeoutput:I'm testing this on a Macbook M5 Max with 48GB RAM.
However, I'm guessing the .m4a file is heavily compressed - but mlx-audio needs to work on the uncompressed audio. Or are there other factors (audio duration?) that drive the memory usage for mlx-audio here? (The duration is 1 hr 24 minutes)
Does anybody know if the above error messages are actually due to memory? (I thought it might be due to ml-explore/mlx-examples#1366, which looks similar - but I could be off base here).
Or are there other ways I can make this work with the Cohere model?
(I did use ffmpeg to split the audio up into 10 minute chunks - and that seemed to work).