Skip to content

Testing new Cohere model - memory issues with 32MB .m4a file? #688

@victorhooi

Description

@victorhooi

I'm some testing for ASR, using the new Cohere model that was added recently (#605) - thanks!

However, I seem to be hitting some kind of memory issues, but would love to confirm.

I installed mlx-audio using uv tools:

uv tool install --force git+https://github.com/Blaizzy/mlx-audio.git --prerelease=allow

I also setup the HF token, using uvx hf auth login (needed so that mlx-audio could download the cohere-transcribe model).

I ran mlx-audio like so:

❯ mlx_audio.stt.generate --model CohereLabs/cohere-transcribe-03-2026 --audio `~/Downloads/test.m4a` --output-path ~/Downloads/cohere/ --language en
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 9998.34it/s]
Download complete: : 0.00B [00:00, ?B/s]                                                                                                                                                        | 0/14 [00:00<?, ?it/s]
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
Error: nu::shell::terminated_by_signal

  × External command was terminated by a signal
   ╭─[repl_entry #11:1:1]
 1 │ mlx_audio.stt.generate --model CohereLabs/cohere-transcribe-03-2026 --audio `~/Downloads/test.m4a` --output-path ~/Downloads/cohere/ --language en
   · ───────────┬──────────
   ·            ╰── terminated by SIGABRT (6)
   ╰────

/Users/victorhooi/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

dotfiles on  main [!] took 44s

In this case, test.m4a is a 32MB audio file - here is the ffprobe output:

✦ ❯ ffprobe test.m4a
ffprobe version 8.0.1 Copyright (c) 2007-2025 the FFmpeg developers
  built with clang version 21.1.8
  configuration: --disable-static --prefix=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1 --target_os=darwin --arch=aarch64 --pkg-config=pkg-config --enable-gpl --enable-version3 --disable-nonfree --disable-static --enable-shared --enable-pic --disable-thumb --disable-small --enable-runtime-cpudetect --disable-gray --enable-swscale-alpha --enable-hardcoded-tables --enable-safe-bitstream-reader --enable-pthreads --disable-w32threads --disable-os2threads --enable-network --enable-pixelutils --datadir=/nix/store/59jc9i19x0pggr30r80caxv9hlfwfmmm-ffmpeg-8.0.1-data/share/ffmpeg --enable-ffmpeg --enable-ffplay --enable-ffprobe --bindir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-bin/bin --enable-avcodec --enable-avdevice --enable-avfilter --enable-avformat --enable-avutil --enable-swresample --enable-swscale --libdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-lib/lib --incdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-dev/include --enable-doc --enable-htmlpages --enable-manpages --mandir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-man/share/man --enable-podpages --enable-txtpages --docdir=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-ffmpeg-8.0.1-doc/share/doc/ffmpeg --disable-alsa --disable-amf --enable-libaom --disable-libaribb24 --disable-libaribcaption --enable-libass --disable-avisynth --enable-libbluray --disable-libbs2b --enable-bzlib --disable-libcaca --disable-libcdio --disable-libcelt --disable-chromaprint --disable-libcodec2 --disable-cuda --enable-cuda-llvm --disable-cuda-nvcc --disable-cuvid --enable-libdav1d --disable-libdavs2 --disable-libdc1394 --disable-libdrm --disable-libdvdnav --disable-libdvdread --disable-libfdk-aac --disable-ffnvcodec --disable-libflite --enable-fontconfig --enable-libfontconfig --enable-libfreetype --disable-frei0r --enable-libfribidi --disable-libgme --enable-gmp --enable-gnutls --disable-libgsm --enable-libharfbuzz --enable-iconv --disable-libilbc --disable-libjack --disable-libjxl --disable-libkvazaar --disable-ladspa --disable-liblc3 --disable-liblcevc-dec --disable-lcms2 --enable-lzma --disable-metal --disable-libmfx --disable-libmodplug --enable-libmp3lame --disable-libmysofa --disable-libnpp --disable-nvdec --disable-nvenc --disable-openal --enable-liboapv --enable-opencl --disable-libopencore-amrnb --disable-libopencore-amrwb --disable-opengl --disable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-libopus --disable-libplacebo --disable-libpulse --disable-libqrencode --disable-libquirc --disable-librav1e --enable-librist --disable-librtmp --disable-librubberband --disable-libsmbclient --enable-sdl2 --disable-libshaderc --disable-libshine --disable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --disable-librsvg --enable-libsvtav1 --disable-libtensorflow --enable-libtheora --disable-libtwolame --disable-libuavs3d --disable-libv4l2 --disable-v4l2-m2m --disable-vaapi --disable-vdpau --disable-libvpl --enable-libvidstab --disable-libvmaf --disable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --disable-vulkan --disable-libvvenc --enable-libwebp --disable-whisper --enable-libx264 --enable-libx265 --disable-libxavs --disable-libxavs2 --disable-libxcb --disable-libxcb-shape --disable-libxcb-shm --disable-libxcb-xfixes --disable-libxevd --disable-libxeve --disable-xlib --enable-libxml2 --enable-libxvid --enable-libzimg --enable-zlib --disable-libzmq --enable-libzvbi --disable-debug --enable-optimizations --disable-extra-warnings --disable-stripping --cc=clang --cxx=clang++
  libavutil      60.  8.100 / 60.  8.100
  libavcodec     62. 11.100 / 62. 11.100
  libavformat    62.  3.100 / 62.  3.100
  libavdevice    62.  1.100 / 62.  1.100
  libavfilter    11.  4.100 / 11.  4.100
  libswscale      9.  1.100 /  9.  1.100
  libswresample   6.  1.100 /  6.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.m4a':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2026-04-28T08:57:00.000000Z
    com.android.version: 16
  Duration: 01:24:41.10, start: 0.000000, bitrate: 50 kb/s
  Stream #0:0[0x3](eng): Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 48 kb/s (default)
    Metadata:
      creation_time   : 2026-04-28T08:57:00.000000Z
      handler_name    : Soun
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x5](eng): Data: none (mett / 0x7474656D), 0 kb/s, start 9.900000 (default)
    Metadata:
      creation_time   : 2026-04-28T08:57:00.000000Z
      handler_name    : Meta
  Stream #0:2[0x6](eng): Data: none (mett / 0x7474656D), 0 kb/s, start 10.156000 (default)
    Metadata:
      creation_time   : 2026-04-28T08:57:00.000000Z
      handler_name    : Meta
  Stream #0:3[0x7](eng): Data: none (mett / 0x7474656D), 0 kb/s (default)
    Metadata:
      creation_time   : 2026-04-28T08:57:00.000000Z
      handler_name    : Meta
Unsupported codec with id 0 for input stream 1
Unsupported codec with id 0 for input stream 2
Unsupported codec with id 0 for input stream 3

I'm testing this on a Macbook M5 Max with 48GB RAM.

However, I'm guessing the .m4a file is heavily compressed - but mlx-audio needs to work on the uncompressed audio. Or are there other factors (audio duration?) that drive the memory usage for mlx-audio here? (The duration is 1 hr 24 minutes)

Does anybody know if the above error messages are actually due to memory? (I thought it might be due to ml-explore/mlx-examples#1366, which looks similar - but I could be off base here).

Or are there other ways I can make this work with the Cohere model?

(I did use ffmpeg to split the audio up into 10 minute chunks - and that seemed to work).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions