Unable to use diarization model (restricted access)

I've followed the website to install Scriberr on my M3 MacBook Pro, installed using Homebrew. While attempting to transcribe a sample file, I encounter the following errors seen in the log below.

```
Add audio
Open menu
Toggle theme
sample-speech-5m
APR 28, 2026

Chat

Unable to load audio stream.
Transcription Logs
System output and processing events.

/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:47: UserWarning: 
torchcodec is not installed correctly so built-in audio decoding will fail. Solutions are:
* use audio preloaded in-memory as a {'waveform': (channel, time) torch.Tensor, 'sample_rate': int} dictionary;
* fix torchcodec installation. Error message was:

Could not load libtorchcodec. Likely causes:
          1. FFmpeg is not properly installed in your environment. We support
             versions 4, 5, 6 and 7.
          2. The PyTorch version (2.8.0) is not compatible with
             this version of TorchCodec. Refer to the version compatibility
             table:
             https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec.
          3. Another runtime dependency; see exceptions below.
        The following exceptions were raised as we tried to load libtorchcodec:
        
[start of libtorchcodec loading traceback]
FFmpeg version 7: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib, 0x0006): Library not loaded: @rpath/libavutil.59.dylib
  Referenced from: <AF25BC1A-0047-3DCC-94F1-9C5BC79D8CCF> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.59.dylib' (no such file)
FFmpeg version 6: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <31E243C2-D449-342E-8CD7-60F6DC3B2C77> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.58.dylib' (no such file)
FFmpeg version 5: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <FC60EC29-499A-3249-BB93-34655B4585B6> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.57.dylib' (no such file)
FFmpeg version 4: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <3F006FCE-1FF7-3B9B-9FB3-997372F03F08> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.56.dylib' (no such file)
[end of libtorchcodec loading traceback].
  warnings.warn(
Traceback (most recent call last):
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 102, in <module>
    cli()
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 98, in cli
    transcribe_task(args, parser)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/transcribe.py", line 127, in transcribe_task
    model = load_model(
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/asr.py", line 357, in load_model
    model = model or WhisperModel(whisper_arch,
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 663, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation.
/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:47: UserWarning: 
torchcodec is not installed correctly so built-in audio decoding will fail. Solutions are:
* use audio preloaded in-memory as a {'waveform': (channel, time) torch.Tensor, 'sample_rate': int} dictionary;
* fix torchcodec installation. Error message was:

Could not load libtorchcodec. Likely causes:
          1. FFmpeg is not properly installed in your environment. We support
             versions 4, 5, 6 and 7.
          2. The PyTorch version (2.8.0) is not compatible with
             this version of TorchCodec. Refer to the version compatibility
             table:
             https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec.
          3. Another runtime dependency; see exceptions below.
        The following exceptions were raised as we tried to load libtorchcodec:
        
[start of libtorchcodec loading traceback]
FFmpeg version 7: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib, 0x0006): Library not loaded: @rpath/libavutil.59.dylib
  Referenced from: <AF25BC1A-0047-3DCC-94F1-9C5BC79D8CCF> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.59.dylib' (no such file)
FFmpeg version 6: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <31E243C2-D449-342E-8CD7-60F6DC3B2C77> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.58.dylib' (no such file)
FFmpeg version 5: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <FC60EC29-499A-3249-BB93-34655B4585B6> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.57.dylib' (no such file)
FFmpeg version 4: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <3F006FCE-1FF7-3B9B-9FB3-997372F03F08> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.56.dylib' (no such file)
[end of libtorchcodec loading traceback].
  warnings.warn(
Traceback (most recent call last):
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 102, in <module>
    cli()
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 98, in cli
    transcribe_task(args, parser)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/transcribe.py", line 127, in transcribe_task
    model = load_model(
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/asr.py", line 357, in load_model
    model = model or WhisperModel(whisper_arch,
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/faster_whisper/transcribe.py", line 663, in __init__
    self.model = ctranslate2.models.Whisper(
ValueError: This CTranslate2 package was not compiled with CUDA support
/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/pyannote/audio/core/io.py:47: UserWarning: 
torchcodec is not installed correctly so built-in audio decoding will fail. Solutions are:
* use audio preloaded in-memory as a {'waveform': (channel, time) torch.Tensor, 'sample_rate': int} dictionary;
* fix torchcodec installation. Error message was:

Could not load libtorchcodec. Likely causes:
          1. FFmpeg is not properly installed in your environment. We support
             versions 4, 5, 6 and 7.
          2. The PyTorch version (2.8.0) is not compatible with
             this version of TorchCodec. Refer to the version compatibility
             table:
             https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec.
          3. Another runtime dependency; see exceptions below.
        The following exceptions were raised as we tried to load libtorchcodec:
        
[start of libtorchcodec loading traceback]
FFmpeg version 7: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib, 0x0006): Library not loaded: @rpath/libavutil.59.dylib
  Referenced from: <AF25BC1A-0047-3DCC-94F1-9C5BC79D8CCF> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core7.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.59.dylib' (no such file)
FFmpeg version 6: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <31E243C2-D449-342E-8CD7-60F6DC3B2C77> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core6.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.58.dylib' (no such file)
FFmpeg version 5: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <FC60EC29-499A-3249-BB93-34655B4585B6> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core5.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.57.dylib' (no such file)
FFmpeg version 4: dlopen(/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <3F006FCE-1FF7-3B9B-9FB3-997372F03F08> /Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/torchcodec/libtorchcodec_core4.dylib
  Reason: tried: '/Users/kadar/.local/share/uv/python/cpython-3.10.20-macos-aarch64-none/lib/libavutil.56.dylib' (no such file)
[end of libtorchcodec loading traceback].
  warnings.warn(
2026-04-28 08:52:16 - whisperx.vads.pyannote - INFO - Performing voice activity detection using Pyannote...
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.5.5. To apply the upgrade to your files permanently, run `python -m lightning.pytorch.utilities.upgrade_checkpoint data/whisperx-env/WhisperX/whisperx/assets/pytorch_model.bin`
2026-04-28 08:52:16 - whisperx.transcribe - INFO - Performing transcription...
Transcript: [0.031 --> 24.196]  Welcome to Samplilid.com, a free online resource for downloading sample files in a wide variety of digital formats. Whether you are a software developer testing file upload functionality, a quality assurance engineer validating media players, a student learning about digital formats, or simply someone who needs a quick test file, Samplilid provides ready-to-use files that you can download instantly.
Transcript: [24.196 --> 48.698]  completely free of charge. In this recording, we will walk you through every aspect of the SampleLib platform, exploring the formats we offer, the technical details behind each one, and the many ways these test files can be used in your projects and workflows. Let us begin with image formats. Images are perhaps the most fundamental type of digital media, and SampleLib offers test files in five major image formats.
Transcript: [48.698 --> 68.948]  JPEG, PNG, SVG, GIF, and Web. Each format has its own strengths, trade-offs, and ideal use cases, and understanding these differences is essential for anyone working with digital images. JPEG, which stands for Joint Photographic Experts Group, is the most widely used image format on the Web.
Transcript: [68.948 --> 86.06]  It uses lossy compression, meaning that some image data is discarded during encoding to achieve smaller file sizes. This makes JPEG ideal for photographs and images with complex color gradients, where minor quality loss is acceptable in exchange for significantly reduced file sizes.
Transcript: [86.06 --> 110.731]  Our JPEG test files come in various resolutions, from small 100x75 pixel thumbnails to larger 400x300 pixel images. We offer both photographic samples, such as landscapes and nature scenes, and solid color fills in red, green, and blue. These solid color fills are particularly useful for testing color rendering accuracy across different displays and software.
Transcript: [110.731 --> 126.627]  The JPEG format supports up to 24 bits of color depth, which translates to approximately 16.7 million possible colors. However, JPEG does not support transparency or animation, which is why other formats like PNG and GIF exist.
Transcript: [126.627 --> 141.781]  PNG, or Portable Network Graphics, is another essential image format that Samplerlib supports extensively. Unlike JPEG, PNG uses lossless compression, which means every single pixel is preserved exactly as it was in the original image.
Transcript: [141.781 --> 158.42]  This makes PNG the preferred choice for images that require pixel-perfect accuracy, such as screenshots, technical diagrams, logos, and user interface elements. One of the most important features of PNG is its support for transparency through the alpha channel.
Transcript: [158.42 --> 188.373]  The alpha channel adds an additional 8 bits of information per pixel, allowing each pixel to have a transparency value ranging from fully opaque to fully transparent, with 256 possible levels in between. Our PNG collection includes several transparency demonstrations. A semi-transparent red fill with an alpha value of 128. A horizontal transparency gradient that fades from fully opaque to fully transparent. A radial transparency gradient that creates a vignette effect.
Transcript: [188.373 --> 206.733]  an opaque circle on a fully transparent background demonstrating binary alpha, and a checkerboard transparency pattern. These files are invaluable for testing how your software handles alpha channel data, SVG, or scalable vector graphics. Represents a fundamentally different approach to digital images.
Transcript: [206.733 --> 228.856]  while JPEG and PNG are raster formats that store images as grids of pixels. SVG is a vector format that describes images using mathematical shapes, paths, and transformations. This means SVG images can be scaled to any size without losing quality, making them perfect for logos, icons, and illustrations that need to look sharp at any resolution.
Transcript: [228.856 --> 254.742]  SVG files are actually XML documents, which means they can be edited with a text editor and manipulated with CSS and JavaScript. Our SVG test files include simple geometric shapes, more complex illustrations, and examples of SVG features like gradients and text rendering. Because SVG files are text-based, they are typically much smaller than equivalent raster images, especially for simple graphics.
Transcript: [254.742 --> 270.824]  GIF, the graphics interchange format, is one of the oldest image formats still in common use today. Originally developed in 1987, GIF supports a maximum of 256 colors per frame and uses lossless compression within that limited color palette.
Transcript: [270.824 --> 300.035]  What makes GIF unique and enduringly popular is its support for animation. A single GIF file can contain multiple frames displayed in sequence, creating simple animations without requiring any video codec or player. Our GIF test files include static images in various color depths, as well as animated examples with different frame counts and timing. While GIF has largely been superseded by more efficient formats for both static images and animations, it remains important for compatibility testing.
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth" to /Users/kadar/.cache/torch/hub/checkpoints/wav2vec2_fairseq_base_ls960_asr_ls960.pth

  0%|          | 0.00/360M [00:00<?, ?B/s]
  0%|          | 128k/360M [00:00<11:04, 568kB/s]
  0%|          | 384k/360M [00:00<05:24, 1.16MB/s]
  0%|          | 768k/360M [00:00<03:06, 2.02MB/s]
  0%|          | 1.50M/360M [00:00<01:41, 3.70MB/s]
  1%|          | 2.88M/360M [00:00<00:54, 6.83MB/s]
  1%|▏         | 5.38M/360M [00:00<00:29, 12.5MB/s]
  2%|▏         | 8.75M/360M [00:00<00:19, 19.3MB/s]
  3%|▎         | 11.5M/360M [00:00<00:16, 21.9MB/s]
  4%|▍         | 15.4M/360M [00:01<00:13, 26.9MB/s]
  5%|▌         | 18.1M/360M [00:01<00:13, 26.1MB/s]
  6%|▌         | 21.4M/360M [00:01<00:12, 28.1MB/s]
  7%|▋         | 24.8M/360M [00:01<00:11, 29.8MB/s]
  8%|▊         | 27.8M/360M [00:01<00:12, 27.5MB/s]
  9%|▊         | 30.9M/360M [00:01<00:12, 28.1MB/s]
 10%|▉         | 34.2M/360M [00:01<00:11, 29.8MB/s]
 10%|█         | 37.2M/360M [00:01<00:11, 28.4MB/s]
 11%|█         | 40.2M/360M [00:02<00:11, 29.0MB/s]
 12%|█▏        | 43.5M/360M [00:02<00:10, 30.3MB/s]
 13%|█▎        | 46.6M/360M [00:02<00:10, 30.8MB/s]
 14%|█▍        | 49.6M/360M [00:02<00:10, 30.0MB/s]
 15%|█▍        | 53.2M/360M [00:02<00:10, 32.2MB/s]
 16%|█▌        | 56.4M/360M [00:02<00:10, 30.9MB/s]
 17%|█▋        | 59.8M/360M [00:02<00:10, 31.4MB/s]
 18%|█▊        | 63.2M/360M [00:02<00:09, 32.7MB/s]
 18%|█▊        | 66.5M/360M [00:02<00:10, 29.2MB/s]
 19%|█▉        | 69.5M/360M [00:03<00:10, 29.0MB/s]
 20%|██        | 73.4M/360M [00:03<00:09, 30.1MB/s]
 21%|██        | 76.5M/360M [00:03<00:09, 30.4MB/s]
 22%|██▏       | 80.0M/360M [00:03<00:09, 32.1MB/s]
 23%|██▎       | 83.1M/360M [00:03<00:09, 31.7MB/s]
 24%|██▍       | 86.5M/360M [00:03<00:08, 32.7MB/s]
 25%|██▍       | 89.8M/360M [00:03<00:10, 27.5MB/s]
 26%|██▌       | 93.2M/360M [00:03<00:09, 29.2MB/s]
 27%|██▋       | 96.5M/360M [00:03<00:09, 30.5MB/s]
 28%|██▊       | 99.6M/360M [00:04<00:09, 29.8MB/s]
 28%|██▊       | 103M/360M [00:04<00:09, 27.6MB/s] 
 30%|██▉       | 106M/360M [00:04<00:08, 30.1MB/s]
 30%|███       | 109M/360M [00:04<00:08, 29.3MB/s]
 31%|███       | 112M/360M [00:04<00:08, 29.0MB/s]
 32%|███▏      | 116M/360M [00:04<00:08, 31.0MB/s]
 33%|███▎      | 119M/360M [00:04<00:08, 29.9MB/s]
 34%|███▍      | 122M/360M [00:04<00:08, 29.3MB/s]
 35%|███▍      | 125M/360M [00:04<00:08, 30.8MB/s]
 36%|███▌      | 128M/360M [00:05<00:07, 30.8MB/s]
 36%|███▋      | 131M/360M [00:05<00:08, 28.1MB/s]
 37%|███▋      | 134M/360M [00:05<00:10, 22.6MB/s]
 38%|███▊      | 138M/360M [00:05<00:09, 25.7MB/s]
 39%|███▉      | 141M/360M [00:05<00:08, 27.4MB/s]
 40%|████      | 144M/360M [00:05<00:08, 27.7MB/s]
 41%|████      | 147M/360M [00:05<00:08, 27.7MB/s]
 42%|████▏     | 150M/360M [00:05<00:07, 27.7MB/s]
 43%|████▎     | 153M/360M [00:06<00:07, 29.0MB/s]
 43%|████▎     | 157M/360M [00:06<00:06, 30.6MB/s]
 44%|████▍     | 160M/360M [00:06<00:06, 32.0MB/s]
 45%|████▌     | 163M/360M [00:06<00:06, 31.9MB/s]
 46%|████▌     | 166M/360M [00:06<00:07, 25.9MB/s]
 47%|████▋     | 169M/360M [00:06<00:08, 24.0MB/s]
 48%|████▊     | 172M/360M [00:06<00:09, 21.8MB/s]
 48%|████▊     | 174M/360M [00:06<00:09, 21.2MB/s]
 49%|████▉     | 176M/360M [00:07<00:08, 22.7MB/s]
 50%|████▉     | 180M/360M [00:07<00:07, 25.9MB/s]
 51%|█████     | 183M/360M [00:07<00:06, 27.3MB/s]
 52%|█████▏    | 186M/360M [00:07<00:06, 28.2MB/s]
 52%|█████▏    | 189M/360M [00:07<00:06, 29.0MB/s]
 53%|█████▎    | 192M/360M [00:07<00:06, 29.1MB/s]
 54%|█████▍    | 195M/360M [00:07<00:06, 28.0MB/s]
 55%|█████▍    | 198M/360M [00:07<00:05, 30.0MB/s]
 56%|█████▌    | 201M/360M [00:07<00:06, 25.5MB/s]
 57%|█████▋    | 204M/360M [00:08<00:06, 26.6MB/s]
 58%|█████▊    | 207M/360M [00:08<00:05, 28.5MB/s]
 58%|█████▊    | 210M/360M [00:08<00:05, 27.5MB/s]
 59%|█████▉    | 213M/360M [00:08<00:05, 27.4MB/s]
 60%|█████▉    | 216M/360M [00:08<00:07, 21.0MB/s]
 60%|██████    | 218M/360M [00:08<00:09, 15.2MB/s]
 61%|██████    | 220M/360M [00:08<00:08, 17.1MB/s]
 62%|██████▏   | 223M/360M [00:09<00:07, 19.8MB/s]
 63%|██████▎   | 225M/360M [00:09<00:06, 20.3MB/s]
 63%|██████▎   | 228M/360M [00:09<00:06, 22.3MB/s]
 64%|██████▍   | 232M/360M [00:09<00:05, 25.3MB/s]
 65%|██████▌   | 234M/360M [00:09<00:06, 19.8MB/s]
 66%|██████▌   | 237M/360M [00:09<00:06, 19.8MB/s]
 66%|██████▋   | 239M/360M [00:09<00:06, 20.3MB/s]
 67%|██████▋   | 241M/360M [00:09<00:06, 20.6MB/s]
 68%|██████▊   | 244M/360M [00:10<00:05, 21.8MB/s]
 69%|██████▊   | 247M/360M [00:10<00:04, 24.7MB/s]
 69%|██████▉   | 249M/360M [00:10<00:04, 24.7MB/s]
 70%|███████   | 252M/360M [00:10<00:04, 24.9MB/s]
 71%|███████   | 255M/360M [00:10<00:08, 13.6MB/s]
 71%|███████   | 256M/360M [00:10<00:07, 14.6MB/s]
 72%|███████▏  | 260M/360M [00:11<00:05, 18.7MB/s]
 73%|███████▎  | 262M/360M [00:11<00:05, 19.5MB/s]
 73%|███████▎  | 264M/360M [00:11<00:05, 19.3MB/s]
 74%|███████▍  | 267M/360M [00:11<00:04, 20.3MB/s]
 75%|███████▍  | 270M/360M [00:11<00:04, 22.4MB/s]
 76%|███████▌  | 272M/360M [00:11<00:03, 24.0MB/s]
 76%|███████▋  | 275M/360M [00:11<00:03, 25.3MB/s]
 77%|███████▋  | 278M/360M [00:11<00:03, 26.4MB/s]
 78%|███████▊  | 281M/360M [00:11<00:03, 27.1MB/s]
 79%|███████▊  | 284M/360M [00:11<00:02, 28.1MB/s]
 80%|███████▉  | 286M/360M [00:12<00:02, 27.7MB/s]
 80%|████████  | 290M/360M [00:12<00:02, 29.3MB/s]
 81%|████████  | 292M/360M [00:12<00:02, 28.5MB/s]
 82%|████████▏ | 295M/360M [00:12<00:02, 28.8MB/s]
 83%|████████▎ | 298M/360M [00:12<00:02, 29.0MB/s]
 84%|████████▎ | 301M/360M [00:12<00:02, 29.7MB/s]
 85%|████████▍ | 304M/360M [00:12<00:01, 29.9MB/s]
 85%|████████▌ | 307M/360M [00:12<00:01, 30.3MB/s]
 86%|████████▌ | 310M/360M [00:12<00:01, 30.2MB/s]
 87%|████████▋ | 314M/360M [00:12<00:01, 32.4MB/s]
 88%|████████▊ | 317M/360M [00:13<00:01, 30.1MB/s]
 89%|████████▉ | 321M/360M [00:13<00:01, 32.1MB/s]
 90%|████████▉ | 324M/360M [00:13<00:01, 26.8MB/s]
 91%|█████████ | 327M/360M [00:13<00:01, 26.2MB/s]
 92%|█████████▏| 330M/360M [00:13<00:01, 28.5MB/s]
 92%|█████████▏| 333M/360M [00:13<00:01, 27.0MB/s]
 93%|█████████▎| 336M/360M [00:13<00:00, 26.2MB/s]
 94%|█████████▍| 339M/360M [00:13<00:00, 26.3MB/s]
 95%|█████████▍| 342M/360M [00:14<00:00, 27.6MB/s]
 96%|█████████▌| 345M/360M [00:14<00:00, 27.1MB/s]
 97%|█████████▋| 348M/360M [00:14<00:00, 28.9MB/s]
 97%|█████████▋| 351M/360M [00:14<00:00, 29.2MB/s]
 98%|█████████▊| 354M/360M [00:14<00:00, 28.9MB/s]
 99%|█████████▉| 357M/360M [00:14<00:00, 30.7MB/s]
100%|█████████▉| 360M/360M [00:14<00:00, 24.5MB/s]
100%|██████████| 360M/360M [00:14<00:00, 25.5MB/s]
2026-04-28 08:53:29 - whisperx.transcribe - INFO - Performing alignment...
2026-04-28 08:53:37 - whisperx.transcribe - WARNING - No --hf_token provided, needs to be saved in environment variable, otherwise will throw error loading diarization model
2026-04-28 08:53:37 - whisperx.transcribe - INFO - Performing diarization...
2026-04-28 08:53:37 - whisperx.transcribe - INFO - Using model: pyannote/speaker-diarization-3.1
2026-04-28 08:53:37 - whisperx.diarize - INFO - Loading diarization model: pyannote/speaker-diarization-3.1

Could not download Pipeline from pyannote/speaker-diarization-3.1.
It might be because the repository is private or gated:

* visit https://hf.co/pyannote/speaker-diarization-3.1 to accept user conditions
* visit https://hf.co/settings/tokens to create an authentication token
* load the Pipeline with the `token` argument:
    >>> Pipeline.from_pretrained('pyannote/speaker-diarization-3.1', token='hf_....')

Traceback (most recent call last):
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 407, in hf_raise_for_status
    response.raise_for_status()
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/requests/models.py", line 1026, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/config.yaml

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/kadar/.local/share/uv/python/cpython-3.10-macos-aarch64-none/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 102, in <module>
    cli()
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/__main__.py", line 98, in cli
    transcribe_task(args, parser)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/transcribe.py", line 218, in transcribe_task
    diarize_model = DiarizationPipeline(model_name=diarize_model_name, token=hf_token, device=device, cache_dir=model_dir)
  File "/Users/kadar/data/whisperx-env/WhisperX/whisperx/diarize.py", line 103, in __init__
    self.model = Pipeline.from_pretrained(model_config, token=token, cache_dir=cache_dir).to(device)
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 198, in from_pretrained
    config_yml = download_from_hf_hub(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/pyannote/audio/utils/hf_hub.py", line 80, in download_from_hf_hub
    return hf_hub_download(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1010, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1117, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1658, in _raise_on_head_call_error
    raise head_call_error
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1546, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1463, in get_hf_file_metadata
    r = _request_wrapper(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 286, in _request_wrapper
    response = _request_wrapper(
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 310, in _request_wrapper
    hf_raise_for_status(response)
  File "/Users/kadar/data/whisperx-env/WhisperX/.venv/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 424, in hf_raise_for_status
    raise _format(GatedRepoError, message, response) from e
huggingface_hub.errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-69f0add1-0c80fb41200bed2311cede03;305872e1-2677-4c4d-8300-46263083212a)

Cannot access gated repo for url https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/config.yaml.
Access to model pyannote/speaker-diarization-3.1 is restricted. You must have access to it and be authenticated to access it. Please log in.
Close
```

> Cannot access gated repo for url https://huggingface.co/pyannote/speaker-diarization-3.1/resolve/main/config.yaml.
Access to model pyannote/speaker-diarization-3.1 is restricted. You must have access to it and be authenticated to access it. Please log in.
Close

Related to #424 , I believe this is because it's transcribing and then immediately undergoing diarization, so if one process fails, so does the other. In this case, transcription was successful, but diarization was not.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to use diarization model (restricted access) #448

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Unable to use diarization model (restricted access) #448

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions