Skip to content

[whisper] KeyError: 'words' on transcription (with --task translate) #1418

@uogbuji

Description

@uogbuji

I ran into the error below trying to create .srt subtitles for an MP4.

❯ mlx_whisper '20260416_170014.mp4'  --task translate  --model mlx-community/whisper-large-v3-mlx --output-format srt --verbose False --condition-on-previous-text False
…
Traceback (most recent call last):
  File "lib/python3.13/site-packages/mlx_whisper/cli.py", line 249, in main
    writer(result, output_name, **writer_args)
    ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 51, in __call__
    self.write_result(result, file=f, options=options, **kwargs)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 209, in write_result
    for i, (start, end, text) in enumerate(
                                 ~~~~~~~~~^
        self.iterate_result(result, options, **kwargs), start=1
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 147, in iterate_result
    for subtitle in iterate_subtitles():
                    ~~~~~~~~~~~~~~~~~^^
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 97, in iterate_subtitles
    last: float = get_start(result["segments"]) or 0.0
                  ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 31, in get_start
    return next(
        (w["start"] for s in segments for w in s["words"]),
        segments[0]["start"] if segments else None,
    )
  File "lib/python3.13/site-packages/mlx_whisper/writers.py", line 32, in <genexpr>
    (w["start"] for s in segments for w in s["words"]),
                                           ~^^^^^^^^^
KeyError: 'words'
Skipping 20260416_170014.mp4 due to KeyError: 'words'

Luckily it was an easy investigation to gin up a fix for my use-case, but I admit I didn't really stick my head all the way into the whisper demo, so my fix might be flawed. I'll share the PR, but there is this bug in evidence, regardless.

Basically, subtitle generation enters word-level mode when only segments[0] has "words", but get_start() iterates every segment's s["words"] → KeyError when later segments omit "words", as is common with --task translate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions