abort_callback and encoder_begin_callback do not interrupt mid-computation (mid-transcription)

## Problem
The `abort_callback` field in `whisper_full_params` does not actually interrupt an in-progress transcription. It only fires after each encode/decode step completes, making it ineffective for real-time cancellation. This makes it impossible to build a responsive *stop* feature in applications that use `whisper_full()`. It can also create ghost processes in certain applications if we rely purely on operating system's garbage collection. (e.g. macOS)

## Root Cause
There are two overloads of the internal `ggml_graph_compute_helper`:
1. The `ggml_cgraph *` version (line 169): Correctly accepts and wires `abort_callback` into ggml backend.
2. The `abort_callback_sched_t` version (line 191): Has no `abort_callback` parameter at all.

All actual encoder and decoder compute calls inside `whisper_encode_internal`and `whisper_decode_internal`use the second (shed) overload exclusively. As a result, `abort_callback` is never passed to the ggml backend during computation. The only places it fires are the post-hoc checks at the very end of those functions (lines 2447 and 2977), after all the work is already done.

Additionally, the main token sampling loop (`whisper_full_with_state`, line 6783) has no abort check at all. It runs up to `n_text_ctx / 2` iterations with no opportunity to exit early.

`encoder_begin_callback` does work correctly. It fires before each audio chunk but this only helps with multi-chunk audio. For short clips processed as a single chunk, by the time a user requests a stop, the single chunk is already being processed and `encoder_begin_callback` will not fire again.

## Proposed Fix
I propose 3 changes to `whisper.cpp` with no API changes:
1. Add `abort_callback` support to the `sched` overload of `ggml_graph_compute_helper`, ısing the same `ggml_backend_set_abort_callback` pattern already present in the non-shed overload.
2. Pass `abort_callback` and `abort_callback_user_data` through the `ggml_graph_compute_helper(schedule, ...)` calls inside `whisper_encode_internal` (lines 2406, 2431, 2447) and inside `whisper_decode_internal` (line 2944). Note that `whisper_decode_internal` is called from 4 external sites, but lines 3940 and 8847 already pass `nullptr` and are unrelated to user-initiated abort.
3. Add an `abort_callback` check at the top of the token sampling loop in `whisper_full_with_state` so it can exit between token generations.

## Discussion
Before implementing these, I wanted to check whether is this the right layer to fix it, or would you prefer the abort mechanism live deeper in ggml? Also do you have any concerns with the proposed approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

abort_callback and encoder_begin_callback do not interrupt mid-computation (mid-transcription) #3718

Problem

Root Cause

Proposed Fix

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

abort_callback and encoder_begin_callback do not interrupt mid-computation (mid-transcription) #3718

Description

Problem

Root Cause

Proposed Fix

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions