examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff) by djd0723 · Pull Request #3922 · ggml-org/whisper.cpp

djd0723 · 2026-07-02T20:39:25Z

What problems was I solving

llama.cpp resolves default model paths from the HuggingFace hub cache, so llama-cli -hf org/repo "just works" for a model already pulled by the hf CLI. whisper-cli had no default model path story at all — every invocation required an explicit -m models/ggml-*.bin, and a model already sitting in ~/.cache/huggingface/hub could not be referenced by its repo id.

This PR closes that gap: whisper-cli can now resolve (and, on a cold cache, download) models straight out of the shared HuggingFace hub cache, matching llama.cpp's -hf one-liner UX, while keeping existing -m behavior 100% backward-compatible.

Success looks like:

whisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wav transcribes with no -m — resolving a model from the shared cache (warm) or downloading it into the same models--org--repo/{blobs,snapshots,refs} layout (cold).
Users who never pass -hf see zero behavior change; an explicit -m /path still wins; the no-args default stays models/ggml-base.en.bin.

What user-facing changes did I ship

New -hf/--hf-repo and -hff/--hf-file flags on whisper-cli — examples/cli/cli.cpp
-hf org/repo --hf-file file.bin resolves from the HF cache, or downloads it on a cold cache (OpenSSL builds) — examples/common-whisper.cpp
Bare -hf org/repo (no --hf-file) resolves a single cached model, or fails fast with a sorted list when the choice is ambiguous (exit 3) — examples/common-whisper.cpp
New opt-in WHISPER_OPENSSL CMake option gating HTTPS downloads; OFF by default (an https:// attempt without it prints a rebuild hint) — examples/CMakeLists.txt

How I implemented it

The work is three vertical slices (one commit each), sharing a single resolver ported from llama.cpp's self-contained HF-cache subsystem into whisper's shared examples/common static library.

Cache subsystem (ported into `examples/common`)

examples/hf-cache.h — the hf_cache API (hf_file, get_repo_files, get_cached_files, download_file, finalize_file, remove_cached_repo), copied from llama.cpp/common/hf-cache.h.
examples/hf-cache.cpp (670 lines) — ported from llama.cpp/common/hf-cache.cpp with whisper dependency substitutions: whisper_version() for the User-Agent, local LOG_WRN/LOG_ERR → fprintf(stderr, …) (whisper has no log.h), vendored "json.hpp", a local get_model_endpoint() reading MODEL_ENDPOINT/HF_ENDPOINT, and small local string_* helpers. The get_cache_directory() env-var chain and all validation helpers are copied unchanged so the on-disk contract with huggingface_hub holds.
examples/http.h — common_http_parse_url/common_http_client over whisper's vendored httplib.h, keeping the CPPHTTPLIB_OPENSSL_SUPPORT gate + rebuild hint.

Resolver + CLI wiring

examples/common-whisper.cpp — whisper_hf_resolve_model plus helpers (whisper_hf_is_ggml_bin, whisper_hf_pick_primary, whisper_hf_ggml_candidates, whisper_hf_print_candidates). Explicit --hf-file is download-first with cache fall-back; bare -hf is cache-first and refuses ambiguity. HF_HUB_OFFLINE short-circuits the network; HF_TOKEN drives the Authorization: Bearer header.
examples/common-whisper.h — declares the resolver.
examples/cli/cli.cpp — new params, arg parsing, usage lines, and a pre-init resolve that only fires when -hf is set and -m is still at its default string, exiting 3 with a diagnostic on failure.

Build

examples/CMakeLists.txt — adds the new sources to the common STATIC lib, links json_cpp, requires cxx_std_17, adds examples/server/ to the include path for httplib.h, and adds the WHISPER_OPENSSL option (OpenSSL + CPPHTTPLIB_OPENSSL_SUPPORT when ON).

Design notes

Refuse ambiguity instead of guessing. Unlike llama.cpp — whose -hf user/model[:quant] picks a default quant because its repos are one-model-many-quants — whisper repos are many-models-one-repo with no meaningful default. So bare -hf resolves only when the choice is unambiguous (a single cached model) and otherwise errors with an actionable, sorted list rather than silently downloading a multi-GB guess.
Slim downloader, not a full port. whisper models are one file per name, so download_file streams the single primary file to blobs/<oid> and finalizes it, rather than porting llama.cpp's split/quant/preset download machinery. It follows cross-host CDN redirects manually because cpp-httplib 0.20 drops the presigned query string on redirect (403), and it never forwards the HF bearer token to a different host.
HTTPS is opt-in. WHISPER_OPENSSL defaults OFF, so no new hard dependency on OpenSSL for existing builds; the cache-only path works without it.

How to verify it

git fetch origin pull/3922/head:pr-3922 && git checkout pr-3922
# (or check out the branch directly from the fork)

# build (cache-only path needs no OpenSSL; add -DWHISPER_OPENSSL=ON for downloads)
cmake -B build -DWHISPER_BUILD_EXAMPLES=ON
cmake --build build --target whisper-cli -j

# offline seeded-cache suite (no network required)
./tests/test-hf-resolve.sh

Manual Testing

hf download ggerganov/whisper.cpp ggml-base.en.bin, then whisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wav transcribes with no -m.
With only one model cached, bare whisper-cli -hf ggerganov/whisper.cpp -f samples/jfk.wav transcribes; after caching a second, it errors, lists both, and exits 3.
whisper-cli -m models/for-tests-ggml-base.en.bin -f samples/jfk.wav and bare whisper-cli -f samples/jfk.wav are unchanged (regression).
A no-OpenSSL build attempting an https resolve against an empty cache prints the -DWHISPER_OPENSSL=ON rebuild hint and exits non-zero.

Automated Tests

./tests/test-hf-resolve.sh

tests/test-hf-resolve.sh is a self-contained offline harness: it seeds a temp HF_HUB_CACHE with the models--org--repo/{refs,snapshots} layout (using an existing for-tests model as the payload) and, with HF_HUB_OFFLINE=1, asserts cache hit, missing-file error (exit 3), bare--hf single/multi-cached, the -m regression, the unchanged bare default, and (optionally) the no-OpenSSL rebuild hint.

Description for the changelog

whisper-cli: resolve/download models from the HuggingFace hub cache via -hf org/repo [--hf-file file.bin] (no -m required), backward-compatible with -m.

Port llama.cpp's HuggingFace hub-cache subsystem (http.h, hf-cache.{h,cpp}) into whisper.cpp's shared common library and wire -hf/--hf-repo + -hff/--hf-file into whisper-cli. Phase 1 is cache-only: whisper_hf_resolve_model scans the on-disk HF hub cache (get_cached_files + finalize_file) and maps org/repo (+ optional file) to a concrete snapshot path, so a model already pulled by the hf CLI resolves with no -m path. An explicit -m still wins and the no-args default stays models/ggml-base.en.bin. The network download path is compiled but unused this phase (enabled in Phase 2). Adds tests/test-hf-resolve.sh covering cache hit, missing-file error (exit 3), -m regression, and the default path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Enable the network half of the HF cache resolver. whisper_hf_resolve_model now (unless HF_HUB_OFFLINE is set) lists the repo over the HF API, picks the primary file (exact --hf-file, else first ggml-*.bin), downloads it into the blobs/<oid> + snapshots/<commit> layout via a slim hf_cache::download_file, and falls back to the Phase 1 on-disk cache scan on empty listing or network failure. HF_TOKEN is honored for the Authorization: Bearer header. HTTPS is gated behind a new WHISPER_OPENSSL CMake option (find_package OpenSSL, CPPHTTPLIB_OPENSSL_SUPPORT, link OpenSSL::SSL/Crypto); an https attempt in a non-SSL build prints the rebuild hint. download_file follows redirects manually and disables httplib url-encoding: cpp-httplib 0.20 (whisper's vendored version) both mishandles cross-host redirects and re-encodes the already-encoded signed xet CDN URL, corrupting the presigned query string into a 403. The bearer token is dropped on cross-host (CDN) redirects. tests/test-hf-resolve.sh gains an offline fall-back case (HF_HUB_OFFLINE=1), a WHISPER_CLI override, and an optional no-OpenSSL rebuild-hint check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Change the empty --hf-file branch of whisper_hf_resolve_model to be cache-first and refuse ambiguity rather than pick the repo's first ggml-*.bin. With no -hff: exactly one cached ggml-*.bin resolves it (no network); more than one errors and lists the cached files; a cold cache errors and lists the repo's available models instead of silently downloading. This makes "download once with -hff, then just -hf" work. Unlike llama.cpp's -hf <user>/<model>[:quant] default-quant pick (find_best_model), whisper repos are many-models-one-repo with no meaningful default, so we key off the cache and error+list on ambiguity. The explicit -hff path (download-first, cache fall-back) is unchanged. whisper_hf_resolve_model now prints a specific diagnostic for every failure mode, so cli.cpp no longer prints its own generic (and now inaccurate) "not found in HF cache" line; it just returns exit 3. tests/test-hf-resolve.sh gains single-cached (-hf alone -> exit 0) and multi-cached (-hf alone -> exit 3 + "multiple models cached" + list) cases, and the missing-file assertion matches the new message. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

djd0723 and others added 3 commits July 2, 2026 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922

examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922
djd0723 wants to merge 3 commits into
ggml-org:masterfrom
djd0723:add-huggingface-cache-default-model-path-to-whispercpp

djd0723 commented Jul 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

djd0723 commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problems was I solving

What user-facing changes did I ship

How I implemented it

Cache subsystem (ported into examples/common)

Resolver + CLI wiring

Build

Design notes

How to verify it

Manual Testing

Automated Tests

Description for the changelog

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

djd0723 commented Jul 2, 2026 •

edited

Loading

Cache subsystem (ported into `examples/common`)