Skip to content

examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922

Open
djd0723 wants to merge 3 commits into
ggml-org:masterfrom
djd0723:add-huggingface-cache-default-model-path-to-whispercpp
Open

examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922
djd0723 wants to merge 3 commits into
ggml-org:masterfrom
djd0723:add-huggingface-cache-default-model-path-to-whispercpp

Conversation

@djd0723

@djd0723 djd0723 commented Jul 2, 2026

Copy link
Copy Markdown

What problems was I solving

llama.cpp resolves default model paths from the HuggingFace hub cache, so llama-cli -hf org/repo "just works" for a model already pulled by the hf CLI. whisper-cli had no default model path story at all — every invocation required an explicit -m models/ggml-*.bin, and a model already sitting in ~/.cache/huggingface/hub could not be referenced by its repo id.

This PR closes that gap: whisper-cli can now resolve (and, on a cold cache, download) models straight out of the shared HuggingFace hub cache, matching llama.cpp's -hf one-liner UX, while keeping existing -m behavior 100% backward-compatible.

Success looks like:

  • whisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wav transcribes with no -m — resolving a model from the shared cache (warm) or downloading it into the same models--org--repo/{blobs,snapshots,refs} layout (cold).
  • Users who never pass -hf see zero behavior change; an explicit -m /path still wins; the no-args default stays models/ggml-base.en.bin.

What user-facing changes did I ship

  • New -hf/--hf-repo and -hff/--hf-file flags on whisper-cliexamples/cli/cli.cpp
  • -hf org/repo --hf-file file.bin resolves from the HF cache, or downloads it on a cold cache (OpenSSL builds) — examples/common-whisper.cpp
  • Bare -hf org/repo (no --hf-file) resolves a single cached model, or fails fast with a sorted list when the choice is ambiguous (exit 3) — examples/common-whisper.cpp
  • New opt-in WHISPER_OPENSSL CMake option gating HTTPS downloads; OFF by default (an https:// attempt without it prints a rebuild hint) — examples/CMakeLists.txt

How I implemented it

The work is three vertical slices (one commit each), sharing a single resolver ported from llama.cpp's self-contained HF-cache subsystem into whisper's shared examples/common static library.

Cache subsystem (ported into examples/common)

  • examples/hf-cache.h — the hf_cache API (hf_file, get_repo_files, get_cached_files, download_file, finalize_file, remove_cached_repo), copied from llama.cpp/common/hf-cache.h.
  • examples/hf-cache.cpp (670 lines) — ported from llama.cpp/common/hf-cache.cpp with whisper dependency substitutions: whisper_version() for the User-Agent, local LOG_WRN/LOG_ERRfprintf(stderr, …) (whisper has no log.h), vendored "json.hpp", a local get_model_endpoint() reading MODEL_ENDPOINT/HF_ENDPOINT, and small local string_* helpers. The get_cache_directory() env-var chain and all validation helpers are copied unchanged so the on-disk contract with huggingface_hub holds.
  • examples/http.hcommon_http_parse_url/common_http_client over whisper's vendored httplib.h, keeping the CPPHTTPLIB_OPENSSL_SUPPORT gate + rebuild hint.

Resolver + CLI wiring

  • examples/common-whisper.cppwhisper_hf_resolve_model plus helpers (whisper_hf_is_ggml_bin, whisper_hf_pick_primary, whisper_hf_ggml_candidates, whisper_hf_print_candidates). Explicit --hf-file is download-first with cache fall-back; bare -hf is cache-first and refuses ambiguity. HF_HUB_OFFLINE short-circuits the network; HF_TOKEN drives the Authorization: Bearer header.
  • examples/common-whisper.h — declares the resolver.
  • examples/cli/cli.cpp — new params, arg parsing, usage lines, and a pre-init resolve that only fires when -hf is set and -m is still at its default string, exiting 3 with a diagnostic on failure.

Build

  • examples/CMakeLists.txt — adds the new sources to the common STATIC lib, links json_cpp, requires cxx_std_17, adds examples/server/ to the include path for httplib.h, and adds the WHISPER_OPENSSL option (OpenSSL + CPPHTTPLIB_OPENSSL_SUPPORT when ON).

Design notes

  • Refuse ambiguity instead of guessing. Unlike llama.cpp — whose -hf user/model[:quant] picks a default quant because its repos are one-model-many-quants — whisper repos are many-models-one-repo with no meaningful default. So bare -hf resolves only when the choice is unambiguous (a single cached model) and otherwise errors with an actionable, sorted list rather than silently downloading a multi-GB guess.
  • Slim downloader, not a full port. whisper models are one file per name, so download_file streams the single primary file to blobs/<oid> and finalizes it, rather than porting llama.cpp's split/quant/preset download machinery. It follows cross-host CDN redirects manually because cpp-httplib 0.20 drops the presigned query string on redirect (403), and it never forwards the HF bearer token to a different host.
  • HTTPS is opt-in. WHISPER_OPENSSL defaults OFF, so no new hard dependency on OpenSSL for existing builds; the cache-only path works without it.

How to verify it

git fetch origin pull/3922/head:pr-3922 && git checkout pr-3922
# (or check out the branch directly from the fork)

# build (cache-only path needs no OpenSSL; add -DWHISPER_OPENSSL=ON for downloads)
cmake -B build -DWHISPER_BUILD_EXAMPLES=ON
cmake --build build --target whisper-cli -j

# offline seeded-cache suite (no network required)
./tests/test-hf-resolve.sh

Manual Testing

  • hf download ggerganov/whisper.cpp ggml-base.en.bin, then whisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wav transcribes with no -m.
  • With only one model cached, bare whisper-cli -hf ggerganov/whisper.cpp -f samples/jfk.wav transcribes; after caching a second, it errors, lists both, and exits 3.
  • whisper-cli -m models/for-tests-ggml-base.en.bin -f samples/jfk.wav and bare whisper-cli -f samples/jfk.wav are unchanged (regression).
  • A no-OpenSSL build attempting an https resolve against an empty cache prints the -DWHISPER_OPENSSL=ON rebuild hint and exits non-zero.

Automated Tests

./tests/test-hf-resolve.sh

tests/test-hf-resolve.sh is a self-contained offline harness: it seeds a temp HF_HUB_CACHE with the models--org--repo/{refs,snapshots} layout (using an existing for-tests model as the payload) and, with HF_HUB_OFFLINE=1, asserts cache hit, missing-file error (exit 3), bare--hf single/multi-cached, the -m regression, the unchanged bare default, and (optionally) the no-OpenSSL rebuild hint.

Description for the changelog

whisper-cli: resolve/download models from the HuggingFace hub cache via -hf org/repo [--hf-file file.bin] (no -m required), backward-compatible with -m.

djd0723 and others added 3 commits July 2, 2026 15:52
Port llama.cpp's HuggingFace hub-cache subsystem (http.h, hf-cache.{h,cpp})
into whisper.cpp's shared common library and wire -hf/--hf-repo +
-hff/--hf-file into whisper-cli. Phase 1 is cache-only: whisper_hf_resolve_model
scans the on-disk HF hub cache (get_cached_files + finalize_file) and maps
org/repo (+ optional file) to a concrete snapshot path, so a model already
pulled by the hf CLI resolves with no -m path. An explicit -m still wins and
the no-args default stays models/ggml-base.en.bin.

The network download path is compiled but unused this phase (enabled in
Phase 2). Adds tests/test-hf-resolve.sh covering cache hit, missing-file
error (exit 3), -m regression, and the default path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Enable the network half of the HF cache resolver. whisper_hf_resolve_model
now (unless HF_HUB_OFFLINE is set) lists the repo over the HF API, picks the
primary file (exact --hf-file, else first ggml-*.bin), downloads it into the
blobs/<oid> + snapshots/<commit> layout via a slim hf_cache::download_file,
and falls back to the Phase 1 on-disk cache scan on empty listing or network
failure. HF_TOKEN is honored for the Authorization: Bearer header.

HTTPS is gated behind a new WHISPER_OPENSSL CMake option (find_package OpenSSL,
CPPHTTPLIB_OPENSSL_SUPPORT, link OpenSSL::SSL/Crypto); an https attempt in a
non-SSL build prints the rebuild hint.

download_file follows redirects manually and disables httplib url-encoding:
cpp-httplib 0.20 (whisper's vendored version) both mishandles cross-host
redirects and re-encodes the already-encoded signed xet CDN URL, corrupting
the presigned query string into a 403. The bearer token is dropped on
cross-host (CDN) redirects.

tests/test-hf-resolve.sh gains an offline fall-back case (HF_HUB_OFFLINE=1),
a WHISPER_CLI override, and an optional no-OpenSSL rebuild-hint check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Change the empty --hf-file branch of whisper_hf_resolve_model to be
cache-first and refuse ambiguity rather than pick the repo's first
ggml-*.bin. With no -hff: exactly one cached ggml-*.bin resolves it
(no network); more than one errors and lists the cached files; a cold
cache errors and lists the repo's available models instead of silently
downloading. This makes "download once with -hff, then just -hf" work.

Unlike llama.cpp's -hf <user>/<model>[:quant] default-quant pick
(find_best_model), whisper repos are many-models-one-repo with no
meaningful default, so we key off the cache and error+list on ambiguity.
The explicit -hff path (download-first, cache fall-back) is unchanged.

whisper_hf_resolve_model now prints a specific diagnostic for every
failure mode, so cli.cpp no longer prints its own generic (and now
inaccurate) "not found in HF cache" line; it just returns exit 3.

tests/test-hf-resolve.sh gains single-cached (-hf alone -> exit 0) and
multi-cached (-hf alone -> exit 3 + "multiple models cached" + list)
cases, and the missing-file assertion matches the new message.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant