examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922
Open
djd0723 wants to merge 3 commits into
Open
examples: resolve whisper-cli models from the HuggingFace cache (-hf/-hff)#3922djd0723 wants to merge 3 commits into
djd0723 wants to merge 3 commits into
Conversation
Port llama.cpp's HuggingFace hub-cache subsystem (http.h, hf-cache.{h,cpp})
into whisper.cpp's shared common library and wire -hf/--hf-repo +
-hff/--hf-file into whisper-cli. Phase 1 is cache-only: whisper_hf_resolve_model
scans the on-disk HF hub cache (get_cached_files + finalize_file) and maps
org/repo (+ optional file) to a concrete snapshot path, so a model already
pulled by the hf CLI resolves with no -m path. An explicit -m still wins and
the no-args default stays models/ggml-base.en.bin.
The network download path is compiled but unused this phase (enabled in
Phase 2). Adds tests/test-hf-resolve.sh covering cache hit, missing-file
error (exit 3), -m regression, and the default path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Enable the network half of the HF cache resolver. whisper_hf_resolve_model now (unless HF_HUB_OFFLINE is set) lists the repo over the HF API, picks the primary file (exact --hf-file, else first ggml-*.bin), downloads it into the blobs/<oid> + snapshots/<commit> layout via a slim hf_cache::download_file, and falls back to the Phase 1 on-disk cache scan on empty listing or network failure. HF_TOKEN is honored for the Authorization: Bearer header. HTTPS is gated behind a new WHISPER_OPENSSL CMake option (find_package OpenSSL, CPPHTTPLIB_OPENSSL_SUPPORT, link OpenSSL::SSL/Crypto); an https attempt in a non-SSL build prints the rebuild hint. download_file follows redirects manually and disables httplib url-encoding: cpp-httplib 0.20 (whisper's vendored version) both mishandles cross-host redirects and re-encodes the already-encoded signed xet CDN URL, corrupting the presigned query string into a 403. The bearer token is dropped on cross-host (CDN) redirects. tests/test-hf-resolve.sh gains an offline fall-back case (HF_HUB_OFFLINE=1), a WHISPER_CLI override, and an optional no-OpenSSL rebuild-hint check. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Change the empty --hf-file branch of whisper_hf_resolve_model to be cache-first and refuse ambiguity rather than pick the repo's first ggml-*.bin. With no -hff: exactly one cached ggml-*.bin resolves it (no network); more than one errors and lists the cached files; a cold cache errors and lists the repo's available models instead of silently downloading. This makes "download once with -hff, then just -hf" work. Unlike llama.cpp's -hf <user>/<model>[:quant] default-quant pick (find_best_model), whisper repos are many-models-one-repo with no meaningful default, so we key off the cache and error+list on ambiguity. The explicit -hff path (download-first, cache fall-back) is unchanged. whisper_hf_resolve_model now prints a specific diagnostic for every failure mode, so cli.cpp no longer prints its own generic (and now inaccurate) "not found in HF cache" line; it just returns exit 3. tests/test-hf-resolve.sh gains single-cached (-hf alone -> exit 0) and multi-cached (-hf alone -> exit 3 + "multiple models cached" + list) cases, and the missing-file assertion matches the new message. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problems was I solving
llama.cppresolves default model paths from the HuggingFace hub cache, sollama-cli -hf org/repo"just works" for a model already pulled by thehfCLI.whisper-clihad no default model path story at all — every invocation required an explicit-m models/ggml-*.bin, and a model already sitting in~/.cache/huggingface/hubcould not be referenced by its repo id.This PR closes that gap:
whisper-clican now resolve (and, on a cold cache, download) models straight out of the shared HuggingFace hub cache, matching llama.cpp's-hfone-liner UX, while keeping existing-mbehavior 100% backward-compatible.Success looks like:
whisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wavtranscribes with no-m— resolving a model from the shared cache (warm) or downloading it into the samemodels--org--repo/{blobs,snapshots,refs}layout (cold).-hfsee zero behavior change; an explicit-m /pathstill wins; the no-args default staysmodels/ggml-base.en.bin.What user-facing changes did I ship
-hf/--hf-repoand-hff/--hf-fileflags onwhisper-cli—examples/cli/cli.cpp-hf org/repo --hf-file file.binresolves from the HF cache, or downloads it on a cold cache (OpenSSL builds) —examples/common-whisper.cpp-hf org/repo(no--hf-file) resolves a single cached model, or fails fast with a sorted list when the choice is ambiguous (exit 3) —examples/common-whisper.cppWHISPER_OPENSSLCMake option gating HTTPS downloads; OFF by default (anhttps://attempt without it prints a rebuild hint) —examples/CMakeLists.txtHow I implemented it
The work is three vertical slices (one commit each), sharing a single resolver ported from llama.cpp's self-contained HF-cache subsystem into whisper's shared
examples/commonstatic library.Cache subsystem (ported into
examples/common)examples/hf-cache.h— thehf_cacheAPI (hf_file,get_repo_files,get_cached_files,download_file,finalize_file,remove_cached_repo), copied fromllama.cpp/common/hf-cache.h.examples/hf-cache.cpp(670 lines) — ported fromllama.cpp/common/hf-cache.cppwith whisper dependency substitutions:whisper_version()for the User-Agent, localLOG_WRN/LOG_ERR→fprintf(stderr, …)(whisper has nolog.h), vendored"json.hpp", a localget_model_endpoint()readingMODEL_ENDPOINT/HF_ENDPOINT, and small localstring_*helpers. Theget_cache_directory()env-var chain and all validation helpers are copied unchanged so the on-disk contract withhuggingface_hubholds.examples/http.h—common_http_parse_url/common_http_clientover whisper's vendoredhttplib.h, keeping theCPPHTTPLIB_OPENSSL_SUPPORTgate + rebuild hint.Resolver + CLI wiring
examples/common-whisper.cpp—whisper_hf_resolve_modelplus helpers (whisper_hf_is_ggml_bin,whisper_hf_pick_primary,whisper_hf_ggml_candidates,whisper_hf_print_candidates). Explicit--hf-fileis download-first with cache fall-back; bare-hfis cache-first and refuses ambiguity.HF_HUB_OFFLINEshort-circuits the network;HF_TOKENdrives theAuthorization: Bearerheader.examples/common-whisper.h— declares the resolver.examples/cli/cli.cpp— new params, arg parsing, usage lines, and a pre-init resolve that only fires when-hfis set and-mis still at its default string, exiting 3 with a diagnostic on failure.Build
examples/CMakeLists.txt— adds the new sources to thecommonSTATIC lib, linksjson_cpp, requirescxx_std_17, addsexamples/server/to the include path forhttplib.h, and adds theWHISPER_OPENSSLoption (OpenSSL +CPPHTTPLIB_OPENSSL_SUPPORTwhen ON).Design notes
-hf user/model[:quant]picks a default quant because its repos are one-model-many-quants — whisper repos are many-models-one-repo with no meaningful default. So bare-hfresolves only when the choice is unambiguous (a single cached model) and otherwise errors with an actionable, sorted list rather than silently downloading a multi-GB guess.download_filestreams the single primary file toblobs/<oid>and finalizes it, rather than porting llama.cpp's split/quant/preset download machinery. It follows cross-host CDN redirects manually because cpp-httplib 0.20 drops the presigned query string on redirect (403), and it never forwards the HF bearer token to a different host.WHISPER_OPENSSLdefaults OFF, so no new hard dependency on OpenSSL for existing builds; the cache-only path works without it.How to verify it
Manual Testing
hf download ggerganov/whisper.cpp ggml-base.en.bin, thenwhisper-cli -hf ggerganov/whisper.cpp --hf-file ggml-base.en.bin -f samples/jfk.wavtranscribes with no-m.whisper-cli -hf ggerganov/whisper.cpp -f samples/jfk.wavtranscribes; after caching a second, it errors, lists both, and exits 3.whisper-cli -m models/for-tests-ggml-base.en.bin -f samples/jfk.wavand barewhisper-cli -f samples/jfk.wavare unchanged (regression).httpsresolve against an empty cache prints the-DWHISPER_OPENSSL=ONrebuild hint and exits non-zero.Automated Tests
tests/test-hf-resolve.shis a self-contained offline harness: it seeds a tempHF_HUB_CACHEwith themodels--org--repo/{refs,snapshots}layout (using an existingfor-testsmodel as the payload) and, withHF_HUB_OFFLINE=1, asserts cache hit, missing-file error (exit 3), bare--hfsingle/multi-cached, the-mregression, the unchanged bare default, and (optionally) the no-OpenSSL rebuild hint.Description for the changelog
whisper-cli: resolve/download models from the HuggingFace hub cache via
-hf org/repo [--hf-file file.bin](no-mrequired), backward-compatible with-m.