Fix duplicate model file downloads when progress_callback is active (#1663)#1664
Open
anishesg wants to merge 1 commit intohuggingface:mainfrom
Open
Fix duplicate model file downloads when progress_callback is active (#1663)#1664anishesg wants to merge 1 commit intohuggingface:mainfrom
anishesg wants to merge 1 commit intohuggingface:mainfrom
Conversation
…uggingface#1663) When `progress_callback` is provided to `pipeline()`, model files are fetched from the server more times than necessary — `config.json` appears 3× and `tokenizer.json` 2× in the nginx logs, compared to once each without a callback. Signed-off-by: anish k <ak8686@princeton.edu>
5 tasks
xenova
reviewed
Apr 26, 2026
Collaborator
xenova
left a comment
There was a problem hiding this comment.
thanks for the PR! It looks good to me (would be nice to find a way so that these defaults are defined only once, but this works for now).
cc @nico-martin for final review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When
progress_callbackis provided topipeline(), model files are fetched from the server more times than necessary —config.jsonappears 3× andtokenizer.json2× in the nginx logs, compared to once each without a callback.The root cause is a memoize key mismatch in
get_file_metadata(src/utils/model_registry/get_file_metadata.js). The key is built fromoptions.revision,options.cache_dir, andoptions.local_files_onlywithout normalizing defaults, so a call frompipelines.jswith no options (revision=undefined,cache_dir=undefined,local_files_only=undefined) produces a different key than the call fromloadResourceFileinhub.jswith explicit defaults (revision='main',cache_dir=null,local_files_only=false). Both represent the same logical request, butmemoizePromisesees two distinct keys and invokes_get_file_metadatatwice — each triggering an extra HTTP GET to the local server.The fix normalizes the three options in the memoize key using
?? defaultso that callers with different representations of the same default share the same memoize entry and the underlying fetch runs only once.Fixes #1663