Skip to content

Fix duplicate model file downloads when progress_callback is active (#1663)#1664

Open
anishesg wants to merge 1 commit intohuggingface:mainfrom
proudhare:fix/ph-issue-1663
Open

Fix duplicate model file downloads when progress_callback is active (#1663)#1664
anishesg wants to merge 1 commit intohuggingface:mainfrom
proudhare:fix/ph-issue-1663

Conversation

@anishesg
Copy link
Copy Markdown

When progress_callback is provided to pipeline(), model files are fetched from the server more times than necessary — config.json appears 3× and tokenizer.json 2× in the nginx logs, compared to once each without a callback.

The root cause is a memoize key mismatch in get_file_metadata (src/utils/model_registry/get_file_metadata.js). The key is built from options.revision, options.cache_dir, and options.local_files_only without normalizing defaults, so a call from pipelines.js with no options (revision=undefined, cache_dir=undefined, local_files_only=undefined) produces a different key than the call from loadResourceFile in hub.js with explicit defaults (revision='main', cache_dir=null, local_files_only=false). Both represent the same logical request, but memoizePromise sees two distinct keys and invokes _get_file_metadata twice — each triggering an extra HTTP GET to the local server.

The fix normalizes the three options in the memoize key using ?? default so that callers with different representations of the same default share the same memoize entry and the underlying fetch runs only once.

Fixes #1663

…uggingface#1663)

When `progress_callback` is provided to `pipeline()`, model files are fetched from the server more times than necessary — `config.json` appears 3× and `tokenizer.json` 2× in the nginx logs, compared to once each without a callback.

Signed-off-by: anish k <ak8686@princeton.edu>
Copy link
Copy Markdown
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! It looks good to me (would be nice to find a way so that these defaults are defined only once, but this works for now).

cc @nico-martin for final review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model files are downloaded multiple times if progress_callback is active

2 participants