Skip to content

feat: callback-based progress tracking for uploads and downloads#19

Open
assafvayner wants to merge 20 commits intomainfrom
assaf/progress
Open

feat: callback-based progress tracking for uploads and downloads#19
assafvayner wants to merge 20 commits intomainfrom
assaf/progress

Conversation

@assafvayner
Copy link
Copy Markdown
Collaborator

@assafvayner assafvayner commented Apr 6, 2026

Summary

  • Adds a ProgressHandler trait and structured ProgressEvent types (UploadEvent, DownloadEvent) that allow callers to receive real-time progress callbacks during file transfers
  • Upload events carry phase information (PreparingCheckingUploadModeUploadingCommitting) plus aggregate byte progress
  • Download events carry per-file byte progress (StartedInProgressComplete) and aggregate xet progress
  • All five transfer params structs gain an optional progress: Progress field (Option<Arc<dyn ProgressHandler>>) that defaults to None — existing callers are unaffected
  • The hfrs CLI wires up an indicatif-based handler (CliProgressHandler) that renders multi-progress bars, respects --quiet and HF_HUB_DISABLE_PROGRESS_BARS
  • Xet uploads emit progress before/after commit via GroupProgressReport
  • Removes the polling-based progress tracking design spec (superseded by the callback approach)

New files

File Purpose
huggingface_hub/src/types/progress.rs Core types: ProgressHandler, ProgressEvent, UploadEvent, DownloadEvent, FileProgress, FileStatus, UploadPhase, Progress, emit()
huggingface_hub/src/bin/hfrs/progress.rs CliProgressHandler with indicatif MultiProgress

Key design decisions

  • Nested enum (ProgressEvent::Upload(UploadEvent)) separates upload/download concerns at the type level
  • Phase on every upload event so consumers always know the current phase from any single event — no state tracking needed
  • Delta-only download progress — only files whose state changed since last event
  • Progress type alias (Option<Arc<dyn ProgressHandler>>) keeps params structs clean
  • Blocking API unaffected — progress passes through as part of the params struct via existing sync_api! macros

Test plan

  • Unit tests in types/progress.rs — event recording, phase progression, batched file complete, None handler is no-op, Send+Sync compile check (6 tests)
  • Integration tests in tests/download_test.rs — download with progress to local dir, download with progress to cache, download without progress handler (3 tests)
  • cargo +nightly fmt --check passes
  • cargo clippy --all-features -D warnings passes
  • cargo test -p huggingface-hub --lib — 67 tests pass
  • cargo build --release succeeds
  • CI runs cargo test -p huggingface-hub --all-features which picks up all new unit and integration tests automatically

Callback-based progress system for upload/download with ProgressHandler
trait, ProgressEvent enum, xet polling bridge, and indicatif CLI bars
matching the Python hf CLI style.
Design for adding TransferTask<T> with ProgressHandle to download/upload
APIs, using IntoFuture for backward-compatible .await and lazy polling of
hf-xet progress objects.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll not push

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push, not poll

Account for blocking API macro refactor (sync_api!/sync_api_stream!/
sync_api_async_stream! in macros.rs), new download methods
(download_file_stream, download_file_to_bytes), and lib.rs glob
re-export pattern. The polling design gains a sync_api_transfer! macro
variant and updated blocking API section. The callback design notes
that macros forward progress transparently via params structs.
Add a ProgressHandler trait and structured ProgressEvent types that
allow callers to receive real-time progress updates during file
transfers. The library emits events for upload phases (preparing,
checking upload mode, uploading, committing) and per-file download
byte progress. The hfrs CLI wires up an indicatif-based handler
that renders progress bars matching the Python hf CLI style.
@assafvayner assafvayner changed the title progress plan feat: callback-based progress tracking for uploads and downloads Apr 9, 2026
- Move inline `use` statements to top of file (xet.rs, download_test.rs)
- Update Project Layout in CLAUDE.md with new progress.rs module
- Add examples/progress.rs demonstrating ProgressHandler usage
- Replace unwrap() with expect()/unwrap_or_else() in CLI progress.rs
- Remove trivial doc comments that restate type names
- Wire progress through snapshot_download (was being ignored)
Doc comments (///) on public types generate cargo doc output and
should not be stripped as trivial comments. Updated CLAUDE.md to
distinguish inline comments (no restating code) from doc comments
(required on all public API surface).
Add doc comments to ProgressHandler::on_progress(), FileStatus
variants, CliProgressHandler, and progress_disabled().
@assafvayner assafvayner marked this pull request as ready for review April 9, 2026 07:27
- Fix nested lifecycle: snapshot_download no longer emits duplicate
  Start/Complete through child download_file calls. Introduced
  download_file_inner() without lifecycle events; download_concurrently
  uses it while the public download_file() wraps with Start/Complete.

- Add xet download progress: all three xet download methods now accept
  a Progress handle and spawn a polling task that emits AggregateProgress
  every 100ms during group.finish(). Cache path also emits per-file
  Complete (was missing entirely).

- Add xet upload progress: replace single-shot before/after emit with
  a polling task that reports UploadEvent::Progress{Uploading} every
  100ms during commit().await.

- Emit per-file Complete events for xet batch downloads in
  snapshot_download so the CLI files_bar increments correctly.
Resolve conflicts in repo_params (params moved to types/repo_params.rs
on main), add progress fields to the new param location.

Update download_test.rs:
- Read-only tests use prod_api() (HF_PROD_TOKEN in CI)
- Upload progress tests use hub_ci_api() (HF_CI_TOKEN in CI)
- New tests: upload_file_with_progress, create_commit_with_progress
  (multiple files), upload_with_no_progress_handler
- All write tests create temporary repos on hub-ci and clean up
@assafvayner
Copy link
Copy Markdown
Collaborator Author

Code review

Found 2 issues:

  1. download_file emits Start { total_bytes: 0 } before the HEAD request, so the CLI bytes progress bar is never created for single-file non-xet downloads. CliProgressHandler only creates the bar when total_bytes > 0 && total_files == 1, which never holds. The design spec says total_bytes should be known at Start (from the HEAD response).

pub async fn download_file(&self, params: &RepoDownloadFileParams) -> Result<PathBuf> {
progress::emit(
&params.progress,
ProgressEvent::Download(DownloadEvent::Start {
total_files: 1,
total_bytes: 0,
}),
);
let result = self.download_file_inner(params).await;
if result.is_ok() {

  1. For inline (non-xet/non-LFS) uploads, the bytes_bar is created in the Start handler but never updated. bytes_bar.set_position() is only called when phase == UploadPhase::Uploading (line 213), but inline uploads skip the Uploading phase entirely -- they go Preparing -> CheckingUploadMode -> Committing. The bar stays at position 0 for the entire upload.

if *phase == UploadPhase::Uploading
&& let Some(ref bar) = state.bytes_bar
{
bar.set_length(*total_bytes);
bar.set_position(*bytes_completed);
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

…s flag

- Route tracing output through MultiProgress::println() so log lines
  from hf-xet appear above progress bars without visual corruption
- Add --disable-progress-bars global CLI flag
- Tighten HF_HUB_DISABLE_PROGRESS_BARS env var to only match "1" or "true"
- Create MultiProgress once in main and share with tracing and progress handler
- Remove "LFS" from upload progress bar message
MultiProgress::println() swallows output when stderr is not a tty,
which breaks tracing output in tests and piped contexts. Fall back
to direct stderr when not a terminal.
- Retain XetFileDownload handles and poll per-file progress every 100ms
- Show up to 10 concurrent per-file progress bars, queue overflow with
  promotion on completion
- Emit FileStatus::Complete as individual xet files finish, not after
  the entire batch completes
- Use AtomicBool to guarantee exactly one Complete event per file
  between the poller and cleanup sweep
- Add filename field to XetBatchFile and filename param to
  xet_download_to_blob for progress display
- Emit Complete for files already on disk in local_dir mode
  (mirrors the existing cache mode fix)
- Treat bytes_completed >= total_bytes as Complete to handle the gap
  between SDK progress reporting and task finalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant