Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/pull_hf_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,58 @@ Check [parameters page](./parameters.md) for detailed descriptions of configurat
In case you want to setup model and start server in one step, follow [instructions](./starting_server.md).

> **Note:** When using pull mode you need both read and write access rights to models repository.

## Resuming an interrupted pull

Pulling Generative AI models from Hugging Face often involves transferring multi-gigabyte LFS files (e.g. `openvino_model.bin`). To make this robust against network errors and operator interventions, OVMS pull mode persists the in-progress download state on disk and resumes from where it stopped on the next `--pull` invocation. No extra flags are required — simply re-run the same `--pull` command against the same `--model_repository_path` and OVMS will continue any partially downloaded LFS files instead of starting from scratch.

### What is persisted

While a pull is in flight, OVMS / libgit2 keeps the following on disk under your `--model_repository_path`:

| Artifact | Purpose |
|---|---|
| `<repo>.lfswip` (sibling of the repository directory) | Marker file indicating that an LFS download is work-in-progress. |
| `<file>.lfs_part` (next to each LFS-tracked file) | Partially downloaded LFS object. The next `--pull` resumes the HTTP transfer from the existing byte offset. |
| LFS pointer file (in place of the final binary) | Standard `version https://git-lfs.github.com/spec/v1` pointer that allows libgit2 to identify which OID still needs to be fetched. |

When the LFS transfer for a file completes successfully, the `.lfs_part` file is renamed to its final name and the pointer is replaced. Once **all** LFS files are present, the `.lfswip` marker is removed and the repository is considered clean.

### Resume after Ctrl+C / SIGINT (graceful cancel)

Pressing **Ctrl+C** (or sending `SIGINT` / `SIGTERM` on Linux, `CTRL_BREAK_EVENT` on Windows) while `ovms --pull` is running triggers a graceful cancellation:

1. OVMS marks the server as shutting down.
2. libgit2 clone / LFS callbacks observe the cancellation request and abort the in-flight HTTP transfer cleanly.
3. The process exits with a non-zero status code. Partial `.lfs_part` files and the `.lfswip` marker are left on disk on purpose.
4. Re-running the **same** `--pull` command resumes each partial file using HTTP `Range` requests and finishes the remaining downloads.

This is the recommended way to interrupt a pull — it avoids corrupted partial data and lets you resume without re-downloading completed files.

### Resume after process termination (forced kill / power loss)

If the OVMS process is killed forcibly (`SIGKILL`, OOM killer, container stop with no grace period, host crash, power loss), the on-disk state is the same as for a graceful cancel: any LFS files that were in flight remain as `<file>.lfs_part` plus an LFS pointer file, and the `.lfswip` marker is still present. The next `--pull` invocation:

1. Detects the `.lfswip` marker and the leftover LFS pointer files.
2. For each affected file, opens an HTTP `Range` request starting at the current size of the corresponding `.lfs_part` file and continues the transfer.
3. Cleans up the marker once every LFS file is fully present.

If a forced termination corrupted an in-progress write, the resumed transfer will detect the size/hash mismatch on completion and the file will be re-downloaded on a subsequent attempt. User-edited or user-deleted files are **not** restored automatically — once a `--pull` has finished successfully, OVMS treats the local repository as authoritative and will not overwrite or re-fetch files that you have modified or removed. To force OVMS to re-download a model from scratch, pass `--overwrite_models` on the next `--pull` invocation; the existing model directory under `--model_repository_path` will be replaced with a fresh download.

### Tuning resume behavior

The number of resume attempts per LFS file and the interval between them can be tuned via environment variables read once on process start (defaults shown):

| Environment variable | Default | Description |
|---|---|---|
| `GIT_LFS_RESUME_ATTEMPTS` | `5` | Maximum number of resume attempts for a single LFS file before giving up. `0` disables resume. |
| `GIT_LFS_RESUME_INTERVAL_SECONDS` | `10` | Delay between consecutive resume attempts. |

On startup OVMS logs the resolved configuration, e.g.:

```text
[INFO] LFS resume: attempts=5 interval=10 s
```

> **Note:** Resume relies on the remote server honoring HTTP `Range` requests. Hugging Face Hub supports this by default; private mirrors must allow ranged GETs for resume to work.