Skip to content

Commit c07251d

Browse files
authored
Hf pull readme update (#4196)
### 🛠 Summary Just readme update with resume and terminate, cancel features. ### 🧪 Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. ``
1 parent 202f3f7 commit c07251d

1 file changed

Lines changed: 55 additions & 0 deletions

File tree

docs/pull_hf_models.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,58 @@ Check [parameters page](./parameters.md) for detailed descriptions of configurat
9393
In case you want to setup model and start server in one step, follow [instructions](./starting_server.md).
9494

9595
> **Note:** When using pull mode you need both read and write access rights to models repository.
96+
97+
## Resuming an interrupted pull
98+
99+
Pulling Generative AI models from Hugging Face often involves transferring multi-gigabyte LFS files (e.g. `openvino_model.bin`). To make this robust against network errors and operator interventions, OVMS pull mode persists the in-progress download state on disk and resumes from where it stopped on the next `--pull` invocation. No extra flags are required — simply re-run the same `--pull` command against the same `--model_repository_path` and OVMS will continue any partially downloaded LFS files instead of starting from scratch.
100+
101+
### What is persisted
102+
103+
While a pull is in flight, OVMS / libgit2 keeps the following on disk under your `--model_repository_path`:
104+
105+
| Artifact | Purpose |
106+
|---|---|
107+
| `<repo>.lfswip` (sibling of the repository directory) | Marker file indicating that an LFS download is work-in-progress. |
108+
| `<file>.lfs_part` (next to each LFS-tracked file) | Partially downloaded LFS object. The next `--pull` resumes the HTTP transfer from the existing byte offset. |
109+
| LFS pointer file (in place of the final binary) | Standard `version https://git-lfs.github.com/spec/v1` pointer that allows libgit2 to identify which OID still needs to be fetched. |
110+
111+
When the LFS transfer for a file completes successfully, the `.lfs_part` file is renamed to its final name and the pointer is replaced. Once **all** LFS files are present, the `.lfswip` marker is removed and the repository is considered clean.
112+
113+
### Resume after Ctrl+C / SIGINT (graceful cancel)
114+
115+
Pressing **Ctrl+C** (or sending `SIGINT` / `SIGTERM` on Linux, `CTRL_BREAK_EVENT` on Windows) while `ovms --pull` is running triggers a graceful cancellation:
116+
117+
1. OVMS marks the server as shutting down.
118+
2. libgit2 clone / LFS callbacks observe the cancellation request and abort the in-flight HTTP transfer cleanly.
119+
3. The process exits with a non-zero status code. Partial `.lfs_part` files and the `.lfswip` marker are left on disk on purpose.
120+
4. Re-running the **same** `--pull` command resumes each partial file using HTTP `Range` requests and finishes the remaining downloads.
121+
122+
This is the recommended way to interrupt a pull — it avoids corrupted partial data and lets you resume without re-downloading completed files.
123+
124+
### Resume after process termination (forced kill / power loss)
125+
126+
If the OVMS process is killed forcibly (`SIGKILL`, OOM killer, container stop with no grace period, host crash, power loss), the on-disk state is the same as for a graceful cancel: any LFS files that were in flight remain as `<file>.lfs_part` plus an LFS pointer file, and the `.lfswip` marker is still present. The next `--pull` invocation:
127+
128+
1. Detects the `.lfswip` marker and the leftover LFS pointer files.
129+
2. For each affected file, opens an HTTP `Range` request starting at the current size of the corresponding `.lfs_part` file and continues the transfer.
130+
3. Cleans up the marker once every LFS file is fully present.
131+
132+
If a forced termination corrupted an in-progress write, the resumed transfer will detect the size/hash mismatch on completion and the file will be re-downloaded on a subsequent attempt. User-edited or user-deleted files are **not** restored automatically — once a `--pull` has finished successfully, OVMS treats the local repository as authoritative and will not overwrite or re-fetch files that you have modified or removed. To force OVMS to re-download a model from scratch, pass `--overwrite_models` on the next `--pull` invocation; the existing model directory under `--model_repository_path` will be replaced with a fresh download.
133+
134+
### Tuning resume behavior
135+
136+
The number of resume attempts per LFS file and the interval between them can be tuned via environment variables read once on process start (defaults shown):
137+
138+
| Environment variable | Default | Description |
139+
|---|---|---|
140+
| `GIT_LFS_RESUME_ATTEMPTS` | `5` | Maximum number of resume attempts for a single LFS file before giving up. `0` disables resume. |
141+
| `GIT_LFS_RESUME_INTERVAL_SECONDS` | `10` | Delay between consecutive resume attempts. |
142+
143+
On startup OVMS logs the resolved configuration, e.g.:
144+
145+
```text
146+
[INFO] LFS resume: attempts=5 interval=10 s
147+
```
148+
149+
> **Note:** Resume relies on the remote server honoring HTTP `Range` requests. Hugging Face Hub supports this by default; private mirrors must allow ranged GETs for resume to work.
150+

0 commit comments

Comments
 (0)