Disable HF Xet storage across all CI scripts#19371
Conversation
HuggingFace's Xet storage backend stalls mid-download on CI runners, causing 90-minute job timeouts. Set HF_HUB_DISABLE_XET=1 in every CI script and workflow that downloads from HuggingFace to force standard HTTP downloads instead.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19371
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 1 Pending, 3 Unrelated FailuresAs of commit 23b6acb with merge base 1414bc1 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent CI timeouts caused by HuggingFace Hub’s Xet storage backend stalling mid-download by forcing standard HTTP downloads via HF_HUB_DISABLE_XET=1 in CI entrypoints that fetch models.
Changes:
- Export
HF_HUB_DISABLE_XET=1in multiple CI shell scripts and in the MLX GitHub Actions workflow job scripts. - Set
HF_HUB_DISABLE_XETearly in.ci/scripts/test_huggingface_optimum_model.pyto cover downloads triggered by Python-based HF/Optimum flows. - Add the env var in
.ci/scripts/download_hf_hub.shto cover callers of the shared HF download helper.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/mlx.yml | Exports HF_HUB_DISABLE_XET=1 in several MLX workflow job scripts before model downloads. |
| .ci/scripts/test_phi_3_mini.sh | Disables Xet to avoid stalled HF downloads during Phi-3 mini CI flows. |
| .ci/scripts/test_lora.sh | Disables Xet for LoRA tests that download from HuggingFace Hub. |
| .ci/scripts/test_lora_multimethod.sh | Disables Xet for multimethod LoRA tests that download from HuggingFace Hub. |
| .ci/scripts/test_huggingface_optimum_model.py | Sets HF_HUB_DISABLE_XET in-process before importing libs that may trigger HF downloads. |
| .ci/scripts/export_model_artifact.sh | Disables Xet for model export flows that snapshot-download from HuggingFace Hub. |
| .ci/scripts/download_hf_hub.sh | Disables Xet for all HF downloads performed via this helper script. |
Comments suppressed due to low confidence (1)
.ci/scripts/export_model_artifact.sh:74
- This script exports HF_HUB_DISABLE_XET before enabling
set -u, but it only validates$1and then later reads$2(HF model) unconditionally. Running with a missing hf_model will fail with an unbound variable error, and the earlier error message also refers to the wrong argument. Consider validating both required args (device + hf_model) beforeset -uand updating the error message accordingly.
# Disable HF Xet storage to avoid stalled downloads on CI runners
export HF_HUB_DISABLE_XET=1
set -eux
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| script: | | ||
| set -eux | ||
| # Disable HF Xet storage to avoid stalled downloads on CI runners | ||
| export HF_HUB_DISABLE_XET=1 | ||
|
|
| #!/bin/bash | ||
|
|
||
| # Disable HF Xet storage to avoid stalled downloads on CI runners | ||
| export HF_HUB_DISABLE_XET=1 | ||
|
|
|
Failures seem unrelated. |
HuggingFace's Xet storage backend stalls mid-download on CI runners, causing 90-minute job timeouts. Set HF_HUB_DISABLE_XET=1 in every CI script and workflow that downloads from HuggingFace to force standard HTTP downloads instead.