ci: fix macOS PyTorch wheel cache key for branch-ref pins#19350
ci: fix macOS PyTorch wheel cache key for branch-ref pins#19350rascani merged 3 commits intopytorch:mainfrom
Conversation
`install_pytorch_and_domains` constructs the cached-wheel URL using
`${TORCH_VERSION:0:7}`, which gives "release" when the pin is a
branch ref like `release/2.11`. The upload code uses the basename of
`dist/*.whl`, which is whatever PyTorch's setup.py wrote — always the
resolved commit hash (e.g. `+git70d99e9`). The two never match, so
every macOS run misses the cache and does a ~30-minute source build
even though the wheel for the current pin's HEAD is already in S3.
Resolve the hash via `git rev-parse --short=7 HEAD` after `git
checkout`, so download and upload agree. Commit-hash pins are
unchanged (the first 7 chars already equaled the resolved hash).
Authored with Claude Code.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19350
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Cancelled JobsAs of commit 9df736e with merge base 1643611 ( NEW FAILURE - The following job has failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
|
||
| # Found no such wheel, we will build it from source then | ||
| if [[ "${torch_wheel_not_found}" == "1" ]]; then | ||
| echo "No cached wheel found, continue with building PyTorch at ${TORCH_VERSION}" |
There was a problem hiding this comment.
I guess we don't hit this anymore?
There was a problem hiding this comment.
Its still possible to hit this if there is no cached wheel yet for a given version or when executing on a GitHub hosted runner that does not have AWS access.
digantdesai
left a comment
There was a problem hiding this comment.
Any way to make sure we don't accidently regress in the future?
Add a sanity check that runs after `python setup.py bdist_wheel` and compares the built wheel basename against the cache URL we'd reconstruct on the next run. If they diverge — e.g. someone changes the torch_wheel_name template, or PyTorch's setup.py renames its wheels — fail loudly with a pointer to the function that needs fixing, rather than silently miss the cache forever. Catches the same class of regression that produced the original "+gitrelease" bug fixed in the parent commit (PR pytorch#19350), on the very next CI run that hits the source-build path. Authored with Claude Code.
Added a sanity check to make sure the wheel names match. |
Summary
install_pytorch_and_domainsconstructs the cached-wheel URL using${TORCH_VERSION:0:7}, which gives "release" when the pin is a branch ref likerelease/2.11. The upload code uses the basename ofdist/*.whl, which is whatever PyTorch's setup.py wrote — always the resolved commit hash (e.g.+git70d99e9). The two never match, so every macOS run misses the cache and does a ~30-minute source build even though the wheel for the current pin's HEAD is already in S3.Resolve the hash via
git rev-parse --short=7 HEADaftergit checkout, so download and upload agree. Commit-hash pins are unchanged (the first 7 chars already equaled the resolved hash).Authored with Claude Code.
Test plan
CI