Skip to content

ci: fix macOS PyTorch wheel cache key for branch-ref pins#19350

Merged
rascani merged 3 commits into
pytorch:mainfrom
rascani:fix-macos-torch-cache-key
May 7, 2026
Merged

ci: fix macOS PyTorch wheel cache key for branch-ref pins#19350
rascani merged 3 commits into
pytorch:mainfrom
rascani:fix-macos-torch-cache-key

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 6, 2026

Summary

install_pytorch_and_domains constructs the cached-wheel URL using ${TORCH_VERSION:0:7}, which gives "release" when the pin is a branch ref like release/2.11. The upload code uses the basename of dist/*.whl, which is whatever PyTorch's setup.py wrote — always the resolved commit hash (e.g. +git70d99e9). The two never match, so every macOS run misses the cache and does a ~30-minute source build even though the wheel for the current pin's HEAD is already in S3.

Resolve the hash via git rev-parse --short=7 HEAD after git checkout, so download and upload agree. Commit-hash pins are unchanged (the first 7 chars already equaled the resolved hash).

Authored with Claude Code.

Test plan

CI

`install_pytorch_and_domains` constructs the cached-wheel URL using
`${TORCH_VERSION:0:7}`, which gives "release" when the pin is a
branch ref like `release/2.11`. The upload code uses the basename of
`dist/*.whl`, which is whatever PyTorch's setup.py wrote — always the
resolved commit hash (e.g. `+git70d99e9`). The two never match, so
every macOS run misses the cache and does a ~30-minute source build
even though the wheel for the current pin's HEAD is already in S3.

Resolve the hash via `git rev-parse --short=7 HEAD` after `git
checkout`, so download and upload agree. Commit-hash pins are
unchanged (the first 7 chars already equaled the resolved hash).

Authored with Claude Code.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19350

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 3 Cancelled Jobs

As of commit 9df736e with merge base 1643611 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Comment thread .ci/scripts/utils.sh

# Found no such wheel, we will build it from source then
if [[ "${torch_wheel_not_found}" == "1" ]]; then
echo "No cached wheel found, continue with building PyTorch at ${TORCH_VERSION}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't hit this anymore?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still possible to hit this if there is no cached wheel yet for a given version or when executing on a GitHub hosted runner that does not have AWS access.

Copy link
Copy Markdown
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to make sure we don't accidently regress in the future?

Add a sanity check that runs after `python setup.py bdist_wheel` and
compares the built wheel basename against the cache URL we'd
reconstruct on the next run. If they diverge — e.g. someone changes
the torch_wheel_name template, or PyTorch's setup.py renames its
wheels — fail loudly with a pointer to the function that needs
fixing, rather than silently miss the cache forever.

Catches the same class of regression that produced the original
"+gitrelease" bug fixed in the parent commit (PR pytorch#19350), on the
very next CI run that hits the source-build path.

Authored with Claude Code.
@rascani
Copy link
Copy Markdown
Contributor Author

rascani commented May 7, 2026

Any way to make sure we don't accidently regress in the future?

Added a sanity check to make sure the wheel names match.

@rascani rascani merged commit fa857bd into pytorch:main May 7, 2026
324 of 328 checks passed
@rascani rascani deleted the fix-macos-torch-cache-key branch May 7, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants