You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add MI355X GPU support for AMD GitHub runner
Add MI355X to GitHubGPU enum, GPU_TO_SM mapping, and github launcher
runner routing with runner label mia1-p02-g29.
* Use amd-runner Docker container for MI355X workflow
Add container image ghcr.io/gpu-mode/amd-runner:main with GPU device
passthrough to amd_workflow.yml. Add numpy to AMD_REQUIREMENTS.
* Update AMD Dockerfile: ROCm 7.2, latest aiter, remove multi-GPU deps
- Upgrade ROCm from 6.3.1 to 7.2
- Upgrade PyTorch to nightly rocm7.2
- Update aiter to latest commit (f3be04a) for recent FP4 kernel APIs
- Remove UCX, OpenMPI, and rocSHMEM builds (no longer needed)
* Update AMD_REQUIREMENTS to use ROCm 7.2 nightly index
* Fix container permissions: run as root for GitHub Actions compatibility
* Revert "Update AMD_REQUIREMENTS to use ROCm 7.2 nightly index"
This reverts commit bb5f2ee.
* Revert "Update AMD Dockerfile: ROCm 7.2, latest aiter, remove multi-GPU deps"
This reverts commit bdc4523.
* Simplify AMD workflow for MI355X: use container deps, skip requirements install
* Reapply "Update AMD Dockerfile: ROCm 7.2, latest aiter, remove multi-GPU deps"
This reverts commit e09a2cd.
* Update AMD Dockerfile to ROCm 7.1 stable, latest aiter, remove multi-GPU deps
- Upgrade ROCm from 6.3.1 to 7.1 (stable, matches host ROCm 7.0.1)
- Use stable torch 2.10.0+rocm7.1 instead of nightly
- Update aiter to latest commit (f3be04a) for recent FP4 kernel APIs
- Remove UCX, OpenMPI, and rocSHMEM builds
* Use mia1-p02-g29 runner to build AMD Docker image
* Add workspace cleanup step before checkout in AMD Docker build
Fixes EACCES errors from root-owned files left by previous container runs.
* Remove workspace cleanup step from AMD Docker build
* Use GITHUB_TOKEN instead of PUBLISH_TOKEN for ghcr.io login
* Fix Dockerfile for Ubuntu 24.04 (Noble) base image
- Replace python3.10 packages with python3 equivalents
- Use noble ROCm package instead of jammy
- Add --break-system-packages for pip on Noble
- Remove git-core PPA (not needed on Noble)
- Remove linux-headers install (not available during build)
* Remove pip upgrade step (incompatible with Noble system pip)
* Use amd-runner:mi355 Docker image with working aiter + ROCm
* Fix pip install: add --break-system-packages for container environment
* Update amd-docker.Dockerfile
* Set minimum GitHub timeout to DEFAULT_GITHUB_TIMEOUT_MINUTES
Ensures the workflow timeout is at least 30 minutes to account for
Docker image pulls and container initialization on new runners.
0 commit comments