[ROCm] Fix gpt_oss test suite: AITER attention fallback, SonicMoE guard, and distributed test fixes#46160
Open
Abdennacer-Badaoui wants to merge 4 commits into
Open
[ROCm] Fix gpt_oss test suite: AITER attention fallback, SonicMoE guard, and distributed test fixes#46160Abdennacer-Badaoui wants to merge 4 commits into
Abdennacer-Badaoui wants to merge 4 commits into
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Member
Author
|
I aslo fixed distributed worker for current TP API
|
Contributor
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gpt_oss |
Contributor
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46160&sha=089a66 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
Hub kernels like
kernels-community/vllm-flash-attn3ship CUDA-only wheels. On ROCm, loading them raisesValueError: Cannot find a build variant, which causedGptOssModelTest::test_eager_matches_fa2_generate(among many others) to crash.This PR adds a ROCm-specific fallback in
_lazy_imports: when a model routes to a hub kernel and we're on ROCm with AITER installed, we use AMD's AITER Triton MHA kernel instead of attempting to fetch the hub wheel. AITER supports the same interface including learnable attention sinks (s_aux) used bygpt_oss. The CUDA path is completely unchanged.Changes:
integrations/aiter_flash_attention.py— thin wrappers aroundaiter.ops.triton.attention.mha.flash_attn_func/flash_attn_varlen_func, mappings_aux -> sinkmodeling_flash_attention_utils.py— ROCm+AITER branch in the kernel fallback section of_lazy_imports; registers the attention function under the hub kernel name so model forward resolves it correctlyutils/import_utils.py— addsis_aiter_available()docker/transformers-pytorch-amd-gpu/Dockerfile— pins the AITER wheel matching the ROCm 7.2 base imagetests/models/gpt_oss/test_modeling_gpt_oss.py— skips the hub-kernel availability check on ROCm+AITER intest_default_flash_implementation_auto_correctionSonicMoE ROCm guard : AMD MI300X GPUs return
torch.cuda.get_device_capability() >= (9, 0)via the CUDA compatibility layer, causingsonicmoeto be included in expert implementations on ROCm even thoughkernels-community/sonic-moehas no ROCm build. This PR addsis_rocm_platform()guards in_load_sonicmoe_kernel()andtest_modeling_common.pyto fully disable SonicMoE on ROCm.Tested on: ROCm 7.2.2 / MI300X / Python 3.10 /
torch 2.10.0+rocm7.2.2This fixes ~140 failing
gpt_osstests on ROCm.