Skip to content

[ROCm] Drop gfx900 (MI25) and gfx906 (MI50/MI60) support#847

Open
nurmukhametov wants to merge 6 commits into
mainfrom
anurmukh/remove-gfx900-gfx906
Open

[ROCm] Drop gfx900 (MI25) and gfx906 (MI50/MI60) support#847
nurmukhametov wants to merge 6 commits into
mainfrom
anurmukh/remove-gfx900-gfx906

Conversation

@nurmukhametov
Copy link
Copy Markdown
Member

@nurmukhametov nurmukhametov commented May 6, 2026

[Do not merge]

Motivation

For internal review.

Submission Checklist

static constexpr absl::string_view kList[] = {"gfx900", "gfx906"};
return !IsThisGfxInAnyList(kList);
}
bool fence_before_barrier() const { return true; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this anymore. Regarding in TF, please have a grep check there. We shouldn't use it anymore neither IIRC.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it.

desc.set_fpus_per_core(fpus_per_core(gcn_arch_name));
// Source:
// https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf
desc.set_fpus_per_core(128);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding, I guess you can adjust this properly. @Eetusjo investigated this before https://github.com/ROCm/frameworks-internal/issues/15846#issuecomment-4073834863

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added todo here, because I think a proper fix is out of scope of this PR.

@nurmukhametov nurmukhametov requested a review from i-chaochen May 7, 2026 14:34
@nurmukhametov nurmukhametov added the claude-review Request a Claude AI code review for this PR label May 8, 2026
@draganmladjenovic
Copy link
Copy Markdown

@i-chaochen What bothers me is that TheRock seems to have gfx900 and gfx906 builds https://therock-hud-dev.amd.com/

Copy link
Copy Markdown

@draganmladjenovic draganmladjenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we figure it out if TheRock needs this.

@i-chaochen
Copy link
Copy Markdown
Collaborator

i-chaochen commented May 8, 2026

@i-chaochen What bothers me is that TheRock seems to have gfx900 and gfx906 builds https://therock-hud-dev.amd.com/

No, but JAX and us never commit we will have gfx900 and gfx906 build, there is no gfx900 and gfx906 on JAX/Pytorch build neither.

https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html
https://github.com/ROCm/rocm-jax/blob/master/stack.py#L20
https://github.com/ROCm/TheRock/blob/main/.github/workflows/test_pytorch_wheels_full.yml

@i-chaochen
Copy link
Copy Markdown
Collaborator

i-chaochen commented May 11, 2026

Yes, thanks for @jamestangg found this table https://github.com/ROCm/TheRock/blob/main/SUPPORTED_GPUS.md

image

Since they just need a build pass, I don't think it's related to what we rm the support here in XLA.

We should be no problem to upstream this drop?

@draganmladjenovic @nurmukhametov

@nurmukhametov
Copy link
Copy Markdown
Member Author

Since they just need a build pass, I don't think it's related to what we rm the support here in XLA.

We should be no problem to upstream this drop?

@draganmladjenovic @nurmukhametov

I am ready to upstream this. Any objections, @draganmladjenovic ?

@nurmukhametov nurmukhametov added claude-review Request a Claude AI code review for this PR and removed claude-review Request a Claude AI code review for this PR labels May 13, 2026
Comment on lines +1183 to +1186
// TODO(ROCm): replace this hardcoded value with a per-arch lookup table and
// populate scalar_unit_description / matrix_unit_description so the perf
// model picks the right FP32 path (vector vs. matrix).
desc.set_fpus_per_core(128);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (pre-existing, not introduced by this PR): The hardcoded 128 is correct for all remaining CDNA architectures (gfx908, gfx90a), but RDNA architectures (gfx1030, gfx1100, gfx1101, gfx1200, gfx1201, gfx1250) have 64 FP32 stream processors per CU. The old fpus_per_core() function also defaulted to 128 for everything except gfx906, so this preserves existing behavior — just noting that the TODO here is still load-bearing for RDNA correctness.

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude Review Summary

Clean, well-scoped PR. Drops gfx900 (MI25) and gfx906 (MI50/MI60) support consistently across the codebase — supported versions list, fence_before_barrier() removal, fpus_per_core() inlining, and test updates are all complete with no orphaned references remaining. The removal aligns with AMD's current ROCm compatibility matrix.

One informational note left inline regarding pre-existing technical debt in the fpus_per_core hardcoding for RDNA architectures.

@github-actions github-actions Bot removed the claude-review Request a Claude AI code review for this PR label May 13, 2026
KanishAnand and others added 6 commits May 14, 2026 03:34
PiperOrigin-RevId: 915336621
PiperOrigin-RevId: 915338077
- Introduce `hlo_query::IsStandardAssociativeScan` to identify standard forward associative scans (single input/init, returning an `(output, carry)` tuple where the `carry` output is unused).
- Use this new helper in `ReduceWindowRewriter` and GPU `ScanRewriter` to simplify scan matching.
- In `ReduceWindowRewriter`, instead of skipping short scans, rewrite them directly into a single `kReduceWindow` operation.
- Remove the complex carry computation logic in `ReduceWindowRewriter`, as `IsStandardAssociativeScan` guarantees the carry output is dead.

PiperOrigin-RevId: 915338116
PiperOrigin-RevId: 915357414
When LayoutAssignment clones computations for conditional branches, the internal call graph needs to be rebuilt to include the newly cloned computations.

PiperOrigin-RevId: 915366372
@nurmukhametov nurmukhametov force-pushed the anurmukh/remove-gfx900-gfx906 branch from d4689d4 to df45abc Compare May 14, 2026 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants