[ROCm] Drop gfx900 (MI25) and gfx906 (MI50/MI60) support#847
[ROCm] Drop gfx900 (MI25) and gfx906 (MI50/MI60) support#847nurmukhametov wants to merge 6 commits into
Conversation
| static constexpr absl::string_view kList[] = {"gfx900", "gfx906"}; | ||
| return !IsThisGfxInAnyList(kList); | ||
| } | ||
| bool fence_before_barrier() const { return true; } |
There was a problem hiding this comment.
I don't think we need this anymore. Regarding in TF, please have a grep check there. We shouldn't use it anymore neither IIRC.
| desc.set_fpus_per_core(fpus_per_core(gcn_arch_name)); | ||
| // Source: | ||
| // https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf | ||
| desc.set_fpus_per_core(128); |
There was a problem hiding this comment.
Instead of hard coding, I guess you can adjust this properly. @Eetusjo investigated this before https://github.com/ROCm/frameworks-internal/issues/15846#issuecomment-4073834863
There was a problem hiding this comment.
I have added todo here, because I think a proper fix is out of scope of this PR.
|
@i-chaochen What bothers me is that TheRock seems to have gfx900 and gfx906 builds https://therock-hud-dev.amd.com/ |
draganmladjenovic
left a comment
There was a problem hiding this comment.
Until we figure it out if TheRock needs this.
No, but JAX and us never commit we will have gfx900 and gfx906 build, there is no gfx900 and gfx906 on JAX/Pytorch build neither. https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html |
|
Yes, thanks for @jamestangg found this table https://github.com/ROCm/TheRock/blob/main/SUPPORTED_GPUS.md
Since they just need a build pass, I don't think it's related to what we rm the support here in XLA. We should be no problem to upstream this drop? |
I am ready to upstream this. Any objections, @draganmladjenovic ? |
| // TODO(ROCm): replace this hardcoded value with a per-arch lookup table and | ||
| // populate scalar_unit_description / matrix_unit_description so the perf | ||
| // model picks the right FP32 path (vector vs. matrix). | ||
| desc.set_fpus_per_core(128); |
There was a problem hiding this comment.
nit (pre-existing, not introduced by this PR): The hardcoded 128 is correct for all remaining CDNA architectures (gfx908, gfx90a), but RDNA architectures (gfx1030, gfx1100, gfx1101, gfx1200, gfx1201, gfx1250) have 64 FP32 stream processors per CU. The old fpus_per_core() function also defaulted to 128 for everything except gfx906, so this preserves existing behavior — just noting that the TODO here is still load-bearing for RDNA correctness.
Claude Review SummaryClean, well-scoped PR. Drops gfx900 (MI25) and gfx906 (MI50/MI60) support consistently across the codebase — supported versions list, One informational note left inline regarding pre-existing technical debt in the |
- Introduce `hlo_query::IsStandardAssociativeScan` to identify standard forward associative scans (single input/init, returning an `(output, carry)` tuple where the `carry` output is unused). - Use this new helper in `ReduceWindowRewriter` and GPU `ScanRewriter` to simplify scan matching. - In `ReduceWindowRewriter`, instead of skipping short scans, rewrite them directly into a single `kReduceWindow` operation. - Remove the complex carry computation logic in `ReduceWindowRewriter`, as `IsStandardAssociativeScan` guarantees the carry output is dead. PiperOrigin-RevId: 915338116
PiperOrigin-RevId: 915357414
When LayoutAssignment clones computations for conditional branches, the internal call graph needs to be rebuilt to include the newly cloned computations. PiperOrigin-RevId: 915366372
d4689d4 to
df45abc
Compare

[Do not merge]
Motivation
For internal review.
Submission Checklist