Skip to content

[Codegen] Add phase group information into a new enum for shared mem info#24471

Open
Muzammiluddin-Syed-ECE wants to merge 10 commits into
mainfrom
users/muzasyed/sharedmem
Open

[Codegen] Add phase group information into a new enum for shared mem info#24471
Muzammiluddin-Syed-ECE wants to merge 10 commits into
mainfrom
users/muzasyed/sharedmem

Conversation

@Muzammiluddin-Syed-ECE
Copy link
Copy Markdown
Contributor

This is the first of two PRs meant to improve the xor shuffling strategy for resolving bank conflicts. See: #24428

This change does two things:

  1. Adds phase group information for CDNA4 arch, allowing us to more rigorously simulate bank conflicts.
  2. Introduces a new enum "SharedMemoryModel" behind which we move all shared memory related information such as number of banks and phase group info.

Copy link
Copy Markdown
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the shared mem model so coarse-grained? For example, the number of banks changed between cdna3 and cdna4.

@Yu-Zhewen
Copy link
Copy Markdown
Contributor

I was reading the original PR comment: #24428 (comment). Just wondering, would it be a better choice to keep bank_count, bank_width, and phase_schedule as orthogonal fields? Bundling them into one enum would mean adding a new SharedMemoryModel case for every change, which might feel like a bit ofburden

@Muzammiluddin-Syed-ECE
Copy link
Copy Markdown
Contributor Author

Muzammiluddin-Syed-ECE commented May 13, 2026

Why is the shared mem model so coarse-grained? For example, the number of banks changed between cdna3 and cdna4.

The coarseness reflects how little the shared mem model changes in the features it tracks across architectures.

// CDNA1-3: 32 banks, contiguous phase scheduling.
def IREEGPU_SharedMemModelCDNA
    : I32EnumAttrCase<"CDNA", 1, "cdna">;
// CDNA4 (gfx950): 64 banks, non-contiguous 4-phase scheduling for
// ds_read_b128. Phase table determined by empirical measurement.
def IREEGPU_SharedMemModelCDNA4
    : I32EnumAttrCase<"CDNA4", 2, "cdna4">;

For your example, there is only a real distinction in the cdna line between cdna1-3 and cdna4 so only cdna4 is given its own enum.

edit: Thinking on it longer, we don't lose anything by having an entry per arch and gain clarity. ill just make it more fine-grained.

@Muzammiluddin-Syed-ECE
Copy link
Copy Markdown
Contributor Author

I was reading the original PR comment: #24428 (comment). Just wondering, would it be a better choice to keep bank_count, bank_width, and phase_schedule as orthogonal fields? Bundling them into one enum would mean adding a new SharedMemoryModel case for every change, which might feel like a bit ofburden

Because these fields represent hardware details, they are extremely stable and shouldn't change beyond adding new entries for every new piece of hardware.

Comment thread compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUEnums.td
Comment thread compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.cpp Outdated
Comment thread compiler/src/iree/compiler/Codegen/Dialect/GPU/TargetUtils/KnownTargets.cpp Outdated
// Write operations (ds_write_*) may have different phase scheduling, but
// we haven't needed to handle write-side bank conflicts yet.
std::optional<SmallVector<SmallVector<int64_t>>>
getPhaseGroups(SharedMemoryModel model, int64_t readBytes, int64_t numThreads) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename it to indicate these are read phase groups? Writes have their own groups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather add a todo to add an access type function argument like AccessType rw where AccessType is an enum with values {READ, WRITE}.

getPhaseGroups(SharedMemoryModel model, int64_t readBytes, int64_t numThreads, AccessType rw)

…ry information

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants