Skip to content

Commit 2e578c6

Browse files
committed
rebased
Signed-off-by: xintin <gaurav.verma@amd.com>
1 parent 2c03c78 commit 2e578c6

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

wave_lang/kernel/compiler/wave_codegen/read_write.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1382,6 +1382,11 @@ def _write_permlane_pack_to_global(
13821382
duplicate store), avoiding divergent control flow. The buffer
13831383
descriptor's ``valid_bytes`` handles out-of-bounds suppression.
13841384
1385+
TODO: Eliminate duplicate stores by using both outputs of
1386+
``permlane16_swap``, letting each lane write the partner's assembled
1387+
data to the partner's destination address so every lane performs a
1388+
unique store.
1389+
13851390
Preconditions:
13861391
- The kernel must use swapped MFMA operands (B as LHS, A as RHS)
13871392
so the accumulator's 4-contiguous values align with the output

0 commit comments

Comments
 (0)