We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 2c03c78 commit 2e578c6Copy full SHA for 2e578c6
1 file changed
wave_lang/kernel/compiler/wave_codegen/read_write.py
@@ -1382,6 +1382,11 @@ def _write_permlane_pack_to_global(
1382
duplicate store), avoiding divergent control flow. The buffer
1383
descriptor's ``valid_bytes`` handles out-of-bounds suppression.
1384
1385
+ TODO: Eliminate duplicate stores by using both outputs of
1386
+ ``permlane16_swap``, letting each lane write the partner's assembled
1387
+ data to the partner's destination address so every lane performs a
1388
+ unique store.
1389
+
1390
Preconditions:
1391
- The kernel must use swapped MFMA operands (B as LHS, A as RHS)
1392
so the accumulator's 4-contiguous values align with the output
0 commit comments