Skip to content

[stinkytofu] Expand RawAsmParser coverage for custom operands, MX modifiers, and DPP#7097

Open
darrenhsieh-amd wants to merge 3 commits into
developfrom
users/darrenh/rawasmparser-coverage
Open

[stinkytofu] Expand RawAsmParser coverage for custom operands, MX modifiers, and DPP#7097
darrenhsieh-amd wants to merge 3 commits into
developfrom
users/darrenh/rawasmparser-coverage

Conversation

@darrenhsieh-amd
Copy link
Copy Markdown
Contributor

@darrenhsieh-amd darrenhsieh-amd commented May 6, 2026

Motivation

Instructions with custom operand syntax, MX matrix format modifiers, or DPP encodings silently fall through to TEXTBLOCK in RawAsmParser. As TEXTBLOCK, they are opaque to all optimization passes. This PR fixes the parser to recognize these instruction patterns so they can be properly analyzed and transformed.

Technical Details

Per-field operand dispatch for custom syntax instructions

  • Refactor operand parsing from a generic while-loop to a per-field for-loop over operandFields, dispatching by FieldType
  • Add dedicated parsers: parseDelayAluSyntax(), parseWaitAluSyntax(), parseHwregOperand()
  • Enables mixed-type instructions (e.g. s_setreg_IMM32_b32 with hwreg + simm32 operands)
  • Unhandled custom FieldTypes fall back to TEXTBLOCK gracefully

Wire MX matrix format modifiers

  • Map MC_VOP3PX2/MC_VOP3PX3 microcode formats to mod.mfma modifier namespace
  • Store key:Identifier modifier values (e.g. matrix_a_fmt:MATRIX_FMT_FP8) in generic fields collection instead of marking as unrepresentable
  • Add mod.mfma deserialize support for matrix_a_fmt, matrix_b_fmt, matrix_a_scale_fmt, matrix_b_scale_fmt, matrix_a_reuse, matrix_b_reuse

Add DPP modifier support

  • Add field-based DPP detection via hasDPPFields() — after collecting modifier tokens, assigns mod.dpp when DPP keys are present, avoiding interference with non-DPP VOP modifiers
  • Add dppCtrl string field to DPPModifiers for unmodelled DPP modes (row_xmask, row_shl, row_ror, etc.)
  • Add mod.dpp deserialize support in ModifierSerializer

Test Plan

ctest pass
filechecks pass
stinkytofu-opt --arch gfx1250 --emit-asm round-trip correct

Submission Checklist

…tions

Instructions with custom operand syntax (s_delay_alu, s_wait_alu,
s_setreg_IMM32_b32) were falling to TEXTBLOCK because the operand
parser used a generic while-loop that called parseOneRegister on
every operand, regardless of FieldType.

Refactor the operand parsing from a while-loop to a per-field for-loop
over operandFields, dispatching each field by FieldType via
parseCustomOperand():
- FieldType::delay -> parseDelayAluSyntax()
- FieldType::wait_alu -> parseWaitAluSyntax()
- FieldType::hwreg -> parseHwregOperand()
- register/immediate -> parseOneRegister with existing recovery logic

Custom operand parsers are called directly in the operand loop,
consistent with their role as operands rather than trailing modifiers.
Uses field.isDest directly from operandFields instead of pre-counted
numDest.

This also enables mixed-type instructions like s_setreg_IMM32_b32
(hwreg + simm32) where custom and register operands coexist.
v_wmma_scale_* instructions with matrix_a_fmt/matrix_b_fmt modifiers
were falling to TEXTBLOCK because their microcode format (MC_VOP3PX2/
MC_VOP3PX3) had no modKey mapping, and key:Identifier modifier values
(e.g. matrix_a_fmt:MATRIX_FMT_FP8) were unconditionally marked as
sawUnrepresentable.

Add MC_VOP3PX2/MC_VOP3PX3 to the parseModifiers microcode switch,
mapping to "mod.mfma" (same modifier path as the rocisa pipeline).
Store key:Identifier modifier values in the generic fields collection
instead of marking unrepresentable — formats without a modifier
namespace are caught by the existing modKey.empty() check.

Add mod.mfma handler for matrix_a_fmt, matrix_b_fmt, matrix_a_reuse,
matrix_b_reuse. Build inputPermute from individual fields in
ModifierSerializer::deserialize, consistent with the rocisa pipeline
path (ToStinkyTofuUtils::extractMatrixFormatStr).
VOP instructions with DPP modifiers (row_xmask, row_shl, row_ror,
etc.) were falling to TEXTBLOCK because VOP microcode formats have
no modKey mapping. DPP is an add-on encoding variant layered on top
of the base VOP format — the assembler picks a wider encoding when
DPP modifiers are present.

Add field-based DPP detection after modifier token collection via
hasDPPFields() helper. When modKey is empty and DPP fields are
present, assign modKey = "mod.dpp" post-hoc. This avoids mapping all
VOP formats to DPP which would interfere with non-DPP VOP modifiers.

Add dppCtrl string field to DPPModifiers for unmodelled DPP modes
(row_xmask, row_shl, etc.). Existing typed fields (row_shr,
row_bcast, bound_ctrl) preserved for rocisa pipeline compatibility.

Add mod.dpp deserialize support in ModifierSerializer.
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7097   +/-   ##
========================================
  Coverage    65.26%   65.26%           
========================================
  Files         2083     2083           
  Lines       323506   323506           
  Branches     42438    42438           
========================================
  Hits        211111   211111           
  Misses       94821    94821           
  Partials     17574    17574           
Flag Coverage Δ *Carryforward flag
hipBLAS 90.65% <ø> (ø) Carriedforward from 352e78c
hipBLASLt 39.86% <ø> (ø)
hipCUB 82.21% <ø> (ø) Carriedforward from 352e78c
hipDNN 85.51% <ø> (ø) Carriedforward from 352e78c
hipFFT 56.25% <ø> (ø) Carriedforward from 352e78c
hipRAND 76.12% <ø> (ø) Carriedforward from 352e78c
hipSOLVER 69.24% <ø> (ø) Carriedforward from 352e78c
hipSPARSE 84.70% <ø> (ø) Carriedforward from 352e78c
rocBLAS 48.11% <ø> (ø) Carriedforward from 352e78c
rocFFT 48.18% <ø> (ø) Carriedforward from 352e78c
rocPRIM 38.99% <ø> (ø) Carriedforward from 352e78c
rocRAND 57.03% <ø> (ø) Carriedforward from 352e78c
rocSOLVER 77.83% <ø> (ø) Carriedforward from 352e78c
rocSPARSE 72.82% <ø> (ø) Carriedforward from 352e78c

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@darrenhsieh-amd
Copy link
Copy Markdown
Contributor Author

testing asm for the stinkytofu-opt round-trip:

/******************************************/
/* Begin Kernel                           */
/******************************************/
.amdgcn_target "amdgcn-amd-amdhsa--gfx1250"
.text
.protected test_3commits
.globl test_3commits
.p2align 8
.type test_3commits,@function
.section .rodata,#alloc
.p2align 6
.amdhsa_kernel test_3commits
  .amdhsa_user_sgpr_kernarg_segment_ptr 1
  .amdhsa_next_free_vgpr 256
  .amdhsa_next_free_sgpr 32
  .amdhsa_group_segment_fixed_size 32768
  .amdhsa_wavefront_size32 1
  .amdhsa_private_segment_fixed_size 0
  .amdhsa_system_sgpr_workgroup_id_x 1
  .amdhsa_system_sgpr_workgroup_id_y 1
  .amdhsa_system_sgpr_workgroup_id_z 1
  .amdhsa_system_vgpr_workitem_id 0
  .amdhsa_float_denorm_mode_32 3
  .amdhsa_float_denorm_mode_16_64 3
.end_amdhsa_kernel
.text

test_3commits:

label_Begin:

/******************************************/
/* Commit 1: s_delay_alu custom operands  */
/******************************************/
s_mov_b32 s0, 0
s_delay_alu instid0(SALU_CYCLE_1)
s_mov_b32 s1, s0
s_delay_alu instid0(SALU_CYCLE_2)
s_add_u32 s2, s0, s1
v_mov_b32 v0, s0
s_delay_alu instid0(VALU_DEP_1)
v_add_nc_u32 v1, v0, 1
s_delay_alu instid0(VALU_DEP_2)
v_add_nc_u32 v2, v1, v0
s_delay_alu instid0(VALU_DEP_3)
v_mov_b32 v3, v2
s_delay_alu instid0(VALU_DEP_4)
v_mov_b32 v4, v3

/******************************************/
/* Commit 1: s_wait_alu custom operands   */
/******************************************/
s_wait_alu depctr_va_vdst(0)
s_wait_alu depctr_sa_sdst(0)
s_wait_alu depctr_vm_vsrc(0)

/******************************************/
/* Commit 1: s_setreg hwreg custom opnd   */
/******************************************/
s_setreg_IMM32_b32 hwreg(26,4,1), 1
s_setreg_IMM32_b32 hwreg(26,0,2), 1
s_setreg_IMM32_b32 hwreg(26,0,2), 0

/******************************************/
/* Commit 2: MX matrix format modifiers   */
/******************************************/
v_wmma_scale_f32_16x16x128_f8f6f4 v[24:31], v[32:47], v[48:55], 0, v0, v1 matrix_a_fmt:MATRIX_FMT_FP8 matrix_b_fmt:MATRIX_FMT_FP4
v_wmma_scale_f32_16x16x128_f8f6f4 v[56:63], v[64:79], v[80:87], 0, v2, v3 matrix_a_fmt:MATRIX_FMT_FP8 matrix_b_fmt:MATRIX_FMT_FP4 matrix_b_reuse
v_wmma_scale_f32_16x16x128_f8f6f4 v[88:95], v[96:111], v[112:119], 0, v4, v5 matrix_a_fmt:MATRIX_FMT_FP8 matrix_b_fmt:MATRIX_FMT_FP8 matrix_a_reuse

/******************************************/
/* Commit 3: DPP modifiers                */
/******************************************/
v_mov_b32 v10, v11 row_xmask:8
v_mov_b32 v12, v13 row_shr:4
v_add_f32 v14, v15, v16 row_shr:1
v_mov_b32 v17, v18 row_shl:2
v_mov_b32 v19, v20 row_bcast:15

/******************************************/
/* Mixed: normal instructions around them */
/******************************************/
s_mov_b32 s3, 42
v_add_nc_u32 v21, v10, v12
s_delay_alu instid0(VALU_DEP_1)
v_add_nc_u32 v22, v21, v14

s_endpgm

label_End:

Comment thread shared/stinkytofu/src/serialization/asm/ModifierSerializer.cpp
Comment thread shared/stinkytofu/src/serialization/asm/ModifierSerializer.cpp
Copy link
Copy Markdown
Contributor

@cycheng cycheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some notes.

Comment thread shared/stinkytofu/include/stinkytofu/ir/asm/StinkyModifiers.hpp
if (fields.count("glc")) modFields["glc"] = "true";
if (fields.count("nv")) modFields["nv"] = "true";

} else if (modKey == "mod.dpp") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there is a future TODO:
We now have 4 conversions for modifier:

  1. StinkyModifier to assembly string (asm emitter)
  2. StinkyModifier to stinkytofu IR string (modifier serializer)
  3. assembly string to StinkyModifier (raw asm parser)
  4. stinkytofu IR string to StinkyModifier (modifier de-serializer)

4 for instruction:

  1. StinkyInstruction to assembly string (asm emitter)
  2. StinkyInstruction to stinkytofu IR string (IRConverter)
  3. assembly string to StinkyInstruction (raw asm parser)
  4. stinkytofu IR string to StinkyInstruction (IRConverter)

Maybe we could define a way that can handle serializer/de-serializer in a unified way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants