Skip to content

Test coverage gaps: recent MIGraphX lowering passes and CI flakiness (Mar 2026) #2297

@cursor

Description

@cursor

Context

This is an automated test-coverage analysis triggered by PR #2295 (marking large-kernel-no-scavenge.mlir as XFAIL due to intermittent lowering failure). While that PR is a CI workaround, it and the surrounding recent merges expose several meaningful test coverage gaps across new lowering passes and bug fixes. Below are the areas most worth hardening, ordered by blast radius.


1. 🔥 arith.maximumf/minimumf — No behavioral test for the "don't expand" pipeline option

Commit: e40a31807f51[EXTERNAL] Stop expanding float min/max ops

What changed: Pipelines.cpp now sets includeFloatMinMax = false so that arith.maximumf / arith.minimumf / arith.maxnumf / arith.minnumf are not expanded into compare-and-select sequences, relying instead on the AMDGPU backend's native v_max_*/v_min_* instructions.

Current test coverage: mlir/test/rocmlir-driver/pipelines.mlir only checks that the printed pipeline string contains include-float-min-max=false. There is no test that:

  • Verifies arith.maximumf is preserved (not expanded) when flowing through the rocMLIR full pipeline.
  • Validates the NaN semantics difference: the old expand path propagates NaN from either operand; the native instruction may have different IEEE handling on specific GFX targets.
  • Checks arith.minnumf / arith.maxnumf (which have "num" semantics — propagate NaN only from lhs).

Risk: If the backend does not handle these ops for a target, compilation fails silently or emits wrong code. NaN-in/NaN-out behavior is a correctness concern for attention masking and fusion kernels.

Suggested tests to add:

File: mlir/test/rocmlir-driver/large-kernel-float-minmax.mlir (new)

// RUN: rocmlir-gen ... | rocmlir-driver --kernel-pipeline=full | FileCheck %s
// CHECK-NOT: arith.cmpf
// CHECK-NOT: arith.select
// CHECK: arith.maximumf

File: mlir/test/Dialect/Arith/expand-ops-amdgpu.mlir (new)

  • A test running arith-expand="include-float-min-max=false" on a function containing all four float min/max ops and verifying they pass through unchanged.
  • A test with a NaN input verifying arith.maximumf(NaN, x) == NaN vs arith.maxnumf(NaN, x) == x when running with include-float-min-max=false.

2. 🔥 Flaky large-kernel-no-scavenge test — Root cause untested

Commit: 5e908079c7daMark large-kernel-no-scavenge as XFAIL

What happened: rocmlir-driver --kernel-pipeline=full intermittently produces empty output (printing Lowering failed to stderr) for a specific conv_bwd_data configuration with --perf_config 'v3:128,64,8,128,64,16,1,1,2,1,1'. The test was XFAIL'd rather than fixed.

Risk: The flakiness signal is being suppressed, not resolved. If the underlying lowering failure is non-deterministic (e.g. a resource race, register pressure corner case, or unhandled fallback in the scavenger-disabled path), it could affect other large convolution or attention kernels in production.

Suggested tests to add:

File: mlir/test/rocmlir-driver/large-kernel-no-scavenge-error.mlir (new)

  • A test that explicitly invokes the same gen command and verifies it does not print Lowering failed to stderr (using FileCheck --implicit-check-not).
  • A deterministic stress test that runs the same command 3 times and checks all three succeed (a shell RUN loop), to surface flakiness early in CI rather than masking it.

Additionally, the Lowering failed error path in rocmlir-driver itself should be tested:

  • Verify that when lowering fails, the driver exits with a non-zero code and prints a diagnostic that includes the operation that failed (not just Lowering failed with no context).

3. migraphx.shaped parser crash fix — Parse-level errors untested

Commit: 839eb350e187AIROCMLIR-546 Fixed parser crash from invalid !migraphx.shaped

What changed: MIXRShapedType::parse() in MIGraphX.cpp now calls parser.emitError() in three places instead of crashing via get() when stride/shape counts mismatch.

Current test coverage: mlir/test/Dialect/MIGraphX/invalid.mlir tests only the verifier (verify()), not the parser (parse()). The three new emitError() call sites are completely untested:

  1. Failure to parse <, dimension list, or element type.
  2. Failure to parse the stride dimension list in a non-scalar shaped type.
  3. Failure to parse the closing >.

Suggested tests to add in mlir/test/Dialect/MIGraphX/invalid.mlir:

// -----
// expected-error @+1 {{expected shaped dimension list with type}}
func.func @bad_parse_missing_gt(%arg: !migraphx.shaped<1xf32) { func.return }

// -----
// expected-error @+1 {{expected `,` and a `x`-separated list}}
func.func @bad_parse_missing_stride(%arg: !migraphx.shaped<1xf32>) { func.return }

// -----
// expected-error @+1 {{expected shaped dimension list with type}}
func.func @bad_parse_garbage(%arg: !migraphx.shaped<garbage>) { func.return }

Why it matters: Without parsing tests, a refactor of parse() could silently remove the error handling and restore the crashing behavior.


4. Broadcasting Linalg lowering — Error paths and edge cases untested

Commit: a8ae8acacbd0[AIROCMLIR-552] Added Broadcasting Linalg Lowering Path

What changed: BroadcastConverter and MultiBroadcastConverter were rewritten in MIGraphXToLinalg.cpp to use linalg.broadcast instead of TOSA.

Current test coverage: Four tests in mixr-to-linalg-ops.mlir: axis=0, 4D multibroadcast, scalar multibroadcast, scalar broadcast.

Gaps:

Missing case Why it matters
broadcastDimensions.empty() in MultiBroadcastConverter (reshape-only, no broadcast needed) This branch is taken when the input and output have the same non-unit dims; never tested
arith::ConstantOp + DenseElementsAttr::isSplat() fast path in MultiBroadcastConverter Splat-constant optimization silently broken if isSplat() ever returns false for a constant
BroadcastConverter with axis > 0 and multi-dimensional input Code conditionally strips trailing 1 dims; only axis=0 and scalar tested
Error path: "cannot convert output type to ranked tensor type" No negative test

Suggested file to update: mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir


5. migraphx.greater / migraphx.equal — Missing type and error coverage

Commit: 712f49ed5447[AIROCMIR-446] Lower migraphx.greater/equal into linalg.generic

What changed: New BooleanElementwiseConverter<Greater> and BooleanElementwiseConverter<Equal> in MIGraphXToLinalg.cpp.

Current test coverage: 5 tests in migraphx-to-linalg-boolean.mlir covering i32, si32 (greater only), f32 for both ops.

Gaps:

  • f16 and bf16 types: these are the dominant compute types in rocMLIR attention and GEMM pipelines; no test verifies arith.cmpf ogt + arith.uitofp works correctly for f16 output.
  • migraphx.equal with si32 input.
  • No test for mismatched input types (should the converter reject or convert?). The code assumes operands share the same element type; if they don't, the linalg.generic body would emit a type error deep in lowering rather than a clear diagnostic.
  • No rank variation tests (rank-1 and rank-4 tensors).

Suggested file to update: mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-boolean.mlir


6. Reshape helper — No-op and collapse-only paths untested

Commit: 529789d99c07[AIROCMLIR-564] Lower migraphx.reshape using helper function

What changed: The reshapeValue() helper in MIGraphXToLinalg.cpp has three code paths:

  1. Same-shape early return (no-op).
  2. Collapse-only (single CollapseShapeOp).
  3. Collapse + expand (general case, tested).

Current test coverage: Only the collapse (2D→3D expand) and expand (3D→2D collapse) cases are tested.

Suggested tests in mixr-to-linalg-ops.mlir:

  • migraphx.reshape with identical input/output shape — should return the input value unchanged (no new ops).
  • migraphx.reshape that only requires a tensor.collapse_shape (e.g., 4x4xf3216xf32).

Summary Table

Area File to Add/Update Priority
arith.maximumf preservation in full pipeline mlir/test/rocmlir-driver/large-kernel-float-minmax.mlir (new) High
arith-expand with include-float-min-max=false behavioral test external/llvm-project/mlir/test/Dialect/Arith/expand-ops.mlir High
large-kernel-no-scavenge deterministic stress + error path mlir/test/rocmlir-driver/large-kernel-no-scavenge-error.mlir (new) High
Parser crash fix — parse-level error paths mlir/test/Dialect/MIGraphX/invalid.mlir Medium
Broadcasting edge cases (empty broadcastDims, splat const, axis>0) mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir Medium
migraphx.greater/equal — f16, bf16, equal si32, negative mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-boolean.mlir Medium
reshapeValue same-shape no-op + collapse-only mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir Low

Generated by automated regression-test coverage analysis on 2026-03-13, triggered by PR #2295.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions