Skip to content

Updated client code to handle A/B=0 for General Batched GEMM and added corresponding tests#8412

Merged
mpanoop merged 2 commits into
ROCm:developfrom
mpanoop:general_batched_handle_A_B_zero
Jun 16, 2026
Merged

Updated client code to handle A/B=0 for General Batched GEMM and added corresponding tests#8412
mpanoop merged 2 commits into
ROCm:developfrom
mpanoop:general_batched_handle_A_B_zero

Conversation

@mpanoop

@mpanoop mpanoop commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Motivation

The GEMM kernel implementation in hipBLASLt initializes the SrdA and SrdB from AddressA and AddressB values passed into the kernel as arguments for each workgroup but the dereferencing of the addresses in SrdA and SrdB only happens when alpha != 0. But with General Batched GEMM, the AddressA and AddressB is pointer to a pointer array on the device side. In order to initialize the SrdA and SrdB, the AddressA and AddressB needs to be dereferenced before the alpha != 0 check. This gap was exposed by rocblas batched gemm bad args tests. The PR at #8082 has a dependency on this fix.

Technical Details

When K = 0, the size of A and B matrices will be 0 as well. We have two scenarios which can happen here for the bad args:

  1. A = 0, B = 0 and alpha = 0: This is the case where as long as beta != 0, C/D != 0, then it's still a valid problem. The current implementation failed here since the SrdA and SrdB initialization was involving dereferencing of the pointer array passed in AddressA and AddressB even before the alpha != 0 check. This PR adds this additional check of AddressA/B != 0 before dereferencing in Kernel Assembly.
  2. A = 0, B = 0 and alpha != 0: This will result invalid data pointer error in rocblaslt_matmul() function (handled on the host side itself).

Test Plan

hipBLASLt didn't have a test for this earlier. Added a new test with K=0 and the client code for General Batched GEMM checks the size of A and B matrices to decide if it should pass the pointer array or just nullptr into the kernel. This mimics the rocblas test's behavior.

Test Result

All tests are passing locally

Submission Checklist

@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py 0.00% 2 Missing ⚠️

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8412      +/-   ##
===========================================
+ Coverage    65.05%   65.17%   +0.12%     
===========================================
  Files         2597     2597              
  Lines       403820   404619     +799     
  Branches     60162    60333     +171     
===========================================
+ Hits        262683   263693    +1010     
+ Misses      121909   121088     -821     
- Partials     19228    19838     +610     
Flag Coverage Δ *Carryforward flag
TensileLite 31.04% <0.00%> (+1.35%) ⬆️
hipBLAS 90.65% <ø> (ø) Carriedforward from 08cd7c6
hipBLASLt 41.20% <ø> (+0.02%) ⬆️
hipCUB 82.68% <ø> (ø) Carriedforward from 08cd7c6
hipDNN 86.68% <ø> (ø) Carriedforward from 08cd7c6
hipFFT 50.97% <ø> (ø) Carriedforward from 08cd7c6
hipRAND 76.12% <ø> (ø) Carriedforward from 08cd7c6
hipSOLVER 69.18% <ø> (ø) Carriedforward from 08cd7c6
hipSPARSE 86.55% <ø> (ø) Carriedforward from 08cd7c6
rocBLAS 48.10% <ø> (ø) Carriedforward from 08cd7c6
rocFFT 49.48% <ø> (ø) Carriedforward from 08cd7c6
rocRAND 57.02% <ø> (ø) Carriedforward from 08cd7c6
rocSOLVER 77.83% <ø> (ø) Carriedforward from 08cd7c6
rocSPARSE 72.64% <ø> (ø) Carriedforward from 08cd7c6
rocThrust 91.34% <ø> (ø) Carriedforward from 08cd7c6

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py 7.98% <0.00%> (+0.30%) ⬆️

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mpanoop mpanoop marked this pull request as ready for review June 15, 2026 05:47
@mpanoop mpanoop requested a review from a team as a code owner June 15, 2026 05:47
Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated

@nakajee nakajee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@mpanoop mpanoop merged commit 63d4e3b into ROCm:develop Jun 16, 2026
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants