Skip to content

New rocblas hipblaslt integration#8082

Merged
mpanoop merged 21 commits into
ROCm:developfrom
mpanoop:rocblas_hipblaslt_new_integration
Jun 23, 2026
Merged

New rocblas hipblaslt integration#8082
mpanoop merged 21 commits into
ROCm:developfrom
mpanoop:rocblas_hipblaslt_new_integration

Conversation

@mpanoop

@mpanoop mpanoop commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Motivation

rocBLAS integration to hipBLASLt currently uses the hipBLASLt extension APIs. This was because unlike non-strided and strided batched GEMM, the batched GEMM wasn't supported using standard hipBLASLt APIs. Instead, the batched GEMM in rocBLAS was routed to hipBLASLt extension Grouped GEMM APIs. In order to keep the integration code consistent across all 3 categories of GEMM, hipBLASLt extension API based integration was chosen.

With hipBLASLt version 1.3.0, General Batched GEMM support is introduced. PR at #7464 adds missing support for StreamK = 3 + Parallel Reduction as well along with new modifications needed in hipBLASLt side to make the rocblas hipblaslt integration work. The new integration also has a dependency on a newly introduced hipblaslt-ext API isSupportSolution() will be invoked from the rocBLAS side. Hence the new hipblaslt integration is currently guarded for compile time dependency on hipblaslt version 1.4.1 or above. Otherwise, the older hipblaslt integration will be exercised.

Technical Details

With the newly introduced support for hipblasLtBatchMode_t enum which enables support for General Batched GEMM workflow using the standard hipBLASLt APIs, the new hipBLASLt integration code will mimic any other hipBLASLt customer code. This is also a performant alternative for General Batched GEMM since the previous approach was routing to Grouped GEMM which wasn't taking advantage of the properties of the General Batched GEMM, the solution space wasn't exhaustive to cover all data types across GPUs, and the solution selection approach was getAllSolutions() instead of heuristic approach with no CacheLibrary support. The new approach mitigates these performance bottlenecks.

Test Plan

No new tests are added. Existing testcases will exercise the new integration when the hipBLASLt version is 1.4.1 or above.

Test Result

All the tests were passing when run on MI350 node where the BF16 and FP16 GEMMs are defaulted to hipBLASLt backend by default.

Submission Checklist

@mpanoop mpanoop marked this pull request as ready for review June 4, 2026 21:58
@mpanoop mpanoop requested a review from a team as a code owner June 4, 2026 21:58
@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8082      +/-   ##
===========================================
+ Coverage    61.53%   61.58%   +0.05%     
===========================================
  Files         2095     2095              
  Lines       361435   361501      +66     
  Branches     54717    54717              
===========================================
+ Hits        222391   222625     +234     
+ Misses      120198   119997     -201     
- Partials     18846    18879      +33     
Flag Coverage Δ *Carryforward flag
TensileLite 28.63% <ø> (ø) Carriedforward from 7a80d00
hipBLAS 90.65% <ø> (ø) Carriedforward from 7a80d00
hipBLASLt 41.17% <ø> (ø) Carriedforward from 7a80d00
hipCUB 82.68% <ø> (ø) Carriedforward from 7a80d00
hipDNN 86.62% <ø> (ø) Carriedforward from 7a80d00
hipFFT 50.97% <ø> (ø) Carriedforward from 7a80d00
hipRAND 76.12% <ø> (ø) Carriedforward from 7a80d00
hipSOLVER 69.18% <ø> (ø) Carriedforward from 7a80d00
hipSPARSE 86.55% <ø> (ø) Carriedforward from 7a80d00
rocBLAS 48.49% <ø> (+0.39%) ⬆️
rocFFT 47.15% <ø> (ø) Carriedforward from 7a80d00
rocRAND 57.02% <ø> (ø) Carriedforward from 7a80d00
rocSOLVER 77.83% <ø> (ø) Carriedforward from 7a80d00
rocSPARSE 72.31% <ø> (ø) Carriedforward from 7a80d00

*This pull request uses carry forward flags. Click here to find out more.
see 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mpanoop mpanoop requested a review from TorreZuk June 8, 2026 19:58

@TorreZuk TorreZuk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style comments from earlier

@TorreZuk TorreZuk requested review from a team and Copilot June 8, 2026 22:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates rocBLAS’s hipBLASLt backend integration to use the standard hipBLASLt matmul APIs for General Batched GEMM when hipBLASLt is new enough, while retaining the existing extension-API-based integration as a fallback for older hipBLASLt versions.

Changes:

  • Adds a new hipBLASLt matmul code path guarded by a hipBLASLt version check (intended for >= 1.4.1).
  • Introduces helper macros/utilities for hipBLASLt status checking, solution selection, and workspace size validation.
  • Adds an alpha/beta type-mapping helper to match hipBLASLt expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
@mpanoop mpanoop force-pushed the rocblas_hipblaslt_new_integration branch from a233172 to 46c697c Compare June 10, 2026 15:59

@TorreZuk TorreZuk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My EXPECT to simpler CHECK_ was not addressed, all these that are just success remove the status and just use the CHECK_ pattern we use elsewhere in the library

@evedovelli evedovelli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. Some might be too nitpicking so I'll let you decide what to change or keep as is.

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
@mpanoop mpanoop force-pushed the rocblas_hipblaslt_new_integration branch from b8170a5 to b76e86c Compare June 12, 2026 01:09
@mpanoop mpanoop added the helpWanted Extra attention is needed label Jun 12, 2026

@TorreZuk TorreZuk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More questions on design

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
mpanoop added a commit that referenced this pull request Jun 16, 2026
…d corresponding tests (#8412)

## Motivation

The GEMM kernel implementation in hipBLASLt initializes the SrdA and
SrdB from AddressA and AddressB values passed into the kernel as
arguments for each workgroup but the dereferencing of the addresses in
SrdA and SrdB only happens when alpha != 0. But with General Batched
GEMM, the AddressA and AddressB is pointer to a pointer array on the
device side. In order to initialize the SrdA and SrdB, the AddressA and
AddressB needs to be dereferenced before the alpha != 0 check. This gap
was exposed by rocblas batched gemm bad args tests. The PR at
#8082 has a dependency on
this fix.

## Technical Details

When K = 0, the size of A and B matrices will be 0 as well. We have two
scenarios which can happen here for the bad args:
1. A = 0, B = 0 and alpha = 0: This is the case where as long as beta !=
0, C/D != 0, then it's still a valid problem. The current implementation
failed here since the SrdA and SrdB initialization was involving
dereferencing of the pointer array passed in AddressA and AddressB even
before the alpha != 0 check. This PR adds this additional check of
AddressA/B != 0 before dereferencing in Kernel Assembly.
2. A = 0, B = 0 and alpha != 0: This will result invalid data pointer
error in rocblaslt_matmul() function (handled on the host side itself).

## Test Plan

hipBLASLt didn't have a test for this earlier. Added a new test with K=0
and the client code for General Batched GEMM checks the size of A and B
matrices to decide if it should pass the pointer array or just nullptr
into the kernel. This mimics the rocblas test's behavior.

## Test Result

All tests are passing locally

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: Madhusoodhanan Prabha <amadhuso@ctr2-alola-ctrl-01.amd.com>
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

@evedovelli evedovelli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpanoop thanks for the changes! I left a few more suggestions in the PR.

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

@TorreZuk TorreZuk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better, our we getting downstream testing on these changes?

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated
…_HIPBLASLT_ERROR to not overwrite previous error status
Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

@TorreZuk TorreZuk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think all concerns now been addressed. Need to discuss timeline to merge in meeting

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp
@mpanoop mpanoop merged commit caf307b into ROCm:develop Jun 23, 2026
72 of 76 checks passed
assistant-librarian Bot pushed a commit to ROCm/rocBLAS that referenced this pull request Jun 23, 2026
New rocblas hipblaslt integration

## Motivation

rocBLAS integration to hipBLASLt currently uses the hipBLASLt extension
APIs. This was because unlike non-strided and strided batched GEMM, the
batched GEMM wasn't supported using standard hipBLASLt APIs. Instead,
the batched GEMM in rocBLAS was routed to hipBLASLt extension Grouped
GEMM APIs. In order to keep the integration code consistent across all 3
categories of GEMM, hipBLASLt extension API based integration was
chosen.

With hipBLASLt version 1.3.0, General Batched GEMM support is
introduced. PR at ROCm/rocm-libraries#7464 adds
missing support for StreamK = 3 + Parallel Reduction as well along with
new modifications needed in hipBLASLt side to make the rocblas hipblaslt
integration work. The new integration also has a dependency on a newly
introduced hipblaslt-ext API isSupportSolution() will be invoked from
the rocBLAS side. Hence the new hipblaslt integration is currently
guarded for compile time dependency on hipblaslt version 1.4.1 or above.
Otherwise, the older hipblaslt integration will be exercised.

## Technical Details

With the newly introduced support for hipblasLtBatchMode_t enum which
enables support for General Batched GEMM workflow using the standard
hipBLASLt APIs, the new hipBLASLt integration code will mimic any other
hipBLASLt customer code. This is also a performant alternative for
General Batched GEMM since the previous approach was routing to Grouped
GEMM which wasn't taking advantage of the properties of the General
Batched GEMM, the solution space wasn't exhaustive to cover all data
types across GPUs, and the solution selection approach was
getAllSolutions() instead of heuristic approach with no CacheLibrary
support. The new approach mitigates these performance bottlenecks.

## Test Plan

No new tests are added. Existing testcases will exercise the new
integration when the hipBLASLt version is 1.4.1 or above.

## Test Result

All the tests were passing when run on MI350 node where the BF16 and
FP16 GEMMs are defaulted to hipBLASLt backend by default.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
mpanoop added a commit that referenced this pull request Jun 24, 2026
…8746)

## Motivation

The PR #8082 introduces a new hipBLASLt integration from rocBLAS.
Previous hipBLASLt integration for non-batched, strided-batched and
general batched GEMMs was routed via hipblaslt-ext APIs. This was
because hipBLASLt didn't support for General Batched GEMM and rocBLAS
was routing the General Batched GEMM APIs to Grouped GEMM APIs which was
exposed via hipblaslt-ext. Now that General Batched GEMM is supported in
hipBLASLt, the hipBLASLt integration in rocBLAS can be streamlined to
look similar to any other customer code which directly consumes
hipBLASLt APIs.

## Technical Details

There are still some rocBLAS specific scenarios like given a solution
index in hipBLASLt Solution Library, validate if this solution is
supported for a given GPU and problem type were the hipBLASLt APIs
currently exposed won't suffice. Going via the
hipblasLtMatmulAlgoGetHeuristics() API will mean we have to use the
hipblasLtMatmulHeuristicResult_t parameter to pass the solution-index as
input. But as per the API contract, this parameter is strictly output
parameter. That's why we decided to expose an extension API which can
directly route this to equivalent rocblaslt layer API.

## Test Plan

rocBLAS tests in PR 8082 were using this new API for the scenarios where
the solution-index were explicitly passed in.

## Test Result

All tests are passing.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

---------

Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants