New rocblas hipblaslt integration by mpanoop · Pull Request #8082 · ROCm/rocm-libraries

mpanoop · 2026-06-04T21:10:36Z

Motivation

rocBLAS integration to hipBLASLt currently uses the hipBLASLt extension APIs. This was because unlike non-strided and strided batched GEMM, the batched GEMM wasn't supported using standard hipBLASLt APIs. Instead, the batched GEMM in rocBLAS was routed to hipBLASLt extension Grouped GEMM APIs. In order to keep the integration code consistent across all 3 categories of GEMM, hipBLASLt extension API based integration was chosen.

With hipBLASLt version 1.3.0, General Batched GEMM support is introduced. PR at #7464 adds missing support for StreamK = 3 + Parallel Reduction as well along with new modifications needed in hipBLASLt side to make the rocblas hipblaslt integration work. The new integration also has a dependency on a newly introduced hipblaslt-ext API isSupportSolution() will be invoked from the rocBLAS side. Hence the new hipblaslt integration is currently guarded for compile time dependency on hipblaslt version 1.4.1 or above. Otherwise, the older hipblaslt integration will be exercised.

Technical Details

With the newly introduced support for hipblasLtBatchMode_t enum which enables support for General Batched GEMM workflow using the standard hipBLASLt APIs, the new hipBLASLt integration code will mimic any other hipBLASLt customer code. This is also a performant alternative for General Batched GEMM since the previous approach was routing to Grouped GEMM which wasn't taking advantage of the properties of the General Batched GEMM, the solution space wasn't exhaustive to cover all data types across GPUs, and the solution selection approach was getAllSolutions() instead of heuristic approach with no CacheLibrary support. The new approach mitigates these performance bottlenecks.

Test Plan

No new tests are added. Existing testcases will exercise the new integration when the hipBLASLt version is 1.4.1 or above.

Test Result

All the tests were passing when run on MI350 node where the BF16 and FP16 GEMMs are defaulted to hipBLASLt backend by default.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

codecov-commenter · 2026-06-04T23:27:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8082      +/-   ##
===========================================
+ Coverage    61.53%   61.58%   +0.05%     
===========================================
  Files         2095     2095              
  Lines       361435   361501      +66     
  Branches     54717    54717              
===========================================
+ Hits        222391   222625     +234     
+ Misses      120198   119997     -201     
- Partials     18846    18879      +33

Flag	Coverage Δ		*Carryforward flag
TensileLite	`28.63% <ø> (ø)`		Carriedforward from 7a80d00
hipBLAS	`90.65% <ø> (ø)`		Carriedforward from 7a80d00
hipBLASLt	`41.17% <ø> (ø)`		Carriedforward from 7a80d00
hipCUB	`82.68% <ø> (ø)`		Carriedforward from 7a80d00
hipDNN	`86.62% <ø> (ø)`		Carriedforward from 7a80d00
hipFFT	`50.97% <ø> (ø)`		Carriedforward from 7a80d00
hipRAND	`76.12% <ø> (ø)`		Carriedforward from 7a80d00
hipSOLVER	`69.18% <ø> (ø)`		Carriedforward from 7a80d00
hipSPARSE	`86.55% <ø> (ø)`		Carriedforward from 7a80d00
rocBLAS	`48.49% <ø> (+0.39%)`	⬆️
rocFFT	`47.15% <ø> (ø)`		Carriedforward from 7a80d00
rocRAND	`57.02% <ø> (ø)`		Carriedforward from 7a80d00
rocSOLVER	`77.83% <ø> (ø)`		Carriedforward from 7a80d00
rocSPARSE	`72.31% <ø> (ø)`		Carriedforward from 7a80d00

*This pull request uses carry forward flags. Click here to find out more.
see 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

TorreZuk

Some style comments from earlier

Copilot

Pull request overview

This PR updates rocBLAS’s hipBLASLt backend integration to use the standard hipBLASLt matmul APIs for General Batched GEMM when hipBLASLt is new enough, while retaining the existing extension-API-based integration as a fallback for older hipBLASLt versions.

Changes:

Adds a new hipBLASLt matmul code path guarded by a hipBLASLt version check (intended for >= 1.4.1).
Introduces helper macros/utilities for hipBLASLt status checking, solution selection, and workspace size validation.
Adds an alpha/beta type-mapping helper to match hipBLASLt expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TorreZuk

My EXPECT to simpler CHECK_ was not addressed, all these that are just success remove the status and just use the CHECK_ pattern we use elsewhere in the library

evedovelli

I left a few comments. Some might be too nitpicking so I'll let you decide what to change or keep as is.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…ng a new device pointer array with offset update before passing into hipblasLtMatmul

…ion, changed error handling

TorreZuk

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

…_HIPBLASLT_ERROR to not overwrite previous error status

TorreZuk

Okay I think all concerns now been addressed. Need to discuss timeline to merge in meeting

New rocblas hipblaslt integration ## Motivation rocBLAS integration to hipBLASLt currently uses the hipBLASLt extension APIs. This was because unlike non-strided and strided batched GEMM, the batched GEMM wasn't supported using standard hipBLASLt APIs. Instead, the batched GEMM in rocBLAS was routed to hipBLASLt extension Grouped GEMM APIs. In order to keep the integration code consistent across all 3 categories of GEMM, hipBLASLt extension API based integration was chosen. With hipBLASLt version 1.3.0, General Batched GEMM support is introduced. PR at ROCm/rocm-libraries#7464 adds missing support for StreamK = 3 + Parallel Reduction as well along with new modifications needed in hipBLASLt side to make the rocblas hipblaslt integration work. The new integration also has a dependency on a newly introduced hipblaslt-ext API isSupportSolution() will be invoked from the rocBLAS side. Hence the new hipblaslt integration is currently guarded for compile time dependency on hipblaslt version 1.4.1 or above. Otherwise, the older hipblaslt integration will be exercised. ## Technical Details With the newly introduced support for hipblasLtBatchMode_t enum which enables support for General Batched GEMM workflow using the standard hipBLASLt APIs, the new hipBLASLt integration code will mimic any other hipBLASLt customer code. This is also a performant alternative for General Batched GEMM since the previous approach was routing to Grouped GEMM which wasn't taking advantage of the properties of the General Batched GEMM, the solution space wasn't exhaustive to cover all data types across GPUs, and the solution selection approach was getAllSolutions() instead of heuristic approach with no CacheLibrary support. The new approach mitigates these performance bottlenecks. ## Test Plan No new tests are added. Existing testcases will exercise the new integration when the hipBLASLt version is 1.4.1 or above. ## Test Result All the tests were passing when run on MI350 node where the BF16 and FP16 GEMMs are defaulted to hipBLASLt backend by default. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…8746) ## Motivation The PR #8082 introduces a new hipBLASLt integration from rocBLAS. Previous hipBLASLt integration for non-batched, strided-batched and general batched GEMMs was routed via hipblaslt-ext APIs. This was because hipBLASLt didn't support for General Batched GEMM and rocBLAS was routing the General Batched GEMM APIs to Grouped GEMM APIs which was exposed via hipblaslt-ext. Now that General Batched GEMM is supported in hipBLASLt, the hipBLASLt integration in rocBLAS can be streamlined to look similar to any other customer code which directly consumes hipBLASLt APIs. ## Technical Details There are still some rocBLAS specific scenarios like given a solution index in hipBLASLt Solution Library, validate if this solution is supported for a given GPU and problem type were the hipBLASLt APIs currently exposed won't suffice. Going via the hipblasLtMatmulAlgoGetHeuristics() API will mean we have to use the hipblasLtMatmulHeuristicResult_t parameter to pass the solution-index as input. But as per the API contract, this parameter is strictly output parameter. That's why we decided to expose an extension API which can directly route this to equivalent rocblaslt layer API. ## Test Plan rocBLAS tests in PR 8082 were using this new API for the scenarios where the solution-index were explicitly passed in. ## Test Result All tests are passing. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>

github-actions Bot added the project: rocblas label Jun 4, 2026

mpanoop marked this pull request as ready for review June 4, 2026 21:58

mpanoop requested a review from a team as a code owner June 4, 2026 21:58

assistant-librarian Bot added the organization: ROCm label Jun 4, 2026

mpanoop requested a review from TorreZuk June 8, 2026 19:58

TorreZuk reviewed Jun 8, 2026

View reviewed changes

TorreZuk requested review from a team and Copilot June 8, 2026 22:42

Copilot started reviewing on behalf of TorreZuk June 8, 2026 22:42 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

mpanoop force-pushed the rocblas_hipblaslt_new_integration branch from a233172 to 46c697c Compare June 10, 2026 15:59

TorreZuk requested changes Jun 10, 2026

View reviewed changes

TorreZuk added the ci:extended label Jun 11, 2026

evedovelli reviewed Jun 11, 2026

View reviewed changes

TorreZuk reviewed Jun 11, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Madhusoodhanan Prabha and others added 10 commits June 11, 2026 18:09

New rocblas hipblaslt integration

2085aa5

Fixed the formatting reported by static analysis check

0e0f540

Removed a debug print

2fb74c6

Updated the logic when new hipblaslt integration is enabled

99d2e3e

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Fixed the code review comments from Copilot

b767b34

Keep the original device pointer array from customer as is and creati…

fe6b9b6

…ng a new device pointer array with offset update before passing into hipblasLtMatmul

Fixed the errors in refactoring

105cf21

Fixed the formatting issues

11a5bc9

Made sure original pointer array is not overwritten with offset addit…

32681e9

…ion, changed error handling

Added scaletype handling for BF16 inputs

b76e86c

mpanoop force-pushed the rocblas_hipblaslt_new_integration branch from b8170a5 to b76e86c Compare June 12, 2026 01:09

mpanoop added the helpWanted Extra attention is needed label Jun 12, 2026

Address Potential Memory Leak, added new error handling macro

31a91a2

mpanoop mentioned this pull request Jun 15, 2026

Updated client code to handle A/B=0 for General Batched GEMM and added corresponding tests #8412

Merged

1 task

TorreZuk reviewed Jun 15, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

mpanoop added 2 commits June 15, 2026 19:13

Avoid memcpys for non-zero offsets, replacing blocking allocators, de…

b4592e3

…allocators and memcpys with async equivalents

Fixed build errors

82e4bd3

TorreZuk reviewed Jun 16, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Added a new device kernel for offset adjustment in pointer array

63ac965

evedovelli reviewed Jun 18, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

TorreZuk reviewed Jun 18, 2026

View reviewed changes

mpanoop added 4 commits June 18, 2026 15:53

Fused the memcpy of device pointer array into the addoffset kernel, r…

d5d1120

…evisted the error handling to avoid duplicate code

Fixed the handling of scale type when data type is int8_t

c3a70b8

Additional error handling to avoid potential memory leaks

2cedf80

Handling the Initialization of hipblasLt types

9f3b127

TorreZuk requested a review from Copilot June 19, 2026 19:32

Copilot started reviewing on behalf of TorreZuk June 19, 2026 19:33 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Changes based on latest copilot review

6558b5a

TorreZuk reviewed Jun 22, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

TorreZuk reviewed Jun 22, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Updated logic of setting the RETURN_STATUS in newly introduced HANDLE…

7a80d00

…_HIPBLASLT_ERROR to not overwrite previous error status

TorreZuk reviewed Jun 22, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp Outdated

Changes to preserve the first error status

c3a85ab

TorreZuk approved these changes Jun 22, 2026

View reviewed changes

NaveenElumalaiAMD approved these changes Jun 22, 2026

View reviewed changes

Comment thread projects/rocblas/library/src/hipblaslt_host.cpp

mpanoop merged commit caf307b into ROCm:develop Jun 23, 2026
72 of 76 checks passed

mpanoop mentioned this pull request Jun 24, 2026

Update on the newly added API hipblaslt-ext::isSolutionSupported() #8746

Merged

1 task

amd-chiranjeevi mentioned this pull request Jun 24, 2026

Revert "New rocblas hipblaslt integration" #8761

Closed

Uh oh!

Conversation

mpanoop commented Jun 4, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

codecov-commenter commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TorreZuk left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TorreZuk left a comment

Choose a reason for hiding this comment

Uh oh!

evedovelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TorreZuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evedovelli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TorreZuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TorreZuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Jun 4, 2026 •

edited

Loading