Skip to content

Adding support for bmm_fp8 OP using cutlass#108

Open
bhargaveede wants to merge 3 commits intosgl-project:mainfrom
bhargaveede:bmm_fp8_cutlass
Open

Adding support for bmm_fp8 OP using cutlass#108
bhargaveede wants to merge 3 commits intosgl-project:mainfrom
bhargaveede:bmm_fp8_cutlass

Conversation

@bhargaveede
Copy link
Copy Markdown

Created new PR inplace of closed PR : #40

@kareemshaik80 @mingfeima This will still be relevant for HW with native support. As we will not do dtype conversions there.
Also, Right now we are doing replication of scales to support stride of 1 (current limitation of cutlass).

On top of current PR, We can avoid that replication once that stride support of 0 is available which will improve the perf further.

Comment thread python/sgl_kernel/gemm.py
B_scale: torch.Tensor,
) -> None:
cublas_handle = torch.cuda.current_blas_handle()
# cublas_handle = torch.cuda.current_blas_handle()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review comments not resolved from the earlier PR#40 (#40)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants