Skip to content

LoRA: Implementing kernels using CUBE computation unit#384

Merged
RuixuanZhang06 merged 5 commits intosgl-project:mainfrom
vlserov:vlserov/lora_kernels_cube
Apr 8, 2026
Merged

LoRA: Implementing kernels using CUBE computation unit#384
RuixuanZhang06 merged 5 commits intosgl-project:mainfrom
vlserov:vlserov/lora_kernels_cube

Conversation

@vlserov
Copy link
Copy Markdown
Contributor

@vlserov vlserov commented Feb 27, 2026

Implementing kernels using CUBE computation unit instead of using VECTOR computation unit

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances LoRA kernel performance by migrating key operations from the VECTOR computation unit to the more specialized CUBE computation unit. This involves introducing new sgemmc_expand and sgemmc_shrink kernels, complete with their host-side interfaces and dedicated tiling mechanisms. Additionally, existing LoRA kernels were refactored to share common utilities, streamlining the codebase and preparing for future optimizations.

Highlights

  • New LoRA Kernels with CUBE Unit: Introduced new sgemmc_expand and sgemmc_shrink operations, specifically designed to utilize the CUBE computation unit for LoRA (Low-Rank Adaptation) kernels, moving away from or complementing existing VECTOR unit implementations.
  • Tiling Infrastructure for CUBE Operations: Added dedicated tiling logic and data structures (sgemmc_tiling.cpp, sgemmc_tiling.h, sgemmc_tiling_data.h) to support the configuration and execution of the new CUBE-based sgemmc operations.
  • Kernel Code Refactoring and Reusability: Refactored existing sgmv_expand_kernel.cpp, sgmv_shrink_kernel.cpp, sgemmv_expand_kernel.cpp, and sgemmv_shrink_kernel.cpp to leverage a new common BlockIterator utility defined in lora_common_kernel.h, improving code modularity and reducing redundancy.
  • Build System and API Integration: Updated CMakeLists.txt to include the newly added host and kernel source files for sgemmc operations, linked necessary libraries, and registered the new sgemmc_expand and sgemmc_shrink functions within the PyTorch extension API (pytorch_extensions.cpp) and public headers (sgl_kenel_npu_ops.h).
  • Utility Enhancements: Enhanced common_tiling.h with a DataType enum and torch_helper.h with a ConvertDataType utility, alongside a new common_tiling_kernel.h for kernel-side tiling data copying, providing foundational support for the new kernel implementations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • csrc/CMakeLists.txt
    • Added sgemmc_expand.cpp, sgemmc_shrink.cpp, and sgemmc_tiling.cpp to host source files.
    • Included sgemmc_expand_kernel.cpp and sgemmc_shrink_kernel.cpp in workspace kernel sources.
    • Added utils/kernel to include directories for both no_workspace_kernel and workspace_kernel.
    • Linked host_intf_pub to the OP_PLUGIN_NAME target.
    • Added ${ASCEND_INCLUDE_DIR} to target include directories.
  • csrc/lora/op_host/sgemmc_expand.cpp
    • Added new host-side implementation for sgemmc_expand using CUBE computation unit.
  • csrc/lora/op_host/sgemmc_shrink.cpp
    • Added new host-side implementation for sgemmc_shrink using CUBE computation unit.
  • csrc/lora/op_host/tiling/sgemmc_tiling.cpp
    • Added implementation for GenerateTiling function to configure CUBE tiling parameters for sgemmc operations.
  • csrc/lora/op_host/tiling/sgemmc_tiling.h
    • Added header file declaring the GenerateTiling function for sgemmc operations.
  • csrc/lora/op_host/tiling/sgemmc_tiling_data.h
    • Added data structure SGEMMCTilingData to hold tiling information for sgemmc kernels.
  • csrc/lora/op_kernel/lora_common_kernel.h
    • Added new header file defining BlockIterator for common LoRA kernel utilities.
  • csrc/lora/op_kernel/sgemmc_expand_kernel.cpp
    • Added new kernel-side implementation for sgemmc_expand utilizing the CUBE computation unit.
  • csrc/lora/op_kernel/sgemmc_shrink_kernel.cpp
    • Added new kernel-side implementation for sgemmc_shrink utilizing the CUBE computation unit.
  • csrc/lora/op_kernel/sgemmv_expand_kernel.cpp
    • Included lora_common_kernel.h.
    • Updated Process method to use lora_common::BlockIterator for LoRA index lookup.
    • Removed redundant CopyInIndex private method.
  • csrc/lora/op_kernel/sgemmv_shrink_kernel.cpp
    • Included lora_common_kernel.h.
    • Updated Process method to use lora_common::BlockIterator for LoRA index lookup.
    • Removed redundant CopyInIndex private method.
  • csrc/lora/op_kernel/sgmv_expand_kernel.cpp
    • Included lora_common_kernel.h.
    • Updated Process method to use lora_common::BlockIterator for LoRA index lookup.
    • Removed redundant CopyInIndex private method.
  • csrc/lora/op_kernel/sgmv_shrink_kernel.cpp
    • Included lora_common_kernel.h.
    • Updated Process method to use lora_common::BlockIterator for LoRA index lookup.
    • Removed redundant CopyInIndex private method.
  • csrc/pytorch_extensions.cpp
    • Registered sgemmc_expand and sgemmc_shrink operations in the PyTorch NPU library fragment.
    • Implemented sgemmc_expand and sgemmc_shrink functions for the PrivateUse1 backend.
  • csrc/utils/common_tiling.h
    • Added DataType enum to host_utils namespace.
  • csrc/utils/kernel/common_tiling_kernel.h
    • Added new header file defining CopyTiling utility for kernel-side tiling data handling.
  • csrc/utils/torch_helper.h
    • Included common_tiling.h.
    • Added ConvertDataType static method to TorchNpuHelper to convert at::ScalarType to host_utils::DataType.
  • include/sgl_kenel_npu_ops.h
    • Declared sgemmc_expand and sgemmc_shrink functions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new LoRA kernels (sgemmc) that leverage the CUBE computation unit for better performance, which is a solid architectural improvement. The refactoring of existing kernels to use a common BlockIterator is also a good move for code maintainability. However, my review uncovered several critical issues in the new host-side and kernel-side implementations. These include incorrect variable initializations, swapped function arguments, use of uninitialized variables, and missing template parameters, which will likely cause compilation errors or incorrect runtime behavior. I've also noted some medium-severity issues related to code quality, such as improper error handling and dead code. The BlockIterator refactoring is also flawed as it's used incorrectly. I've provided specific suggestions to address these critical problems.

Comment thread csrc/lora/op_kernel/sgemmc_expand_kernel.cpp Outdated
Comment thread csrc/lora/op_kernel/sgemmv_expand_kernel.cpp
Comment thread csrc/lora/op_kernel/sgmv_shrink_kernel.cpp
Comment thread csrc/lora/op_kernel/sgemmc_expand_kernel.cpp Outdated
Comment thread csrc/lora/op_kernel/sgemmc_expand_kernel.cpp Outdated
Comment thread csrc/lora/op_kernel/sgemmv_shrink_kernel.cpp
Comment thread csrc/lora/op_kernel/sgemmc_shrink_kernel.cpp Outdated
Comment thread csrc/lora/op_kernel/sgemmc_shrink_kernel.cpp Outdated
Comment thread csrc/lora/op_kernel/sgemmc_expand_kernel.cpp Outdated
Comment thread csrc/lora/op_host/tiling/sgemmc_tiling.cpp Outdated
@vlserov vlserov force-pushed the vlserov/lora_kernels_cube branch 2 times, most recently from ba1b713 to 573bb5d Compare March 2, 2026 11:11
@vlserov vlserov marked this pull request as ready for review March 2, 2026 11:12
RuixuanZhang06
RuixuanZhang06 previously approved these changes Mar 10, 2026
@vlserov vlserov force-pushed the vlserov/lora_kernels_cube branch 2 times, most recently from b82a7ca to 9cc6799 Compare March 19, 2026 05:01
@vlserov vlserov force-pushed the vlserov/lora_kernels_cube branch from 9cc6799 to 27a056c Compare March 20, 2026 11:58
RuixuanZhang06
RuixuanZhang06 previously approved these changes Mar 30, 2026
@vlserov vlserov force-pushed the vlserov/lora_kernels_cube branch from 8d51c08 to a7455c6 Compare April 2, 2026 04:34
@RuixuanZhang06 RuixuanZhang06 merged commit 7b93364 into sgl-project:main Apr 8, 2026
5 of 7 checks passed
iforgetmyname added a commit that referenced this pull request Apr 8, 2026
iforgetmyname added a commit that referenced this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants