Description / 描述
Hygon DCU is a widely used GPGPU in domestic data centers, which is based on the ROCm ecosystem (DTK). To expand the hardware ecosystem of rtp-llm and support more domestic computing power scenarios, we hope the community can consider adding support for Hygon DCU.
海光 DCU 是目前国内数据中心广泛使用的 GPGPU,其软件生态基于 ROCm(DTK)。为了扩展 rtp-llm 的硬件生态并支持更多国产算力场景,希望社区能考虑增加对海光 DCU 的适配支持。
Reasons for Support / 支持理由
Market Demand: Many enterprise users in China are deploying LLMs on Hygon DCU (e.g., BW200/BW1000).
Ecosystem Compatibility: Since DCU uses the DTK (ROCm-compatible) stack, most CUDA-based kernels can be migrated with relatively low effort.
Performance Support: rtp-llm's high-performance inference capabilities would be a great addition to the DCU software stack.
市场需求:国内许多企业用户正在海光 DCU(如 BW200/BW1000)上部署大模型。
生态兼容性:由于 DCU 使用 DTK(兼容 ROCm)技术栈,大部分基于 CUDA 的算子可以以较低的迁移成本进行适配。
性能互补:rtp-llm 的高性能推理能力将极大地丰富 DCU 的软件应用生态。
Proposed Changes / 建议改动
Add DCU-specific hardware detection logic.
Adapt kernels using HIP/DTK for key operators (Attention, PagedAttention, etc.).
Support building and CI environment for DTK/ROCm.
增加 DCU 特有的硬件检测逻辑。
针对核心算子(Attention, PagedAttention 等)使用 HIP/DTK 进行适配。
支持基于 DTK/ROCm 的构建脚本和 CI 环境。
Additional Context / 其他信息
I can assist with testing on DCU environments if needed. Are there any existing plans or roadmaps for DCU support?
如果有需要,我可以协助在 DCU 环境下进行测试。请问目前社区是否有关于 ROCm 或 DCU 支持的路线图?
Description / 描述
Hygon DCU is a widely used GPGPU in domestic data centers, which is based on the ROCm ecosystem (DTK). To expand the hardware ecosystem of rtp-llm and support more domestic computing power scenarios, we hope the community can consider adding support for Hygon DCU.
海光 DCU 是目前国内数据中心广泛使用的 GPGPU,其软件生态基于 ROCm(DTK)。为了扩展 rtp-llm 的硬件生态并支持更多国产算力场景,希望社区能考虑增加对海光 DCU 的适配支持。
Reasons for Support / 支持理由
Market Demand: Many enterprise users in China are deploying LLMs on Hygon DCU (e.g., BW200/BW1000).
Ecosystem Compatibility: Since DCU uses the DTK (ROCm-compatible) stack, most CUDA-based kernels can be migrated with relatively low effort.
Performance Support: rtp-llm's high-performance inference capabilities would be a great addition to the DCU software stack.
市场需求:国内许多企业用户正在海光 DCU(如 BW200/BW1000)上部署大模型。
生态兼容性:由于 DCU 使用 DTK(兼容 ROCm)技术栈,大部分基于 CUDA 的算子可以以较低的迁移成本进行适配。
性能互补:rtp-llm 的高性能推理能力将极大地丰富 DCU 的软件应用生态。
Proposed Changes / 建议改动
Add DCU-specific hardware detection logic.
Adapt kernels using HIP/DTK for key operators (Attention, PagedAttention, etc.).
Support building and CI environment for DTK/ROCm.
增加 DCU 特有的硬件检测逻辑。
针对核心算子(Attention, PagedAttention 等)使用 HIP/DTK 进行适配。
支持基于 DTK/ROCm 的构建脚本和 CI 环境。
Additional Context / 其他信息
I can assist with testing on DCU environments if needed. Are there any existing plans or roadmaps for DCU support?
如果有需要,我可以协助在 DCU 环境下进行测试。请问目前社区是否有关于 ROCm 或 DCU 支持的路线图?