Add AMD GPU support via ROCm/HIP#94
Conversation
This adds an optional AMD GPU build to cuPDLPx through ROCm/HIP, alongside the existing CUDA path. The CUDA build is unchanged when USE_HIP is off. To review: start with internal/cuda_to_hip.h, which routes the CUDA runtime, cuBLAS, cuSPARSE, and CUB symbols used by the solver to their hipRT, hipBLAS, hipSPARSE, and hipCUB equivalents on a HIP build, and includes the standard CUDA headers otherwise. The device sources keep their CUDA spelling and are compiled as HIP. internal/cusparse_compat.h selects the standard hipsparseSpMV path on ROCm, since hipSPARSE does not provide the cusparseSpMVOp variant. CMakeLists.txt gains a USE_HIP option (off by default). When enabled the project is configured with the HIP language, the .cu sources are compiled as HIP, and the targets link hipBLAS, hipSPARSE, and hipCUB instead of the CUDA libraries. GPU architectures are chosen with CMAKE_HIP_ARCHITECTURES, defaulting to gfx90a. On Windows the CLI-only mps_parser.c is excluded from the core library because it relies on strtok_r. The interface test gains a case that runs the GPU solver path with presolve disabled, exercising the hipBLAS and hipSPARSE execution path end to end. Test Plan: Built and ran on an AMD Instinct MI200 (gfx90a) with ROCm 7.2.1: ``` cmake -B build -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_PREFIX_PATH=/opt/rocm \ -DCUPDLPX_BUILD_CLI=ON -DCUPDLPX_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release cmake --build build -j$(nproc) ./build/tests/test_interface ``` The interface suite passes, including the GPU solver case (Status: OPTIMAL). The same configuration builds cleanly for gfx1100 (RDNA3) and gfx1201 (RDNA4); the device code objects are identical across the documentation and formatting commits that followed validation. The CUDA build path is unaffected by these changes. This work was authored with the assistance of Claude, an AI assistant by Anthropic.
The ROCm support commit routed all CUDA/HIP includes through internal/cuda_to_hip.h and pulled it into utils.h and internal_types.h, which are included by the C translation units (cli.c, cupdlpx.c, mps_parser.c, presolve.c). On the CUDA path that header included <cub/device/device_reduce.cuh> unconditionally; cub is C++ only, so the C compiler failed with "unknown type name 'namespace'", breaking every CUDA build job (all Linux and Windows toolchains, CUDA 12.4 through 13.1). The HIP path was unaffected because its hipcub include was already guarded with #ifdef __cplusplus. The fix mirrors that guard on the CUDA branch: the cub header is only included for C++ translation units (the .cu device sources that actually use cub::DeviceReduce). The change is entirely within the #else CUDA branch, so the HIP/ROCm device code is unchanged. Authored with assistance from Claude. Test Plan: reproduced and verified the CUDA path locally with the CUDA 12.8 toolkit (gcc 13, ninja), matching the upstream CI configure: ``` cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release \ -DCUPDLPX_BUILD_TESTS=OFF -DCMAKE_CUDA_ARCHITECTURES=80 cmake --build build --clean-first ``` Before: cc -std=gnu99 -c src/cupdlpx.c fails on cub/device/device_reduce.cuh. After: clean build, links cupdlpx and libcupdlpx.so with 0 errors.
ZedongPeng
left a comment
There was a problem hiding this comment.
Thanks a lot for adding ROCm/HIP support, @jeffdaily. This is a genuinely useful contribution, and I really appreciate the clean implementation. I left a few comments, mostly minor ones related to project consistency.
| #include "internal_types.h" | ||
|
|
||
| // On CUDA, include the standard headers; on HIP, cuda_to_hip.h already included them. | ||
| #if !defined(USE_HIP) && !defined(__HIP_PLATFORM_AMD__) |
There was a problem hiding this comment.
Since cuda_to_hip.h already includes these headers, the #if !defined(USE_HIP) blocks are now redundant. We can keep just #include "cuda_to_hip.h" and drop them. (preconditioner.cu's whole 20-27 block can go too.)
| printf("\n=== Test 9: CSR Matrix (presolve disabled, GPU solver) ===\n"); | ||
| { | ||
| lp_problem_t *prob9 = create_lp_problem(c, &A_csr, l, u, NULL, NULL, NULL); | ||
| if (!prob9) | ||
| { | ||
| fprintf(stderr, "[test] create_lp_problem failed for Test 9.\n"); | ||
| return 1; | ||
| } | ||
| pdhg_parameters_t params9; | ||
| set_default_parameters(¶ms9); | ||
| params9.presolve = false; | ||
| params9.verbose = true; | ||
| cupdlpx_result_t *res9 = solve_lp_problem(prob9, ¶ms9); | ||
| lp_problem_free(prob9); | ||
| if (!res9) | ||
| { | ||
| fprintf(stderr, "[test] solve_lp_problem failed for Test 9.\n"); | ||
| return 1; | ||
| } | ||
| print_vec("x", res9->primal_solution, res9->num_variables); | ||
| print_vec("y", res9->dual_solution, res9->num_constraints); | ||
| cupdlpx_result_free(res9); | ||
| } |
There was a problem hiding this comment.
It currently only checks for non-NULL, so it won't catch a wrong answer. Could we assert the known optimum (x=[1,2], objective 3.0)?
if (res9->termination_reason != TERMINATION_REASON_OPTIMAL ||
fabs(res9->primal_objective_value - 3.0) > 1e-4)
{
fprintf(stderr, "[test] Test 9 wrong: status=%d obj=%g (expected OPTIMAL, 3.0)\n",
res9->termination_reason, res9->primal_objective_value);
cupdlpx_result_free(res9);
return 1;
}| # Mark .cu files as HIP language | ||
| set_source_files_properties(${CU_SOURCES} PROPERTIES LANGUAGE HIP) | ||
| # Define USE_HIP for the compat header | ||
| add_compile_definitions(USE_HIP) |
There was a problem hiding this comment.
Minor: add_compile_definitions(USE_HIP) is directory-scoped. Attaching it to the interface target would make it travel with consumers more reliably: target_compile_definitions(cupdlpx_compile_flags INTERFACE USE_HIP)
| if(USE_HIP) | ||
| set_target_properties(cupdlpx_core PROPERTIES | ||
| HIP_ARCHITECTURES "${CMAKE_HIP_ARCHITECTURES}" | ||
| ) | ||
| else() | ||
| set_target_properties(cupdlpx_core PROPERTIES | ||
| CUDA_SEPARABLE_COMPILATION ON | ||
| CUDA_RESOLVE_DEVICE_SYMBOLS ON | ||
| ) | ||
| endif() |
There was a problem hiding this comment.
The CUDA static lib sets CUDA_SEPARABLE_COMPILATION/CUDA_RESOLVE_DEVICE_SYMBOLS so that cupdlpx_core's device code is fully device-linked at the archive boundary. This is needed because the pybind module (_cupdlpx_core, linked by the C++ compiler) consumes it. The HIP branch only sets HIP_ARCHITECTURES.
Have you been able to build the ROCm Python extension (CUPDLPX_BUILD_PYTHON=ON) and confirm the pybind module links against the HIP static lib? hipcc's default whole-program compilation may make this a non-issue, just want to make sure the bindings path works on ROCm too.
| /* | ||
| Copyright 2025 Haihao Lu | ||
| Copyright (c) 2026 Advanced Micro Devices, Inc. | ||
|
|
||
| Author: Jeff Daily <jeff.daily@amd.com> | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
|
|
||
| /* | ||
| * CUDA to HIP compatibility header for cuPDLPx. | ||
| * | ||
| * This header maps CUDA API symbols to their HIP equivalents when building | ||
| * with USE_HIP. On CUDA builds, it simply includes the standard CUDA headers. | ||
| * Source files keep their CUDA spelling; this header handles the translation. | ||
| */ |
There was a problem hiding this comment.
Thanks a lot for the contribution. To keep the project headers simple and consistent, I’d prefer not to add per-author or per-company copyright lines in individual files. Could we use the existing cuPDLPx copyright header instead? The contribution will still be fully recorded in the git history.
This adds optional AMD GPU support to cuPDLPx through ROCm/HIP, alongside the existing CUDA path. The CUDA build is unchanged when the new option is off.
What changed
The port keeps every device source in its existing CUDA spelling and routes the CUDA APIs to their HIP equivalents through a single compatibility header,
internal/cuda_to_hip.h. On a HIP build that header maps the CUDA runtime, cuBLAS, cuSPARSE, and CUB symbols to hipRT, hipBLAS, hipSPARSE, and hipCUB; on a CUDA build it includes the standard CUDA headers, so nothing about the NVIDIA path changes.internal/cusparse_compat.hselects the standardhipsparseSpMVpath on ROCm, since hipSPARSE does not provide thecusparseSpMVOpvariant.The build system gains a
USE_HIPoption (off by default). When enabled, the project is configured with the HIP language, the.cusources are compiled as HIP, and the targets link hipBLAS, hipSPARSE, and hipCUB instead of the CUDA libraries. GPU architectures are chosen withCMAKE_HIP_ARCHITECTURES(defaulting togfx90a). On Windows, the CLI-onlymps_parser.cis excluded from the core library because it relies onstrtok_r.The interface test gains a case that runs the GPU solver path with presolve disabled, so the hipBLAS/hipSPARSE execution path is exercised end to end. Enabling the test suite (
-DCUPDLPX_BUILD_TESTS=ON) to run the new case surfaced four stalezero_toleranceassignments intest_interface.creferencing amatrix_desc_tfield that no longer exists; they are removed so the file compiles.Building for AMD GPUs
Set
CMAKE_HIP_ARCHITECTURESto match the target GPU (for examplegfx90afor MI200,gfx1100for RDNA3 desktop, orgfx1201for RDNA4). If the ROCm install is not on CMake's default search path, point-DCMAKE_PREFIX_PATHat it (e.g./opt/rocm) sofind_packagecan locate hip, hipBLAS, hipSPARSE, and hipCUB. The resulting./build/cupdlpxbinary is invoked exactly as in the CUDA build. The README documents this alongside the CUDA build instructions.Validation
Built and run on an AMD Instinct MI200 (gfx90a) with ROCm 7.2.1: the full interface test suite passes, including the GPU solver case that exercises the hipBLAS and hipSPARSE paths (Status: OPTIMAL). The same configuration builds cleanly for gfx1100 (RDNA3) and gfx1201 (RDNA4). The CUDA build path is unaffected by these changes.
This work was prepared with assistance from Claude, an AI assistant by Anthropic.