[ROCm] Add AMD ROCm/HIP build for the CUDA particle/SPH solvers by jeffdaily · Pull Request #145 · utilForever/CubbyFlow

jeffdaily · 2026-06-19T05:04:59Z

This adds an AMD ROCm/HIP build path for CubbyFlow's CUDA particle and SPH solvers (SPH, WCSPH, PCISPH, the particle-system solvers, the point-hash-grid neighbor searcher, and the CUDA array/container layer) so they run on AMD GPUs in addition to NVIDIA GPUs, while keeping the existing CUDA build unchanged when USE_HIP is off.

What changed

The CUDA sources are compiled with hipcc rather than rewritten. A small compatibility shim, Includes/Core/CUDA/cuda_to_hip.h, is force-included on every HIP translation unit and maps the handful of CUDA runtime symbols the project uses (cudaMalloc, cudaMemcpy, error helpers, and the memcpy-kind enumerators) onto their hip* equivalents. The shim also reconciles the preprocessor idioms the codebase relies on: the project gates its kernel and host/device attribute macros on __CUDACC__ and selects device-vs-host accessor return types on __CUDA_ARCH__, so those guards are extended in place to also accept the HIP compile passes. __CUDACC__ is deliberately left undefined under HIP because rocThrust keys its backend selection on it; instead Thrust is pinned to its HIP backend so it does not pull a CUDA-only CUB header.

The remaining edits are confined to the CUDA headers and sources under Includes/Core/CUDA and Sources/Core/CUDA (extending the __CUDACC__/__CUDA_ARCH__ guards), the CMake build (a mutually exclusive USE_HIP option that enables the HIP language, sets CMAKE_HIP_ARCHITECTURES, and force-includes the shim), the CUDA examples and tests so they build under HIP, and the install documentation.

How to build the ROCm/HIP path

NVIDIA CUDA remains the default (USE_CUDA=ON). To build for AMD GPUs instead, configure with USE_HIP=ON:

cmake -B build -S . -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a
cmake --build build -j

CMAKE_HIP_ARCHITECTURES accepts a semicolon-separated list of AMD GPU targets and defaults to gfx90a when omitted. A working ROCm installation with HIP is required. The new section in Documents/Install.md documents this alongside the existing build instructions.

Validation

Built and run on AMD Instinct MI250X (gfx90a) and Radeon (gfx1100) on Linux, and on gfx1201 on Windows (ROCm/TheRock SDK). The Tests/CUDATests suite passes (35/35 cases, 3168/3168 assertions), the CPU UnitTests remain green (722/722), and the CUDASPHSim example runs the full GPU pipeline end to end. The CUDA build and behavior are unchanged when USE_HIP is off.

This work was authored with the assistance of Claude (Anthropic's AI assistant).

Summary by CodeRabbit

Release Notes

New Features
- Added AMD ROCm/HIP GPU acceleration support as an alternative to CUDA.
- Updated GPU examples and tests to build/run for HIP as well as CUDA.
Documentation
- Added “Building with GPU Acceleration” instructions for CUDA vs ROCm/HIP, including USE_HIP=ON and optional HIP architecture configuration (defaults to gfx90a when unset).
Build System
- Introduced a USE_HIP build option that switches to HIP compilation and disables CUDA for HIP builds.
Bug Fixes
- Strengthened CUDA test checks by asserting successful GPU synchronization.

This adds an AMD ROCm/HIP build path for CubbyFlow's CUDA particle and SPH solvers (SPH, WCSPH, PCISPH, the particle-system solvers, the point-hash-grid neighbor searcher, and the CUDA array/container layer) so they run on AMD GPUs in addition to NVIDIA GPUs, while keeping the existing CUDA build unchanged when USE_HIP is off. The CUDA sources are compiled with hipcc rather than rewritten. A small compatibility shim, Includes/Core/CUDA/cuda_to_hip.h, is force-included on every HIP translation unit and maps the handful of CUDA runtime symbols the project uses (cudaMalloc, cudaMemcpy, error helpers, and the memcpy-kind enumerators) onto their hip* equivalents. The shim also reconciles the preprocessor idioms the codebase relies on: the project gates its kernel and host/device attribute macros on __CUDACC__ and selects device-vs-host accessor return types on __CUDA_ARCH__, so those guards are extended in place to also accept the HIP compile passes. __CUDACC__ is deliberately left undefined under HIP because rocThrust keys its backend selection on it; instead Thrust is pinned to its HIP backend so it does not pull a CUDA-only CUB header. To review: start with cuda_to_hip.h and Includes/Core/Utils/Macros.hpp (the guard extensions), then the CMake USE_HIP option (enables the HIP language, sets CMAKE_HIP_ARCHITECTURES, force-includes the shim, marks the CUDA directory's .cu and device-touching .cpp as LANGUAGE HIP), then the mechanical __CUDACC__/__CUDA_ARCH__ guard widening across Includes/Core/CUDA and Sources/Core/CUDA. NVIDIA CUDA remains the default; configure with -DUSE_HIP=ON to build for AMD. Test Plan: ``` git submodule update --init --recursive cmake -B build -S . -DCMAKE_BUILD_TYPE=Release -DUSE_HIP=ON \ -DCMAKE_HIP_ARCHITECTURES=gfx90a \ -DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \ -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON cmake --build build -j$(nproc) ./build/bin/CUDAUnitTests ``` Built and run on AMD Instinct MI250X (gfx90a) and Radeon (gfx1100) on Linux, and on gfx1201 on Windows (ROCm/TheRock SDK). CUDATests pass (35/35 cases, 3168/3168 assertions: CUDA array/vector/stdarray, particle system data and solver, point hash grid searcher cross-checked against the CPU searcher), the CPU UnitTests remain green (722/722), and the CUDASPHSim example runs the full GPU pipeline end to end. The CUDA build and behavior are unchanged when USE_HIP is off. This work was authored with the assistance of Claude (Anthropic's AI assistant).

coderabbitai · 2026-06-19T05:05:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 02923b31-47e2-46e0-a793-f8f6a159c048

📥 Commits

Reviewing files that changed from the base of the PR and between 9c90155 and 62f4604.

📒 Files selected for processing (1)

CMakeLists.txt

📜 Recent review details

🔇 Additional comments (1)

CMakeLists.txt (1)

47-72: LGTM!

Also applies to: 182-183, 193-195, 208-214

📝 Walkthrough

Walkthrough

The PR adds an AMD ROCm/HIP GPU backend as a mutually exclusive alternative to the existing CUDA backend. Changes include a new USE_HIP CMake option, a new cuda_to_hip.h compatibility shim that maps CUDA runtime API calls to HIP equivalents, HIP-aware preprocessor guards across all CUDA headers, CUBBYFLOW_CUDA_HOST_DEVICE annotations on kernel and array functions, and updated CMakeLists for all affected build targets.

Changes

AMD ROCm/HIP Backend Support

Layer / File(s)	Summary
HIP compatibility shim and base macros `Includes/Core/CUDA/cuda_to_hip.h`, `Includes/Core/Utils/Macros.hpp`	New `cuda_to_hip.h` header maps CUDA runtime API symbols to HIP equivalents and aligns `__CUDA_ARCH__` for HIP device-compile passes. `Macros.hpp` activates `CUBBYFLOW_CUDA_DEVICE`/`HOST`, alignment macros, and the CUDA error-check macro under both `__CUDACC__` and `__HIPCC__`.
CMake build system: USE_HIP option and target wiring `CMakeLists.txt`, `Builds/CMake/CompileOptions.cmake`, `Sources/Core/CMakeLists.txt`, `Examples/CUDASPHSim/CMakeLists.txt`, `Tests/CUDATests/CMakeLists.txt`	Adds `USE_HIP` option with mutual exclusion against `USE_CUDA`, enables HIP language, injects the compat header via `CMAKE_HIP_FLAGS`, pins Thrust to the HIP device system. Broadens `Tests/CUDATests` and `Examples/CUDASPHSim` to `USE_CUDA OR USE_HIP`, excludes Python API bindings when either GPU backend is active. Each sub-target CMakeLists adds a `USE_HIP` branch using `add_executable` with `LANGUAGE HIP`. Adds HIP-specific warning suppressions in CompileOptions.cmake.
Conditional `cuda_runtime.h` → `cuda_to_hip.h` includes `Includes/Core/CUDA/CUDAUtils.hpp`, `Includes/Core/CUDA/ThrustUtils.hpp`, `Includes/Core/CUDA/CUDAParticleSystemData.hpp`, `Includes/Core/CUDA/CUDAPointHashGridSearcher.hpp`, `Sources/Core/CUDA/CUDASPHSystemData*Func.hpp`, `Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu`, `Includes/Core/CUDA/CUDAAlgorithms.hpp`, `Includes/Core/CUDA/CUDAArray-Impl.hpp`	Replaces unconditional `cuda_runtime.h` includes with `__HIP__`-guarded conditional includes of `cuda_to_hip.h` across all affected headers and sources. Widens `__CUDACC__`-only guards to `__CUDACC__ \|\| __HIPCC__` where needed. Adds `#if !defined(__HIP__)` guard around CUDA float vector operator overloads in `CUDAUtils.hpp`.
`CUBBYFLOW_CUDA_HOST_DEVICE` annotations on SPH kernels, CUDAArrayView, and CUDAStdArray `Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp`, `Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp`, `Includes/Core/CUDA/CUDAArrayView-Impl.hpp`, `Includes/Core/CUDA/CUDAStdArray-Impl.hpp`	Adds `CUBBYFLOW_CUDA_HOST_DEVICE` to all `CUDASPHStdKernel2/SpikyKernel2` and `CUDASPHStdKernel3/SpikyKernel3` member function definitions, all `CUDAArrayView` constructor/assignment/`Set`/`Fill` definitions. Reformats `CUDAStdArray-Impl.hpp` signatures without logic changes.
CUDAStdVector and CUDAArrayBase HIP-aware accessor branching `Includes/Core/CUDA/CUDAStdVector.hpp`, `Includes/Core/CUDA/CUDAStdVector-Impl.hpp`, `Includes/Core/CUDA/CUDAArrayBase.hpp`, `Includes/Core/CUDA/CUDAArrayBase-Impl.hpp`	Restructures `ReferenceType` and `At`/`operator[]` overload selection from `__CUDA_ARCH__` vs `else` into `__HIP__ \|\| __CUDA_ARCH__` vs `__HIP__ \|\| !__CUDA_ARCH__` splits with explicit `CUBBYFLOW_CUDA_HOST` on host overloads. Adds a full `__HIP__` branch in `CUDAArrayBase.hpp` declaring device+host accessor overloads. Updates helper methods with `CUBBYFLOW_CUDA_HOST_DEVICE`.
Documentation and test fixes `Documents/Install.md`, `Tests/CUDATests/CUDAArray2Tests.cu`	Adds a "Building with GPU Acceleration" section in `Install.md` describing the CUDA vs HIP backend selection with example CMake commands. Asserts `cudaDeviceSynchronize()` return value in resize verification paths.

Sequence Diagram(s)

sequenceDiagram
    participant CMake as CMake Configure
    participant hipcc as hipcc Compiler
    participant cuda_to_hip as cuda_to_hip.h
    participant hip_runtime as hip/hip_runtime.h
    participant CUDAHeader as CUDAArray/SPH Headers

    CMake->>CMake: USE_HIP=ON → set USE_CUDA=OFF
    CMake->>CMake: enable_language(HIP), set CMAKE_HIP_FLAGS -include cuda_to_hip.h
    CMake->>hipcc: compile *.cu with LANGUAGE HIP
    hipcc->>cuda_to_hip: force-include cuda_to_hip.h
    cuda_to_hip->>hip_runtime: include hip/hip_runtime.h
    cuda_to_hip->>cuda_to_hip: define __CUDA_ARCH__=1 on device pass
    cuda_to_hip->>cuda_to_hip: remap cudaMalloc→hipMalloc, cudaMemcpy→hipMemcpy, etc.
    hipcc->>CUDAHeader: compile with __HIP__ defined
    CUDAHeader->>CUDAHeader: select __HIP__-aware accessor branches
    CUDAHeader->>CUDAHeader: compile CUBBYFLOW_CUDA_HOST_DEVICE functions for host+device

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop, across the silicon divide,
From NVIDIA green to AMD's red tide!
cuda_to_hip maps each call with care,
__HIP__ guards sprout everywhere.
Two GPU backends, one project to share—
The rabbit typed USE_HIP=ON with flair! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding AMD ROCm/HIP build support to CUDA particle/SPH solvers. It is specific, directly related to the primary objective, and avoids generic terms.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

CMakeLists.txt (1)

70-71: ⚡ Quick win

Avoid hard-coding Thrust’s HIP backend numeric ID.

Line 71 uses THRUST_DEVICE_SYSTEM=5; that value is an internal constant and can drift with Thrust revisions. Prefer the symbolic macro to keep this resilient.

Suggested fix

-    # THRUST_DEVICE_SYSTEM_HIP == 5 in thrust/detail/config/device_system.h.
-    add_compile_definitions(THRUST_DEVICE_SYSTEM=5)
+    add_compile_definitions(THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 70 - 71, In the CMakeLists.txt file, the
add_compile_definitions call on line 71 hard-codes the numeric value 5 for
THRUST_DEVICE_SYSTEM, which is an internal constant that may change across
Thrust versions. Replace the hard-coded numeric value 5 with the symbolic macro
THRUST_DEVICE_SYSTEM_HIP to ensure the configuration remains resilient to Thrust
revisions and is more maintainable.

Tests/CUDATests/CUDAArray2Tests.cu (1)

302-302: ⚡ Quick win

Assert cudaDeviceSynchronize() success instead of discarding it.

Line 302 currently suppresses the return value; this can hide backend/runtime failures in the resize test.

Suggested fix

-        static_cast<void>(cudaDeviceSynchronize());
+        CHECK_EQ(cudaSuccess, cudaDeviceSynchronize());

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Tests/CUDATests/CUDAArray2Tests.cu` at line 302, The call to
cudaDeviceSynchronize() is being cast to void, which suppresses its return value
and can hide backend/runtime failures. Replace the static_cast that discards the
return value with an assertion that checks whether cudaDeviceSynchronize()
returns successfully, ensuring that any CUDA runtime errors during device
synchronization in the resize test are properly caught and reported.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CMakeLists.txt`:
- Around line 64-65: The CUBBYFLOW_CUDA_TO_HIP_HEADER variable reference in the
CMAKE_HIP_FLAGS set command is not quoted, which causes word-splitting issues if
the path contains spaces. Add quotes around the ${CUBBYFLOW_CUDA_TO_HIP_HEADER}
variable substitution to ensure the absolute path is treated as a single
argument to the -include compiler flag, protecting it from being split on
whitespace during argument parsing.

In `@Documents/Install.md`:
- Around line 237-242: The fenced code block containing the mkdir, cd, cmake,
and make commands is missing a language specification after the opening triple
backticks, which triggers the MD040 linting rule. Add "bash" as the language
identifier immediately after the opening triple backticks to specify that this
is a bash shell command block and resolve the linting violation.

---

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 70-71: In the CMakeLists.txt file, the add_compile_definitions
call on line 71 hard-codes the numeric value 5 for THRUST_DEVICE_SYSTEM, which
is an internal constant that may change across Thrust versions. Replace the
hard-coded numeric value 5 with the symbolic macro THRUST_DEVICE_SYSTEM_HIP to
ensure the configuration remains resilient to Thrust revisions and is more
maintainable.

In `@Tests/CUDATests/CUDAArray2Tests.cu`:
- Line 302: The call to cudaDeviceSynchronize() is being cast to void, which
suppresses its return value and can hide backend/runtime failures. Replace the
static_cast that discards the return value with an assertion that checks whether
cudaDeviceSynchronize() returns successfully, ensuring that any CUDA runtime
errors during device synchronization in the resize test are properly caught and
reported.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 587d8c24-6b54-4254-a701-3b4c5a3099b0

📥 Commits

Reviewing files that changed from the base of the PR and between 5b786fe and a15f841.

📒 Files selected for processing (28)

Builds/CMake/CompileOptions.cmake
CMakeLists.txt
Documents/Install.md
Examples/CUDASPHSim/CMakeLists.txt
Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu
Includes/Core/CUDA/CUDAAlgorithms.hpp
Includes/Core/CUDA/CUDAArray-Impl.hpp
Includes/Core/CUDA/CUDAArrayBase-Impl.hpp
Includes/Core/CUDA/CUDAArrayBase.hpp
Includes/Core/CUDA/CUDAArrayView-Impl.hpp
Includes/Core/CUDA/CUDAParticleSystemData2.hpp
Includes/Core/CUDA/CUDAParticleSystemData3.hpp
Includes/Core/CUDA/CUDAPointHashGridSearcher2.hpp
Includes/Core/CUDA/CUDAPointHashGridSearcher3.hpp
Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp
Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp
Includes/Core/CUDA/CUDAStdArray-Impl.hpp
Includes/Core/CUDA/CUDAStdVector-Impl.hpp
Includes/Core/CUDA/CUDAStdVector.hpp
Includes/Core/CUDA/CUDAUtils.hpp
Includes/Core/CUDA/ThrustUtils.hpp
Includes/Core/CUDA/cuda_to_hip.h
Includes/Core/Utils/Macros.hpp
Sources/Core/CMakeLists.txt
Sources/Core/CUDA/CUDASPHSystemData2Func.hpp
Sources/Core/CUDA/CUDASPHSystemData3Func.hpp
Tests/CUDATests/CMakeLists.txt
Tests/CUDATests/CUDAArray2Tests.cu

📜 Review details

🧰 Additional context used

🪛 markdownlint-cli2 (0.22.1)

Documents/Install.md

[warning] 237-237: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (25)

Includes/Core/CUDA/cuda_to_hip.h (1)

14-50: LGTM!

Includes/Core/CUDA/ThrustUtils.hpp (1)

18-22: LGTM!

Includes/Core/CUDA/CUDAParticleSystemData2.hpp (1)

21-25: LGTM!

Includes/Core/CUDA/CUDAParticleSystemData3.hpp (1)

21-25: LGTM!

Includes/Core/CUDA/CUDAPointHashGridSearcher2.hpp (1)

19-23: LGTM!

Includes/Core/CUDA/CUDAPointHashGridSearcher3.hpp (1)

19-23: LGTM!

Includes/Core/Utils/Macros.hpp (1)

17-23: LGTM!

Also applies to: 27-35, 38-52

Includes/Core/CUDA/CUDAUtils.hpp (1)

18-22: LGTM!

Also applies to: 44-48, 353-353

Sources/Core/CUDA/CUDASPHSystemData2Func.hpp (1)

20-24: LGTM!

Sources/Core/CUDA/CUDASPHSystemData3Func.hpp (1)

20-24: LGTM!

Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu (1)

18-22: LGTM!

Includes/Core/CUDA/CUDAAlgorithms.hpp (1)

22-22: LGTM!

Includes/Core/CUDA/CUDAArray-Impl.hpp (1)

18-18: LGTM!

Also applies to: 143-143, 357-357

Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp (1)

21-148: LGTM!

Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp (1)

21-144: LGTM!

Includes/Core/CUDA/CUDAArrayView-Impl.hpp (1)

19-184: LGTM!

Also applies to: 186-188

Includes/Core/CUDA/CUDAStdArray-Impl.hpp (1)

19-27: LGTM!

Also applies to: 36-37, 45-46, 54-55, 64-65, 74-75, 86-87, 98-99, 120-123, 127-130, 134-135, 149-153, 157-163, 167-168

Includes/Core/CUDA/CUDAStdVector.hpp (1)

68-80: LGTM!

Also applies to: 125-131, 160-168

Includes/Core/CUDA/CUDAStdVector-Impl.hpp (1)

98-123: LGTM!

Also applies to: 238-263

Includes/Core/CUDA/CUDAArrayBase.hpp (1)

59-115: LGTM!

Includes/Core/CUDA/CUDAArrayBase-Impl.hpp (1)

19-38: LGTM!

Also applies to: 41-80, 92-95, 185-268, 271-351, 359-359

Builds/CMake/CompileOptions.cmake (1)

128-133: LGTM!

Also applies to: 143-163

Sources/Core/CMakeLists.txt (1)

21-31: LGTM!

Examples/CUDASPHSim/CMakeLists.txt (1)

8-9: LGTM!

Also applies to: 22-25

Tests/CUDATests/CMakeLists.txt (1)

10-18: LGTM!

Use the symbolic THRUST_DEVICE_SYSTEM_HIP macro instead of its numeric value when pinning rocThrust's backend, quote the force-included shim path so a source tree containing spaces still passes a single -include argument, assert the cudaDeviceSynchronize() result in the CUDAArray2 resize test instead of discarding it, and tag the HIP build's fenced code block in the install docs with a language. These are behavior-preserving: the Thrust macro expands to the same value and the quoted path resolves identically, so the compiled device code is unchanged on both the CUDA and HIP paths. Test Plan: cmake .. -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a make ./bin/CUDATests Authored with the assistance of Claude (Anthropic's AI assistant).

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Tests/CUDATests/CUDAArray2Tests.cu`:
- Line 302: At line 330, there is a similar arr.Resize(1, 9, 3.f) operation with
a fill value that performs async kernel operations, but it lacks a
synchronization check unlike the one added at line 302. Add a
CHECK_EQ(cudaSuccess, cudaDeviceSynchronize()) call immediately after the
arr.Resize operation at line 330 to ensure the async resize-with-fill kernel
completes before proceeding with value verification, maintaining consistency
with line 302 and preventing potential flaky tests on the HIP backend.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f7a2d4d-7779-4244-9741-3dfb3fba28ee

📥 Commits

Reviewing files that changed from the base of the PR and between a15f841 and 8b88013.

📒 Files selected for processing (3)

CMakeLists.txt
Documents/Install.md
Tests/CUDATests/CUDAArray2Tests.cu

✅ Files skipped from review due to trivial changes (1)

Documents/Install.md

🚧 Files skipped from review as they are similar to previous changes (1)

CMakeLists.txt

Mirror the cudaDeviceSynchronize() success check onto the second resize-with-fill case in the CUDAArray2 resize test so both async fill kernels are confirmed complete before their results are read, keeping the two cases consistent. Test Plan: cmake .. -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a make ./bin/CUDATests Authored with the assistance of Claude (Anthropic's AI assistant).

The default-gfx90a block preempted and duplicated CMake's own host-GPU detection. Removing it lets enable_language(HIP) honor an explicit -DCMAKE_HIP_ARCHITECTURES, auto-detect the host GPU, or error on a no-GPU build host. The block was also dead code: it ran after enable_language(HIP), by which point CMAKE_HIP_ARCHITECTURES was already set, so its guard was always false. This work was authored with the Claude AI assistant.

pull-request-size Bot added the size/XL label Jun 19, 2026

coderabbitai Bot requested changes Jun 19, 2026

View reviewed changes

Comment thread CMakeLists.txt Outdated

Comment thread Documents/Install.md Outdated

coderabbitai Bot requested changes Jun 19, 2026

View reviewed changes

Comment thread Tests/CUDATests/CUDAArray2Tests.cu

coderabbitai Bot approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Add AMD ROCm/HIP build for the CUDA particle/SPH solvers#145

[ROCm] Add AMD ROCm/HIP build for the CUDA particle/SPH solvers#145
jeffdaily wants to merge 4 commits into
utilForever:mainfrom
jeffdaily:moat-port

jeffdaily commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeffdaily commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

How to build the ROCm/HIP path

Validation

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeffdaily commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading