Skip to content

[ROCm] Add AMD ROCm/HIP build for the CUDA particle/SPH solvers#145

Open
jeffdaily wants to merge 4 commits into
utilForever:mainfrom
jeffdaily:moat-port
Open

[ROCm] Add AMD ROCm/HIP build for the CUDA particle/SPH solvers#145
jeffdaily wants to merge 4 commits into
utilForever:mainfrom
jeffdaily:moat-port

Conversation

@jeffdaily

@jeffdaily jeffdaily commented Jun 19, 2026

Copy link
Copy Markdown

This adds an AMD ROCm/HIP build path for CubbyFlow's CUDA particle and SPH solvers (SPH, WCSPH, PCISPH, the particle-system solvers, the point-hash-grid neighbor searcher, and the CUDA array/container layer) so they run on AMD GPUs in addition to NVIDIA GPUs, while keeping the existing CUDA build unchanged when USE_HIP is off.

What changed

The CUDA sources are compiled with hipcc rather than rewritten. A small compatibility shim, Includes/Core/CUDA/cuda_to_hip.h, is force-included on every HIP translation unit and maps the handful of CUDA runtime symbols the project uses (cudaMalloc, cudaMemcpy, error helpers, and the memcpy-kind enumerators) onto their hip* equivalents. The shim also reconciles the preprocessor idioms the codebase relies on: the project gates its kernel and host/device attribute macros on __CUDACC__ and selects device-vs-host accessor return types on __CUDA_ARCH__, so those guards are extended in place to also accept the HIP compile passes. __CUDACC__ is deliberately left undefined under HIP because rocThrust keys its backend selection on it; instead Thrust is pinned to its HIP backend so it does not pull a CUDA-only CUB header.

The remaining edits are confined to the CUDA headers and sources under Includes/Core/CUDA and Sources/Core/CUDA (extending the __CUDACC__/__CUDA_ARCH__ guards), the CMake build (a mutually exclusive USE_HIP option that enables the HIP language, sets CMAKE_HIP_ARCHITECTURES, and force-includes the shim), the CUDA examples and tests so they build under HIP, and the install documentation.

How to build the ROCm/HIP path

NVIDIA CUDA remains the default (USE_CUDA=ON). To build for AMD GPUs instead, configure with USE_HIP=ON:

cmake -B build -S . -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a
cmake --build build -j

CMAKE_HIP_ARCHITECTURES accepts a semicolon-separated list of AMD GPU targets and defaults to gfx90a when omitted. A working ROCm installation with HIP is required. The new section in Documents/Install.md documents this alongside the existing build instructions.

Validation

Built and run on AMD Instinct MI250X (gfx90a) and Radeon (gfx1100) on Linux, and on gfx1201 on Windows (ROCm/TheRock SDK). The Tests/CUDATests suite passes (35/35 cases, 3168/3168 assertions), the CPU UnitTests remain green (722/722), and the CUDASPHSim example runs the full GPU pipeline end to end. The CUDA build and behavior are unchanged when USE_HIP is off.

This work was authored with the assistance of Claude (Anthropic's AI assistant).

Summary by CodeRabbit

Release Notes

  • New Features

    • Added AMD ROCm/HIP GPU acceleration support as an alternative to CUDA.
    • Updated GPU examples and tests to build/run for HIP as well as CUDA.
  • Documentation

    • Added “Building with GPU Acceleration” instructions for CUDA vs ROCm/HIP, including USE_HIP=ON and optional HIP architecture configuration (defaults to gfx90a when unset).
  • Build System

    • Introduced a USE_HIP build option that switches to HIP compilation and disables CUDA for HIP builds.
  • Bug Fixes

    • Strengthened CUDA test checks by asserting successful GPU synchronization.

This adds an AMD ROCm/HIP build path for CubbyFlow's CUDA particle and SPH solvers (SPH, WCSPH, PCISPH, the particle-system solvers, the point-hash-grid neighbor searcher, and the CUDA array/container layer) so they run on AMD GPUs in addition to NVIDIA GPUs, while keeping the existing CUDA build unchanged when USE_HIP is off.

The CUDA sources are compiled with hipcc rather than rewritten. A small compatibility shim, Includes/Core/CUDA/cuda_to_hip.h, is force-included on every HIP translation unit and maps the handful of CUDA runtime symbols the project uses (cudaMalloc, cudaMemcpy, error helpers, and the memcpy-kind enumerators) onto their hip* equivalents. The shim also reconciles the preprocessor idioms the codebase relies on: the project gates its kernel and host/device attribute macros on __CUDACC__ and selects device-vs-host accessor return types on __CUDA_ARCH__, so those guards are extended in place to also accept the HIP compile passes. __CUDACC__ is deliberately left undefined under HIP because rocThrust keys its backend selection on it; instead Thrust is pinned to its HIP backend so it does not pull a CUDA-only CUB header.

To review: start with cuda_to_hip.h and Includes/Core/Utils/Macros.hpp (the guard extensions), then the CMake USE_HIP option (enables the HIP language, sets CMAKE_HIP_ARCHITECTURES, force-includes the shim, marks the CUDA directory's .cu and device-touching .cpp as LANGUAGE HIP), then the mechanical __CUDACC__/__CUDA_ARCH__ guard widening across Includes/Core/CUDA and Sources/Core/CUDA. NVIDIA CUDA remains the default; configure with -DUSE_HIP=ON to build for AMD.

Test Plan:

```
git submodule update --init --recursive
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release -DUSE_HIP=ON \
  -DCMAKE_HIP_ARCHITECTURES=gfx90a \
  -DCMAKE_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ \
  -DBUILD_TESTS=ON -DBUILD_EXAMPLES=ON
cmake --build build -j$(nproc)
./build/bin/CUDAUnitTests
```

Built and run on AMD Instinct MI250X (gfx90a) and Radeon (gfx1100) on Linux, and on gfx1201 on Windows (ROCm/TheRock SDK). CUDATests pass (35/35 cases, 3168/3168 assertions: CUDA array/vector/stdarray, particle system data and solver, point hash grid searcher cross-checked against the CPU searcher), the CPU UnitTests remain green (722/722), and the CUDASPHSim example runs the full GPU pipeline end to end. The CUDA build and behavior are unchanged when USE_HIP is off.

This work was authored with the assistance of Claude (Anthropic's AI assistant).
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 02923b31-47e2-46e0-a793-f8f6a159c048

📥 Commits

Reviewing files that changed from the base of the PR and between 9c90155 and 62f4604.

📒 Files selected for processing (1)
  • CMakeLists.txt
📜 Recent review details
🔇 Additional comments (1)
CMakeLists.txt (1)

47-72: LGTM!

Also applies to: 182-183, 193-195, 208-214


📝 Walkthrough

Walkthrough

The PR adds an AMD ROCm/HIP GPU backend as a mutually exclusive alternative to the existing CUDA backend. Changes include a new USE_HIP CMake option, a new cuda_to_hip.h compatibility shim that maps CUDA runtime API calls to HIP equivalents, HIP-aware preprocessor guards across all CUDA headers, CUBBYFLOW_CUDA_HOST_DEVICE annotations on kernel and array functions, and updated CMakeLists for all affected build targets.

Changes

AMD ROCm/HIP Backend Support

Layer / File(s) Summary
HIP compatibility shim and base macros
Includes/Core/CUDA/cuda_to_hip.h, Includes/Core/Utils/Macros.hpp
New cuda_to_hip.h header maps CUDA runtime API symbols to HIP equivalents and aligns __CUDA_ARCH__ for HIP device-compile passes. Macros.hpp activates CUBBYFLOW_CUDA_DEVICE/HOST, alignment macros, and the CUDA error-check macro under both __CUDACC__ and __HIPCC__.
CMake build system: USE_HIP option and target wiring
CMakeLists.txt, Builds/CMake/CompileOptions.cmake, Sources/Core/CMakeLists.txt, Examples/CUDASPHSim/CMakeLists.txt, Tests/CUDATests/CMakeLists.txt
Adds USE_HIP option with mutual exclusion against USE_CUDA, enables HIP language, injects the compat header via CMAKE_HIP_FLAGS, pins Thrust to the HIP device system. Broadens Tests/CUDATests and Examples/CUDASPHSim to USE_CUDA OR USE_HIP, excludes Python API bindings when either GPU backend is active. Each sub-target CMakeLists adds a USE_HIP branch using add_executable with LANGUAGE HIP. Adds HIP-specific warning suppressions in CompileOptions.cmake.
Conditional cuda_runtime.hcuda_to_hip.h includes
Includes/Core/CUDA/CUDAUtils.hpp, Includes/Core/CUDA/ThrustUtils.hpp, Includes/Core/CUDA/CUDAParticleSystemData*.hpp, Includes/Core/CUDA/CUDAPointHashGridSearcher*.hpp, Sources/Core/CUDA/CUDASPHSystemData*Func.hpp, Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu, Includes/Core/CUDA/CUDAAlgorithms.hpp, Includes/Core/CUDA/CUDAArray-Impl.hpp
Replaces unconditional cuda_runtime.h includes with __HIP__-guarded conditional includes of cuda_to_hip.h across all affected headers and sources. Widens __CUDACC__-only guards to __CUDACC__ || __HIPCC__ where needed. Adds #if !defined(__HIP__) guard around CUDA float vector operator overloads in CUDAUtils.hpp.
CUBBYFLOW_CUDA_HOST_DEVICE annotations on SPH kernels, CUDAArrayView, and CUDAStdArray
Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp, Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp, Includes/Core/CUDA/CUDAArrayView-Impl.hpp, Includes/Core/CUDA/CUDAStdArray-Impl.hpp
Adds CUBBYFLOW_CUDA_HOST_DEVICE to all CUDASPHStdKernel2/SpikyKernel2 and CUDASPHStdKernel3/SpikyKernel3 member function definitions, all CUDAArrayView constructor/assignment/Set/Fill definitions. Reformats CUDAStdArray-Impl.hpp signatures without logic changes.
CUDAStdVector and CUDAArrayBase HIP-aware accessor branching
Includes/Core/CUDA/CUDAStdVector.hpp, Includes/Core/CUDA/CUDAStdVector-Impl.hpp, Includes/Core/CUDA/CUDAArrayBase.hpp, Includes/Core/CUDA/CUDAArrayBase-Impl.hpp
Restructures ReferenceType and At/operator[] overload selection from __CUDA_ARCH__ vs else into __HIP__ || __CUDA_ARCH__ vs __HIP__ || !__CUDA_ARCH__ splits with explicit CUBBYFLOW_CUDA_HOST on host overloads. Adds a full __HIP__ branch in CUDAArrayBase.hpp declaring device+host accessor overloads. Updates helper methods with CUBBYFLOW_CUDA_HOST_DEVICE.
Documentation and test fixes
Documents/Install.md, Tests/CUDATests/CUDAArray2Tests.cu
Adds a "Building with GPU Acceleration" section in Install.md describing the CUDA vs HIP backend selection with example CMake commands. Asserts cudaDeviceSynchronize() return value in resize verification paths.

Sequence Diagram(s)

sequenceDiagram
    participant CMake as CMake Configure
    participant hipcc as hipcc Compiler
    participant cuda_to_hip as cuda_to_hip.h
    participant hip_runtime as hip/hip_runtime.h
    participant CUDAHeader as CUDAArray/SPH Headers

    CMake->>CMake: USE_HIP=ON → set USE_CUDA=OFF
    CMake->>CMake: enable_language(HIP), set CMAKE_HIP_FLAGS -include cuda_to_hip.h
    CMake->>hipcc: compile *.cu with LANGUAGE HIP
    hipcc->>cuda_to_hip: force-include cuda_to_hip.h
    cuda_to_hip->>hip_runtime: include hip/hip_runtime.h
    cuda_to_hip->>cuda_to_hip: define __CUDA_ARCH__=1 on device pass
    cuda_to_hip->>cuda_to_hip: remap cudaMalloc→hipMalloc, cudaMemcpy→hipMemcpy, etc.
    hipcc->>CUDAHeader: compile with __HIP__ defined
    CUDAHeader->>CUDAHeader: select __HIP__-aware accessor branches
    CUDAHeader->>CUDAHeader: compile CUBBYFLOW_CUDA_HOST_DEVICE functions for host+device
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hop, hop, across the silicon divide,
From NVIDIA green to AMD's red tide!
cuda_to_hip maps each call with care,
__HIP__ guards sprout everywhere.
Two GPU backends, one project to share—
The rabbit typed USE_HIP=ON with flair! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding AMD ROCm/HIP build support to CUDA particle/SPH solvers. It is specific, directly related to the primary objective, and avoids generic terms.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
CMakeLists.txt (1)

70-71: ⚡ Quick win

Avoid hard-coding Thrust’s HIP backend numeric ID.

Line 71 uses THRUST_DEVICE_SYSTEM=5; that value is an internal constant and can drift with Thrust revisions. Prefer the symbolic macro to keep this resilient.

Suggested fix
-    # THRUST_DEVICE_SYSTEM_HIP == 5 in thrust/detail/config/device_system.h.
-    add_compile_definitions(THRUST_DEVICE_SYSTEM=5)
+    add_compile_definitions(THRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CMakeLists.txt` around lines 70 - 71, In the CMakeLists.txt file, the
add_compile_definitions call on line 71 hard-codes the numeric value 5 for
THRUST_DEVICE_SYSTEM, which is an internal constant that may change across
Thrust versions. Replace the hard-coded numeric value 5 with the symbolic macro
THRUST_DEVICE_SYSTEM_HIP to ensure the configuration remains resilient to Thrust
revisions and is more maintainable.
Tests/CUDATests/CUDAArray2Tests.cu (1)

302-302: ⚡ Quick win

Assert cudaDeviceSynchronize() success instead of discarding it.

Line 302 currently suppresses the return value; this can hide backend/runtime failures in the resize test.

Suggested fix
-        static_cast<void>(cudaDeviceSynchronize());
+        CHECK_EQ(cudaSuccess, cudaDeviceSynchronize());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Tests/CUDATests/CUDAArray2Tests.cu` at line 302, The call to
cudaDeviceSynchronize() is being cast to void, which suppresses its return value
and can hide backend/runtime failures. Replace the static_cast that discards the
return value with an assertion that checks whether cudaDeviceSynchronize()
returns successfully, ensuring that any CUDA runtime errors during device
synchronization in the resize test are properly caught and reported.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CMakeLists.txt`:
- Around line 64-65: The CUBBYFLOW_CUDA_TO_HIP_HEADER variable reference in the
CMAKE_HIP_FLAGS set command is not quoted, which causes word-splitting issues if
the path contains spaces. Add quotes around the ${CUBBYFLOW_CUDA_TO_HIP_HEADER}
variable substitution to ensure the absolute path is treated as a single
argument to the -include compiler flag, protecting it from being split on
whitespace during argument parsing.

In `@Documents/Install.md`:
- Around line 237-242: The fenced code block containing the mkdir, cd, cmake,
and make commands is missing a language specification after the opening triple
backticks, which triggers the MD040 linting rule. Add "bash" as the language
identifier immediately after the opening triple backticks to specify that this
is a bash shell command block and resolve the linting violation.

---

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 70-71: In the CMakeLists.txt file, the add_compile_definitions
call on line 71 hard-codes the numeric value 5 for THRUST_DEVICE_SYSTEM, which
is an internal constant that may change across Thrust versions. Replace the
hard-coded numeric value 5 with the symbolic macro THRUST_DEVICE_SYSTEM_HIP to
ensure the configuration remains resilient to Thrust revisions and is more
maintainable.

In `@Tests/CUDATests/CUDAArray2Tests.cu`:
- Line 302: The call to cudaDeviceSynchronize() is being cast to void, which
suppresses its return value and can hide backend/runtime failures. Replace the
static_cast that discards the return value with an assertion that checks whether
cudaDeviceSynchronize() returns successfully, ensuring that any CUDA runtime
errors during device synchronization in the resize test are properly caught and
reported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 587d8c24-6b54-4254-a701-3b4c5a3099b0

📥 Commits

Reviewing files that changed from the base of the PR and between 5b786fe and a15f841.

📒 Files selected for processing (28)
  • Builds/CMake/CompileOptions.cmake
  • CMakeLists.txt
  • Documents/Install.md
  • Examples/CUDASPHSim/CMakeLists.txt
  • Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu
  • Includes/Core/CUDA/CUDAAlgorithms.hpp
  • Includes/Core/CUDA/CUDAArray-Impl.hpp
  • Includes/Core/CUDA/CUDAArrayBase-Impl.hpp
  • Includes/Core/CUDA/CUDAArrayBase.hpp
  • Includes/Core/CUDA/CUDAArrayView-Impl.hpp
  • Includes/Core/CUDA/CUDAParticleSystemData2.hpp
  • Includes/Core/CUDA/CUDAParticleSystemData3.hpp
  • Includes/Core/CUDA/CUDAPointHashGridSearcher2.hpp
  • Includes/Core/CUDA/CUDAPointHashGridSearcher3.hpp
  • Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp
  • Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp
  • Includes/Core/CUDA/CUDAStdArray-Impl.hpp
  • Includes/Core/CUDA/CUDAStdVector-Impl.hpp
  • Includes/Core/CUDA/CUDAStdVector.hpp
  • Includes/Core/CUDA/CUDAUtils.hpp
  • Includes/Core/CUDA/ThrustUtils.hpp
  • Includes/Core/CUDA/cuda_to_hip.h
  • Includes/Core/Utils/Macros.hpp
  • Sources/Core/CMakeLists.txt
  • Sources/Core/CUDA/CUDASPHSystemData2Func.hpp
  • Sources/Core/CUDA/CUDASPHSystemData3Func.hpp
  • Tests/CUDATests/CMakeLists.txt
  • Tests/CUDATests/CUDAArray2Tests.cu
📜 Review details
🧰 Additional context used
🪛 markdownlint-cli2 (0.22.1)
Documents/Install.md

[warning] 237-237: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (25)
Includes/Core/CUDA/cuda_to_hip.h (1)

14-50: LGTM!

Includes/Core/CUDA/ThrustUtils.hpp (1)

18-22: LGTM!

Includes/Core/CUDA/CUDAParticleSystemData2.hpp (1)

21-25: LGTM!

Includes/Core/CUDA/CUDAParticleSystemData3.hpp (1)

21-25: LGTM!

Includes/Core/CUDA/CUDAPointHashGridSearcher2.hpp (1)

19-23: LGTM!

Includes/Core/CUDA/CUDAPointHashGridSearcher3.hpp (1)

19-23: LGTM!

Includes/Core/Utils/Macros.hpp (1)

17-23: LGTM!

Also applies to: 27-35, 38-52

Includes/Core/CUDA/CUDAUtils.hpp (1)

18-22: LGTM!

Also applies to: 44-48, 353-353

Sources/Core/CUDA/CUDASPHSystemData2Func.hpp (1)

20-24: LGTM!

Sources/Core/CUDA/CUDASPHSystemData3Func.hpp (1)

20-24: LGTM!

Examples/CUDASPHSim/CUDAPCISPHSolver3Example.cu (1)

18-22: LGTM!

Includes/Core/CUDA/CUDAAlgorithms.hpp (1)

22-22: LGTM!

Includes/Core/CUDA/CUDAArray-Impl.hpp (1)

18-18: LGTM!

Also applies to: 143-143, 357-357

Includes/Core/CUDA/CUDASPHKernels2-Impl.hpp (1)

21-148: LGTM!

Includes/Core/CUDA/CUDASPHKernels3-Impl.hpp (1)

21-144: LGTM!

Includes/Core/CUDA/CUDAArrayView-Impl.hpp (1)

19-184: LGTM!

Also applies to: 186-188

Includes/Core/CUDA/CUDAStdArray-Impl.hpp (1)

19-27: LGTM!

Also applies to: 36-37, 45-46, 54-55, 64-65, 74-75, 86-87, 98-99, 120-123, 127-130, 134-135, 149-153, 157-163, 167-168

Includes/Core/CUDA/CUDAStdVector.hpp (1)

68-80: LGTM!

Also applies to: 125-131, 160-168

Includes/Core/CUDA/CUDAStdVector-Impl.hpp (1)

98-123: LGTM!

Also applies to: 238-263

Includes/Core/CUDA/CUDAArrayBase.hpp (1)

59-115: LGTM!

Includes/Core/CUDA/CUDAArrayBase-Impl.hpp (1)

19-38: LGTM!

Also applies to: 41-80, 92-95, 185-268, 271-351, 359-359

Builds/CMake/CompileOptions.cmake (1)

128-133: LGTM!

Also applies to: 143-163

Sources/Core/CMakeLists.txt (1)

21-31: LGTM!

Examples/CUDASPHSim/CMakeLists.txt (1)

8-9: LGTM!

Also applies to: 22-25

Tests/CUDATests/CMakeLists.txt (1)

10-18: LGTM!

Comment thread CMakeLists.txt Outdated
Comment thread Documents/Install.md Outdated
Use the symbolic THRUST_DEVICE_SYSTEM_HIP macro instead of its numeric
value when pinning rocThrust's backend, quote the force-included shim
path so a source tree containing spaces still passes a single -include
argument, assert the cudaDeviceSynchronize() result in the CUDAArray2
resize test instead of discarding it, and tag the HIP build's fenced
code block in the install docs with a language.

These are behavior-preserving: the Thrust macro expands to the same
value and the quoted path resolves identically, so the compiled device
code is unchanged on both the CUDA and HIP paths.

Test Plan:

    cmake .. -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a
    make
    ./bin/CUDATests

Authored with the assistance of Claude (Anthropic's AI assistant).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Tests/CUDATests/CUDAArray2Tests.cu`:
- Line 302: At line 330, there is a similar arr.Resize(1, 9, 3.f) operation with
a fill value that performs async kernel operations, but it lacks a
synchronization check unlike the one added at line 302. Add a
CHECK_EQ(cudaSuccess, cudaDeviceSynchronize()) call immediately after the
arr.Resize operation at line 330 to ensure the async resize-with-fill kernel
completes before proceeding with value verification, maintaining consistency
with line 302 and preventing potential flaky tests on the HIP backend.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2f7a2d4d-7779-4244-9741-3dfb3fba28ee

📥 Commits

Reviewing files that changed from the base of the PR and between a15f841 and 8b88013.

📒 Files selected for processing (3)
  • CMakeLists.txt
  • Documents/Install.md
  • Tests/CUDATests/CUDAArray2Tests.cu
✅ Files skipped from review due to trivial changes (1)
  • Documents/Install.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • CMakeLists.txt

Comment thread Tests/CUDATests/CUDAArray2Tests.cu
Mirror the cudaDeviceSynchronize() success check onto the second
resize-with-fill case in the CUDAArray2 resize test so both async
fill kernels are confirmed complete before their results are read,
keeping the two cases consistent.

Test Plan:

    cmake .. -DUSE_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx90a
    make
    ./bin/CUDATests

Authored with the assistance of Claude (Anthropic's AI assistant).
The default-gfx90a block preempted and duplicated CMake's own host-GPU
detection. Removing it lets enable_language(HIP) honor an explicit
-DCMAKE_HIP_ARCHITECTURES, auto-detect the host GPU, or error on a
no-GPU build host. The block was also dead code: it ran after
enable_language(HIP), by which point CMAKE_HIP_ARCHITECTURES was already
set, so its guard was always false.

This work was authored with the Claude AI assistant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant