Skip to content

fix(hip): map Gemma4 BSA runtime calls#258

Open
weicj wants to merge 1 commit into
Luce-Org:mainfrom
weicj:fix-gemma4-hip-runtime-compat
Open

fix(hip): map Gemma4 BSA runtime calls#258
weicj wants to merge 1 commit into
Luce-Org:mainfrom
weicj:fix-gemma4-hip-runtime-compat

Conversation

@weicj
Copy link
Copy Markdown
Contributor

@weicj weicj commented May 22, 2026

Summary

This PR fixes a HIP build failure in the Gemma4 BSA prefill path.

After the Gemma4 backend landed, HIP builds can reach gemma4_graph.cpp and fail because that path uses CUDA runtime names directly, including cudaMemcpy2D and cudaDeviceSynchronize. Most of the project already routes these CUDA-style names through gpu_runtime_compat.h, but Gemma4 did not include that compatibility header and the header did not map the synchronous cudaMemcpy2D call yet.

Changes

  • Add cudaMemcpy2D -> hipMemcpy2D to dflash/src/common/gpu_runtime_compat.h.
  • Include common/gpu_runtime_compat.h from dflash/src/gemma4/gemma4_graph.cpp so the Gemma4 BSA path uses the same CUDA/HIP runtime compatibility layer as the rest of the C++ backend code.

Notes

  • Verified on ROCm 6.3.3 / gfx906 with an initialized submodule checkout: HIP dflash_server builds successfully, and a Gemma4 E4B Q4 smoke run loads on hip:0 and returns a /v1/chat/completions response.
  • This removes the Gemma4 HIP build blocker hit while validating the rebased mixed-backend work.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@howard0su
Copy link
Copy Markdown
Contributor

Please add HiP build into CI workflow to avoid such breakage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants