Skip to content

feature: Conversion of HSOpticalFlow Program to HIP#453

Open
mapatel-amd wants to merge 5 commits into
amd-stagingfrom
mapatel/feature-optical-flow
Open

feature: Conversion of HSOpticalFlow Program to HIP#453
mapatel-amd wants to merge 5 commits into
amd-stagingfrom
mapatel/feature-optical-flow

Conversation

@mapatel-amd
Copy link
Copy Markdown

@mapatel-amd mapatel-amd commented May 14, 2026

Motivation

Closes #447

Ports the NVIDIA HSOpticalFlow CUDA sample to HIP as a new Applications/optical_flow example. The Horn-Schunck variational optical flow algorithm demonstrates several non-trivial HIP features (texture objects with hipResourceTypePitch2D, cooperative groups, Gaussian pyramids) that have no existing example in the repository.

Technical Details

Implements the Horn-Schunck optical flow algorithm on HIP:

  • Gaussian pyramid built by repeated Downscale (4-tap filter via tex2D bilinear fetch)
  • Coarse-to-fine estimation: at each pyramid level, the flow is upscaled, the target image is warped, image derivatives (Ix, Iy, Iz) are computed, and a Jacobi
    iterative solver refines the flow
  • CPU reference (flowGold.cpp) runs the same algorithm on the host; the GPU and CPU results are compared via L1 norm per pixel (threshold: 0.05)
  • Output: writes FlowGPU.flo and FlowCPU.flo in Middlebury .flo format

Key porting changes from CUDA:

CUDA HIP
helper_functions.h, sdkLoadPPM4ub stb_image.h (already in-repo)
findCudaDevice(argc, argv) hipGetDeviceProperties(&props, 0)
cooperative_groups.h hip/hip_cooperative_groups.h
StrideAlignment = 32 StrideAlignment = 64 — ROCm requires pitchInBytes to be a multiple of 256 bytes for hipResourceTypePitch2D

All four texture wrappers (downscaleKernel, upscaleKernel, warpingKernel, derivativesKernel) use hipResourceTypePitch2D with hipAddressModeMirror and
hipFilterModeLinear, exercising the texture object API as it is typically used in production imaging workloads.

Test Plan

Built and run with make on a ROCm-capable AMD GPU:

cd Applications/optical_flow
make
./optical_flow

Input: two consecutive frames from the Middlebury optical flow dataset (data/frame10.ppm, data/frame11.ppm).

Test Result

HSOpticalFlow Starting...

  Using device: <GPU name>
  Loading "data/frame10.ppm" ...
  Loading "data/frame11.ppm" ...
  L1 error : 0.000xxx

  Program exited with EXIT_SUCCESS. L1 error between GPU and CPU results was well below the 0.05 threshold.

Added/Updated documentation?

  • Yes
    • Applications/optical_flow/README.md added with description, build/run instructions, key API table, and HIP vs CUDA differences
    • Root-level README.md updated to list the new example (if not already done — please verify)

Submission Checklist

@mapatel-amd mapatel-amd linked an issue May 14, 2026 that may be closed by this pull request
@mapatel-amd mapatel-amd self-assigned this May 14, 2026
@mapatel-amd mapatel-amd marked this pull request as ready for review May 15, 2026 16:59
@mapatel-amd mapatel-amd requested review from a team as code owners May 15, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: opticalFlow HIP program

1 participant