Skip to content

Moving EL Bubbles with MPI Decomposition#1289

Closed
wilfonba wants to merge 2470 commits into
MFlowCode:MovingBubblesFreshfrom
wilfonba:MovingBubblesFresh
Closed

Moving EL Bubbles with MPI Decomposition#1289
wilfonba wants to merge 2470 commits into
MFlowCode:MovingBubblesFreshfrom
wilfonba:MovingBubblesFresh

Conversation

@wilfonba

@wilfonba wilfonba commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Description

This PR adds support for moving Euler-Lagrange bubbles and various force models. Tracer particles that simply follow the background flow are implemented, as are Newton's 2nd Law particles that respond to body forces, pressure gradients, and drag. This PR is marked as a draft for now as there are several things that I need to complete before a proper merge can occur, but I wanted these changes to be visible to other contributors working on related features.

Fixes #(issue)

Type of change

  • New feature

Testing

How did you test your changes?

Checklist

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

  • @coderabbitai review — incremental review (new changes only)
  • @coderabbitai full review — full review from scratch
  • /review — Qodo review
  • /improve — Qodo code suggestions

Anand Radhakrishnan and others added 30 commits August 14, 2025 22:34
Co-authored-by: Anand <anand@lawn-128-61-19-55.lawn.gatech.edu>
Co-authored-by: Spencer Bryngelson <shb@gatech.edu>
Co-authored-by: Anand <anand@ipsec-10-2-68-64.vpn.gatech.edu>
…de#912)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
Co-authored-by: Malmahrouqi3 <mohdsaid497566@gmail.com>
Co-authored-by: Malmahrouqi3 <mohdsaid497566@gmail.com>
Co-authored-by: Dimitrios Adam <dimitriosadam@Dimitrioss-MacBook-Air.local>
Co-authored-by: Spencer Bryngelson <shb@gatech.edu>
Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
Co-authored-by: Hyeoksu Lee <hyeoksu@M4Pro.local>
sbryngelson and others added 26 commits February 25, 2026 19:53
Added auto review settings to .coderabbit.yaml and updated path instructions for Fortran source files.
Added review status and prompt settings for AI agents.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Added configuration options to hide review status messages and disable AI agent prompts. Also disabled extra summary sections in auto review.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Added configuration options to reduce walkthrough noise and remove 'Finishing Touches' section content.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Spencer Bryngelson <sbryngelson@gmail.com>
Co-authored-by: Daniel Vickers <danieljvickers@frontier10431.frontier.olcf.ornl.gov>
Co-authored-by: Daniel Vickers <danieljvickers@login09.frontier.olcf.ornl.gov>
Co-authored-by: Daniel Vickers <danieljvickers@login07.frontier.olcf.ornl.gov>
Co-authored-by: Daniel Vickers <danieljvickers@login11.frontier.olcf.ornl.gov>
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

Claude Code Review

Head SHA: 5032526
Files changed: 155 (9,459 additions / 3,185 deletions)

Key files:

  • src/common/m_model.fpp (+475/-487) — GPU-accelerated STL levelset rewrite
  • src/simulation/m_ib_patches.fpp (+540/-304) — Periodic IB markers, bounding-box culling
  • src/simulation/m_ibm.fpp (+268/-278) — GPU ghost-point search and image-point computation
  • src/simulation/m_bubbles_EL.fpp / m_bubbles_EL_kernels.fpp — Moving EL bubble additions
  • src/common/m_derived_types.fpp — GPU-friendly flat triangle arrays, periodicity fields
  • src/simulation/m_compute_levelset.fpp (+40/-86) — STL model levelset now GPU-parallel

Summary

  • Replaces CPU-only interpolation-based STL levelset with an exact closest-point projection algorithm (s_distance_normals_3D/2D) callable as a GPU routine.
  • Adds periodic IB boundary support by encoding periodicity into IB-marker integers and applying per-period centroid offsets throughout the levelset/marker pipeline.
  • Ports ghost-point search, image-point compute, and IB-marker generation to GPU via GPU_PARALLEL_LOOP + atomics.
  • Removes s_populate_ib_buffers (MPI IB halo exchange) from the setup path.
  • Adds tracer/Newton's-2nd-Law EL bubble force models (draft; several checklist items outstanding).

Findings

Critical

1. Stale index after atomic capture in s_find_ghost_points ()

After the atomic-capture block stores the new slot in local_idx, subsequent DB-boundary assignments still use the un-captured count variable:

! atomic capture: count → local_idx
ghost_points_in(local_idx)%loc = [i, j, k]   ! ✓ uses local_idx
...
if ((x_cc(i) - dx(i)) < glb_bounds(1)%beg) then
    ghost_points_in(count)%DB(1) = -1          ! ✗ should be local_idx
else if ((x_cc(i) + dx(i)) > glb_bounds(1)%end) then
    ghost_points_in(count)%DB(1) = 1           ! ✗ should be local_idx

In a GPU parallel region where many threads share count, each thread's private local_idx is the correct slot. Using count here writes to a race-condition-corrupted address. All ghost_points_in(count)%DB references in that block must be changed to ghost_points_in(local_idx)%DB.

2. f_model_random_number GPU_ROUTINE macro commented out (, ~line 507)

function f_model_random_number(seed) result(rval)
    ! $:GPU_ROUTINE(parallelism='[seq]')   ! <-- disabled

This function is called from f_model_is_inside (which is itself called by IB marker routines). If f_model_is_inside is to be device-callable, its callees must also carry GPU_ROUTINE. Without it the function will not be compiled for the device and GPU builds will silently fall back to CPU-only evaluation or fail to link.

3. s_check_boundary adds GPU_PARALLEL_LOOP over local allocatables ()

The newly added parallel loop has copy='[temp_boundary_v,edge_occurrence]' but both arrays are locally allocated on the CPU stack/heap in this routine. They have never been entered onto the device, so the copy clause is a no-op at best and may corrupt device memory at worst. This routine is called only from CPU-side s_instantiate_STL_models; the GPU parallelism here has no benefit and should be removed.


High Severity

4. s_populate_ib_buffers removed without MPI halo replacement ()

The removed subroutine exchanged IB-marker ghost cells across MPI subdomain boundaries:

! deleted:
if (bc_x%beg >= 0) call s_mpi_sendrecv_ib_buffers(ib_markers, 1, -1)
...

s_find_ghost_points iterates i-gp_layers:i+gp_layers and j-gp_layers:j+gp_layers. Without the halo, IB-marker data in the ghost-cell band is stale (all zero), so ghost points adjacent to MPI subdomain faces will be misclassified. This is a correctness regression for any multi-rank run with IBM.

5. Duplicate ghost_points_in(count)%DB vs ghost_points_in(local_idx) writes expose a related correctness hole for ib_markers%sf decode ()

The line ib_markers%sf(i, j, k) = patch_id inside the GPU parallel loop overwrites the encoded value that other threads still need to read. The decode step should happen before any write-back, or the original encoded marker should be preserved in a separate array.

6. f_model_is_inside_flat references global p inside a GPU routine ()

function f_model_is_inside_flat(ntrs, pid, point) result(fraction)
    $:GPU_ROUTINE(parallelism='[seq]')
    ...
    if (p == 0 .and. k == 0) cycle   ! uses module-level global p
    if (p == 0) then
        fraction = real(nInOrOut)/18._wp

p is declared in m_global_parameters and must be explicitly declared device-resident (or passed as an argument) for GPU execution. If p is not on the device in the context where this routine is called, it will read garbage. Pass p (or num_dims) as an explicit integer argument, consistent with the existing ntrs/pid pattern.


Medium Severity

7. Incomplete END_GPU_ATOMIC_CAPTURE macro ()

The file gains END_GPU_ATOMIC_CAPTURE() but no matching GPU_ATOMIC_CAPTURE() opening macro. Usage relies on $:GPU_ATOMIC(atomic='capture') for the opening and $:END_GPU_ATOMIC_CAPTURE() for the close. On OpenACC this pair emits ! atomic capture / ! end atomic; on OpenMP it emits ! atomic capture / ! end atomic. The asymmetry (reusing GPU_ATOMIC as the open) is fragile and undocumented—add an explicit GPU_ATOMIC_CAPTURE() macro for clarity and symmetry.

8. glb_bounds type vs initialization mismatch ()

-   glb_bounds(1)%beg = dflt_int; glb_bounds(1)%end = dflt_int
+   glb_bounds(1)%beg = dflt_real; glb_bounds(1)%end = dflt_real

If glb_bounds is still declared as type(int_bounds_info) (integer fields), assigning dflt_real (a real constant) silently truncates. Please confirm the type of glb_bounds was also changed to a real-valued bounds type, or revert to dflt_int.

9. t_model_array%interpolate changed from logical to integer ()

Any existing caller that compares models(i)%interpolate to a logical value will now silently misinterpret the integer (Fortran allows this only under specific conditions). Ensure all callers have been updated and the intent of the integer encoding is documented.


Minor

  • Typo in comment, src/common/m_model.fpp ~line 615: "inersections""intersections".
  • s_pack_model_for_gpu missing subroutine name in closing statement (end subroutine should be end subroutine s_pack_model_for_gpu).
  • PR checklist is entirely unchecked — testing, docs, and GPU validation items are all open. As a draft this is expected, but the MPI halo and atomic-capture bugs (above) should be resolved before promotion to ready-for-review.

Draft PR — not recommending merge until findings 1–5 are resolved.

@wilfonba wilfonba closed this Mar 3, 2026
@github-actions

github-actions Bot commented Mar 3, 2026

Copy link
Copy Markdown

Claude Code Review

Head SHA: 5032526ca051b6190030c8661488b545297ce6f3
Files changed: 155 (shown: key src/ files)

Key files:

  • src/common/m_model.fpp (+475/-487)
  • src/simulation/m_ib_patches.fpp (+540/-304)
  • src/simulation/m_ibm.fpp (+268/-278)
  • src/simulation/m_compute_levelset.fpp (+40/-86)
  • src/simulation/m_bubbles_EL.fpp / m_bubbles_EL_kernels.fpp
  • src/pre_process/m_start_up.fpp

Summary

  • Adds moving Euler-Lagrange (EL) bubble support with MPI decomposition, tracer particles, Newton's-law particles with drag/pressure/body forces, and 2-way IBM coupling (moving_ibm=2)
  • Refactors STL model levelset computation to use GPU-friendly flattened arrays (gpu_trs_v, gpu_trs_n), replacing the CPU-side interpolation scheme with exact closest-point projection
  • Consolidates duplicated CI scripts across frontier/frontier_amd into symlinks + shared helpers
  • Adds a new ./mfc.sh viz visualization toolchain with binary and Silo-HDF5 readers
  • The PR is marked as a draft and the author acknowledges it is incomplete

Findings

1. Duplicate namelist variable — compile/runtime error src/pre_process/m_start_up.fpp line ~150

-            g0_ic, p0_ic, hyper_cleaning
+            g0_ic, p0_ic, hyper_cleaning, hyper_cleaning

hyper_cleaning appears twice in the Fortran namelist declaration. This is invalid Fortran — most compilers will reject it at compile time; those that accept it produce undefined behavior at runtime.

2. Commented-out GPU_ROUTINE macro — GPU compile failure src/common/m_model.fpp around line 510

function f_model_random_number(seed) result(rval)
    ! $:GPU_ROUTINE(parallelism='[seq]')   ← the leading '!' makes this a Fortran comment

The ! prefix makes this a Fortran comment, not a Fypp directive. f_model_random_number is called inside f_model_is_inside_flat, which is correctly marked $:GPU_ROUTINE. GPU builds will fail because the callee is not compiled for device. Remove the ! to activate the directive.

3. Removed bc_x/bc_y/bc_z device update — potential GPU regression src/simulation/m_start_up.fpp

-        $:GPU_UPDATE(device='[bc_x%beg, bc_x%end]')
-        $:GPU_UPDATE(device='[bc_y%beg, bc_y%end]')
-        $:GPU_UPDATE(device='[bc_z%beg, bc_z%end]')

The new s_shift_cell_symmetric_bc in m_bubbles_EL_kernels.fpp compares against bc_x%beg == BC_REFLECTIVE, etc. inside a GPU_ROUTINE. Removing the device update means the GPU copy of these fields may not be current, silently producing wrong reflective-BC handling on GPU.

4. t_model_array%interpolate type change: logicalinteger src/common/m_derived_types.fpp
Any downstream code that uses this field as a boolean (if (models(i)%interpolate)) will silently evaluate any nonzero integer as .true., which may mask zero-vs-nonzero distinctions or break under strict compilers. All call sites should be audited and updated to integer comparisons.

5. num_patches_max increased 10 → 1000 src/common/m_constants.fpp

-    integer, parameter :: num_patches_max = 10
+    integer, parameter :: num_patches_max = 1000

Any static or stack-allocated arrays dimensioned by this constant become 100× larger. Please verify that no such fixed-size arrays exist in global parameters or derived types — that could impose a large per-rank memory cost even for simple runs.

6. GPU_PARALLEL_LOOP with copy= clause in s_check_boundary src/common/m_model.fpp

$:GPU_PARALLEL_LOOP(private='[i,j]', copy='[temp_boundary_v,edge_occurrence]', collapse=2)

GPU_PARALLEL_LOOP accepts data-region style args but this is semantically a data copy/update clause, not a loop clause. This is called during setup (not a hot-path GPU loop), and the parallelism of an O(N²) edge-checking loop may conflict with the GPU_ATOMIC update inside — verify this compiles and produces correct results for non-trivial models.

7. glb_bounds default init: dflt_intdflt_real src/simulation/m_global_parameters.fpp

-        glb_bounds(1)%beg = dflt_int; glb_bounds(1)%end = dflt_int
+        glb_bounds(1)%beg = dflt_real; glb_bounds(1)%end = dflt_real

This is likely a correctness fix (the bounds are real(wp) coordinates used in level-set arithmetic), but it changes the sentinel value for "not set" from an integer flag to a real sentinel. Confirm that any code that checks for the unset case still works correctly.


Minor / non-blocking

  • Debug print removed (src/simulation/m_start_up.fpp): — good cleanup, was clearly accidental.
  • STL normal normalization added (src/common/m_model.fpp): Both binary and ASCII readers now normalize the stored normal vector. This is a correctness improvement for malformed STL files.
  • Leftover merge conflict marker removed (src/common/m_boundary_common.fpp): was present in the original file; this PR removes it — good.
  • The checklist in the PR description is empty (tests, docs, GPU validation). Per CLAUDE.md, these must be completed before merge.

Draft PR — issues 1–3 above are compile/GPU blockers that should be resolved before this is marked ready for review.

@wilfonba wilfonba deleted the MovingBubblesFresh branch March 4, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.