Skip to content

IB Memory Reduction#1307

Draft
danieljvickers wants to merge 4 commits intoMFlowCode:masterfrom
danieljvickers:ib-memory-reduction
Draft

IB Memory Reduction#1307
danieljvickers wants to merge 4 commits intoMFlowCode:masterfrom
danieljvickers:ib-memory-reduction

Conversation

@danieljvickers
Copy link
Member

Description

Before scaling up, I was looking for any free memeory that we have been loose with and cleaning it up.

This removes the inner point array, as it is just a waste of memory resources. It also means we can reduce the number of ghost points to keep in memory. I also added protection preventing arbitrary memory allocation in non-moving cases.

Type of change

  • Bug fix

Testing

How did you test your changes?

I reran all mibm examples to ensure we did not run out of memory

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

  • @coderabbitai review — incremental review (new changes only)
  • @coderabbitai full review — full review from scratch
  • /review — Qodo review
  • /improve — Qodo code suggestions

@github-actions
Copy link

Claude Code Review

Head SHA: 23a49931ad4070320a7013bc4def2a2134ef46d8
Files changed: 1 — src/simulation/m_ibm.fpp
+20 / -62


Summary

  • Removes the inner_points array and num_inner_gps counter, reducing GPU/CPU memory usage for IBM cases.
  • Removes the inner-points momentum-correction loop from s_ibm_correct_state.
  • Changes ghost_points allocation from a generous over-allocated buffer to an exact-count allocation.
  • Adds a moving_immersed_boundary_flag guard around the MPI allreduce + cap, skipping it for static IB cases.
  • Updates s_find_num_ghost_points and s_find_ghost_points signatures to drop the inner-points output/argument.

Findings

1. Correctness — removed inner-points momentum-correction loop (high concern)
src/simulation/m_ibm.fpp, s_ibm_correct_state (around old line 401–418):
The loop that zeroed momentum components (q_cons_vf(q)%sf(j,k,l) = 0._wp for q = momxb, momxe) for all cells classified as inner points (i.e., cells fully inside an immersed boundary) has been removed entirely. The PR description does not explain why this correction is no longer needed. If any fluid solver path writes nonzero momentum into those cells, the immersed boundary no longer resets them, which could affect mass/momentum conservation near solid boundaries.

Please confirm: is this loop genuinely redundant (e.g., because inner points are never touched by the Riemann solver), or does removing it change physics results? The testing note says "reran all mibm examples to ensure we did not run out of memory" — it would be valuable to also confirm that pressure/velocity fields inside IBs are unchanged.


2. Dead code — count_i declared but now unused in s_find_ghost_points
src/simulation/m_ibm.fpp, s_find_ghost_points:

The only code that used count_i was the removed inner-points else branch. count_i is now declared but never referenced. Strict compilers (Intel ifx, Cray ftn with -warn unused) may emit a warning; some configurations treat this as an error. Remove the declaration.


3. Exact allocation without buffer for non-moving case (low-risk but worth noting)
src/simulation/m_ibm.fpp, s_ibm_setup (around new line 129):

For moving_immersed_boundary_flag = .false., num_gps is now the raw local GPU-counted value with no allreduce and no safety factor. The old code allocated (max_num_gps + max_num_inner_gps) * 2, which was admittedly very generous. The tightened allocation is safe if and only if the two GPU parallel loops in s_find_num_ghost_points and s_find_ghost_points see an identical grid state. If ib_markers is updated between the two calls (e.g., by s_find_ghost_points itself writing ib_markers%sf(i,j,k) = patch_id), a second call path through s_update_ib_markers calling s_find_num_ghost_points + s_find_ghost_points would re-count on already-modified markers. Confirm this is safe in the moving-boundary update path.


Minor

  • The PR description checkboxes ("Bug fix", "GPU results match CPU results", "Tested on NVIDIA GPU or AMD GPU") are all unchecked. For a simulation-only GPU module change, please fill in the GPU testing checklist.

@github-actions
Copy link

Claude Code Review

Head SHA: 4150b12

Files changed: 1

  • src/simulation/m_ibm.fpp (+20 / -62)

Summary

  • Removes the inner_points array and num_inner_gps counter, saving memory for IB simulations.
  • Adds a guard: MPI allreduce and 2× buffer allocation only happen when moving_immersed_boundary_flag is true; non-moving IBs allocate exactly num_gps elements.
  • Removes the inner-point momentum-zeroing loop in s_ibm_correct_state.
  • Simplifies s_find_num_ghost_points and s_find_ghost_points to operate only on ghost points.

Findings

1. Medium — ib_markers%sf not decoded for inner IB cells (potential downstream correctness issue)

File: src/simulation/m_ibm.fpp, s_find_ghost_points (removed else branch around line 630)

Previously, for inner IB cells the code decoded the encoded patch ID and wrote back the plain patch_id to ib_markers%sf(i,j,k). This step is now removed. Any code that later reads ib_markers%sf for a deeply-interior IB cell will now see the encoded value (with periodicity bits set) rather than the clean patch ID.

If interior cells are never read after this point (e.g., they are completely masked), this is harmless. But it warrants confirming that no downstream consumer of ib_markers%sf can reach interior cells and misinterpret the encoded value.


2. Medium — Removal of inner-point momentum zeroing (physics correctness)

File: src/simulation/m_ibm.fpp, s_ibm_correct_state (removed ~lines 398–419)

The deleted loop set q_cons_vf(q)%sf(j,k,l) = 0 for all momentum components of every inner IB cell. Removing this means deeply-interior cells will accumulate whatever state the solver writes there. For a stationary IB this is generally benign (they never influence the exterior solution), but for moving IBs where a cell transitions from interior to ghost/fluid, stale momentum in those cells could contaminate the solution momentarily.

Since you tested mibm examples this may be validated, but consider adding a note in a comment or PR body about why removing this is safe for both static and moving IB cases.


3. Low — Non-moving IB allocation uses exact count with no buffer

File: src/simulation/m_ibm.fpp, s_ibm_setup (~line 130)

For non-moving IBs, num_gps is the local rank's exact count with no additional buffer. s_find_ghost_points then fills up to exactly that count, so there is no out-of-bounds risk for the static case. This is intentional and safe, just worth a brief comment explaining the design decision (static = exact, moving = 2× global sum).


4. Low — Verify s_finalize_ibm_module was updated

The diff does not show s_finalize_ibm_module being touched. Confirm that any prior @:DEALLOCATE(inner_points(...)) in that routine was also removed. If it was not present, this means inner_points was a pre-existing memory leak now resolved by elimination.


Improvement Opportunity

The comment at the top of the new moving-IB block could clarify why a 2× factor is used:

This would help future maintainers understand the intent without reading git history.


Overall this is a clean, well-scoped reduction. The main item to double-check is the ib_markers%sf encoding for interior cells and whether any downstream reader can hit those cells.

@github-actions
Copy link

Claude Code Review

Head SHA: 48a588009f9aaadae65ce8f321241e787a290ed4

Files changed: 1

  • src/simulation/m_ibm.fpp (+22 / -62)

Summary

  • Removes inner_points array and num_inner_gps counter, reducing GPU/CPU memory for IB simulations.
  • Adds a moving_immersed_boundary_flag guard: static IBs allocate exactly num_gps locally; moving IBs allreduce-sum and apply a 2× cap.
  • Removes inner-point momentum-zeroing loop from s_ibm_correct_state.
  • Simplifies s_find_num_ghost_points and s_find_ghost_points signatures to drop inner-point arguments.

Findings

1. Medium — ib_markers%sf encoding not cleaned for interior cells

File: src/simulation/m_ibm.fpp, s_find_ghost_points (removed else branch ~line 630)

The removed else block decoded the encoded patch ID and wrote the clean patch_id back to ib_markers%sf(i,j,k) for deeply-interior cells. That writeback is now gone. Any downstream code reading ib_markers%sf for an interior cell will see the encoded value (with periodicity bits) instead of the plain patch ID.

If interior cells are fully masked in all downstream readers this is benign, but please confirm no consumer of ib_markers%sf can reach interior cells and misinterpret the encoded value.


2. Medium — Removal of inner-point momentum zeroing (physics change)

File: src/simulation/m_ibm.fpp, s_ibm_correct_state (removed ~lines 401–418)

The deleted loop set q_cons_vf(q)%sf(j,k,l) = 0 for momentum of every fully-interior IB cell. For a static IB these cells are inert, but for a moving IB a cell that transitions from interior to ghost/fluid could carry stale nonzero momentum until the next correction cycle.

Testing "reran all mibm examples" covers memory correctness; it would also be valuable to confirm pressure/velocity inside IBs are unchanged between the old and new code, particularly for moving-boundary cases.


3. Low — Possible unused count_i variable in s_find_ghost_points

File: src/simulation/m_ibm.fpp, s_find_ghost_points

The diff removes all uses of count_i (it was only used in the removed inner-points else branch), but the declaration line (integer :: count, count_i, local_idx) is not shown being removed. If count_i is still declared, Intel ifx and Cray ftn may emit an unused-variable warning — or a build error under strict flags. Please verify the declaration is removed.


4. Low — Verify s_finalize_ibm_module was updated

The diff does not include any change to the finalization subroutine. Please confirm that @:DEALLOCATE(inner_points(...)) (or equivalent GPU deallocation) is also removed there. If no deallocation existed before, this was a pre-existing memory/GPU-data leak now resolved by eliminating the array.


Minor

  • PR description checkboxes ("Bug fix", "GPU results match CPU results", "Tested on NVIDIA GPU") are all unchecked. For a src/simulation/ change touching GPU memory layout, filling in the GPU testing checklist would strengthen confidence.
  • A short comment explaining why the 2× factor is used for the moving-IB allocation (headroom for boundary motion between setup and first timestep update) would help future readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants