Fix 1D multi-rank MPI_GATHERV bug in post-process silo output#1138
Fix 1D multi-rank MPI_GATHERV bug in post-process silo output#1138sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
📝 WalkthroughWalkthroughAdds 1D-specific MPI gather handling to spatial/data extent collection in Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,200,255,0.5)
participant Rank_i as Non-root Rank
end
rect rgba(180,255,200,0.5)
participant Root as Root Rank (proc_rank==0)
end
rect rgba(255,220,180,0.5)
participant MPI as MPI Library
end
Note over Rank_i,Root: 1D path (n==0 && grid_geometry!=3) flow
Rank_i->>MPI: MPI_GATHER(ext_scalar -> ext_temp[rank_index])
MPI->>Root: deliver ext_temp array
Root->>Root: assemble spatial_extents/data_extents from ext_temp per rank
Note over Rank_i,Root: Multi-D path (n>0 or grid_geometry==3) flow
Rank_i->>MPI: MPI_GATHERV(local_extent, recvcounts, displs)
MPI->>Root: deliver gathered arrays
Root->>Root: assign spatial_extents/data_extents from gathered arrays
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR updates a comment in the post-processing data output module (m_data_output) to slightly simplify the description of the m_global_parameters import.
Changes:
- Updated the inline comment on
use m_global_parametersfrom “Global parameters for the code” to “Global parameters”.
Nitpicks 🔍
|
|
CodeAnt AI finished reviewing your PR. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1138 +/- ##
==========================================
- Coverage 44.03% 44.02% -0.01%
==========================================
Files 70 70
Lines 20649 20658 +9
Branches 2053 2059 +6
==========================================
+ Hits 9093 9095 +2
- Misses 10368 10373 +5
- Partials 1188 1190 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The 1D paths in s_mpi_gather_spatial_extents and s_mpi_gather_data_extents reused recvcounts/displs arrays sized for grid defragmentation (m+1 per rank), but each rank only sends 1 scalar value. This sendcount/recvcounts mismatch is undefined behavior per the MPI standard and caused nondeterministic crashes with Intel MPI, preventing silo files from being written. Replace MPI_GATHERV with MPI_GATHER + temp buffer for the 1D case. Multi-D paths and 1D defragmentation functions are unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
s_mpi_gather_spatial_extentsands_mpi_gather_data_extentsinsrc/post_process/m_mpi_proxy.fppfor 1D multi-rank runsrecvcounts/displsarrays are sized for grid defragmentation (m+1per rank) but were being reused for scalar extent gathers where each rank sends exactly 1 element — a sendcount/recvcounts mismatch that is undefined behavior per the MPI standard4*displsinstead of2*displs, causing buffer overflows on the root processMPI_GATHERVwithMPI_GATHER+ temp buffer for the 1D case, avoiding the mismatched shared arrays entirelyBug details
The
1D -> 2 MPI Rankstest (UUID0FCCE9F1) was failing nondeterministically on the Intel no-debug CI job (Github (ubuntu, mpi, no-debug, true)) with:This was a latent bug exposed when the Intel oneAPI packages were unpinned (from
intel-oneapi-mpi-2021.7.1to latest). Newer Intel MPI versions are stricter about theMPI_GATHERVsendcount/recvcounts mismatch, causing crashes that prevent silo files from being written. The bug was masked with GCC/OpenMPI (which tolerates the mismatch) and in debug mode (which changes stack/heap layout).Test plan
1D -> 2 MPI Rankstest passes on Intel no-debug CISummary by CodeRabbit
Refactor
Chores