Skip to content

Fix 1D multi-rank MPI_GATHERV bug in post-process silo output#1138

Open
sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
sbryngelson:testci
Open

Fix 1D multi-rank MPI_GATHERV bug in post-process silo output#1138
sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
sbryngelson:testci

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 13, 2026

Summary

  • Fix undefined behavior in s_mpi_gather_spatial_extents and s_mpi_gather_data_extents in src/post_process/m_mpi_proxy.fpp for 1D multi-rank runs
  • The shared recvcounts/displs arrays are sized for grid defragmentation (m+1 per rank) but were being reused for scalar extent gathers where each rank sends exactly 1 element — a sendcount/recvcounts mismatch that is undefined behavior per the MPI standard
  • Additionally, the 1D spatial extents path used a displacement multiplier of 4*displs instead of 2*displs, causing buffer overflows on the root process
  • Replace MPI_GATHERV with MPI_GATHER + temp buffer for the 1D case, avoiding the mismatched shared arrays entirely

Bug details

The 1D -> 2 MPI Ranks test (UUID 0FCCE9F1) was failing nondeterministically on the Intel no-debug CI job (Github (ubuntu, mpi, no-debug, true)) with:

h5dump error: unable to open file ".../tests/0FCCE9F1/silo_hdf5/p0/50.silo"

This was a latent bug exposed when the Intel oneAPI packages were unpinned (from intel-oneapi-mpi-2021.7.1 to latest). Newer Intel MPI versions are stricter about the MPI_GATHERV sendcount/recvcounts mismatch, causing crashes that prevent silo files from being written. The bug was masked with GCC/OpenMPI (which tolerates the mismatch) and in debug mode (which changes stack/heap layout).

Test plan

  • 1D -> 2 MPI Ranks test passes on Intel no-debug CI
  • All other tests unaffected (2D/3D paths unchanged, 1D defragmentation functions unchanged)

Summary by CodeRabbit

  • Refactor

    • Updated internal MPI communication handling for 1D simulations to optimize data gathering operations.
  • Chores

    • Minor documentation updates in module declarations.

Copilot AI review requested due to automatic review settings February 13, 2026 00:17
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 13, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

@qodo-code-review
Copy link
Contributor

PR Code Suggestions ✨

No code suggestions found for the PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds 1D-specific MPI gather handling to spatial/data extent collection in src/post_process/m_mpi_proxy.fpp, and updates a single comment in src/post_process/m_data_output.fpp. Functional behavior for multi-D remains unchanged; 1D gathers now use intermediate buffers and root-only assembly.

Changes

Cohort / File(s) Summary
Module Comment Update
src/post_process/m_data_output.fpp
Minor comment change in a module use statement; no semantic or behavioral impact.
MPI gather logic for extents
src/post_process/m_mpi_proxy.fpp
Introduces 1D-specific gather flow: uses temporary ext_temp buffers and MPI_GATHER on non-root ranks, then root assembles per-rank extents; preserves existing MPI_GATHERV behavior for multi-dimensional cases. Changes affect routines handling spatial and data extents and add provisional buffers and conditional control flow for proc_rank == 0 assembly. Review focus: correctness of buffer sizing/initialization, root-only assignments, and unchanged multi-D semantics.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Rank_i as Non-root Rank
    end
    rect rgba(180,255,200,0.5)
    participant Root as Root Rank (proc_rank==0)
    end
    rect rgba(255,220,180,0.5)
    participant MPI as MPI Library
    end

    Note over Rank_i,Root: 1D path (n==0 && grid_geometry!=3) flow
    Rank_i->>MPI: MPI_GATHER(ext_scalar -> ext_temp[rank_index])
    MPI->>Root: deliver ext_temp array
    Root->>Root: assemble spatial_extents/data_extents from ext_temp per rank

    Note over Rank_i,Root: Multi-D path (n>0 or grid_geometry==3) flow
    Rank_i->>MPI: MPI_GATHERV(local_extent, recvcounts, displs)
    MPI->>Root: deliver gathered arrays
    Root->>Root: assign spatial_extents/data_extents from gathered arrays
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through buffers, soft and neat,
Collected extents on every seat.
For 1D I bundled, then handed to root,
Multi-D stayed steady, unchanged in its route.
A tiny comment flicked, and the meadow's complete.

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (11 files):

⚔️ .typos.toml (content)
⚔️ CMakeLists.txt (content)
⚔️ docs/Doxyfile.in (content)
⚔️ docs/documentation/case.md (content)
⚔️ docs/documentation/readme.md (content)
⚔️ docs/documentation/references.md (content)
⚔️ docs/documentation/visualization.md (content)
⚔️ src/post_process/m_data_output.fpp (content)
⚔️ src/post_process/m_mpi_proxy.fpp (content)
⚔️ toolchain/mfc/lint_docs.py (content)
⚔️ toolchain/mfc/params/definitions.py (content)

These conflicts must be resolved before merging into master.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: fixing a 1D multi-rank MPI_GATHERV bug in the post-process silo output module.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description is comprehensive and addresses all required template sections with clear technical detail about the bug, fix rationale, and test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch testci
  • Post resolved changes as copyable diffs in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:XS This PR changes 0-9 lines, ignoring generated files label Feb 13, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates a comment in the post-processing data output module (m_data_output) to slightly simplify the description of the m_global_parameters import.

Changes:

  • Updated the inline comment on use m_global_parameters from “Global parameters for the code” to “Global parameters”.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 13, 2026

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Namespace pollution
    The added module import brings in all public entities from m_global_parameters into m_data_output's namespace. This can cause accidental symbol collisions, make the code harder to reason about, and increase build time. Prefer restricting imports to only the symbols required.

@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 13, 2026

CodeAnt AI finished reviewing your PR.

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 9.52381% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.02%. Comparing base (0ba6c02) to head (0a79011).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/post_process/m_mpi_proxy.fpp 9.52% 18 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1138      +/-   ##
==========================================
- Coverage   44.03%   44.02%   -0.01%     
==========================================
  Files          70       70              
  Lines       20649    20658       +9     
  Branches     2053     2059       +6     
==========================================
+ Hits         9093     9095       +2     
- Misses      10368    10373       +5     
- Partials     1188     1190       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson changed the title Update m_data_output.fpp Sanity check Feb 13, 2026
@sbryngelson sbryngelson changed the title Sanity check Fix MPI Feb 13, 2026
@sbryngelson sbryngelson changed the title Fix MPI Fix 1D multi-rank MPI_GATHERV bug in post-process silo output Feb 13, 2026
The 1D paths in s_mpi_gather_spatial_extents and
s_mpi_gather_data_extents reused recvcounts/displs arrays sized for
grid defragmentation (m+1 per rank), but each rank only sends 1
scalar value. This sendcount/recvcounts mismatch is undefined behavior
per the MPI standard and caused nondeterministic crashes with Intel
MPI, preventing silo files from being written.

Replace MPI_GATHERV with MPI_GATHER + temp buffer for the 1D case.
Multi-D paths and 1D defragmentation functions are unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 1/5 size:XS This PR changes 0-9 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant