Add --deps-only flag to build dependencies separately from source#1139
Add --deps-only flag to build dependencies separately from source#1139sbryngelson wants to merge 4 commits intoMFlowCode:masterfrom
Conversation
This allows CI to fetch and build dependencies (FFTW, HDF5, etc.) on login nodes with internet access, then build MFC source code on compute nodes that may have no network connectivity. Key changes: - New `--deps-only` CLI flag for `./mfc.sh build` - Already-configured dependencies are skipped entirely during regular builds, guaranteeing no network access in the source build step - All clusters (Phoenix, Frontier, Frontier AMD) now follow the same pattern: deps on login node, source build + test on compute node Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
📝 WalkthroughWalkthroughFetches and builds dependencies on login nodes via a new Changes
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant Login as Login Node
participant Compute as Compute Node
GHA->>Login: run platform build script\n(e.g. phoenix/build.sh cpu none)
Login->>Login: assemble build_opts (gpu/acc/omp)
Login->>Login: ./mfc.sh build --deps-only -j N
Login-->>GHA: deps fetched (success)
GHA->>Compute: submit SLURM job (test.sh / bench.sh)
Compute->>Compute: attempt 1..3: dry-run build (./mfc.sh test/bench --dry-run)
alt dry-run succeeds
Compute->>Compute: run tests/bench
Compute-->>GHA: results (success)
else dry-run fails (not final)
Compute->>Compute: ./mfc.sh clean -t pre_process simulation post_process syscheck
Compute->>Compute: sleep 30s
Compute->>Compute: retry
else all attempts fail
Compute-->>GHA: exit 1 (failure)
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
High-level Suggestion
The build-with-retry logic is duplicated across four shell scripts. This should be consolidated into a single reusable script or integrated into the Python build tool to improve maintainability. [High-level, importance: 7]
Solution Walkthrough:
Before:
# In .github/workflows/frontier/test.sh
max_attempts=3
attempt=1
while [ $attempt -le $max_attempts ]; do
echo "Build attempt $attempt..."
if ./mfc.sh test --dry-run ...; then
echo "Build succeeded."
break
fi
if [ $attempt -lt $max_attempts ]; then
echo "Build failed... retrying..."
./mfc.sh clean
sleep 30
fi
attempt=$((attempt + 1))
done
# ... similar logic repeated in 3 other files ...
After:
# In a new script, e.g., .github/workflows/common/retry_build.sh
#!/bin/bash
COMMAND_TO_RUN="$@"
max_attempts=3
attempt=1
while [ $attempt -le $max_attempts ]; do
echo "Build attempt $attempt..."
if eval "$COMMAND_TO_RUN"; then
echo "Build succeeded."
exit 0
fi
# ... retry logic ...
done
exit 1
# In .github/workflows/frontier/test.sh
source .github/workflows/common/retry_build.sh "./mfc.sh test --dry-run ..."
| if ARG("deps_only", False): | ||
| all_deps = set() | ||
| for target in targets: | ||
| target = get_target(target) | ||
| for dep in target.requires.compute(): | ||
| all_deps.add(dep) | ||
|
|
||
| if len(history) == 0: | ||
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in all_deps], 'magenta', 'None')}[/bold]") | ||
| cons.print(no_indent=True) | ||
|
|
||
| if not all_deps: | ||
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | ||
| return | ||
|
|
||
| for dep in all_deps: | ||
| __build_target(dep, case, history) | ||
|
|
||
| return |
There was a problem hiding this comment.
✅ Suggestion: Sort the set of dependencies before building them to ensure a deterministic build order and consistent logging. [general, importance: 5]
| if ARG("deps_only", False): | |
| all_deps = set() | |
| for target in targets: | |
| target = get_target(target) | |
| for dep in target.requires.compute(): | |
| all_deps.add(dep) | |
| if len(history) == 0: | |
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in all_deps], 'magenta', 'None')}[/bold]") | |
| cons.print(no_indent=True) | |
| if not all_deps: | |
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | |
| return | |
| for dep in all_deps: | |
| __build_target(dep, case, history) | |
| return | |
| if ARG("deps_only", False): | |
| all_deps = set() | |
| for target in targets: | |
| target = get_target(target) | |
| for dep in target.requires.compute(): | |
| all_deps.add(dep) | |
| sorted_deps = sorted(list(all_deps), key=lambda t: t.name) | |
| if len(history) == 0: | |
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in sorted_deps], 'magenta', 'None')}[/bold]") | |
| cons.print(no_indent=True) | |
| if not sorted_deps: | |
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | |
| return | |
| for dep in sorted_deps: | |
| __build_target(dep, case, history) | |
| return |
| - name: Fetch Dependencies | ||
| run: | | ||
| (cd pr && ${{ matrix.build_script }}) & | ||
| (cd master && ${{ matrix.build_script }}) & | ||
| wait %1 && wait %2 |
There was a problem hiding this comment.
Suggestion: Add a conditional to the Fetch Dependencies step to only run when matrix.build_script is not empty, preventing potential syntax errors. [possible issue, importance: 6]
| - name: Fetch Dependencies | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 | |
| - name: Fetch Dependencies | |
| if: ${{ matrix.build_script != '' }} | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 |
| for dir in benchmarks/*/; do | ||
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | ||
| build_cmd_ok=false | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
Suggestion: Guard the benchmarks directory glob so the loop is skipped cleanly when no benchmark directories exist instead of iterating over a literal pattern. [custom_rule]
Severity Level: Minor
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -s nullglob | |
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -u nullglob |
Why it matters? ⭐
The existing code will iterate over the literal pattern "benchmarks/*/" when there are no matches, causing the run command to be invoked with an invalid path and likely fail the build loop. The proposed use of "shopt -s nullglob" directly fixes that by making the glob expand to nothing so the loop is skipped. This addresses a real robustness bug in the shell script; the file has a "#!/bin/bash" shebang so using the bash builtin "shopt" is appropriate. This suggestion targets general shell-glob guarding (the custom_rule), improving correctness without changing project-specific Fortran rules.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** .github/workflows/frontier/bench.sh
**Line:** 25:30
**Comment:**
*Custom Rule: Guard the benchmarks directory glob so the loop is skipped cleanly when no benchmark directories exist instead of iterating over a literal pattern.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Nitpicks 🔍
|
| attempt=1 | ||
| while [ $attempt -le $max_attempts ]; do | ||
| echo "Build attempt $attempt of $max_attempts..." | ||
| if ./mfc.sh test -v -a --dry-run --rdma-mpi -j 8 $device_opts; then |
There was a problem hiding this comment.
Suggestion: The dry-run build step always uses $device_opts and never adds --no-gpu for CPU jobs, so on CPU-only runs the build phase may incorrectly configure GPU-enabled tests, causing the build to fail or behave inconsistently with the actual test invocation that explicitly uses --no-gpu. [logic error]
Severity Level: Major ⚠️
- ⚠️ Frontier CPU CI builds use different config than tests.
- ⚠️ Dry-run may not catch CPU-only configuration issues.
- ⚠️ Potential divergence between built artifacts and tested setup.| if ./mfc.sh test -v -a --dry-run --rdma-mpi -j 8 $device_opts; then | |
| build_device_opts="$device_opts" | |
| if [ "$job_device" != "gpu" ]; then | |
| build_device_opts+=" --no-gpu" | |
| fi | |
| if ./mfc.sh test -v -a --dry-run --rdma-mpi -j 8 $build_device_opts; then |
Steps of Reproduction ✅
1. In the Frontier CI workflow, run the `.github/workflows/frontier/test.sh` script with
`job_device` set to a non-GPU value (e.g. `cpu`); see the `device_opts` initialisation and
conditional at lines 7–14 where only `job_device="gpu"` adds `--gpu`.
2. Observe the build loop at lines 17–25: for a CPU job, `device_opts` is empty, so the
dry-run build command executed is `./mfc.sh test -v -a --dry-run --rdma-mpi -j 8` (no
`--gpu`, no `--no-gpu`).
3. Later in the same script, for CPU jobs the real test invocation in the `else` branch at
lines 40–42 is `./mfc.sh test -v -a --max-attempts 3 -j 32 --no-gpu -- -c frontier`,
explicitly forcing a CPU-only configuration.
4. This shows that for CPU jobs the build phase validates a different configuration (no
GPU flag) than the one actually used for tests (`--no-gpu`), so a dry-run build can
succeed while the subsequent CPU-only test run fails or behaves differently because the
GPU/CPU selection is not identical between the two invocations.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** .github/workflows/frontier/test.sh
**Line:** 21:21
**Comment:**
*Logic Error: The dry-run build step always uses `$device_opts` and never adds `--no-gpu` for CPU jobs, so on CPU-only runs the build phase may incorrectly configure GPU-enabled tests, causing the build to fail or behave inconsistently with the actual test invocation that explicitly uses `--no-gpu`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| all_deps = set() | ||
| for target in targets: | ||
| target = get_target(target) | ||
| for dep in target.requires.compute(): |
There was a problem hiding this comment.
✅ Suggestion: In deps-only mode the build routine only collects dependencies of the requested targets and never includes the targets themselves when they are dependency targets (like "fftw" or "lapack"), so running the build with --deps-only -t lapack (or any dependency target) will print "No dependencies to build" and skip building that dependency entirely; you should also add dependency targets themselves to the all_deps set when they are requested. [logic error]
Severity Level: Major ⚠️
- ⚠️ `./mfc.sh build --deps-only -t lapack` builds nothing.
- ⚠️ Standalone dependency targets unusable with deps-only mode.
- ⚠️ Limits flexibility for debugging or rebuilding single dependencies.| for dep in target.requires.compute(): | |
| if target.isDependency: | |
| all_deps.add(target) |
Steps of Reproduction ✅
1. Run the CLI to build only dependencies for a specific dependency target, e.g.:
`./mfc.sh build --deps-only -t lapack` on a machine with internet access.
This invokes the `build()` function in `toolchain/mfc/build.py:608` with
`ARG("deps_only") == True` and `ARG("targets") == ['lapack']` (wired via
the build command in `toolchain/mfc/cli/commands.py` as described in the PR).
2. Inside `build()` (`toolchain/mfc/build.py:616-618`), the call
`targets = get_targets(list(REQUIRED_TARGETS) + targets)` converts the
string `"lapack"` into the `LAPACK` `MFCTarget` instance (defined earlier
in `toolchain/mfc/build.py` as `MFCTarget('lapack', ..., MFCTarget.Dependencies([], [],
[]), -1)`),
so `targets` effectively contains that dependency target.
3. The deps-only branch starting at `toolchain/mfc/build.py:620` executes:
it initializes `all_deps = set()` and then, for the `LAPACK` target,
iterates `for dep in target.requires.compute():` (lines 622–625). Because
`LAPACK.requires` has empty dependency lists, `target.requires.compute()`
returns an empty list and nothing is added to `all_deps`.
4. The check at `toolchain/mfc/build.py:631-633` sees that `all_deps` is
still empty, prints
`"[yellow]No dependencies to build for the requested targets.[/yellow]"`,
and returns early without ever calling `__build_target()` for `LAPACK`.
As a result, the user-requested dependency target (`lapack` in this case)
is not configured, built, or installed in deps-only mode.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/mfc/build.py
**Line:** 624:624
**Comment:**
*Logic Error: In deps-only mode the build routine only collects dependencies of the requested targets and never includes the targets themselves when they are dependency targets (like "fftw" or "lapack"), so running the build with `--deps-only -t lapack` (or any dependency target) will print "No dependencies to build" and skip building that dependency entirely; you should also add dependency targets themselves to the `all_deps` set when they are requested.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/frontier_amd/build.sh (1)
1-23:⚠️ Potential issue | 🟡 MinorMissing
trap '' HUP— inconsistent withfrontier/build.sh.
frontier/build.shincludestrap '' HUP(Line 7) to survive login-node session drops, but this script omits it. Since both scripts run on login nodes for dependency fetching, this script should also ignore SIGHUP.Proposed fix
#!/bin/bash # Fetch dependencies on login node (internet access). # Source code is built on compute nodes via test.sh / bench.sh. +# Ignore SIGHUP to survive login node session drops +trap '' HUP + job_device=$1 job_interface=$2
🤖 Fix all issues with AI agents
In @.github/workflows/frontier_amd/test.sh:
- Line 21: The dry-run invocation of mfc.sh test uses $device_opts which can be
empty for CPU jobs; update the command that calls "mfc.sh test -v -a --dry-run
-j 8 $device_opts" to explicitly include the "--no-gpu" flag when running the
CPU dry-run (i.e., change the invocation to include "--no-gpu" alongside
$device_opts or conditionally append "--no-gpu" when job_device != "gpu") so it
matches the real test invocation that passes "--no-gpu".
In @.github/workflows/phoenix/build.sh:
- Around line 1-20: Add a noop HUP trap to make the login-node build resilient
to SSH disconnects: in the script that sets job_device/job_interface and
build_opts (the top of .github/workflows/phoenix/build.sh, before sourcing
./mfc.sh), insert a trap '' HUP command (as done in frontier/build.sh) so the
process will ignore SIGHUP and continue running the subsequent . ./mfc.sh load
and ./mfc.sh build --deps-only -j $(nproc) $build_opts steps even if the SSH
session disconnects.
🧹 Nitpick comments (2)
.github/workflows/frontier_amd/bench.sh (1)
19-46: Retry loop is duplicated across multiple scripts.This retry pattern (max_attempts, attempt counter, clean-and-sleep) is copy-pasted nearly verbatim in
frontier/bench.sh,frontier_amd/bench.sh, andfrontier/test.sh. Consider extracting it into a shared helper (e.g.,.github/workflows/common/retry_build.sh) to reduce maintenance burden and divergence risk.toolchain/mfc/cli/commands.py (1)
165-177: Consider adding--deps-onlyto examples and key_options.The new flag isn't documented in the
examplesorkey_optionslists for the build command. Since this is a user-facing workflow for HPC clusters, adding it would improve discoverability.Suggested addition
examples=[ Example("./mfc.sh build", "Build all default targets (CPU)"), Example("./mfc.sh build -j 8", "Build with 8 parallel jobs"), Example("./mfc.sh build --gpu", "Build with GPU (OpenACC) support"), Example("./mfc.sh build -i case.py --case-optimization -j 8", "Case optimization (10x faster!)"), + Example("./mfc.sh build --deps-only -j 8", "Fetch and build only dependencies (HPC login node)"), ], key_options=[ ("-j, --jobs N", "Number of parallel build jobs"), ("-t, --targets", "Targets: pre_process, simulation, post_process"), ("--gpu [acc|mp]", "Enable GPU support (OpenACC or OpenMP)"), ("--case-optimization", "Hard-code case params for 10x speedup"), ("--debug", "Build in debug mode"), + ("--deps-only", "Only fetch/build dependencies (no MFC source)"), ],
There was a problem hiding this comment.
Pull request overview
This PR introduces a --deps-only flag for separating dependency builds from source builds, enabling a two-phase build pattern on HPC clusters where login nodes have internet access but compute nodes don't. The change optimizes resource usage by building lightweight dependencies (FFTW, HDF5, etc.) on login nodes and reserving compute nodes for source compilation.
Changes:
- Added
--deps-onlyCLI flag to build only dependencies (FFTW, HDF5, SILO, LAPACK, HIPFORT) without building MFC source targets - Implemented early-return logic for already-configured dependencies to prevent re-entering the superbuild and accessing the network
- Restructured CI workflows so all clusters (Phoenix, Frontier, Frontier AMD) follow the same pattern: deps-only build on login node, source build + testing on compute node
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| toolchain/mfc/cli/commands.py | Added --deps-only argument to the build command |
| toolchain/mfc/build.py | Implemented --deps-only mode logic: collects dependencies, builds them, and adds early return for configured dependencies |
| .github/workflows/test.yml | Renamed steps to clarify two-phase build; all clusters now call build.sh for dependencies |
| .github/workflows/phoenix/build.sh | New script for Phoenix dependency-only builds (previously didn't exist) |
| .github/workflows/frontier_amd/test.sh | Added source build retry loop before tests (moved from build.sh) |
| .github/workflows/frontier_amd/build.sh | Simplified to only fetch dependencies using --deps-only |
| .github/workflows/frontier_amd/bench.sh | Added case-optimized source build retry loop before benchmarking |
| .github/workflows/frontier/test.sh | Added source build retry loop before tests (moved from build.sh) |
| .github/workflows/frontier/build.sh | Simplified to only fetch dependencies using --deps-only |
| .github/workflows/frontier/bench.sh | Added case-optimized source build retry loop before benchmarking |
| .github/workflows/bench.yml | Updated to use new two-phase build pattern for all clusters |
| run: | | ||
| (cd pr && ${{ matrix.build_script }}) & | ||
| (cd master && ${{ matrix.build_script }}) & | ||
| wait %1 && wait %2 |
There was a problem hiding this comment.
The syntax wait %1 && wait %2 is invalid in bash. The %N syntax is for csh/tcsh job control, not bash. In bash, you should use wait without arguments to wait for all background jobs, or capture the PIDs and wait on them explicitly. Replace this line with just wait to wait for both background jobs spawned on lines 158-159.
| wait %1 && wait %2 | |
| wait |
toolchain/mfc/build.py
Outdated
| if ARG("deps_only", False): | ||
| all_deps = set() | ||
| for target in targets: | ||
| target = get_target(target) |
There was a problem hiding this comment.
Redundant get_target call on line 623. The variable target is already an MFCTarget object from line 616 where targets was converted via get_targets(). Remove the target = get_target(target) line as it's unnecessary.
| target = get_target(target) |
There was a problem hiding this comment.
1 issue found across 11 files
Confidence score: 3/5
- Potential build/regression risk:
toolchain/mfc/build.pyskips dependency builds based only onis_configured(), which can leave deps unbuilt if configure succeeded but build/install failed. - Score reflects a concrete, user-impacting build correctness risk at medium severity (6/10) with moderate confidence.
- Pay close attention to
toolchain/mfc/build.py- ensure dependency build/install state is verified, not just configuration.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="toolchain/mfc/build.py">
<violation number="1" location="toolchain/mfc/build.py:572">
P2: Skipping dependency builds based solely on `is_configured()` can leave dependencies unbuilt if a previous configure succeeded but build/install failed. Consider checking for an install marker (or install directory existence) before skipping.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…o-gpu consistency - Sort dependency list for deterministic build order and log output - Include dependency targets themselves in --deps-only (fixes --deps-only -t fftw doing nothing) - Remove redundant get_target() call (targets already resolved) - Add trap '' HUP to phoenix/build.sh for SSH resilience - Add --no-gpu to CPU dry-run builds in frontier test scripts for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - name: Fetch Dependencies | ||
| run: | | ||
| (cd pr && ${{ matrix.build_script }}) & | ||
| (cd master && ${{ matrix.build_script }}) & | ||
| wait %1 && wait %2 |
There was a problem hiding this comment.
Suggestion: Add a conditional if: matrix.build_script != '' to the 'Fetch Dependencies' step to prevent shell errors if build_script is empty. [possible issue, importance: 6]
| - name: Fetch Dependencies | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 | |
| - name: Fetch Dependencies | |
| if: matrix.build_script != '' | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 |
| for dir in benchmarks/*/; do | ||
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | ||
| build_cmd_ok=false | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
Suggestion: Use shopt -s nullglob before the for loop over benchmarks/*/ to prevent errors when no benchmark directories exist. [possible issue, importance: 5]
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -s nullglob | |
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -u nullglob |
| # Dependencies are pinned to fixed versions. If already configured | ||
| # (built & installed by a prior --deps-only step), skip entirely | ||
| # to avoid re-entering the superbuild (which may access the network). | ||
| if target.isDependency and target.is_configured(case): | ||
| return |
There was a problem hiding this comment.
✅ Suggestion: Strengthen the dependency skip logic by checking for the existence of the install directory (os.path.isdir(target.get_install_dirpath(case))) in addition to checking if it is configured. [possible issue, importance: 8]
| # Dependencies are pinned to fixed versions. If already configured | |
| # (built & installed by a prior --deps-only step), skip entirely | |
| # to avoid re-entering the superbuild (which may access the network). | |
| if target.isDependency and target.is_configured(case): | |
| return | |
| # Dependencies are pinned to fixed versions. If already configured | |
| # AND installed by a prior --deps-only step, skip entirely to avoid | |
| # re-entering the superbuild (which may access the network). | |
| if target.isDependency and target.is_configured(case) and os.path.isdir(target.get_install_dirpath(case)): | |
| return |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/frontier/test.sh:
- Around line 18-37: The retry loop currently runs "./mfc.sh clean", which
deletes the entire build/ directory (including build/install/) and removes the
staging CMakeCache.txt so the deps-only workflow cannot detect pre-fetched deps
on retry; change the cleanup step in the loop to remove only the staging
artifacts (i.e., delete build/staging) instead of calling "./mfc.sh clean" so
build/install and cached dependencies remain available for the subsequent
attempt and the CMakeCache.txt check still succeeds.
🧹 Nitpick comments (1)
.github/workflows/phoenix/build.sh (1)
21-23: Quote$(nproc)and$job_deviceto satisfy shellcheck and prevent unexpected word-splitting.
$build_optsis intentionally unquoted for multi-word expansion, but$(nproc)(single integer) and$job_device(single token) should be quoted for defensive scripting.Proposed fix
-. ./mfc.sh load -c p -m $job_device +. ./mfc.sh load -c p -m "$job_device" -./mfc.sh build --deps-only -j $(nproc) $build_opts +./mfc.sh build --deps-only -j "$(nproc)" $build_opts
| # Build source code on compute node (deps already fetched on login node) | ||
| max_attempts=3 | ||
| attempt=1 | ||
| while [ $attempt -le $max_attempts ]; do | ||
| echo "Build attempt $attempt of $max_attempts..." | ||
| if ./mfc.sh test -v -a --dry-run --rdma-mpi -j 8 $device_opts; then | ||
| echo "Build succeeded on attempt $attempt." | ||
| break | ||
| fi | ||
|
|
||
| if [ $attempt -lt $max_attempts ]; then | ||
| echo "Build failed on attempt $attempt. Cleaning and retrying in 30s..." | ||
| ./mfc.sh clean | ||
| sleep 30 | ||
| else | ||
| echo "Build failed after $max_attempts attempts." | ||
| exit 1 | ||
| fi | ||
| attempt=$((attempt + 1)) | ||
| done |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the clean command implementation
echo "=== Searching for clean in commands.py ==="
fd -t f commands.py | xargs rg -n "clean" 2>/dev/null | head -30
echo ""
echo "=== Searching for clean function/handler ==="
fd -t f "*.py" toolchain/mfc | xargs rg -n "def clean|'clean'" 2>/dev/null | grep -E "(def clean|command|'clean')" | head -30
echo ""
echo "=== Checking mfc.sh script directly ==="
cat -n mfc.sh | grep -A 20 -B 2 "clean"Repository: MFlowCode/MFC
Length of output: 1758
🏁 Script executed:
#!/bin/bash
# Check where deps are built/installed when using --deps-only
echo "=== Searching for deps-only or dependency build paths ==="
rg -n "deps-only|dependencies" toolchain/mfc/build.py | head -20
echo ""
echo "=== Check CMakeLists or build config for dependency installation ==="
fd -t f "CMakeLists.txt" | xargs grep -n "INSTALL\|PREFIX\|dependencies" 2>/dev/null | head -30
echo ""
echo "=== Look for deps-only implementation in build logic ==="
rg -n "deps.only|deps_only" toolchain/mfc/ | head -20
echo ""
echo "=== Check for dependency paths in build.py ==="
grep -n "build\|deps\|INSTALL" toolchain/mfc/build.py | head -40Repository: MFlowCode/MFC
Length of output: 4908
🏁 Script executed:
#!/bin/bash
# Check the test.sh context and understand the retry logic
echo "=== Show full test.sh file ==="
cat -n .github/workflows/frontier/test.sh
echo ""
echo "=== Check if there's logic to handle deps on retries ==="
rg -n "deps-only|--deps-only" .github/workflows/frontier/Repository: MFlowCode/MFC
Length of output: 1789
🏁 Script executed:
#!/bin/bash
# Check for selective clean options or dependency caching logic
echo "=== Check mfc.sh clean options ==="
./mfc.sh clean --help 2>&1 | head -30
echo ""
echo "=== Check if build.py has logic to skip dependency rebuild ==="
rg -n "skip.*dep|already.*built|cache.*dep" toolchain/mfc/build.py
echo ""
echo "=== Check if there's a way to preserve install dir ==="
rg -n "install|preserve|keep" toolchain/mfc/build.py | grep -i "dir\|path" | head -15Repository: MFlowCode/MFC
Length of output: 1842
🏁 Script executed:
#!/bin/bash
# Look for the logic that skips already-built targets
echo "=== Find the skip logic for already-built targets ==="
rg -B5 -A5 "already built and installed this target" toolchain/mfc/build.py
echo ""
echo "=== Check the actual condition ==="
rg -B3 -A8 "If we have already built" toolchain/mfc/build.pyRepository: MFlowCode/MFC
Length of output: 1025
🏁 Script executed:
#!/bin/bash
# Check what is_configured() checks
echo "=== Find is_configured method ==="
rg -B2 -A8 "def is_configured" toolchain/mfc/build.py
echo ""
echo "=== Understanding the check ==="
rg -B5 -A10 "def is_configured" toolchain/mfc/build.pyRepository: MFlowCode/MFC
Length of output: 1281
🏁 Script executed:
#!/bin/bash
# Check what get_staging_dirpath returns and understand the clean issue
echo "=== Verify what clean removes vs what would be needed ==="
rg -B2 -A4 "def get_staging_dirpath" toolchain/mfc/build.py
echo ""
echo "=== Confirm the full picture: check if clean has any options ==="
cat -n mfc.sh | sed -n '60,66p'
echo ""
echo "=== Check if there's an alternative command for targeted cleanup ==="
rg -n "rm -rf.*build\|clean" mfc.shRepository: MFlowCode/MFC
Length of output: 945
The retry loop with ./mfc.sh clean will break the --deps-only workflow.
./mfc.sh clean (line 30) removes the entire build/ directory, including both build/staging/ and build/install/. When the build is retried, the system checks for CMakeCache.txt in the staging directory to determine if dependencies are already built. Since clean removes this file, the system will attempt to rebuild all dependencies from source—but compute nodes on HPC systems (like Frontier) lack internet access and cannot fetch external repositories. This guarantees the retry will fail.
Replace ./mfc.sh clean with rm -rf "$(pwd)/build/staging" to preserve the pre-fetched dependencies in build/install/.
🤖 Prompt for AI Agents
In @.github/workflows/frontier/test.sh around lines 18 - 37, The retry loop
currently runs "./mfc.sh clean", which deletes the entire build/ directory
(including build/install/) and removes the staging CMakeCache.txt so the
deps-only workflow cannot detect pre-fetched deps on retry; change the cleanup
step in the loop to remove only the staging artifacts (i.e., delete
build/staging) instead of calling "./mfc.sh clean" so build/install and cached
dependencies remain available for the subsequent attempt and the CMakeCache.txt
check still succeeds.
- Clean only source targets (not deps) in compute-node retry loops, so pre-fetched dependencies survive build failures on offline nodes - Also check install dir exists (not just CMakeCache.txt) before skipping a dependency, guarding against configure-ok-but-build-failed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - name: Fetch Dependencies | ||
| run: | | ||
| (cd pr && ${{ matrix.build_script }}) & | ||
| (cd master && ${{ matrix.build_script }}) & | ||
| wait %1 && wait %2 |
There was a problem hiding this comment.
Suggestion: Add a conditional to the 'Fetch Dependencies' step to skip execution if matrix.build_script is empty, preventing a potential workflow failure. [possible issue, importance: 6]
| - name: Fetch Dependencies | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 | |
| - name: Fetch Dependencies | |
| if: matrix.build_script != '' | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 |
| if ARG("deps_only", False): | ||
| all_deps = set() | ||
| for target in targets: | ||
| if target.isDependency: | ||
| all_deps.add(target) | ||
| for dep in target.requires.compute(): | ||
| all_deps.add(dep) | ||
|
|
||
| sorted_deps = sorted(all_deps, key=lambda t: t.name) | ||
|
|
||
| if len(history) == 0: | ||
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in sorted_deps], 'magenta', 'None')}[/bold]") | ||
| cons.print(no_indent=True) | ||
|
|
||
| if not sorted_deps: | ||
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | ||
| return | ||
|
|
||
| for dep in sorted_deps: | ||
| __build_target(dep, case, history) | ||
|
|
||
| return |
There was a problem hiding this comment.
Suggestion: Normalize all dependency items to MFCTarget objects using get_target() before sorting or accessing attributes to prevent potential AttributeError crashes. [possible issue, importance: 8]
| if ARG("deps_only", False): | |
| all_deps = set() | |
| for target in targets: | |
| if target.isDependency: | |
| all_deps.add(target) | |
| for dep in target.requires.compute(): | |
| all_deps.add(dep) | |
| sorted_deps = sorted(all_deps, key=lambda t: t.name) | |
| if len(history) == 0: | |
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in sorted_deps], 'magenta', 'None')}[/bold]") | |
| cons.print(no_indent=True) | |
| if not sorted_deps: | |
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | |
| return | |
| for dep in sorted_deps: | |
| __build_target(dep, case, history) | |
| return | |
| if ARG("deps_only", False): | |
| all_deps = set() | |
| for target in targets: | |
| if target.isDependency: | |
| all_deps.add(target) | |
| for dep in target.requires.compute(): | |
| all_deps.add(dep) | |
| normalized_deps = [get_target(d) for d in all_deps] | |
| sorted_deps = sorted(normalized_deps, key=lambda t: t.name) | |
| if len(history) == 0: | |
| cons.print(f"[bold]Fetch Dependencies | {format_list_to_string([d.name for d in sorted_deps], 'magenta', 'None')}[/bold]") | |
| cons.print(no_indent=True) | |
| if not sorted_deps: | |
| cons.print("[yellow]No dependencies to build for the requested targets.[/yellow]") | |
| return | |
| for dep in sorted_deps: | |
| __build_target(dep, case, history) | |
| return |
| build_cmd_ok=true | ||
| for dir in benchmarks/*/; do | ||
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | ||
| build_cmd_ok=false | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
Suggestion: Use shopt -s nullglob before the benchmark directory loop to correctly handle cases where no matching directories are found, preventing the loop from running on the literal glob pattern. [possible issue, importance: 7]
| build_cmd_ok=true | |
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| build_cmd_ok=true | |
| shopt -s nullglob | |
| bench_dirs=(benchmarks/*/) | |
| shopt -u nullglob | |
| for dir in "${bench_dirs[@]}"; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1139 +/- ##
=======================================
Coverage 44.03% 44.03%
=======================================
Files 70 70
Lines 20649 20649
Branches 2053 2054 +1
=======================================
Hits 9093 9093
Misses 10368 10368
Partials 1188 1188 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…for CPU Same fixes previously applied to Frontier scripts but missed for Phoenix: - Use targeted clean (-t) to avoid wiping pre-fetched deps on retry - Add --no-gpu for CPU test jobs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
| - name: Fetch Dependencies | ||
| run: | | ||
| (cd pr && ${{ matrix.build_script }}) & | ||
| (cd master && ${{ matrix.build_script }}) & | ||
| wait %1 && wait %2 |
There was a problem hiding this comment.
Suggestion: Add an if: matrix.build_script != '' condition to the 'Fetch Dependencies' step in the bench.yml workflow to prevent shell syntax errors when matrix.build_script is empty. [possible issue, importance: 7]
| - name: Fetch Dependencies | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 | |
| - name: Fetch Dependencies | |
| if: matrix.build_script != '' | |
| run: | | |
| (cd pr && ${{ matrix.build_script }}) & | |
| (cd master && ${{ matrix.build_script }}) & | |
| wait %1 && wait %2 |
| for dir in benchmarks/*/; do | ||
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | ||
| build_cmd_ok=false | ||
| break | ||
| fi | ||
| done |
There was a problem hiding this comment.
Suggestion: In bench.sh scripts, use shopt -s nullglob before the for dir in benchmarks/*/ loop to prevent it from executing with a literal string if no benchmark directories are found. [possible issue, importance: 6]
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -s nullglob | |
| for dir in benchmarks/*/; do | |
| if ! ./mfc.sh run -v "$dir/case.py" --case-optimization -j 8 --dry-run $build_opts; then | |
| build_cmd_ok=false | |
| break | |
| fi | |
| done | |
| shopt -u nullglob |
|
|
||
| if [ $attempt -lt $max_attempts ]; then | ||
| echo "Build failed on attempt $attempt. Cleaning source targets and retrying in 30s..." | ||
| ./mfc.sh clean -t pre_process simulation post_process syscheck |
There was a problem hiding this comment.
Suggestion: In the retry path, -t is only guaranteed to apply to the first argument, meaning that simulation, post_process, and syscheck may not actually be cleaned and can cause the test build to keep failing with unchanged artifacts. [logic error]
Severity Level: Major ⚠️
- ⚠️ Frontier CI build retries may not clean all targets.
- ⚠️ Stale artifacts can cause repeated Frontier test failures.| ./mfc.sh clean -t pre_process simulation post_process syscheck | |
| ./mfc.sh clean -t pre_process -t simulation -t post_process -t syscheck |
Steps of Reproduction ✅
1. Push a branch that triggers the Frontier CI workflow which calls
`.github/workflows/frontier/test.sh` on the compute node (see `.github/workflows` CI
configuration; this script is the test entrypoint).
2. Ensure there is a prior partial or failed build so that at least one of the targets
`simulation`, `post_process`, or `syscheck` has stale artifacts while `pre_process` is
either clean or non-problematic.
3. During the CI run, the script enters the build loop at
`.github/workflows/frontier/test.sh:21-37` and the first attempt (`./mfc.sh test -v -a
--dry-run --rdma-mpi -j 8 $device_opts` at line 23) fails due to the stale state in e.g.
the `simulation` target.
4. Because `$attempt -lt $max_attempts`, the cleanup branch at lines 28-31 executes:
`./mfc.sh clean -t pre_process simulation post_process syscheck`. With a conventional
single-argument `-t` implementation in `./mfc.sh` (root script, not shown in this diff,
typically using `getopts`), only `pre_process` is actually bound to `-t`; `simulation`,
`post_process`, and `syscheck` are treated as positional arguments and not as additional
`-t` values. On the next loop iteration, the same stale targets remain uncleaned, causing
the build at line 23 to fail again even though the retry logic was intended to clean all
four targets. This pattern repeats until `max_attempts` is exhausted and the CI job fails.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** .github/workflows/frontier/test.sh
**Line:** 30:30
**Comment:**
*Logic Error: In the retry path, `-t` is only guaranteed to apply to the first argument, meaning that `simulation`, `post_process`, and `syscheck` may not actually be cleaned and can cause the test build to keep failing with unchanged artifacts.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI Incremental review completed. |
User description
Summary
--deps-onlyCLI flag to./mfc.sh buildthat fetches and builds only dependencies (FFTW, HDF5, SILO, LAPACK, etc.) without building MFC source targetsMotivation
On HPC clusters like Frontier, login nodes have internet access but compute nodes do not. Previously, the entire build (deps + source) ran on the login node for Frontier, tying up shared resources. With this change:
./mfc.sh build --deps-only -j 8./mfc.sh build -j 8— deps are skipped, only source compilesChanges
Toolchain (2 files)
toolchain/mfc/cli/commands.py— new--deps-onlyargument on thebuildcommandtoolchain/mfc/build.py—--deps-onlymode collects and builds only dependency targets; already-configured deps return early to avoid re-entering the superbuild (which could access the network)CI workflows (9 files)
test.yml/bench.yml— all clusters now run a "Fetch Dependencies" step on the login nodephoenix/build.sh— new file, deps-only for Phoenixfrontier/build.sh,frontier_amd/build.sh— simplified to--deps-onlyfrontier/test.sh,frontier_amd/test.sh— added source build step (with retry) before testingfrontier/bench.sh,frontier_amd/bench.sh— added case-optimized source build step before benchmarkingTest plan
./mfc.sh build --deps-only -t pre_process— prints "no dependencies" (pre_process has none)./mfc.sh build --deps-only -t simulation— builds FFTW only, stops./mfc.sh build -t simulationafter deps-only — skips FFTW entirely (no network), builds simulation only🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
CodeAnt-AI Description
Separate dependency fetching from source builds with --deps-only and CI split
What Changed
Impact
✅ No network access during compute-node source builds✅ Fewer wasted retries that re-download dependencies✅ Shorter login-node occupation for source compilation💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.