Update base image to 25.12 #8738

KumoLiu · 2026-02-11T16:40:24Z

Fixes #8585.

Description

Update base image to nvcr.io/nvidia/pytorch:25.12-py3

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Yun Liu <yunl@nvidia.com>

coderabbitai · 2026-02-11T16:40:48Z

📝 Walkthrough

Walkthrough

This PR updates the NVIDIA PyTorch base Docker image from 24.10 to 25.12 and removes its associated torch patching step. It introduces torch.compiler guards in RetinaNet's NaN/Inf validation checks to skip validation during graph compilation. A new helper method in TransChex extracts hidden states robustly across transformer versions. The TRT compiler now falls back to an alternative CUDA runtime import path. ONNX export logic adapts to PyTorch 2.6+ by using dynamo-based export instead of scripting. Signal filtering specifies explicit tensor dtype. Subprocess utilities improve error capture and handling. Various test tolerances are relaxed, and onnxruntime's Python version constraint is removed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ❌ 5

❌ Failed checks (3 warnings, 2 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	Description is incomplete. Only mentions Dockerfile image update but omits multiple substantial code changes (ONNX export logic, NaN checks, CUDA imports, signal filtering, error handling, test adjustments).	Expand description to document all significant code changes: PyTorch 2.6+ ONNX export handling, torch.compiler guard additions, CUDA import fallback, and all test tolerance/skip adjustments made.
Out of Scope Changes check	⚠️ Warning	Multiple changes appear out-of-scope: onnxruntime version constraint removal, test tolerance relaxations, CUDA runtime import refactoring, and error handling improvements lack justification.	Justify or remove changes unrelated to Docker image update: explain why onnxruntime constraint removed, why test tolerances broadened, and whether CUDA/ONNX changes are necessary for 25.12.
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	Title is vague and incomplete, mentioning only version number without context about broader changes beyond Dockerfile update.	Clarify title to reflect main changes: consider 'Update base PyTorch image and fix PyTorch 2.6+ compatibility' or similar, capturing the Dockerfile update plus downstream fixes.
Linked Issues check	❓ Inconclusive	PR partially meets issue `#8585` objective by updating Docker image to 25.12 (beyond requested 25.08), but extensive downstream fixes suggest incomplete testing or undocumented compatibility issues.	Clarify whether all fixes are required for 25.12 compatibility or if some are orthogonal improvements; document testing results and why 25.12 was chosen over 25.08.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@monai/networks/utils.py`:
- Around line 722-729: The comment and version check around torch.__version__
and selection of mode_to_export are incorrect: dynamo became the default
exporter in PyTorch 2.9, and PyTorch 2.6–2.8 require explicitly enabling dynamo;
also dynamo doesn't fully support ScriptModule. Update the comment to state that
dynamo became default in 2.9 and note that dynamo may not support ScriptModule,
and change the logic around mode_to_export (symbols: torch.__version__,
_pt_major_minor, mode_to_export, model, torch.jit.script) to only bypass
scripting when running on PyTorch >= 2.9 (or when dynamo is explicitly enabled)
and otherwise use torch.jit.script(model, **kwargs); ensure the comment
documents this behavior and the rationale so the exporter picks the correct
input type.

🧹 Nitpick comments (5)

tests/test_utils.py (1)
60-61: Inconsistent naming: _has_hf_local vs has_hf_hub.

All other availability flags (has_req, has_gdown, has_hf_hub) are unprefixed. _has_hf_local breaks the pattern.
Proposed fix
-hf_local_entry_error, _has_hf_local = optional_import("huggingface_hub.errors", name="LocalEntryNotFoundError")
+hf_local_entry_error, has_hf_local = optional_import("huggingface_hub.errors", name="LocalEntryNotFoundError")
And update the guard on line 77:
-if _has_hf_local:
+if has_hf_local:
tests/networks/test_convert_to_onnx.py (1)

36-36: 100x relaxation on atol — verify this is necessary.

atol went from 1e-4 to 1e-2. That's a significant jump. If the dynamo-based ONNX export path introduces this much drift, it's worth confirming these tolerances are tight enough to still catch real regressions.

monai/utils/misc.py (1)

882-887: Asymmetric capture_output default logic is fine but the docstring is stale.

Line 872 says capture_output "will be overriden with the debug status" — but when check=True, it now defaults to True regardless of debug. Consider updating the docstring to match.
monai/networks/nets/transchex.py (1)
229-238: Missing Args and Returns in docstring.

As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."
Proposed docstring fix
     `@staticmethod`
     def _get_hidden_states(layer_output):
         """Extract hidden states from BertLayer output.
 
         Compatible with both older transformers (returns a tuple) and
         newer transformers >=5.0 (may return a tensor directly).
+
+        Args:
+            layer_output: output from a BertLayer, either a tensor or a tuple
+                whose first element contains the hidden states.
+
+        Returns:
+            The hidden states tensor.
         """
         if isinstance(layer_output, torch.Tensor):
             return layer_output
         return layer_output[0]
monai/apps/detection/networks/retinanet_network.py (1)
201-206: Same stacklevel nit here.
Proposed fix
-                    warnings.warn("box_regression is NaN or Inf.")
+                    warnings.warn("box_regression is NaN or Inf.", stacklevel=2)

coderabbitai · 2026-02-11T16:44:35Z

monai/networks/utils.py

+            # In PyTorch 2.6+, torch.onnx.export defaults to the dynamo-based exporter
+            # which uses torch.export.export internally and does not support ScriptModule.
+            # Pass the raw nn.Module directly; the new exporter captures all code paths.
+            _pt_major_minor = tuple(int(x) for x in torch.__version__.split("+")[0].split(".")[:2])
+            if _pt_major_minor >= (2, 6):
+                mode_to_export = model
+            else:
+                mode_to_export = torch.jit.script(model, **kwargs)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /tmp && git ls-files monai/networks/utils.py

Repository: Project-MONAI/MONAI

Length of output: 133

🏁 Script executed:

# Get broader context around lines 722-729 sed -n '700,750p' monai/networks/utils.py

Repository: Project-MONAI/MONAI

Length of output: 2564

🌐 Web query:

PyTorch 2.6 torch.onnx.export dynamo default behavior ScriptModule support

💡 Result:

PyTorch 2.6 default behavior: torch.onnx.export(..., dynamo=...) exists, but it is not the default in 2.6. The ONNX docs note that dynamo only became True by default in PyTorch 2.9, implying that in 2.6–2.8 you must opt in with dynamo=True. [1]

What dynamo=True does (2.6+): it uses the new exporter path based on torch.export.ExportedProgram (rather than the legacy TorchScript-based ONNX exporter). It is described as the recommended approach and returns an ONNXProgram. [1]

torch.jit.ScriptModule / TorchScript support with dynamo=True:

While the API surface lists TorchScript types as acceptable inputs, the dynamo/torch.export-based exporter does not fully support TorchScript-specific models/behaviors.

The PyTorch ONNX team explicitly called out that torch.onnx.export(..., dynamo=True) may need to “auto-fallback” to the TorchScript-based exporter when the input is a torch.jit.ScriptModule / torch.jit.ScriptFunction, because those are TorchScript-oriented and not reliably handled by the dynamo path. [2]

Practical takeaway for PyTorch 2.6: if your model is a ScriptModule, expect best support with dynamo=False (legacy/TorchScript exporter), and treat dynamo=True as potentially unsupported unless you confirm it works for your specific model/operators. [2]

Sources: PyTorch ONNX API docs (notes on dynamo default change in 2.9 and exporter behavior) [1], PyTorch issue discussing lack of TorchScript (ScriptModule) support in dynamo=True path and proposed fallback [2].

The code comment about PyTorch 2.6 torch.onnx.export defaults is incorrect.

PyTorch 2.6–2.8 do not default to dynamo-based exporter; dynamo became the default only in PyTorch 2.9. In 2.6–2.8, you must explicitly set dynamo=True. Additionally, dynamo does not fully support ScriptModule. Correct the comment to accurately reflect this versioning, and reconsider the version logic—simply passing the raw model for >= 2.6 may not achieve the intended behavior.

🤖 Prompt for AI Agents

In `@monai/networks/utils.py` around lines 722 - 729, The comment and version check around torch.__version__ and selection of mode_to_export are incorrect: dynamo became the default exporter in PyTorch 2.9, and PyTorch 2.6–2.8 require explicitly enabling dynamo; also dynamo doesn't fully support ScriptModule. Update the comment to state that dynamo became default in 2.9 and note that dynamo may not support ScriptModule, and change the logic around mode_to_export (symbols: torch.__version__, _pt_major_minor, mode_to_export, model, torch.jit.script) to only bypass scripting when running on PyTorch >= 2.9 (or when dynamo is explicitly enabled) and otherwise use torch.jit.script(model, **kwargs); ensure the comment documents this behavior and the rationale so the exporter picks the correct input type.

KumoLiu added 13 commits February 7, 2026 20:12

update base image

f9d39da

Signed-off-by: Yun Liu <yunl@nvidia.com>

fix transchex error

15044de

Signed-off-by: Yun Liu <yunl@nvidia.com>

try fix

264c012

Signed-off-by: Yun Liu <yunl@nvidia.com>

remove patch

7378f03

Signed-off-by: Yun Liu <yunl@nvidia.com>

remove --uninstall

0258508

Signed-off-by: Yun Liu <yunl@nvidia.com>

fix

cfe21f3

Signed-off-by: Yun Liu <yunl@nvidia.com>

fix tolerence issue and remove pin for onnxruntime

d23acbe

Signed-off-by: Yun Liu <yunl@nvidia.com>

temp skip TestPerceptualLoss

6b1ba1f

Signed-off-by: Yun Liu <yunl@nvidia.com>

fix

2167bbf

Signed-off-by: Yun Liu <yunl@nvidia.com>

temp skip test_perceptual_loss

6fb0669

Signed-off-by: Yun Liu <yunl@nvidia.com>

patch back apex in 25.12

4726aa5

Signed-off-by: Yun Liu <yunl@nvidia.com>

try skip downlaod fail

69b7eac

Signed-off-by: Yun Liu <yunl@nvidia.com>

temp skip

4bf4960

Signed-off-by: Yun Liu <yunl@nvidia.com>

KumoLiu requested review from Nic-Ma and ericspod as code owners February 11, 2026 16:40

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update base image to 25.12 #8738

Update base image to 25.12 #8738

Uh oh!

KumoLiu commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update base image to 25.12 #8738

Are you sure you want to change the base?

Update base image to 25.12 #8738

Uh oh!

Conversation

KumoLiu commented Feb 11, 2026

Description

Types of changes

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 11, 2026 •

edited

Loading