refactor(optimizer): finalize optimizer schema and backend handling by OutisLi · Pull Request #5157 · deepmodeling/deepmd-kit

OutisLi · 2026-01-16T15:48:08Z

Summary by CodeRabbit

New Features
- Added a top-level, configurable optimizer section; PyTorch: Adam/AdamW/LKF/AdaMuon/HybridMuon, TensorFlow/Paddle: Adam.
- Exposes Adam-style hyperparameters (beta1, beta2, weight_decay) for relevant optimizers.
Documentation
- Training docs updated with optimizer examples and framework-specific guidance.
Backward Compatibility
- Legacy optimizer fields are auto-migrated to the new format; tests and examples updated to use the new config.

Copilot

Pull request overview

This pull request refactors the optimizer configuration schema to promote it from being nested within the training section to a dedicated top-level optimizer section in the input JSON configuration. This change improves consistency and clarity across the DeePMD-kit codebase.

Changes:

Introduced a new top-level optimizer section in the configuration schema with support for multiple optimizer types (Adam, AdamW, LKF, AdaMuon, HybridMuon)
Updated all backend implementations (TensorFlow, PyTorch, Paddle) to extract optimizer parameters from the new location
Updated documentation to reference the new optimizer section across multiple training guides

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`doc/train/training-advanced.md`	Added comprehensive documentation for the new `optimizer` section with examples and backend-specific support information
`doc/model/train-fitting-tensor.md`	Updated to reference the new `optimizer` section in the configuration structure
`doc/model/train-fitting-property.md`	Updated to reference the new `optimizer` section in the configuration structure
`doc/model/train-fitting-dos.md`	Updated to reference the new `optimizer` section in the configuration structure
`deepmd/utils/argcheck.py`	Defined optimizer schema with ArgsPlugin pattern, moved optimizer definitions from `training_args` to standalone `optimizer_args`, added support for all optimizer types with proper defaults
`deepmd/tf/train/trainer.py`	Updated to extract optimizer parameters from new `optimizer` section, added early validation for TensorFlow-unsupported features (weight_decay, non-Adam optimizers), implemented beta1/beta2 parameters
`deepmd/pt/train/training.py`	Updated to extract optimizer parameters from new `optimizer` section, refactored `get_opt_param` to include optimizer-specific default values, expanded optimizer initialization with explicit beta parameters
`deepmd/pd/train/training.py`	Updated to extract optimizer parameters from new `optimizer` section, added beta1/beta2/weight_decay parameters to Adam optimizer initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deepmd/pd/train/training.py

coderabbitai · 2026-01-16T15:58:06Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Consolidates optimizer configuration across TensorFlow, PyTorch, and Paddle by adding a top-level optimizer section, plumbing type-specific optimizer parameters into trainers, adding argcheck entries for optimizer variants, and providing backward-compatibility migration from legacy training-scoped optimizer fields.

Changes

Cohort / File(s)	Summary
Trainer Implementations `deepmd/pd/train/training.py`, `deepmd/pt/train/training.py`, `deepmd/tf/train/trainer.py`	Trainers now read `config["optimizer"]` (default {}), validate `type` (whitelist per backend), extract type-specific params (adam_beta1/2, weight_decay, momentum, lr_adjust, etc.), and construct framework-specific optimizers; multi-task per-model handling retained for PyTorch.
Argument Validation System `deepmd/utils/argcheck.py`	Added optimizer argument subsystem and registrars for Adam, AdamW, LKF, AdaMuon, HybridMuon; new `optimizer_args()` and `optimizer_variant_type_args()` integrated into gen_args for single- and multi-task flows.
Compatibility & Migration `deepmd/utils/compat.py`, `deepmd/tf/utils/compat.py`	Added `convert_optimizer_to_new_format()` to migrate legacy `opt_type`/training-scoped keys into top-level `optimizer` (emits DeprecationWarning); exported via TF compat module.
Tests & Examples `source/tests/pd/test_training.py`, `source/tests/pt/test_training.py`, `source/tests/pt/model/water/lkf.json`, `examples/.../input_torch*.json`	Tests updated to call migration in setup; sample configs and tests moved/added top-level `optimizer` blocks (e.g., LKF, AdamW examples).
Docs `doc/train/training-advanced.md`, `doc/model/train-fitting-*.md`	Documentation updated with new Optimizer subsection, example JSON, supported optimizers per framework, and parameter descriptions (adam_beta1/2, weight_decay).
Minor `deepmd/tf/entrypoints/train.py`	Whitespace housekeeping.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ConfigLoader
    participant CompatLayer as Compat Layer
    participant Argcheck
    participant Trainer

    User->>ConfigLoader: Load config.json
    ConfigLoader->>CompatLayer: is_deepmd_v1_input()?
    alt Legacy Format (opt_type in training)
        CompatLayer->>CompatLayer: convert_optimizer_to_new_format()
        CompatLayer->>ConfigLoader: Return updated config
    end
    ConfigLoader->>Argcheck: normalize() config
    Argcheck->>Argcheck: Validate optimizer section via optimizer_args()
    Argcheck-->>Trainer: Return normalized config
    Trainer->>Trainer: Extract optimizer params from config["optimizer"]
    Trainer->>Trainer: Initialize optimizer with type-specific params
    Trainer-->>User: Ready for training

sequenceDiagram
    participant PyTorchTrainer as PyTorch Trainer
    participant ConfigParser
    participant ModelLoop

    ConfigParser->>PyTorchTrainer: Load config with optimizer
    alt Per-model optim_dict present
        PyTorchTrainer->>ModelLoop: Use per-model optimizer dicts
    else Global optimizer_params
        PyTorchTrainer->>ModelLoop: get_opt_param(optimizer_params)
    end
    ModelLoop->>ModelLoop: Initialize optimizer with type-specific params (beta1, beta2, weight_decay, ...)
    ModelLoop-->>PyTorchTrainer: Model with optimizer ready

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(pt): add AdamW for pt training #4757: Overlapping changes to PyTorch optimizer handling and argcheck for AdamW/weight_decay support.

Suggested labels

Examples

Suggested reviewers

njzjz
wanghan-iapcm

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and specifically summarizes the main change: refactoring optimizer schema and backend handling across multiple frameworks (TensorFlow, PyTorch, Paddle).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@doc/model/train-fitting-dos.md`:
- Line 19: The README uses non-descriptive link text "here"; change it to a
descriptive phrase and update the inline reference so readers know what they’re
clicking. Replace "here" with something like "training parameters reference" or
"training keyword definitions" while keeping the same target file
(train-se-e2-a.md), and ensure the sentence still mentions the relevant keywords
{ref}`model[standard]/fitting_net <model[standard]/fitting_net>` and {ref}`loss
<loss>` so readers can find the training parameter explanations for fitting the
dos.

In `@doc/model/train-fitting-property.md`:
- Line 17: Replace the ambiguous link text "here" with a descriptive phrase so
the sentence reads clearly and improves accessibility; specifically update the
sentence referencing train-se-atten.md to use descriptive link text such as "SE
attention training documentation" (or similar) instead of "here", keeping the
surrounding refs ({ref}`model`, {ref}`learning_rate`, {ref}`optimizer`,
{ref}`loss`, {ref}`training`) and the callouts to
{ref}`model[standard]/fitting_net` and {ref}`loss` intact.

In `@doc/model/train-fitting-tensor.md`:
- Line 33: Replace the non-descriptive link text "here" in the sentence
describing training JSON (the line referencing `input.json`, `ener` mode and the
sections {ref}`model <model>`, {ref}`learning_rate <learning_rate>`,
{ref}`optimizer <optimizer>`, {ref}`loss <loss>` and {ref}`training <training>`)
with a descriptive link label that reflects the target content (e.g., "training
keywords and meanings" or "training keyword reference") and point it to
train-se-e2-a.md so the link text clearly conveys what the reader will find.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0c3b0 and c1d615b.

📒 Files selected for processing (8)

deepmd/pd/train/training.py
deepmd/pt/train/training.py
deepmd/tf/train/trainer.py
deepmd/utils/argcheck.py
doc/model/train-fitting-dos.md
doc/model/train-fitting-property.md
doc/model/train-fitting-tensor.md
doc/train/training-advanced.md

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

doc/model/train-fitting-tensor.md

📚 Learning: 2025-12-12T13:40:14.334Z

Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test TensorFlow training with `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation

Applied to files:

doc/model/train-fitting-tensor.md

📚 Learning: 2026-01-10T04:28:18.703Z

Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5130
File: deepmd/pt/optimizer/adamuon.py:40-109
Timestamp: 2026-01-10T04:28:18.703Z
Learning: The coefficients (3.4445, -4.7750, 2.0315) in the `zeropower_via_newtonschulz5` function in `deepmd/pt/optimizer/adamuon.py` are standard across AdaMuon implementations and should not be modified for consistency with the reference implementations.

Applied to files:

deepmd/pt/train/training.py

🧬 Code graph analysis (2)

deepmd/utils/argcheck.py (1)

deepmd/utils/plugin.py (2)

register (41-59)

register (122-141)

deepmd/pd/train/training.py (1)

deepmd/pt/train/training.py (1)

get_opt_param (160-197)

🪛 markdownlint-cli2 (0.18.1)

doc/model/train-fitting-property.md

17-17: Link text should be descriptive

(MD059, descriptive-link-text)

🪛 Ruff (0.14.11)

deepmd/pt/train/training.py

178-178: Avoid specifying long messages outside the exception class

(TRY003)

deepmd/tf/train/trainer.py

130-132: Avoid specifying long messages outside the exception class

(TRY003)

134-137: Avoid specifying long messages outside the exception class

(TRY003)

325-327: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)

GitHub Check: Agent
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Test Python (8, 3.13)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test Python (11, 3.10)
GitHub Check: Test Python (11, 3.13)
GitHub Check: Test Python (10, 3.10)
GitHub Check: Test Python (9, 3.13)
GitHub Check: Test Python (10, 3.13)
GitHub Check: Test Python (8, 3.10)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (9, 3.10)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (7, 3.13)
GitHub Check: Test Python (6, 3.13)
GitHub Check: Test Python (12, 3.10)
GitHub Check: Test Python (12, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Test Python (3, 3.10)
GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (7, 3.10)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (2, 3.13)
GitHub Check: Test Python (2, 3.10)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)

🔇 Additional comments (12)

deepmd/pt/train/training.py (4)

126-129: Verify legacy optimizer keys are handled.

The trainer now reads only the top-level optimizer section when optim_dict is absent. If existing configs still set training.opt_type / weight_decay / adam_beta2, those values will be ignored. Please confirm this is an intentional breaking change or add a fallback/deprecation error to make it explicit.

Also applies to: 319-320

160-197: LGTM: per-optimizer defaults centralized.

Nice consolidation of default hyperparameters and validation in get_opt_param.

775-793: Nice: AdaMuon now restores optimizer state + scheduler.

Keeps resume behavior consistent with Adam/AdamW.

733-764: No changes needed — fused parameter is guaranteed in all supported PyTorch versions.

The project requires PyTorch ≥2.7 (GPU) or ≥2.8 (CPU) per pyproject.toml. Both torch.optim.Adam (fused since 1.13) and torch.optim.AdamW (fused since 2.0) fully support the fused parameter across all pinned versions. The current implementation is correct and doesn't require guards.

Likely an incorrect or invalid review comment.

deepmd/utils/argcheck.py (3)

2568-2852: LGTM: optimizer schema centralized under optimizer.

Clear variant-based definitions and docstrings.

3683-3684: No review comment needed.

3754-3768: LGTM: optimizer args wired into gen_args for single/multi-task.

deepmd/pd/train/training.py (2)

116-163: LGTM: optimizer params now sourced from the top-level config.

Parsing the optimizer block once keeps the trainer aligned with the new schema.

Also applies to: 250-266

595-606: LGTM: optimizer hyperparams are now plumbed through.

deepmd/tf/train/trainer.py (2)

123-137: Optimizer parsing/validation looks solid.

323-348: Optimizer construction is consistent across branches.

doc/train/training-advanced.md (1)

48-63: Optimizer section is clear and well-scoped.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

doc/model/train-fitting-dos.md

doc/model/train-fitting-property.md

doc/model/train-fitting-tensor.md

codecov · 2026-01-16T16:21:45Z

Codecov Report

❌ Patch coverage is 77.55102% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.14%. Comparing base (43b5809) to head (350cac1).
⚠️ Report is 13 commits behind head on master.

Files with missing lines	Patch %	Lines
deepmd/utils/compat.py	64.00%	9 Missing ⚠️
deepmd/pt/train/training.py	60.00%	8 Missing ⚠️
deepmd/tf/train/trainer.py	69.23%	4 Missing ⚠️
deepmd/pd/train/training.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5157      +/-   ##
==========================================
+ Coverage   81.98%   82.14%   +0.15%     
==========================================
  Files         750      753       +3     
  Lines       75189    75883     +694     
  Branches     3615     3648      +33     
==========================================
+ Hits        61646    62336     +690     
  Misses      12380    12380              
- Partials     1163     1167       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

deepmd/pt/train/training.py

deepmd/utils/argcheck.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@deepmd/pt/train/training.py`:
- Around line 160-189: get_opt_param currently silently defaults to "Adam" when
callers use the legacy key "opt_type" or omit "type"; change get_opt_param to
check for a legacy "opt_type" key first and if present copy it into the
canonical "type" with a deprecation warning (use warnings.warn(...,
DeprecationWarning) or the module logger) before validating opt_type, and if
neither key is present raise a ValueError instead of silently defaulting to
"Adam"; reference the function get_opt_param and the symbols opt_type,
params["type"], and params["opt_type"] when implementing this guard and warning.

In `@deepmd/utils/compat.py`:
- Around line 426-477: The conversion currently only runs when "opt_type" is
present so legacy optimizer keys left in training_cfg (e.g., adam_beta1,
weight_decay) are ignored; update the logic in the block around
training_cfg/optimizer_keys (symbols: training_cfg, optimizer_keys,
extracted_cfg, optimizer_cfg) to trigger extraction when any legacy key is
present (not just opt_type), include the legacy alias "muon_min_2d_dim" by
adding it to optimizer_keys and mapping it to "min_2d_dim" when moving keys,
then proceed to convert opt_type->type if present and merge into optimizer_cfg
(preserving conversion precedence) and keep the existing defaulting to
default_optimizer/type behavior.

🧹 Nitpick comments (2)

deepmd/pd/train/training.py (1)
152-173: Consider removing unused kf_ parameters from extraction.*

The Paddle backend only supports the Adam optimizer (validated at line 161-162), but get_opt_param still extracts Kalman Filter parameters (kf_blocksize, kf_start_pref_e, etc.) that are only relevant for the LKF optimizer. While this doesn't cause functional issues, removing these unused extractions would improve code clarity and align better with the backend's actual capabilities.

The early validation for unsupported optimizer types is good (fail-fast approach).
♻️ Optional: Remove unused kf_* parameters
         opt_param = {
-            "kf_blocksize": params.get("kf_blocksize"),
-            "kf_start_pref_e": params.get("kf_start_pref_e"),
-            "kf_limit_pref_e": params.get("kf_limit_pref_e"),
-            "kf_start_pref_f": params.get("kf_start_pref_f"),
-            "kf_limit_pref_f": params.get("kf_limit_pref_f"),
             "adam_beta1": params.get("adam_beta1"),
             "adam_beta2": params.get("adam_beta2"),
             "weight_decay": params.get("weight_decay"),
         }
deepmd/tf/train/trainer.py (1)
324-349: Minor: Duplicate optimizer type validation.

The optimizer type check at lines 325-328 is redundant since the same validation already occurs in _init_param (lines 130-133), which runs before _build_optimizer is called. The second check could be removed for cleaner code, though keeping it as a defensive guard is also acceptable.
♻️ Optional: Remove redundant validation
     def _build_optimizer(self):
-        if self.optimizer_type != "Adam":
-            raise RuntimeError(
-                f"Unsupported optimizer type {self.optimizer_type} for TensorFlow backend."
-            )
         if self.run_opt.is_distrib:

deepmd/pt/train/training.py

deepmd/utils/compat.py

deepmd/pt/train/training.py

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

deepmd/utils/argcheck.py (1)
4035-4044: ⚠️ Potential issue | 🔴 Critical

Verify migration runs before strict normalization in all entrypoints.

While all main training entrypoints (TensorFlow train/compress/change_bias, PaddlePaddle, PyTorch) correctly call update_deepmd_input() before normalize(), the NVNMD entrypoint at deepmd/tf/nvnmd/entrypoints/mapt.py calls normalize() without prior migration. Since optimizer keys are now schema-driven, any legacy configs will fail strict validation if not migrated first. This applies to all code paths that validate configs with strict=True.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 4035 - 4044, The NVNMD entrypoint
(mapt) calls normalize() before migration, causing strict schema validation
failures for legacy optimizer keys; update the entrypoint to call
update_deepmd_input(config) (the same migration used by other training
entrypoints) before calling normalize(config, strict=True), and audit any other
NVNMD-related code paths to ensure update_deepmd_input() runs prior to
normalize() when strict validation is used.

♻️ Duplicate comments (1)

deepmd/pt/train/training.py (1)

195-200: ⚠️ Potential issue | 🟠 Major

Do not silently ignore legacy opt_type.

At Line 195, configs that still provide opt_type but not type silently become Adam, which can change training behavior without warning.

🔧 Suggested guard

-            opt_type = params.get("type", "Adam")
+            opt_type = params.get("type")
+            if opt_type is None and "opt_type" in params:
+                log.warning(
+                    "optimizer.opt_type is deprecated; use optimizer.type or run convert_optimizer_to_new_format."
+                )
+                opt_type = params["opt_type"]
+            if opt_type is None:
+                opt_type = "Adam"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/train/training.py` around lines 195 - 200, The current code
silently falls back to "Adam" when configs include a legacy "opt_type" key but
no "type", so update the opt selection logic in training.py (the block that
computes opt_type and opt_param) to detect a legacy params.get("opt_type") when
"type" is absent and refuse silent fallback: either map/migrate it explicitly
(e.g., set params["type"]=params.pop("opt_type") for supported legacy names) or
raise a clear ValueError telling the user to rename "opt_type" to "type"
(include the offending value in the message); ensure opt_param still pops "type"
afterwards and that the error/message references the original opt_type value to
aid debugging.

🧹 Nitpick comments (2)

deepmd/tf/train/trainer.py (1)
347-350: Consolidate duplicated optimizer-type validation.

The same optimizer_type != "Adam" guard is already enforced in _init_param; keeping one source of truth reduces drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/tf/train/trainer.py` around lines 347 - 350, Remove the duplicated
optimizer-type guard in the shown block and rely on the single validation in
_init_param: delete the conditional that raises for self.optimizer_type !=
"Adam" (the redundant RuntimeError) so the optimizer validation lives only in
_init_param; if you want safety, replace the block with a simple assertion or a
call to the existing _init_param validation helper (e.g., call
self._init_param() or assert self.optimizer_type == "Adam") rather than
duplicating the error handling.
deepmd/pt/train/training.py (1)
799-802: Document fused optimizer device support or add explicit device validation.

Line 801 enables fused=True on all non-CPU devices. While PyTorch's fused optimizer officially supports CUDA, CPU, MPS, XPU, HPU, and MTIA, it will fail at runtime on unsupported backends (e.g., TPU/XLA). Consider either:

Adding a comment documenting the supported device assumption, or

Explicitly validating the device type: if DEVICE.type in ("cuda", "mps", "xpu", "hpu", "mtia"): extra["fused"] = True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt/train/training.py` around lines 799 - 802, The code sets extra =
{"betas": adam_betas, "fused": DEVICE.type != "cpu"} inside the Adam/AdamW
branch (when self.opt_type in ("Adam", "AdamW")), which will enable fused
optimizers on unsupported backends; update this by validating DEVICE.type
explicitly before setting fused (e.g., only set fused=True when DEVICE.type is
one of the supported backends like "cuda","mps","xpu","hpu","mtia") or add a
clear comment documenting the supported devices; modify the extra construction
in the Adam/AdamW handling (the extra dict created near the Adam/AdamW branch)
to perform this device whitelist check and set fused accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pd/train/training.py`:
- Line 123: Validate the optimizer config immediately after reading it instead
of defaulting to {}, i.e., check that config contains an "optimizer" mapping and
required fields for the chosen optimizer (for Adam ensure keys like "lr",
"betas", "eps", "weight_decay" or whatever keys your Adam usage at lines 686-688
expects are present); if missing, raise a clear ValueError with a message
stating which key is missing. Update the code around the optimizer_params
assignment (the variable optimizer_params in deepmd/pd/train/training.py) to
perform this validation (and reuse the same validation for other code paths
noted at lines ~161-172 and ~278), so the later Adam construction code that
reads those keys cannot raise a late KeyError.
- Around line 167-169: The current Paddle trainer only allows "Adam" via a
runtime check on opt_type (params.get("type")) in training.py; update validation
so optimizer types are filtered at config-load time instead of only at trainer
instantiation: add a backend-aware allowed list (e.g., PADDLE_ALLOWED_OPTS =
{"Adam"} and PT_ALLOWED_OPTS = {"Adam","AdamW","LKF","AdaMuon","HybridMuon"})
and validate params["type"] against the Paddle set when the backend is Paddle,
or better yet add schema-level backend-specific allowed values for the optimizer
type so configs for non-Paddle backends are rejected earlier; ensure the error
message references the backend and lists allowed values instead of the generic
raise ValueError currently using opt_type.

In `@deepmd/tf/train/trainer.py`:
- Around line 141-145: The code in trainer.py that sets self.optimizer_beta1,
self.optimizer_beta2, and self.optimizer_weight_decay calls float(...) on values
that may be None (optimizer_param.get("adam_beta1"), etc.), which raises
TypeError; fix by providing safe defaults or guarding None before conversion
(e.g., use optimizer_param.get("adam_beta1", <default>) and
optimizer_param.get("adam_beta2", <default>) and
optimizer_param.get("weight_decay", 0.0) or check for None and only cast when
present), so that self.optimizer_beta1, self.optimizer_beta2, and
self.optimizer_weight_decay always receive valid float values while keeping
self.optimizer_type parsing unchanged.

---

Outside diff comments:
In `@deepmd/utils/argcheck.py`:
- Around line 4035-4044: The NVNMD entrypoint (mapt) calls normalize() before
migration, causing strict schema validation failures for legacy optimizer keys;
update the entrypoint to call update_deepmd_input(config) (the same migration
used by other training entrypoints) before calling normalize(config,
strict=True), and audit any other NVNMD-related code paths to ensure
update_deepmd_input() runs prior to normalize() when strict validation is used.

---

Duplicate comments:
In `@deepmd/pt/train/training.py`:
- Around line 195-200: The current code silently falls back to "Adam" when
configs include a legacy "opt_type" key but no "type", so update the opt
selection logic in training.py (the block that computes opt_type and opt_param)
to detect a legacy params.get("opt_type") when "type" is absent and refuse
silent fallback: either map/migrate it explicitly (e.g., set
params["type"]=params.pop("opt_type") for supported legacy names) or raise a
clear ValueError telling the user to rename "opt_type" to "type" (include the
offending value in the message); ensure opt_param still pops "type" afterwards
and that the error/message references the original opt_type value to aid
debugging.

---

Nitpick comments:
In `@deepmd/pt/train/training.py`:
- Around line 799-802: The code sets extra = {"betas": adam_betas, "fused":
DEVICE.type != "cpu"} inside the Adam/AdamW branch (when self.opt_type in
("Adam", "AdamW")), which will enable fused optimizers on unsupported backends;
update this by validating DEVICE.type explicitly before setting fused (e.g.,
only set fused=True when DEVICE.type is one of the supported backends like
"cuda","mps","xpu","hpu","mtia") or add a clear comment documenting the
supported devices; modify the extra construction in the Adam/AdamW handling (the
extra dict created near the Adam/AdamW branch) to perform this device whitelist
check and set fused accordingly.

In `@deepmd/tf/train/trainer.py`:
- Around line 347-350: Remove the duplicated optimizer-type guard in the shown
block and rely on the single validation in _init_param: delete the conditional
that raises for self.optimizer_type != "Adam" (the redundant RuntimeError) so
the optimizer validation lives only in _init_param; if you want safety, replace
the block with a simple assertion or a call to the existing _init_param
validation helper (e.g., call self._init_param() or assert self.optimizer_type
== "Adam") rather than duplicating the error handling.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc550f and a1220c8.

📒 Files selected for processing (10)

deepmd/pd/train/training.py
deepmd/pt/train/training.py
deepmd/tf/entrypoints/train.py
deepmd/tf/train/trainer.py
deepmd/tf/utils/compat.py
deepmd/utils/argcheck.py
deepmd/utils/compat.py
doc/model/train-fitting-dos.md
doc/model/train-fitting-property.md
doc/model/train-fitting-tensor.md

💤 Files with no reviewable changes (1)

deepmd/tf/entrypoints/train.py

🚧 Files skipped from review as they are similar to previous changes (3)

doc/model/train-fitting-dos.md
deepmd/utils/compat.py
doc/model/train-fitting-property.md

deepmd/pd/train/training.py

deepmd/tf/train/trainer.py

iProzd

Add default adamW params in dpa3 examples?
Others LGTM.

deepmd/utils/argcheck.py

coderabbitai

🧹 Nitpick comments (1)

deepmd/utils/argcheck.py (1)
3030-3044: Consider documenting breaking change implications.

Since this PR is labeled as a "breaking change", consider adding a comment or documentation note explaining what constitutes a breaking change here and how users should migrate their configurations. The PR summary indicates backward-compatibility migration is handled, but explicit documentation would help users understand the transition.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/utils/argcheck.py` around lines 3030 - 3044, Add a short inline
comment and/or module-level docstring near the Loss configurations section
(e.g., above the "start_pref" function and the earlier "Argument(... 'optimizer'
...)" declaration) that explicitly states this PR is a breaking change,
describes what changed (which config keys/values or behavior were
renamed/removed), and provides concise migration guidance (mapping of old keys
to new keys and any automatic migration logic users should expect); reference
the symbols start_pref and the optimizer Argument/doc_optimizer so readers can
find the affected code quickly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deepmd/utils/argcheck.py`:
- Around line 3030-3044: Add a short inline comment and/or module-level
docstring near the Loss configurations section (e.g., above the "start_pref"
function and the earlier "Argument(... 'optimizer' ...)" declaration) that
explicitly states this PR is a breaking change, describes what changed (which
config keys/values or behavior were renamed/removed), and provides concise
migration guidance (mapping of old keys to new keys and any automatic migration
logic users should expect); reference the symbols start_pref and the optimizer
Argument/doc_optimizer so readers can find the affected code quickly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1220c8 and 42f9eec.

📒 Files selected for processing (3)

deepmd/utils/argcheck.py
examples/water/dpa3/input_torch.json
examples/water/dpa3/input_torch_dynamic.json

deepmd/utils/compat.py

deepmd/pt/train/training.py

Copilot AI review requested due to automatic review settings January 16, 2026 15:48

github-actions bot added Python Docs labels Jan 16, 2026

dosubot bot added the enhancement label Jan 16, 2026

Copilot started reviewing on behalf of OutisLi January 16, 2026 15:49 View session

Copilot AI reviewed Jan 16, 2026

View reviewed changes

deepmd/pd/train/training.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

doc/model/train-fitting-dos.md Outdated Show resolved Hide resolved

doc/model/train-fitting-property.md Outdated Show resolved Hide resolved

doc/model/train-fitting-tensor.md Outdated Show resolved Hide resolved

OutisLi requested review from iProzd, njzjz and wanghan-iapcm January 17, 2026 09:54

OutisLi added the breaking change Breaking changes that should notify users. label Jan 17, 2026

iProzd requested changes Jan 20, 2026

View reviewed changes

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

deepmd/utils/argcheck.py Show resolved Hide resolved

OutisLi force-pushed the pr/opt branch from 977db22 to 0dc550f Compare January 23, 2026 09:39

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

deepmd/pt/train/training.py Show resolved Hide resolved

deepmd/utils/compat.py Outdated Show resolved Hide resolved

wanghan-iapcm requested changes Jan 23, 2026

View reviewed changes

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

OutisLi marked this pull request as draft January 26, 2026 07:02

OutisLi force-pushed the pr/opt branch from 0dc550f to 37b5eba Compare January 26, 2026 08:57

OutisLi force-pushed the pr/opt branch 3 times, most recently from 129b456 to 7f73f71 Compare February 14, 2026 15:26

OutisLi force-pushed the pr/opt branch 2 times, most recently from de34fe0 to 5331422 Compare February 21, 2026 12:17

OutisLi added 5 commits February 26, 2026 10:11

refactor(optimizer): finalize optimizer schema and backend handling

248d9ec

update input jsons

523abb0

refactor opt initialization

938ebe1

fix: add argcheck normalizer to tf unit test

17f04d6

fix: add update_deepmd_input to tf unit test

a1220c8

OutisLi force-pushed the pr/opt branch from 4fc8150 to a1220c8 Compare February 26, 2026 02:16

OutisLi marked this pull request as ready for review February 26, 2026 02:16

OutisLi requested review from iProzd and wanghan-iapcm February 26, 2026 02:16

dosubot bot added the new feature label Feb 26, 2026

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

deepmd/pd/train/training.py Show resolved Hide resolved

deepmd/pd/train/training.py Show resolved Hide resolved

deepmd/tf/train/trainer.py Show resolved Hide resolved

iProzd requested changes Feb 27, 2026

View reviewed changes

deepmd/utils/argcheck.py Outdated Show resolved Hide resolved

use 0.001 by default for AdamW

42f9eec

github-actions bot added the Examples label Feb 27, 2026

OutisLi requested a review from iProzd February 27, 2026 09:43

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

njzjz approved these changes Feb 28, 2026

View reviewed changes

wanghan-iapcm requested changes Mar 1, 2026

View reviewed changes

deepmd/utils/compat.py Outdated Show resolved Hide resolved

deepmd/pt/train/training.py Outdated Show resolved Hide resolved

OutisLi added 2 commits March 1, 2026 17:22

remove unused optim_dict

9a7e2a8

rename

350cac1

OutisLi requested a review from wanghan-iapcm March 2, 2026 02:54

Conversation

OutisLi commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iProzd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

OutisLi commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

codecov bot commented Jan 16, 2026 •

edited

Loading