docs: restructure and clarify learning-rate.md documentation#5276
docs: restructure and clarify learning-rate.md documentation#5276OutisLi wants to merge 2 commits intodeepmodeling:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures doc/train/learning-rate.md to present DeePMD-kit learning-rate schedules in a more user-oriented flow (Quick Start → parameter reference → schedule details → warmup → math → migration), aiming to reduce repetition and align formulas/examples with the implementation.
Changes:
- Reorganized the document into a Quick Start + parameter reference + detailed schedule sections + theory/migration.
- Added parameter tables and expanded/rewrote examples for
expandcosineschedules (including warmup and smooth mode). - Consolidated and clarified the mathematical formulas for warmup, exponential decay, and cosine annealing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRewrites learning-rate documentation into two schedule models (exponential and cosine), adds Quick Start JSON examples, formalizes notation and formulas for warmup and decay, clarifies parameter requirements and mutual-exclusivity (start_lr, stop_lr/stop_lr_ratio, warmup params), and adds migration guidance for v3.1.3+. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)
62-62: Update documentation to specify the automatic decay_steps default formula.The documentation uses the vague phrase "sensible default" without explaining what it actually means. Based on the code implementation in
deepmd/dpmodel/utils/learning_rate.py, the automatic adjustment setsdecay_stepsto:
100if(decay_total // 10) > 100decay_total // 100 + 1otherwise (wheredecay_total = num_steps - warmup_steps)Consider clarifying the documentation to include this specific formula so users understand the behavior when
decay_stepsexceeds the available decay phase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/train/learning-rate.md` at line 62, The doc string for the `decay_steps` parameter is vague; update the description to state the exact automatic default used in deepmd/dpmodel/utils/learning_rate.py: compute decay_total = num_steps - warmup_steps, and if decay_steps > decay_total and decay_rate is not provided then set decay_steps to 100 when (decay_total // 10) > 100, otherwise set decay_steps to (decay_total // 100) + 1; mention this formula alongside the existing explanation for the `decay_steps` parameter so users know the precise behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 62: The doc string for the `decay_steps` parameter is vague; update the
description to state the exact automatic default used in
deepmd/dpmodel/utils/learning_rate.py: compute decay_total = num_steps -
warmup_steps, and if decay_steps > decay_total and decay_rate is not provided
then set decay_steps to 100 when (decay_total // 10) > 100, otherwise set
decay_steps to (decay_total // 100) + 1; mention this formula alongside the
existing explanation for the `decay_steps` parameter so users know the precise
behavior.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)
358-358: Remove or populate the empty References section.The document ends with an empty "## References" header. Either remove this section entirely or populate it with relevant references (e.g., papers on learning rate schedules, PyTorch documentation, etc.).
✂️ Proposed fix to remove the empty section
-## References🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/train/learning-rate.md` at line 358, The doc ends with an empty "## References" heading; remove this unused section or populate it with actual citations: edit the "## References" header near the end of learning-rate.md and either delete the header and any trailing blank lines, or add relevant reference entries (e.g., links to key papers on learning rate schedules, PyTorch/torch.optim docs, and any blog posts or textbooks cited elsewhere in the doc) so the section is no longer empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 358: The doc ends with an empty "## References" heading; remove this
unused section or populate it with actual citations: edit the "## References"
header near the end of learning-rate.md and either delete the header and any
trailing blank lines, or add relevant reference entries (e.g., links to key
papers on learning rate schedules, PyTorch/torch.optim docs, and any blog posts
or textbooks cited elsewhere in the doc) so the section is no longer empty.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5276 +/- ##
=======================================
Coverage 82.12% 82.12%
=======================================
Files 753 753
Lines 75825 75825
Branches 3648 3648
=======================================
Hits 62275 62275
Misses 12382 12382
Partials 1168 1168 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)
67-67: Clarify the exact fallback for oversizeddecay_steps.“Automatically adjusted to a sensible default” is too vague for users debugging LR behavior. Please document the exact adjustment rule/value used by implementation (Line 67), not just that an adjustment happens.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/train/learning-rate.md` at line 67, Update the table description for decay_steps to state the precise fallback rule used in the scheduler: when decay_steps > (numb_steps - warmup_steps) and decay_rate is not set, set decay_steps to (numb_steps - warmup_steps) and compute decay_rate such that the final LR equals the intended minimum (or state the exact numeric default decay_rate if the implementation uses a fixed value); reference the parameters decay_steps, decay_rate, numb_steps, and warmup_steps and mirror the exact logic used in the learning rate schedule implementation so users can predict LR behavior precisely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 67: Update the table description for decay_steps to state the precise
fallback rule used in the scheduler: when decay_steps > (numb_steps -
warmup_steps) and decay_rate is not set, set decay_steps to (numb_steps -
warmup_steps) and compute decay_rate such that the final LR equals the intended
minimum (or state the exact numeric default decay_rate if the implementation
uses a fixed value); reference the parameters decay_steps, decay_rate,
numb_steps, and warmup_steps and mirror the exact logic used in the learning
rate schedule implementation so users can predict LR behavior precisely.
- Add note about stop_lr serving as minimum LR when decay_rate is set - Document automatic decay_steps adjustment behavior
There was a problem hiding this comment.
🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)
155-156: Soften the convergence claim for cosine annealing.Line 155 states cosine annealing “often provides better convergence than exponential decay,” which can be workload-dependent and may mislead users. Suggest neutral wording (e.g., “can provide better convergence in some tasks”).
✏️ Suggested wording tweak
-The cosine annealing schedule smoothly decreases the learning rate following a cosine curve. It often provides better convergence than exponential decay. +The cosine annealing schedule smoothly decreases the learning rate following a cosine curve. It can provide better convergence than exponential decay for some tasks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/train/learning-rate.md` around lines 155 - 156, Replace the definitive claim "often provides better convergence than exponential decay" in the sentence about the cosine annealing schedule with a softer, workload-dependent phrasing such as "can provide better convergence in some tasks" or "may provide better convergence for certain workloads" so the doc's statement about cosine annealing is neutral and acknowledges variability across tasks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@doc/train/learning-rate.md`:
- Around line 155-156: Replace the definitive claim "often provides better
convergence than exponential decay" in the sentence about the cosine annealing
schedule with a softer, workload-dependent phrasing such as "can provide better
convergence in some tasks" or "may provide better convergence for certain
workloads" so the doc's statement about cosine annealing is neutral and
acknowledges variability across tasks.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)
122-123: Soften or qualify the convergence claim.Line 122 currently reads as a broad performance assertion. Suggest qualifying it as workload-dependent (or linking a benchmark) to avoid overgeneralizing behavior across systems/tasks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@doc/train/learning-rate.md` around lines 122 - 123, The sentence asserting that "It often provides better convergence than exponential decay" is too categorical; update the phrasing around "cosine annealing schedule" to qualify the claim (e.g., "can often provide better convergence for certain models or workloads" or "may improve convergence in some settings, depending on task and hyperparameters") and optionally add a supporting citation or benchmark reference to back the claim; ensure the revised text makes the outcome workload-/task-dependent rather than universal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@doc/train/learning-rate.md`:
- Around line 196-201: State explicit valid bounds for warmup inputs: require
warmup_steps and warmup_ratio to be mutually exclusive and constrained as 0 <=
warmup_steps < numb_steps and equivalently 0 <= warmup_ratio < 1 (where
numb_steps is derived from training.numb_steps); update all places that show
formulas deriving warmup_steps from warmup_ratio or that compute the decay phase
(the sections describing warmup_steps, warmup_ratio, and the decay/step
formulas) to include these constraints and a short note that values equal to the
total steps or >=1 are invalid to avoid division-by-zero/invalid decay-phase
behavior.
---
Nitpick comments:
In `@doc/train/learning-rate.md`:
- Around line 122-123: The sentence asserting that "It often provides better
convergence than exponential decay" is too categorical; update the phrasing
around "cosine annealing schedule" to qualify the claim (e.g., "can often
provide better convergence for certain models or workloads" or "may improve
convergence in some settings, depending on task and hyperparameters") and
optionally add a supporting citation or benchmark reference to back the claim;
ensure the revised text makes the outcome workload-/task-dependent rather than
universal.
| You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative): | ||
|
|
||
| **Example 1: Basic exponential decay without warmup** | ||
| - `warmup_steps`: Explicit number of warmup steps | ||
| - `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps` | ||
|
|
||
| ```json | ||
| "learning_rate": { | ||
| "type": "exp", | ||
| "start_lr": 0.001, | ||
| "stop_lr": 1e-6, | ||
| "decay_steps": 5000 | ||
| } | ||
| ``` | ||
| These are mutually exclusive. |
There was a problem hiding this comment.
Add explicit warmup bounds to prevent undefined formulas.
Please explicitly state valid constraints for warmup inputs (e.g., 0 <= warmup_steps < numb_steps, and equivalently 0 <= warmup_ratio < 1). Without this, formulas at Line 46, Line 129, Line 237, and Line 253 can hit division-by-zero/invalid decay phase.
Proposed doc patch
### Specifying warmup duration
You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):
@@
- `warmup_steps`: Explicit number of warmup steps
- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`
These are mutually exclusive.
+Valid ranges:
+- `0 <= warmup_steps < numb_steps`
+- `0 <= warmup_ratio < 1`
+
+If warmup equals total steps, the decay phase length becomes zero and decay formulas are undefined.Also applies to: 46-47, 129-130, 237-238, 253-254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@doc/train/learning-rate.md` around lines 196 - 201, State explicit valid
bounds for warmup inputs: require warmup_steps and warmup_ratio to be mutually
exclusive and constrained as 0 <= warmup_steps < numb_steps and equivalently 0
<= warmup_ratio < 1 (where numb_steps is derived from training.numb_steps);
update all places that show formulas deriving warmup_steps from warmup_ratio or
that compute the decay phase (the sections describing warmup_steps,
warmup_ratio, and the decay/step formulas) to include these constraints and a
short note that values equal to the total steps or >=1 are invalid to avoid
division-by-zero/invalid decay-phase behavior.
Summary
This PR completely restructures the
learning-rate.mddocumentation to improve clarity, organization, and accuracy. The previous version had content scattered across sections with significant repetition. The new structure follows a user-centric approach: quick start → configuration reference → mathematical theory.Key Changes
Structural Improvements
Summary by CodeRabbit