Skip to content

docs: restructure and clarify learning-rate.md documentation#5276

Open
OutisLi wants to merge 2 commits intodeepmodeling:masterfrom
OutisLi:pr/lrdoc
Open

docs: restructure and clarify learning-rate.md documentation#5276
OutisLi wants to merge 2 commits intodeepmodeling:masterfrom
OutisLi:pr/lrdoc

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Mar 1, 2026

Summary

This PR completely restructures the learning-rate.md documentation to improve clarity, organization, and accuracy. The previous version had content scattered across sections with significant repetition. The new structure follows a user-centric approach: quick start → configuration reference → mathematical theory.

Key Changes

Structural Improvements

  • Reorganized section order: Quick Start → Parameters → Schedule Types → Warmup → Mathematical Theory → Migration Guide
  • Eliminated content duplication: Removed redundant formulas between Theory and Instructions sections

Summary by CodeRabbit

  • Documentation
    • Reworked learning-rate guide into a structured, example-driven reference with Quick Start, explicit exponential and cosine schedules, and JSON configuration examples.
    • Added a Notation/Theory section, clear warmup formulas and mutual‑exclusivity rules, unified parameter descriptions (start_lr/stop_lr/stop_lr_ratio/decay_steps) and smooth vs stepped behavior.
    • Expanded migration guidance for versions prior to 3.1.3 and refreshed references.

Copilot AI review requested due to automatic review settings March 1, 2026 03:00
@github-actions github-actions bot added the Docs label Mar 1, 2026
@OutisLi OutisLi requested review from iProzd and njzjz March 1, 2026 03:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures doc/train/learning-rate.md to present DeePMD-kit learning-rate schedules in a more user-oriented flow (Quick Start → parameter reference → schedule details → warmup → math → migration), aiming to reduce repetition and align formulas/examples with the implementation.

Changes:

  • Reorganized the document into a Quick Start + parameter reference + detailed schedule sections + theory/migration.
  • Added parameter tables and expanded/rewrote examples for exp and cosine schedules (including warmup and smooth mode).
  • Consolidated and clarified the mathematical formulas for warmup, exponential decay, and cosine annealing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Rewrites learning-rate documentation into two schedule models (exponential and cosine), adds Quick Start JSON examples, formalizes notation and formulas for warmup and decay, clarifies parameter requirements and mutual-exclusivity (start_lr, stop_lr/stop_lr_ratio, warmup params), and adds migration guidance for v3.1.3+.

Changes

Cohort / File(s) Summary
Learning Rate Documentation
doc/train/learning-rate.md
Full restructure and rewrite: replaces prior Theory with two schedules (exp, cosine), adds Quick Start JSON configs and many self-contained examples, formalizes Notation/Theory and warmup formulas (warmup_start_factor, warmup_steps/ratio), clarifies start_lr and stop_lr/stop_lr_ratio requirements and mutual exclusivity, introduces smooth vs stepped modes, and adds migration and references sections.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: restructure and clarify learning-rate.md documentation' directly and clearly summarizes the main change: restructuring and clarifying the learning-rate.md documentation file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)

62-62: Update documentation to specify the automatic decay_steps default formula.

The documentation uses the vague phrase "sensible default" without explaining what it actually means. Based on the code implementation in deepmd/dpmodel/utils/learning_rate.py, the automatic adjustment sets decay_steps to:

  • 100 if (decay_total // 10) > 100
  • decay_total // 100 + 1 otherwise (where decay_total = num_steps - warmup_steps)

Consider clarifying the documentation to include this specific formula so users understand the behavior when decay_steps exceeds the available decay phase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` at line 62, The doc string for the `decay_steps`
parameter is vague; update the description to state the exact automatic default
used in deepmd/dpmodel/utils/learning_rate.py: compute decay_total = num_steps -
warmup_steps, and if decay_steps > decay_total and decay_rate is not provided
then set decay_steps to 100 when (decay_total // 10) > 100, otherwise set
decay_steps to (decay_total // 100) + 1; mention this formula alongside the
existing explanation for the `decay_steps` parameter so users know the precise
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 62: The doc string for the `decay_steps` parameter is vague; update the
description to state the exact automatic default used in
deepmd/dpmodel/utils/learning_rate.py: compute decay_total = num_steps -
warmup_steps, and if decay_steps > decay_total and decay_rate is not provided
then set decay_steps to 100 when (decay_total // 10) > 100, otherwise set
decay_steps to (decay_total // 100) + 1; mention this formula alongside the
existing explanation for the `decay_steps` parameter so users know the precise
behavior.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0a966b and c982361.

📒 Files selected for processing (1)
  • doc/train/learning-rate.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)

358-358: Remove or populate the empty References section.

The document ends with an empty "## References" header. Either remove this section entirely or populate it with relevant references (e.g., papers on learning rate schedules, PyTorch documentation, etc.).

✂️ Proposed fix to remove the empty section
-## References
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` at line 358, The doc ends with an empty "##
References" heading; remove this unused section or populate it with actual
citations: edit the "## References" header near the end of learning-rate.md and
either delete the header and any trailing blank lines, or add relevant reference
entries (e.g., links to key papers on learning rate schedules,
PyTorch/torch.optim docs, and any blog posts or textbooks cited elsewhere in the
doc) so the section is no longer empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 358: The doc ends with an empty "## References" heading; remove this
unused section or populate it with actual citations: edit the "## References"
header near the end of learning-rate.md and either delete the header and any
trailing blank lines, or add relevant reference entries (e.g., links to key
papers on learning rate schedules, PyTorch/torch.optim docs, and any blog posts
or textbooks cited elsewhere in the doc) so the section is no longer empty.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c982361 and b7caf37.

📒 Files selected for processing (1)
  • doc/train/learning-rate.md

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.12%. Comparing base (a3db25a) to head (bc7c707).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5276   +/-   ##
=======================================
  Coverage   82.12%   82.12%           
=======================================
  Files         753      753           
  Lines       75825    75825           
  Branches     3648     3648           
=======================================
  Hits        62275    62275           
  Misses      12382    12382           
  Partials     1168     1168           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)

67-67: Clarify the exact fallback for oversized decay_steps.

“Automatically adjusted to a sensible default” is too vague for users debugging LR behavior. Please document the exact adjustment rule/value used by implementation (Line 67), not just that an adjustment happens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` at line 67, Update the table description for
decay_steps to state the precise fallback rule used in the scheduler: when
decay_steps > (numb_steps - warmup_steps) and decay_rate is not set, set
decay_steps to (numb_steps - warmup_steps) and compute decay_rate such that the
final LR equals the intended minimum (or state the exact numeric default
decay_rate if the implementation uses a fixed value); reference the parameters
decay_steps, decay_rate, numb_steps, and warmup_steps and mirror the exact logic
used in the learning rate schedule implementation so users can predict LR
behavior precisely.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@doc/train/learning-rate.md`:
- Line 67: Update the table description for decay_steps to state the precise
fallback rule used in the scheduler: when decay_steps > (numb_steps -
warmup_steps) and decay_rate is not set, set decay_steps to (numb_steps -
warmup_steps) and compute decay_rate such that the final LR equals the intended
minimum (or state the exact numeric default decay_rate if the implementation
uses a fixed value); reference the parameters decay_steps, decay_rate,
numb_steps, and warmup_steps and mirror the exact logic used in the learning
rate schedule implementation so users can predict LR behavior precisely.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7caf37 and d494ba8.

📒 Files selected for processing (1)
  • doc/train/learning-rate.md

- Add note about stop_lr serving as minimum LR when decay_rate is set
- Document automatic decay_steps adjustment behavior
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)

155-156: Soften the convergence claim for cosine annealing.

Line 155 states cosine annealing “often provides better convergence than exponential decay,” which can be workload-dependent and may mislead users. Suggest neutral wording (e.g., “can provide better convergence in some tasks”).

✏️ Suggested wording tweak
-The cosine annealing schedule smoothly decreases the learning rate following a cosine curve. It often provides better convergence than exponential decay.
+The cosine annealing schedule smoothly decreases the learning rate following a cosine curve. It can provide better convergence than exponential decay for some tasks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` around lines 155 - 156, Replace the definitive
claim "often provides better convergence than exponential decay" in the sentence
about the cosine annealing schedule with a softer, workload-dependent phrasing
such as "can provide better convergence in some tasks" or "may provide better
convergence for certain workloads" so the doc's statement about cosine annealing
is neutral and acknowledges variability across tasks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@doc/train/learning-rate.md`:
- Around line 155-156: Replace the definitive claim "often provides better
convergence than exponential decay" in the sentence about the cosine annealing
schedule with a softer, workload-dependent phrasing such as "can provide better
convergence in some tasks" or "may provide better convergence for certain
workloads" so the doc's statement about cosine annealing is neutral and
acknowledges variability across tasks.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d494ba8 and 30c9bd0.

📒 Files selected for processing (1)
  • doc/train/learning-rate.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
doc/train/learning-rate.md (1)

122-123: Soften or qualify the convergence claim.

Line 122 currently reads as a broad performance assertion. Suggest qualifying it as workload-dependent (or linking a benchmark) to avoid overgeneralizing behavior across systems/tasks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` around lines 122 - 123, The sentence asserting
that "It often provides better convergence than exponential decay" is too
categorical; update the phrasing around "cosine annealing schedule" to qualify
the claim (e.g., "can often provide better convergence for certain models or
workloads" or "may improve convergence in some settings, depending on task and
hyperparameters") and optionally add a supporting citation or benchmark
reference to back the claim; ensure the revised text makes the outcome
workload-/task-dependent rather than universal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/train/learning-rate.md`:
- Around line 196-201: State explicit valid bounds for warmup inputs: require
warmup_steps and warmup_ratio to be mutually exclusive and constrained as 0 <=
warmup_steps < numb_steps and equivalently 0 <= warmup_ratio < 1 (where
numb_steps is derived from training.numb_steps); update all places that show
formulas deriving warmup_steps from warmup_ratio or that compute the decay phase
(the sections describing warmup_steps, warmup_ratio, and the decay/step
formulas) to include these constraints and a short note that values equal to the
total steps or >=1 are invalid to avoid division-by-zero/invalid decay-phase
behavior.

---

Nitpick comments:
In `@doc/train/learning-rate.md`:
- Around line 122-123: The sentence asserting that "It often provides better
convergence than exponential decay" is too categorical; update the phrasing
around "cosine annealing schedule" to qualify the claim (e.g., "can often
provide better convergence for certain models or workloads" or "may improve
convergence in some settings, depending on task and hyperparameters") and
optionally add a supporting citation or benchmark reference to back the claim;
ensure the revised text makes the outcome workload-/task-dependent rather than
universal.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30c9bd0 and bc7c707.

📒 Files selected for processing (1)
  • doc/train/learning-rate.md

Comment on lines +196 to +201
You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):

**Example 1: Basic exponential decay without warmup**
- `warmup_steps`: Explicit number of warmup steps
- `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`

```json
"learning_rate": {
"type": "exp",
"start_lr": 0.001,
"stop_lr": 1e-6,
"decay_steps": 5000
}
```
These are mutually exclusive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit warmup bounds to prevent undefined formulas.

Please explicitly state valid constraints for warmup inputs (e.g., 0 <= warmup_steps < numb_steps, and equivalently 0 <= warmup_ratio < 1). Without this, formulas at Line 46, Line 129, Line 237, and Line 253 can hit division-by-zero/invalid decay phase.

Proposed doc patch
 ### Specifying warmup duration
 
 You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):
@@
 - `warmup_steps`: Explicit number of warmup steps
 - `warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`
 
 These are mutually exclusive.
+Valid ranges:
+- `0 <= warmup_steps < numb_steps`
+- `0 <= warmup_ratio < 1`
+
+If warmup equals total steps, the decay phase length becomes zero and decay formulas are undefined.

Also applies to: 46-47, 129-130, 237-238, 253-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/train/learning-rate.md` around lines 196 - 201, State explicit valid
bounds for warmup inputs: require warmup_steps and warmup_ratio to be mutually
exclusive and constrained as 0 <= warmup_steps < numb_steps and equivalently 0
<= warmup_ratio < 1 (where numb_steps is derived from training.numb_steps);
update all places that show formulas deriving warmup_steps from warmup_ratio or
that compute the decay phase (the sections describing warmup_steps,
warmup_ratio, and the decay/step formulas) to include these constraints and a
short note that values equal to the total steps or >=1 are invalid to avoid
division-by-zero/invalid decay-phase behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants