You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/train/learning-rate.md
+36-98Lines changed: 36 additions & 98 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,9 @@ DeePMD-kit supports two learning rate schedules:
5
5
-**`exp`**: Exponential decay with optional stepped or smooth mode
6
6
-**`cosine`**: Cosine annealing for smooth decay curve
7
7
8
-
Both schedules support an optional warmup phase where the learning rate gradually increases from a small initial value to the target `start_lr`.
8
+
Both schedules support an optional warmup phase where the learning rate gradually increases from a small initial value to the target {ref}`start_lr <learning_rate/start_lr>`.
9
+
10
+
This page focuses on schedule behavior, examples, and formulas. For the canonical argument definitions, see {ref}`learning_rate <learning_rate>`.
9
11
10
12
## Quick Start
11
13
@@ -32,54 +34,25 @@ Both schedules support an optional warmup phase where the learning rate graduall
32
34
33
35
## Common parameters
34
36
35
-
The following parameters are shared by both `exp` and `cosine` schedules.
36
-
37
-
### Required parameters
38
-
39
-
-`start_lr`: The learning rate at the start of training (after warmup).
40
-
-`stop_lr` or `stop_lr_ratio` (must provide exactly one):
41
-
-`stop_lr`: The learning rate at the end of training.
42
-
-`stop_lr_ratio`: The ratio of `stop_lr` to `start_lr`. Computed as `stop_lr = start_lr * stop_lr_ratio`.
43
-
44
-
### Optional parameters
45
-
46
-
-`warmup_steps` or `warmup_ratio` (mutually exclusive):
47
-
-`warmup_steps`: Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`.
48
-
-`warmup_ratio`: Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * numb_steps)`.
49
-
-`warmup_start_factor`: Factor for initial warmup learning rate (default: 0.0). Warmup starts from `warmup_start_factor * start_lr`.
50
-
-`scale_by_worker`: How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"` (default: `"linear"`).
51
-
52
-
### Type-specific parameters
53
-
54
-
**Exponential decay (`type: "exp"`):**
55
-
56
-
-`decay_steps`: Interval (in steps) at which learning rate decays (default: 5000).
57
-
-`decay_rate`: Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.
58
-
-`smooth`: If `true`, use smooth exponential decay at every step. If `false`, use stepped decay (default: `false`).
37
+
Use {ref}`learning_rate <learning_rate>` as the canonical parameter reference. This page only highlights the argument combinations that matter when choosing a schedule:
59
38
60
-
**Cosine annealing (`type: "cosine"`):**
39
+
- Shared by both {ref}`exp <learning_rate[exp]>` and {ref}`cosine <learning_rate[cosine]>`: {ref}`start_lr <learning_rate/start_lr>` plus exactly one of {ref}`stop_lr <learning_rate/stop_lr>` or {ref}`stop_lr_ratio <learning_rate/stop_lr_ratio>`.
40
+
- Optional warmup for both schedules: {ref}`warmup_steps <learning_rate/warmup_steps>` or {ref}`warmup_ratio <learning_rate/warmup_ratio>`, with optional {ref}`warmup_start_factor <learning_rate/warmup_start_factor>`.
41
+
- Optional distributed scaling for both schedules: {ref}`scale_by_worker <learning_rate/scale_by_worker>`.
42
+
- Additional options for {ref}`exp <learning_rate[exp]>`: {ref}`decay_steps <learning_rate[exp]/decay_steps>`, {ref}`decay_rate <learning_rate[exp]/decay_rate>`, and {ref}`smooth <learning_rate[exp]/smooth>`.
43
+
- {ref}`cosine <learning_rate[cosine]>` has no extra schedule-specific arguments beyond the shared ones.
61
44
62
-
No type-specific parameters. The decay follows a cosine curve from `start_lr` to `stop_lr`.
63
-
64
-
See [Mathematical Theory](#mathematical-theory) section for complete formulas.
45
+
See [Mathematical Theory](#mathematical-theory) for complete formulas.
65
46
66
47
## Exponential Decay Schedule
67
48
68
-
The exponential decay schedule reduces the learning rate exponentially over training steps. It is the default schedule when `type` is omitted.
49
+
The exponential decay schedule reduces the learning rate exponentially over training steps. It is the default schedule when {ref}`type <learning_rate/type>` is omitted.
69
50
70
51
### Stepped vs smooth mode
71
52
72
-
By setting `smooth: true`, the learning rate decays smoothly at every step instead of in a stepped manner. This provides a more gradual decay curve similar to PyTorch's `ExponentialLR`, whereas the default stepped mode (`smooth: false`) is similar to PyTorch's `StepLR`.
73
-
74
-
### Decay rate computation
53
+
By setting {ref}`smooth <learning_rate[exp]/smooth>` to `true`, the learning rate decays smoothly at every step instead of in a stepped manner. This provides a more gradual decay curve similar to PyTorch's `ExponentialLR`, whereas the default stepped mode (`smooth: false`) is similar to PyTorch's `StepLR`.
75
54
76
-
If `decay_rate` is not explicitly provided, it is computed from `start_lr` and `stop_lr` to ensure the learning rate reaches `stop_lr` at the end of training:
where `numb_steps` is the internal total number of training steps (derived from `training.numb_steps` in the training configuration).
55
+
If {ref}`decay_rate <learning_rate[exp]/decay_rate>` is not explicitly provided, DeePMD-kit computes it from {ref}`start_lr <learning_rate/start_lr>` and the requested final learning rate so that the schedule reaches the target by {ref}`numb_steps <training/numb_steps>`. The exact expression is given in [Mathematical Theory](#mathematical-theory).
83
56
84
57
### Examples
85
58
@@ -94,7 +67,7 @@ where `numb_steps` is the internal total number of training steps (derived from
@@ -134,9 +107,9 @@ Learning rate starts from `0.0001` (i.e., `0.1 * 0.001`), increases linearly to
134
107
}
135
108
```
136
109
137
-
If `numb_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default `warmup_start_factor`) and increases to `0.001`.
110
+
If {ref}`numb_steps <training/numb_steps>` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts from `0.0` (default {ref}`warmup_start_factor <learning_rate/warmup_start_factor>`) and increases to {ref}`start_lr <learning_rate/start_lr>`.
@@ -148,21 +121,13 @@ If `numb_steps` is 1,000,000, warmup lasts 50,000 steps. Learning rate starts fr
148
121
}
149
122
```
150
123
151
-
With `smooth: true`, the learning rate decays continuously at every step, similar to PyTorch's `ExponentialLR`. The default stepped mode (`smooth: false`) is similar to PyTorch's `StepLR`.
124
+
With {ref}`smooth <learning_rate[exp]/smooth>` set to `true`, the learning rate decays continuously at every step, similar to PyTorch's `ExponentialLR`. The default stepped mode (`smooth: false`) is similar to PyTorch's `StepLR`.
152
125
153
126
## Cosine Annealing Schedule
154
127
155
128
The cosine annealing schedule smoothly decreases the learning rate following a cosine curve. It often provides better convergence than exponential decay.
156
129
157
-
### Formula
158
-
159
-
During the decay phase (after warmup), the learning rate follows:
At the middle of training (relative to decay phase), the learning rate is approximately `(start_lr + stop_lr) / 2`.
130
+
After warmup, the learning rate follows a cosine curve from {ref}`start_lr <learning_rate/start_lr>` to {ref}`stop_lr <learning_rate/stop_lr>` or the value implied by {ref}`stop_lr_ratio <learning_rate/stop_lr_ratio>`. The exact expression is given in [Mathematical Theory](#mathematical-theory).
166
131
167
132
### Examples
168
133
@@ -176,7 +141,7 @@ At the middle of training (relative to decay phase), the learning rate is approx
You can specify warmup duration using either `warmup_steps` (absolute) or `warmup_ratio` (relative):
168
+
Warmup is a technique to stabilize training in early stages by gradually increasing the learning rate from {ref}`warmup_start_factor <learning_rate/warmup_start_factor>``*` {ref}`start_lr <learning_rate/start_lr>` to {ref}`start_lr <learning_rate/start_lr>`.
230
169
231
-
-`warmup_steps`: Explicit number of warmup steps
232
-
-`warmup_ratio`: Ratio of total training steps. Computed as `int(warmup_ratio * numb_steps)`, where `numb_steps` is derived from `training.numb_steps`
170
+
You can specify warmup duration using either {ref}`warmup_steps <learning_rate/warmup_steps>` (absolute) or {ref}`warmup_ratio <learning_rate/warmup_ratio>` (relative to {ref}`numb_steps <training/numb_steps>`). These are mutually exclusive.
233
171
234
-
These are mutually exclusive.
172
+
The exact piecewise warmup formula is given in [Mathematical Theory](#mathematical-theory).
In version 3.1.2 and earlier, `start_lr` and `stop_lr`/`stop_lr_ratio` had default values and could be omitted. Starting from version 3.1.3, these parameters are **required** and must be explicitly specified.
235
+
In version 3.1.2 and earlier, {ref}`start_lr <learning_rate/start_lr>` and {ref}`stop_lr <learning_rate/stop_lr>` / {ref}`stop_lr_ratio <learning_rate/stop_lr_ratio>` had default values and could be omitted. Starting from version 3.1.3, these parameters are **required** and must be explicitly specified.
0 commit comments