Skip to content

Commit b7caf37

Browse files
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
1 parent c982361 commit b7caf37

1 file changed

Lines changed: 29 additions & 29 deletions

File tree

doc/train/learning-rate.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -36,32 +36,32 @@ The following parameters are shared by both `exp` and `cosine` schedules.
3636

3737
### Required parameters
3838

39-
| Parameter | Type | Description |
40-
|-----------|------|-------------|
41-
| `start_lr` | float | The learning rate at the start of training (after warmup). |
42-
| `stop_lr` | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
43-
| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
39+
| Parameter | Type | Description |
40+
| --------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
41+
| `start_lr` | float | The learning rate at the start of training (after warmup). |
42+
| `stop_lr` | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
43+
| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
4444

4545
You must provide exactly one of `stop_lr` or `stop_lr_ratio`.
4646

4747
### Optional parameters
4848

49-
| Parameter | Type | Default | Description |
50-
|-----------|------|---------|-------------|
51-
| `warmup_steps` | int | 0 | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
52-
| `warmup_ratio` | float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`. |
53-
| `warmup_start_factor` | float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`. |
54-
| `scale_by_worker` | str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`. |
49+
| Parameter | Type | Default | Description |
50+
| --------------------- | ----- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
51+
| `warmup_steps` | int | 0 | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
52+
| `warmup_ratio` | float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`. |
53+
| `warmup_start_factor` | float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`. |
54+
| `scale_by_worker` | str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`. |
5555

5656
### Type-specific parameters
5757

5858
#### Exponential decay (`type: "exp"`)
5959

60-
| Parameter | Type | Default | Description |
61-
|-----------|------|---------|-------------|
62-
| `decay_steps` | int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
63-
| `decay_rate` | float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
64-
| `smooth` | bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
60+
| Parameter | Type | Default | Description |
61+
| ------------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
62+
| `decay_steps` | int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
63+
| `decay_rate` | float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
64+
| `smooth` | bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
6565

6666
#### Cosine annealing (`type: "cosine"`)
6767

@@ -227,6 +227,7 @@ During warmup phase ($0 \leq \tau < \tau^{\text{warmup}}$):
227227
```
228228

229229
where:
230+
230231
- $\tau$ is the current step index
231232
- $\tau^{\text{warmup}}$ is the number of warmup steps
232233
- $\gamma^0$ is `start_lr`
@@ -252,16 +253,16 @@ These are mutually exclusive.
252253

253254
### Notation
254255

255-
| Symbol | Description |
256-
|--------|-------------|
257-
| $\tau$ | Global step index (0-indexed) |
258-
| $\tau^{\text{warmup}}$ | Number of warmup steps |
259-
| $\tau^{\text{decay}}$ | Number of decay steps = `num_steps - warmup_steps` |
260-
| $\gamma^0$ | `start_lr`: Learning rate at start of decay phase |
261-
| $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training |
262-
| $f^{\text{warmup}}$ | `warmup_start_factor`: Initial warmup LR factor |
263-
| $s$ | `decay_steps`: Decay period for exponential schedule |
264-
| $r$ | `decay_rate`: Decay rate for exponential schedule |
256+
| Symbol | Description |
257+
| ---------------------- | ---------------------------------------------------- |
258+
| $\tau$ | Global step index (0-indexed) |
259+
| $\tau^{\text{warmup}}$ | Number of warmup steps |
260+
| $\tau^{\text{decay}}$ | Number of decay steps = `num_steps - warmup_steps` |
261+
| $\gamma^0$ | `start_lr`: Learning rate at start of decay phase |
262+
| $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training |
263+
| $f^{\text{warmup}}$ | `warmup_start_factor`: Initial warmup LR factor |
264+
| $s$ | `decay_steps`: Decay period for exponential schedule |
265+
| $r$ | `decay_rate`: Decay rate for exponential schedule |
265266

266267
### Complete warmup formula
267268

@@ -314,9 +315,10 @@ Equivalently, using $\alpha = \gamma^{\text{stop}} / \gamma^0$:
314315
Starting from version 3.1.3, the following parameters are **required**:
315316

316317
1. `start_lr` (required): Must be explicitly specified.
317-
2. Either `stop_lr` or `stop_lr_ratio` (required): One must be provided.
318+
1. Either `stop_lr` or `stop_lr_ratio` (required): One must be provided.
318319

319320
These parameters are **mutually exclusive**:
321+
320322
- `stop_lr` and `stop_lr_ratio`
321323
- `warmup_steps` and `warmup_ratio`
322324

@@ -354,5 +356,3 @@ Or using `stop_lr_ratio`:
354356
```
355357

356358
## References
357-
358-
[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).

0 commit comments

Comments
 (0)