[pre-commit.ci] auto fixes from pre-commit.com hooks

pre-commit-ci[bot] · pre-commit-ci[bot] · commit b7caf376c253 · 2026-03-01T03:02:22.000Z
for more information, see https://pre-commit.ci
diff --git a/doc/train/learning-rate.md b/doc/train/learning-rate.md
@@ -36,32 +36,32 @@ The following parameters are shared by both `exp` and `cosine` schedules.
 
 ### Required parameters
 
-| Parameter | Type | Description |
-|-----------|------|-------------|
-| `start_lr` | float | The learning rate at the start of training (after warmup). |
-| `stop_lr` | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
-| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
+| Parameter       | Type  | Description                                                                                                                                                           |
+| --------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `start_lr`      | float | The learning rate at the start of training (after warmup).                                                                                                            |
+| `stop_lr`       | float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
+| `stop_lr_ratio` | float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`.                                                    |
 
 You must provide exactly one of `stop_lr` or `stop_lr_ratio`.
 
 ### Optional parameters
 
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `warmup_steps` | int | 0 | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
-| `warmup_ratio` | float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`. |
-| `warmup_start_factor` | float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`. |
-| `scale_by_worker` | str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`. |
+| Parameter             | Type  | Default  | Description                                                                                                                                               |
+| --------------------- | ----- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `warmup_steps`        | int   | 0        | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
+| `warmup_ratio`        | float | None     | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`.                    |
+| `warmup_start_factor` | float | 0.0      | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`.                                                             |
+| `scale_by_worker`     | str   | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`.                                                                 |
 
 ### Type-specific parameters
 
 #### Exponential decay (`type: "exp"`)
 
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `decay_steps` | int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
-| `decay_rate` | float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
-| `smooth` | bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
+| Parameter     | Type  | Default | Description                                                                                                                                                                                                               |
+| ------------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `decay_steps` | int   | 5000    | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
+| `decay_rate`  | float | None    | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.                                                                                                                                             |
+| `smooth`      | bool  | false   | If `true`, use smooth exponential decay. If `false`, stepped decay.                                                                                                                                                       |
 
 #### Cosine annealing (`type: "cosine"`)
 
@@ -227,6 +227,7 @@ During warmup phase ($0 \leq \tau < \tau^{\text{warmup}}$):
 ```
 
 where:
+
 - $\tau$ is the current step index
 - $\tau^{\text{warmup}}$ is the number of warmup steps
 - $\gamma^0$ is `start_lr`
@@ -252,16 +253,16 @@ These are mutually exclusive.
 
 ### Notation
 
-| Symbol | Description |
-|--------|-------------|
-| $\tau$ | Global step index (0-indexed) |
-| $\tau^{\text{warmup}}$ | Number of warmup steps |
-| $\tau^{\text{decay}}$ | Number of decay steps = `num_steps - warmup_steps` |
-| $\gamma^0$ | `start_lr`: Learning rate at start of decay phase |
-| $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training |
-| $f^{\text{warmup}}$ | `warmup_start_factor`: Initial warmup LR factor |
-| $s$ | `decay_steps`: Decay period for exponential schedule |
-| $r$ | `decay_rate`: Decay rate for exponential schedule |
+| Symbol                 | Description                                          |
+| ---------------------- | ---------------------------------------------------- |
+| $\tau$                 | Global step index (0-indexed)                        |
+| $\tau^{\text{warmup}}$ | Number of warmup steps                               |
+| $\tau^{\text{decay}}$  | Number of decay steps = `num_steps - warmup_steps`   |
+| $\gamma^0$             | `start_lr`: Learning rate at start of decay phase    |
+| $\gamma^{\text{stop}}$ | `stop_lr`: Learning rate at end of training          |
+| $f^{\text{warmup}}$    | `warmup_start_factor`: Initial warmup LR factor      |
+| $s$                    | `decay_steps`: Decay period for exponential schedule |
+| $r$                    | `decay_rate`: Decay rate for exponential schedule    |
 
 ### Complete warmup formula
 
@@ -314,9 +315,10 @@ Equivalently, using $\alpha = \gamma^{\text{stop}} / \gamma^0$:
 Starting from version 3.1.3, the following parameters are **required**:
 
 1. `start_lr` (required): Must be explicitly specified.
-2. Either `stop_lr` or `stop_lr_ratio` (required): One must be provided.
+1. Either `stop_lr` or `stop_lr_ratio` (required): One must be provided.
 
 These parameters are **mutually exclusive**:
+
 - `stop_lr` and `stop_lr_ratio`
 - `warmup_steps` and `warmup_ratio`
 
@@ -354,5 +356,3 @@ Or using `stop_lr_ratio`:
 ```
 
 ## References
-
-[^1]: This section is built upon Jinzhe Zeng, Duo Zhang, Denghui Lu, Pinghui Mo, Zeyu Li, Yixiao Chen, Marián Rynik, Li'ang Huang, Ziyao Li, Shaochen Shi, Yingze Wang, Haotian Ye, Ping Tuo, Jiabin Yang, Ye Ding, Yifan Li, Davide Tisi, Qiyu Zeng, Han Bao, Yu Xia, Jiameng Huang, Koki Muraoka, Yibo Wang, Junhan Chang, Fengbo Yuan, Sigbjørn Løland Bore, Chun Cai, Yinnian Lin, Bo Wang, Jiayan Xu, Jia-Xin Zhu, Chenxing Luo, Yuzhi Zhang, Rhys E. A. Goodall, Wenshuo Liang, Anurag Kumar Singh, Sikai Yao, Jingchao Zhang, Renata Wentzcovitch, Jiequn Han, Jie Liu, Weile Jia, Darrin M. York, Weinan E, Roberto Car, Linfeng Zhang, Han Wang, [J. Chem. Phys. 159, 054801 (2023)](https://doi.org/10.1063/5.0155600) licensed under a [Creative Commons Attribution (CC BY) license](http://creativecommons.org/licenses/by/4.0/).