You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/train/learning-rate.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,32 +36,32 @@ The following parameters are shared by both `exp` and `cosine` schedules.
36
36
37
37
### Required parameters
38
38
39
-
| Parameter | Type | Description |
40
-
|-----------|------|-------------|
41
-
|`start_lr`| float | The learning rate at the start of training (after warmup). |
42
-
|`stop_lr`| float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
43
-
|`stop_lr_ratio`| float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
|`start_lr`| float | The learning rate at the start of training (after warmup).|
42
+
|`stop_lr`| float | The learning rate at the end of training. **Mutually exclusive** with `stop_lr_ratio`. When `decay_rate` is explicitly set, this serves as the minimum learning rate. |
43
+
|`stop_lr_ratio`| float | The ratio of `stop_lr` to `start_lr`. `stop_lr = start_lr * stop_lr_ratio`. **Mutually exclusive** with `stop_lr`. |
44
44
45
45
You must provide exactly one of `stop_lr` or `stop_lr_ratio`.
46
46
47
47
### Optional parameters
48
48
49
-
| Parameter | Type | Default | Description |
50
-
|-----------|------|---------|-------------|
51
-
|`warmup_steps`| int | 0 | Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
52
-
|`warmup_ratio`| float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`. |
53
-
|`warmup_start_factor`| float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`. |
54
-
|`scale_by_worker`| str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`. |
|`warmup_steps`| int | 0| Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`. Mutually exclusive with `warmup_ratio`. |
52
+
|`warmup_ratio`| float | None | Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * num_steps)`. Mutually exclusive with `warmup_steps`.|
53
+
|`warmup_start_factor`| float | 0.0 | Factor for initial warmup learning rate. Warmup starts from `warmup_start_factor * start_lr`.|
54
+
|`scale_by_worker`| str | "linear" | How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"`.|
55
55
56
56
### Type-specific parameters
57
57
58
58
#### Exponential decay (`type: "exp"`)
59
59
60
-
| Parameter | Type | Default | Description |
61
-
|-----------|------|---------|-------------|
62
-
|`decay_steps`| int | 5000 | Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
63
-
|`decay_rate`| float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`. |
64
-
|`smooth`| bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay. |
|`decay_steps`| int | 5000| Interval (in steps) at which learning rate decays. If `decay_steps` exceeds the total decay steps (`num_steps - warmup_steps`) and `decay_rate` is not provided, it will be automatically adjusted to a sensible default. |
63
+
|`decay_rate`| float | None | Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.|
64
+
|`smooth`| bool | false | If `true`, use smooth exponential decay. If `false`, stepped decay.|
0 commit comments