Skip to content

Commit ed71925

Browse files
committed
simplify
1 parent 292c3b1 commit ed71925

1 file changed

Lines changed: 0 additions & 33 deletions

File tree

doc/train/learning-rate.md

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -30,39 +30,6 @@ Both schedules support an optional warmup phase where the learning rate graduall
3030
}
3131
```
3232

33-
## Common parameters
34-
35-
The following parameters are shared by both `exp` and `cosine` schedules.
36-
37-
### Required parameters
38-
39-
- `start_lr`: The learning rate at the start of training (after warmup).
40-
- `stop_lr` or `stop_lr_ratio` (must provide exactly one):
41-
- `stop_lr`: The learning rate at the end of training.
42-
- `stop_lr_ratio`: The ratio of `stop_lr` to `start_lr`. Computed as `stop_lr = start_lr * stop_lr_ratio`.
43-
44-
### Optional parameters
45-
46-
- `warmup_steps` or `warmup_ratio` (mutually exclusive):
47-
- `warmup_steps`: Number of steps for warmup. Learning rate increases linearly from `warmup_start_factor * start_lr` to `start_lr`.
48-
- `warmup_ratio`: Ratio of warmup steps to total training steps. `warmup_steps = int(warmup_ratio * numb_steps)`.
49-
- `warmup_start_factor`: Factor for initial warmup learning rate (default: 0.0). Warmup starts from `warmup_start_factor * start_lr`.
50-
- `scale_by_worker`: How to alter learning rate in parallel training. Options: `"linear"`, `"sqrt"`, `"none"` (default: `"linear"`).
51-
52-
### Type-specific parameters
53-
54-
**Exponential decay (`type: "exp"`):**
55-
56-
- `decay_steps`: Interval (in steps) at which learning rate decays (default: 5000).
57-
- `decay_rate`: Explicit decay rate. If not provided, computed from `start_lr` and `stop_lr`.
58-
- `smooth`: If `true`, use smooth exponential decay at every step. If `false`, use stepped decay (default: `false`).
59-
60-
**Cosine annealing (`type: "cosine"`):**
61-
62-
No type-specific parameters. The decay follows a cosine curve from `start_lr` to `stop_lr`.
63-
64-
See [Mathematical Theory](#mathematical-theory) section for complete formulas.
65-
6633
## Exponential Decay Schedule
6734

6835
The exponential decay schedule reduces the learning rate exponentially over training steps. It is the default schedule when `type` is omitted.

0 commit comments

Comments
 (0)