@@ -30,39 +30,6 @@ Both schedules support an optional warmup phase where the learning rate graduall
3030}
3131```
3232
33- ## Common parameters
34-
35- The following parameters are shared by both ` exp ` and ` cosine ` schedules.
36-
37- ### Required parameters
38-
39- - ` start_lr ` : The learning rate at the start of training (after warmup).
40- - ` stop_lr ` or ` stop_lr_ratio ` (must provide exactly one):
41- - ` stop_lr ` : The learning rate at the end of training.
42- - ` stop_lr_ratio ` : The ratio of ` stop_lr ` to ` start_lr ` . Computed as ` stop_lr = start_lr * stop_lr_ratio ` .
43-
44- ### Optional parameters
45-
46- - ` warmup_steps ` or ` warmup_ratio ` (mutually exclusive):
47- - ` warmup_steps ` : Number of steps for warmup. Learning rate increases linearly from ` warmup_start_factor * start_lr ` to ` start_lr ` .
48- - ` warmup_ratio ` : Ratio of warmup steps to total training steps. ` warmup_steps = int(warmup_ratio * numb_steps) ` .
49- - ` warmup_start_factor ` : Factor for initial warmup learning rate (default: 0.0). Warmup starts from ` warmup_start_factor * start_lr ` .
50- - ` scale_by_worker ` : How to alter learning rate in parallel training. Options: ` "linear" ` , ` "sqrt" ` , ` "none" ` (default: ` "linear" ` ).
51-
52- ### Type-specific parameters
53-
54- ** Exponential decay (` type: "exp" ` ):**
55-
56- - ` decay_steps ` : Interval (in steps) at which learning rate decays (default: 5000).
57- - ` decay_rate ` : Explicit decay rate. If not provided, computed from ` start_lr ` and ` stop_lr ` .
58- - ` smooth ` : If ` true ` , use smooth exponential decay at every step. If ` false ` , use stepped decay (default: ` false ` ).
59-
60- ** Cosine annealing (` type: "cosine" ` ):**
61-
62- No type-specific parameters. The decay follows a cosine curve from ` start_lr ` to ` stop_lr ` .
63-
64- See [ Mathematical Theory] ( #mathematical-theory ) section for complete formulas.
65-
6633## Exponential Decay Schedule
6734
6835The exponential decay schedule reduces the learning rate exponentially over training steps. It is the default schedule when ` type ` is omitted.
0 commit comments