Internal change

learned_optimization authors · learned_optimization authors · commit a49615fd9694 · 2023-05-23T20:06:33.000-07:00
PiperOrigin-RevId: 534651910
diff --git a/docs/notebooks/Part3_Truncation_TruncatedStep.md b/docs/notebooks/Part3_Truncation_TruncatedStep.md
@@ -96,7 +96,7 @@ iterative system and is general enough to work with learned optimizers, as well
 When applying a learned optimizer to train some target task, one usually wants the optimizer to be performant for a very large number of steps as training a model can take hundreds to hundreds of thousands of iterations.
 Ideally we would like our meta-training procedure to mirror the testing setup but given how long these unrolls (iterative application of the learned optimizer) can be this can become challenging. Truncated training is one solution to this. The core idea is to never run an entire inner-problem to completion, but instead unroll a shorter segment, and leverage information from that shorter segment to update the weights of the learned optimizer.
 
-This is most commonly seen in the form of truncated backpropogation through time and is used to train training recurrent neural networks. More recently, truncated training has been used to train RL algorithms (e.g. A3C).
+This is most commonly seen in the form of truncated backpropagation through time and is used to train training recurrent neural networks. More recently, truncated training has been used to train RL algorithms (e.g. A3C).
 
 Truncated training has a number of benifits. First, it greatly reduces the amount of computation needed before updating the learned optimizer. If one has length 100 truncations for a total length of 10k iterations, one 100x more updates to the weights of the learned optimizer. For some methods, like PES, we can even do these gradient estimates in an unbiased way (technically less biased, see PES paper for a discussion on hysteresis). For others, such as gradient based meta-training, and other ES variants, this comes at the cost of bias.