diff --git a/chapters/en/chapter1/5.mdx b/chapters/en/chapter1/5.mdx index f00f6643b..2d190a98f 100644 --- a/chapters/en/chapter1/5.mdx +++ b/chapters/en/chapter1/5.mdx @@ -118,7 +118,7 @@ Ready to try your hand at question answering? Check out our complete [question a Summarization involves condensing a longer text into a shorter version while preserving its key information and meaning. -Encoder-decoder models like [BART](https://huggingface.co/docs/transformers/model_doc/bart) and [T5](model_doc/t5) are designed for the sequence-to-sequence pattern of a summarization task. We'll explain how BART works in this section, and then you can try finetuning T5 at the end. +Encoder-decoder models like [BART](https://huggingface.co/docs/transformers/model_doc/bart) and [T5](https://huggingface.co/docs/transformers/model_doc/t5) are designed for the sequence-to-sequence pattern of a summarization task. We'll explain how BART works in this section, and then you can try finetuning T5 at the end.
@@ -135,7 +135,7 @@ Ready to try your hand at summarization? Check out our complete [summarization g
### Translation
-Translation involves converting text from one language to another while preserving its meaning. Translation is another example of a sequence-to-sequence task, which means you can use an encoder-decoder model like [BART](https://huggingface.co/docs/transformers/model_doc/bart) or [T5](model_doc/t5) to do it. We'll explain how BART works in this section, and then you can try finetuning T5 at the end.
+Translation involves converting text from one language to another while preserving its meaning. Translation is another example of a sequence-to-sequence task, which means you can use an encoder-decoder model like [BART](https://huggingface.co/docs/transformers/model_doc/bart) or [T5](https://huggingface.co/docs/transformers/model_doc/t5) to do it. We'll explain how BART works in this section, and then you can try finetuning T5 at the end.
BART adapts to translation by adding a separate randomly initialized encoder to map a source language to an input that can be decoded into the target language. This new encoder's embeddings are passed to the pretrained encoder instead of the original word embeddings. The source encoder is trained by updating the source encoder, positional embeddings, and input embeddings with the cross-entropy loss from the model output. The model parameters are frozen in this first step, and all the model parameters are trained together in the second step.
BART has since been followed up by a multilingual version, mBART, intended for translation and pretrained on many different languages.