|
1 | | -<!-- DISABLE-FRONTMATTER-SECTIONS --> |
2 | | - |
3 | | -# End-of-chapter quiz[[end-of-chapter-quiz]] |
| 1 | +# Summary[[summary]] |
4 | 2 |
|
5 | 3 | <CourseFloatingBanner |
6 | 4 | chapter={1} |
7 | 5 | classNames="absolute z-10 right-0 top-0" |
8 | 6 | /> |
9 | 7 |
|
10 | | -This chapter covered a lot of ground! Don't worry if you didn't grasp all the details; the next chapters will help you understand how things work under the hood. |
11 | | - |
12 | | -First, though, let's test what you learned in this chapter! |
13 | | - |
14 | | - |
15 | | -### 1. Explore the Hub and look for the `roberta-large-mnli` checkpoint. What task does it perform? |
| 8 | +In this chapter, you've been introduced to the fundamentals of Transformer models, Large Language Models (LLMs), and how they're revolutionizing AI and beyond. |
16 | 9 |
|
| 10 | +## Key concepts covered |
17 | 11 |
|
18 | | -<Question |
19 | | - choices={[ |
20 | | - { |
21 | | - text: "Summarization", |
22 | | - explain: "Look again on the <a href=\"https://huggingface.co/roberta-large-mnli\">roberta-large-mnli page</a>." |
23 | | - }, |
24 | | - { |
25 | | - text: "Text classification", |
26 | | - explain: "More precisely, it classifies if two sentences are logically linked across three labels (contradiction, neutral, entailment) — a task also called <em>natural language inference</em>.", |
27 | | - correct: true |
28 | | - }, |
29 | | - { |
30 | | - text: "Text generation", |
31 | | - explain: "Look again on the <a href=\"https://huggingface.co/roberta-large-mnli\">roberta-large-mnli page</a>." |
32 | | - } |
33 | | - ]} |
34 | | -/> |
35 | | - |
36 | | -### 2. What will the following code return? |
| 12 | +### Natural Language Processing and LLMs |
37 | 13 |
|
38 | | -```py |
39 | | -from transformers import pipeline |
| 14 | +We explored what NLP is and how Large Language Models have transformed the field. You learned that: |
| 15 | +- NLP encompasses a wide range of tasks from classification to generation |
| 16 | +- LLMs are powerful models trained on massive amounts of text data |
| 17 | +- These models can perform multiple tasks within a single architecture |
| 18 | +- Despite their capabilities, LLMs have limitations including hallucinations and bias |
40 | 19 |
|
41 | | -ner = pipeline("ner", grouped_entities=True) |
42 | | -ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") |
43 | | -``` |
| 20 | +### Transformer capabilities |
44 | 21 |
|
45 | | -<Question |
46 | | - choices={[ |
47 | | - { |
48 | | - text: "It will return classification scores for this sentence, with labels \"positive\" or \"negative\".", |
49 | | - explain: "This is incorrect — this would be a <code>sentiment-analysis</code> pipeline." |
50 | | - }, |
51 | | - { |
52 | | - text: "It will return a generated text completing this sentence.", |
53 | | - explain: "This is incorrect — it would be a <code>text-generation</code> pipeline.", |
54 | | - }, |
55 | | - { |
56 | | - text: "It will return the words representing persons, organizations or locations.", |
57 | | - explain: "Furthermore, with <code>grouped_entities=True</code>, it will group together the words belonging to the same entity, like \"Hugging Face\".", |
58 | | - correct: true |
59 | | - } |
60 | | - ]} |
61 | | -/> |
| 22 | +You saw how the `pipeline()` function from 🤗 Transformers makes it easy to use pre-trained models for various tasks: |
| 23 | +- Text classification, token classification, and question answering |
| 24 | +- Text generation and summarization |
| 25 | +- Translation and other sequence-to-sequence tasks |
| 26 | +- Speech recognition and image classification |
62 | 27 |
|
63 | | -### 3. What should replace ... in this code sample? |
| 28 | +### Transformer architecture |
64 | 29 |
|
65 | | -```py |
66 | | -from transformers import pipeline |
| 30 | +We discussed how Transformer models work at a high level, including: |
| 31 | +- The importance of the attention mechanism |
| 32 | +- How transfer learning enables models to adapt to specific tasks |
| 33 | +- The three main architectural variants: encoder-only, decoder-only, and encoder-decoder |
67 | 34 |
|
68 | | -filler = pipeline("fill-mask", model="bert-base-cased") |
69 | | -result = filler("...") |
70 | | -``` |
| 35 | +### Model architectures and their applications |
| 36 | +A key aspect of this chapter was understanding which architecture to use for different tasks: |
71 | 37 |
|
72 | | -<Question |
73 | | - choices={[ |
74 | | - { |
75 | | - text: "This <mask> has been waiting for you.", |
76 | | - explain: "This is incorrect. Check out the <code>bert-base-cased</code> model card and try to spot your mistake." |
77 | | - }, |
78 | | - { |
79 | | - text: "This [MASK] has been waiting for you.", |
80 | | - explain: "Correct! This model's mask token is [MASK].", |
81 | | - correct: true |
82 | | - }, |
83 | | - { |
84 | | - text: "This man has been waiting for you.", |
85 | | - explain: "This is incorrect. This pipeline fills in masked words, so it needs a mask token somewhere." |
86 | | - } |
87 | | - ]} |
88 | | -/> |
| 38 | +| Model | Examples | Tasks | |
| 39 | +|-----------------|--------------------------------------------|----------------------------------------------------------------------------------| |
| 40 | +| Encoder-only | BERT, DistilBERT, ModernBERT | Sentence classification, named entity recognition, extractive question answering | |
| 41 | +| Decoder-only | GPT, LLaMA, Gemma, SmolLM | Text generation, conversational AI, creative writing | |
| 42 | +| Encoder-decoder | BART, T5, Marian, mBART | Summarization, translation, generative question answering | |
89 | 43 |
|
90 | | -### 4. Why will this code fail? |
| 44 | +### Modern LLM developments |
| 45 | +You also learned about recent developments in the field: |
| 46 | +- How LLMs have grown in size and capability over time |
| 47 | +- The concept of scaling laws and how they guide model development |
| 48 | +- Specialized attention mechanisms that help models process longer sequences |
| 49 | +- The two-phase training approach of pretraining and instruction tuning |
91 | 50 |
|
92 | | -```py |
93 | | -from transformers import pipeline |
| 51 | +### Practical applications |
| 52 | +Throughout the chapter, you've seen how these models can be applied to real-world problems: |
| 53 | +- Using the Hugging Face Hub to find and use pre-trained models |
| 54 | +- Leveraging the Inference API to test models directly in your browser |
| 55 | +- Understanding which models are best suited for specific tasks |
94 | 56 |
|
95 | | -classifier = pipeline("zero-shot-classification") |
96 | | -result = classifier("This is a course about the Transformers library") |
97 | | -``` |
| 57 | +## Looking ahead |
98 | 58 |
|
99 | | -<Question |
100 | | - choices={[ |
101 | | - { |
102 | | - text: "This pipeline requires that labels be given to classify this text.", |
103 | | - explain: "Right — the correct code needs to include <code>candidate_labels=[...]</code>.", |
104 | | - correct: true |
105 | | - }, |
106 | | - { |
107 | | - text: "This pipeline requires several sentences, not just one.", |
108 | | - explain: "This is incorrect, though when properly used, this pipeline can take a list of sentences to process (like all other pipelines)." |
109 | | - }, |
110 | | - { |
111 | | - text: "The 🤗 Transformers library is broken, as usual.", |
112 | | - explain: "We won't dignify this answer with a comment!" |
113 | | - }, |
114 | | - { |
115 | | - text: "This pipeline requires longer inputs; this one is too short.", |
116 | | - explain: "This is incorrect. Note that a very long text will be truncated when processed by this pipeline." |
117 | | - } |
118 | | - ]} |
119 | | -/> |
120 | | - |
121 | | -### 5. What does "transfer learning" mean? |
122 | | - |
123 | | -<Question |
124 | | - choices={[ |
125 | | - { |
126 | | - text: "Transferring the knowledge of a pretrained model to a new model by training it on the same dataset.", |
127 | | - explain: "No, that would be two versions of the same model." |
128 | | - }, |
129 | | - { |
130 | | - text: "Transferring the knowledge of a pretrained model to a new model by initializing the second model with the first model's weights.", |
131 | | - explain: "Correct: when the second model is trained on a new task, it *transfers* the knowledge of the first model.", |
132 | | - correct: true |
133 | | - }, |
134 | | - { |
135 | | - text: "Transferring the knowledge of a pretrained model to a new model by building the second model with the same architecture as the first model.", |
136 | | - explain: "The architecture is just the way the model is built; there is no knowledge shared or transferred in this case." |
137 | | - } |
138 | | - ]} |
139 | | -/> |
140 | | - |
141 | | -### 6. True or false? A language model usually does not need labels for its pretraining. |
142 | | - |
143 | | -<Question |
144 | | - choices={[ |
145 | | - { |
146 | | - text: "True", |
147 | | - explain: "The pretraining is usually <em>self-supervised</em>, which means the labels are created automatically from the inputs (like predicting the next word or filling in some masked words).", |
148 | | - correct: true |
149 | | - }, |
150 | | - { |
151 | | - text: "False", |
152 | | - explain: "This is not the correct answer." |
153 | | - } |
154 | | - ]} |
155 | | -/> |
| 59 | +Now that you have a solid understanding of what Transformer models are and how they work at a high level, you're ready to dive deeper into how to use them effectively. In the next chapters, you'll learn how to: |
156 | 60 |
|
157 | | -### 7. Select the sentence that best describes the terms "model", "architecture", and "weights". |
| 61 | +- Use the Transformers library to load and fine-tune models |
| 62 | +- Process different types of data for model input |
| 63 | +- Adapt pre-trained models to your specific tasks |
| 64 | +- Deploy models for practical applications |
158 | 65 |
|
159 | | -<Question |
160 | | - choices={[ |
161 | | - { |
162 | | - text: "If a model is a building, its architecture is the blueprint and the weights are the people living inside.", |
163 | | - explain: "Following this metaphor, the weights would be the bricks and other materials used to construct the building." |
164 | | - }, |
165 | | - { |
166 | | - text: "An architecture is a map to build a model and its weights are the cities represented on the map.", |
167 | | - explain: "The problem with this metaphor is that a map usually represents one existing reality (there is only one city in France named Paris). For a given architecture, multiple weights are possible." |
168 | | - }, |
169 | | - { |
170 | | - text: "An architecture is a succession of mathematical functions to build a model and its weights are those functions parameters.", |
171 | | - explain: "The same set of mathematical functions (architecture) can be used to build different models by using different parameters (weights).", |
172 | | - correct: true |
173 | | - } |
174 | | - ]} |
175 | | -/> |
176 | | - |
177 | | - |
178 | | -### 8. Which of these types of models would you use for completing prompts with generated text? |
179 | | - |
180 | | -<Question |
181 | | - choices={[ |
182 | | - { |
183 | | - text: "An encoder model", |
184 | | - explain: "An encoder model generates a representation of the whole sentence that is better suited for tasks like classification." |
185 | | - }, |
186 | | - { |
187 | | - text: "A decoder model", |
188 | | - explain: "Decoder models are perfectly suited for text generation from a prompt.", |
189 | | - correct: true |
190 | | - }, |
191 | | - { |
192 | | - text: "A sequence-to-sequence model", |
193 | | - explain: "Sequence-to-sequence models are better suited for tasks where you want to generate sentences in relation to the input sentences, not a given prompt." |
194 | | - } |
195 | | - ]} |
196 | | -/> |
197 | | - |
198 | | -### 9. Which of those types of models would you use for summarizing texts? |
199 | | - |
200 | | -<Question |
201 | | - choices={[ |
202 | | - { |
203 | | - text: "An encoder model", |
204 | | - explain: "An encoder model generates a representation of the whole sentence that is better suited for tasks like classification." |
205 | | - }, |
206 | | - { |
207 | | - text: "A decoder model", |
208 | | - explain: "Decoder models are good for generating output text (like summaries), but they don't have the ability to exploit a context like the whole text to summarize." |
209 | | - }, |
210 | | - { |
211 | | - text: "A sequence-to-sequence model", |
212 | | - explain: "Sequence-to-sequence models are perfectly suited for a summarization task.", |
213 | | - correct: true |
214 | | - } |
215 | | - ]} |
216 | | -/> |
217 | | - |
218 | | -### 10. Which of these types of models would you use for classifying text inputs according to certain labels? |
219 | | - |
220 | | -<Question |
221 | | - choices={[ |
222 | | - { |
223 | | - text: "An encoder model", |
224 | | - explain: "An encoder model generates a representation of the whole sentence which is perfectly suited for a task like classification.", |
225 | | - correct: true |
226 | | - }, |
227 | | - { |
228 | | - text: "A decoder model", |
229 | | - explain: "Decoder models are good for generating output texts, not extracting a label out of a sentence." |
230 | | - }, |
231 | | - { |
232 | | - text: "A sequence-to-sequence model", |
233 | | - explain: "Sequence-to-sequence models are better suited for tasks where you want to generate text based on an input sentence, not a label.", |
234 | | - } |
235 | | - ]} |
236 | | -/> |
237 | | - |
238 | | -### 11. What possible source can the bias observed in a model have? |
239 | | - |
240 | | -<Question |
241 | | - choices={[ |
242 | | - { |
243 | | - text: "The model is a fine-tuned version of a pretrained model and it picked up its bias from it.", |
244 | | - explain: "When applying Transfer Learning, the bias in the pretrained model used persists in the fine-tuned model.", |
245 | | - correct: true |
246 | | - }, |
247 | | - { |
248 | | - text: "The data the model was trained on is biased.", |
249 | | - explain: "This is the most obvious source of bias, but not the only one.", |
250 | | - correct: true |
251 | | - }, |
252 | | - { |
253 | | - text: "The metric the model was optimizing for is biased.", |
254 | | - explain: "A less obvious source of bias is the way the model is trained. Your model will blindly optimize for whatever metric you chose, without any second thoughts.", |
255 | | - correct: true |
256 | | - } |
257 | | - ]} |
258 | | -/> |
| 66 | +The foundation you've built in this chapter will serve you well as you explore more advanced topics and techniques in the coming sections. |
0 commit comments