We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
2 parents 01d4b30 + 2abbf54 commit 7f3e519Copy full SHA for 7f3e519
1 file changed
docs/how-to/end-to-end-training-run.md
@@ -33,9 +33,8 @@ uv sync # creates .venv and installs all deps
33
34
## 2. Cache the tokenizer
35
36
-The reference config uses the GPT-2 tokenizer. Compute nodes
37
-typically don't have internet access, so pre-cache it on the login
38
-node:
+The reference config uses the GPT-2 tokenizer. Compute nodes typically have restricted or much slower
+internet access (~1 Gbps vs. ~100 Gbps on login nodes), so it’s best to pre-cache it on the login node:
39
40
```bash
41
python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('gpt2')"
0 commit comments