|
15 | 15 | "id": "1", |
16 | 16 | "metadata": {}, |
17 | 17 | "source": [ |
18 | | - "[TOC]" |
| 18 | + "## Table of Contents\n", |
| 19 | + "- [References](#References)\n", |
| 20 | + "- [Inspecting the data](#Inspecting-the-data)\n", |
| 21 | + "- [Bigram language model](#Bigram-language-model)\n", |
| 22 | + " - [Evaluating the quality of the model](#Evaluating-the-quality-of-the-model)\n", |
| 23 | + "- [A neural network approach](#A-neural-network-approach)\n", |
| 24 | + " - [The training set](#The-training-set)\n", |
| 25 | + " - [Feeding the network](#Feeding-the-network)\n", |
| 26 | + " - [Regaining a normal distribution](#Regaining-a-normal-distribution)\n", |
| 27 | + " - [Recap: How the Neural Network Processes Input Characters](#Recap:-How-the-Neural-Network-Processes-Input-Characters)\n", |
| 28 | + " - [Optimization](#Optimization)\n", |
| 29 | + " - [Putting it all together](#Putting-it-all-together)\n", |
| 30 | + " - [Preparing data](#Preparing-data)\n", |
| 31 | + " - [Initializing the neural network](#Initializing-the-neural-network)\n", |
| 32 | + " - [Training the neural network](#Training-the-neural-network)\n", |
| 33 | + " - [Comparison with a Bigram frequency model](#Comparison-with-a-Bigram-frequency-model)\n", |
| 34 | + " - [Smoothing applied to a neural network](#Smoothing-applied-to-a-neural-network)\n", |
| 35 | + " - [Sampling from our trained model](#Sampling-from-our-trained-model)\n", |
| 36 | + " - [Conclusion](#Conclusion)\n", |
| 37 | + "- [Exercises](#Exercises)\n", |
| 38 | + " - [1. Build a Trigram model](#1.-Build-a-Trigram-model)\n", |
| 39 | + " - [2. Split the dataset](#2.-Split-the-dataset)\n", |
| 40 | + " - [Bigram model baseline](#Bigram-model-baseline)\n", |
| 41 | + " - [Compare the Bigram and Trigram model](#Compare-the-Bigram-and-Trigram-model)\n", |
| 42 | + " - [3. Change the loss function](#3.-Change-the-loss-function)" |
19 | 43 | ] |
20 | 44 | }, |
21 | 45 | { |
|
49 | 73 | "- Transformer: [Vaswani et al. 2017](https://arxiv.org/abs/1706.03762)\n", |
50 | 74 | "\n", |
51 | 75 | "A few more related resource (hands-on, tutorial, articles, videos, etc.):\n", |
| 76 | + "- Book \"[Build a Large Language Model (From Scratch)](http://mng.bz/orYv)\" by Sebastian Raschka (the companion [GitHub repository](https://github.com/rasbt/LLMs-from-scratch))\n", |
| 77 | + "- [Andrej Karpathy's \"Neural Net: From Zero to Hero\"](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ) (*This was the **main** inspiration for this and the subsequent notebooks*)\n", |
52 | 78 | "- A [tutorial](https://docs.fast.ai/tutorial.text.html) on *transfer learning* by fastai\n", |
53 | 79 | "- [Hugging Face's FineWeb dataset](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1)\n", |
54 | | - "- [Transformer LLM 3D visualizer](https://bbycroft.net/llm)\n", |
55 | | - "- [Andrej Karpathy's \"Neural Net: From Zero to Hero\"](https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ) (*This was the **main** inspiration for this and the subsequent notebooks*)\n", |
56 | | - "- Book \"[Build a Large Language Model (From Scratch)](http://mng.bz/orYv)\" by Sebastian Raschka (the companion [GitHub repository](https://github.com/rasbt/LLMs-from-scratch))" |
| 80 | + "- [Transformer LLM 3D visualizer](https://bbycroft.net/llm)" |
57 | 81 | ] |
58 | 82 | }, |
59 | 83 | { |
|
0 commit comments