Skip to content

Commit 1435536

Browse files
committed
feat: add ssm chapter
1 parent d4eba33 commit 1435536

3 files changed

Lines changed: 6 additions & 3 deletions

File tree

15-bonus-state-space-models.ipynb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -902,7 +902,8 @@
902902
"output_type": "stream",
903903
"text": [
904904
"[Mamba2] step 0 | loss = 3.0486\n",
905-
"[Mamba2] step 500 | loss = 1.2650\n"
905+
"[Mamba2] step 500 | loss = 1.2650\n",
906+
"[Mamba2] step 1000 | loss = 0.7374\n"
906907
]
907908
}
908909
],

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,15 @@ Each module is a standalone notebook packed with explanations, exercises, and im
4444
| 12 | Appendix: Quantisation Strategies | [12-appendix-quantisation.ipynb](https://shreshthtuli.github.io/llms-from-scratch/12-appendix-quantisation.html) |
4545
| 13 | Appendix: Parameter-Efficient Tuning | [13-appendix-peft.ipynb](https://shreshthtuli.github.io/llms-from-scratch/13-appendix-peft.html) |
4646
| 14 | Bonus: Energy Based and Diffusion LLMs | [14-bonus-diffusion-llms.ipynb](https://shreshthtuli.github.io/llms-from-scratch/14-bonus-diffusion-llms.html) |
47+
| 15 | Bonus: State Space Models | [15-bonus-state-space-models.ipynb](https://shreshthtuli.github.io/llms-from-scratch/15-bonus-state-space-models.html) |
4748

4849
## 🧠 What You'll Learn
4950
- The end-to-end data flow of an LLM—from tokenization and batching to inference-time decoding.
5051
- How to implement core transformer components, attention variations, and optimization tricks.
5152
- Strategies for scaling datasets, managing checkpoints, and monitoring training stability.
5253
- Practical alignment techniques: SFT, preference modeling, RLHF, and reward modeling.
5354
- Deployment-ready compression: pruning, distillation, quantization, and PEFT recipes.
54-
- Bonus section on Energy based models (EBMs) and Diffusion LLMs.
55+
- Bonus sections on Energy based models (EBMs), Diffusion LLMs, and State Space Models (SSMs).
5556

5657

5758
## ⚙️ Quick Start
@@ -88,7 +89,7 @@ Each module is a standalone notebook packed with explanations, exercises, and im
8889
1. **Foundations (Modules 01–03)** – Understand tokens, build your first transformer, and iterate on architecture improvements.
8990
2. **Data & Scaling (Modules 04–06)** – Curate corpora, tune training loops, and scale pretraining experiments responsibly.
9091
3. **Alignment (Modules 07–09)** – Apply SFT, RLHF, and efficient adaptation techniques to align your model with human intent.
91-
4. **Optimization (Modules 10–13)** – Compress, fine-tune, and deploy models using state-of-the-art efficiency tricks.
92+
4. **Optimization (Modules 10–15)** – Compress, fine-tune, and deploy models using state-of-the-art efficiency tricks.
9293
5. **Capstone** – Combine your learnings to train, align, and ship a bespoke LLM tailored to your use case.
9394

9495
Mix and match as needed—every notebook is designed to stand on its own, but following this order unlocks the smoothest learning curve.

_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,4 @@ chapters:
1515
- file: 12-appendix-quantisation
1616
- file: 13-appendix-peft
1717
- file: 14-bonus-diffusion-llms
18+
- file: 15-bonus-state-space-models

0 commit comments

Comments
 (0)