Skip to content

Commit 8e077b6

Browse files
committed
feat: add DSA
1 parent 73ed085 commit 8e077b6

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

03-advancing-our-llm.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@
197197
"- MLA compresses keys and values into a lower-dimensional latent space (`d_latent`) before caching, then reconstructs them on the fly, striking a balance between memory use and accuracy.\n",
198198
"- Pay attention to how the code handles cache growth and masking; these patterns generalise if you later plug in grouped-query or other attention variants.\n",
199199
"\n",
200-
"There have been other advancements in this space in [DeepSeek-v3.2](https://cas-bridge.xethub.hf.co/xet-bridge-us/692cfec93b25b81d09307b94/2d0aa38511b9df084d12a00fe04a96595496af772cb766c516c4e6aee1e21246?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251204%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251204T193212Z&X-Amz-Expires=3600&X-Amz-Signature=524e9e482de8170f4a5cac3e5c2cc805306af5e45ffe79a0eecdbbfc9be4622e&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27paper.pdf%3B+filename%3D%22paper.pdf%22%3B&response-content-type=application%2Fpdf&x-id=GetObject&Expires=1764880332&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2NDg4MDMzMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OTJjZmVjOTNiMjViODFkMDkzMDdiOTQvMmQwYWEzODUxMWI5ZGYwODRkMTJhMDBmZTA0YTk2NTk1NDk2YWY3NzJjYjc2NmM1MTZjNGU2YWVlMWUyMTI0NioifV19&Signature=MNcxwOQ47j5SrD7c1Ha03wwf1RRG%7ESWbfqKGzwrzrCGvqEimL%7EzCjbXths0O7pV4DMpgfzSB4CWzMkUsPyxA7ORUjI4j-ENZw0TPvR6J1GeAak%7Ei1R0KgN4FdkEW0rt-mtmAG00braaauFuKuYv0Nboc8ciJJNS0IxpZrWLXJi7vAoQlC3FnZ1999a0ZsTs9ae7fPEMe%7EP-RL3sSR9Ur1Ni8CByYk4sZEJBORn-QOQQhug%7E24bJKgfga6zypuGcGIetTqLJ-spLPEd6B1yvbpaz%7EKlFDGywK6DE3j2HQ8thcVgAZpD49VF48HnTRzGjXc1skmMIjaYRdEnIKATSDiw__&Key-Pair-Id=K2L8F4GPSG1IFC) which is a combination of MLA, Quantization and other tricks and they refer to this as the DeepSeek Sparse Attention (DSA). It primarily aims to make training and inference more efficient by computing attention only on the most important tokens while having similar or better performance than plain MLA (I highly recommend watching [this video](https://youtu.be/Y-o545eYjXM?si=7EQo3vtqvs2-UYdV)). We shall see quantization in appendix 12 and here we implement vanilla MLA."
200+
"There have been other advancements in this space in [DeepSeek-v3.2](https://cas-bridge.xethub.hf.co/xet-bridge-us/692cfec93b25b81d09307b94/2d0aa38511b9df084d12a00fe04a96595496af772cb766c516c4e6aee1e21246?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251204%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251204T193212Z&X-Amz-Expires=3600&X-Amz-Signature=524e9e482de8170f4a5cac3e5c2cc805306af5e45ffe79a0eecdbbfc9be4622e&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27paper.pdf%3B+filename%3D%22paper.pdf%22%3B&response-content-type=application%2Fpdf&x-id=GetObject&Expires=1764880332&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2NDg4MDMzMn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OTJjZmVjOTNiMjViODFkMDkzMDdiOTQvMmQwYWEzODUxMWI5ZGYwODRkMTJhMDBmZTA0YTk2NTk1NDk2YWY3NzJjYjc2NmM1MTZjNGU2YWVlMWUyMTI0NioifV19&Signature=MNcxwOQ47j5SrD7c1Ha03wwf1RRG%7ESWbfqKGzwrzrCGvqEimL%7EzCjbXths0O7pV4DMpgfzSB4CWzMkUsPyxA7ORUjI4j-ENZw0TPvR6J1GeAak%7Ei1R0KgN4FdkEW0rt-mtmAG00braaauFuKuYv0Nboc8ciJJNS0IxpZrWLXJi7vAoQlC3FnZ1999a0ZsTs9ae7fPEMe%7EP-RL3sSR9Ur1Ni8CByYk4sZEJBORn-QOQQhug%7E24bJKgfga6zypuGcGIetTqLJ-spLPEd6B1yvbpaz%7EKlFDGywK6DE3j2HQ8thcVgAZpD49VF48HnTRzGjXc1skmMIjaYRdEnIKATSDiw__&Key-Pair-Id=K2L8F4GPSG1IFC) which is a combination of MLA, Quantization and other tricks and they refer to this as the DeepSeek Sparse Attention (DSA). It primarily aims to make training and inference more efficient by computing attention only on the most important tokens while having similar or better performance than plain MLA (I highly recommend watching [this video](https://youtu.be/Y-o545eYjXM?si=7EQo3vtqvs2-UYdV)). We shall see quantization in appendix 12 and here we implement MLA and not DSA."
201201
]
202202
},
203203
{

0 commit comments

Comments
 (0)