Skip to content

Commit 147fb92

Browse files
committed
reformatted readme
1 parent 98b7e4f commit 147fb92

3 files changed

Lines changed: 14 additions & 14 deletions

File tree

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
# Language Modelling Exercise
22

33
This exercsie will allow you to explore language modelling. We focus on the key concept of multi-head attention.
4-
Navigate to the `src/attention_model.py`-file and implement multi-head attention [1]
54

6-
``` math
7-
\text{Attention}(\mathbf{Q},\mathbf{K},\mathbf{V}) = \text{softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}})\mathbf{V}
8-
```
5+
1. Navigate to the `src/attention_model.py`-file and implement multi-head attention [1]
96

10-
To make attention useful in a language modelling scenario we cannot use future information. A model without access to upcoming future inputs or words is known as causal.
11-
Since our attention matrix is multiplied from the left we must mask out the upper triangle
12-
excluding the main diagonal for causality.
7+
``` math
8+
\text{Attention}(\mathbf{Q},\mathbf{K},\mathbf{V}) = \text{softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d_k}})\mathbf{V}
9+
```
1310
14-
Keep in mind that $\mathbf{Q} \in \mathbb{R}^{b,h,o,d_k}$, $\mathbf{K} \in \mathbb{R}^{b,h,o,d_k}$ and $\mathbf{V} \in \mathbb{R}^{b,h,o,d_v}$, with $b$ the batch size, $h$ the number of heads, $o$ the desired output dimension, $d_k$ the key dimension and finally $d_v$ as value dimension. Your code must rely on broadcasting to process the matrix operations correctly. The notation follows [1].
11+
To make attention useful in a language modelling scenario we cannot use future information. A model without access to upcoming future inputs or words is known as causal.
12+
Since our attention matrix is multiplied from the left we must mask out the upper triangle
13+
excluding the main diagonal for causality.
1514
16-
Furthermore write a function to convert the network output of vector encodings back into a string by completing the `convert` function in `src/util.py`.
15+
Keep in mind that $\mathbf{Q} \in \mathbb{R}^{b,h,o,d_k}$, $\mathbf{K} \in \mathbb{R}^{b,h,o,d_k}$ and $\mathbf{V} \in \mathbb{R}^{b,h,o,d_v}$, with $b$ the batch size, $h$ the number of heads, $o$ the desired output dimension, $d_k$ the key dimension and finally $d_v$ as value dimension. Your code must rely on broadcasting to process the matrix operations correctly. The notation follows [1].
1716
17+
2. Furthermore write a function to convert the network output of vector encodings back into a string by completing the `convert` function in `src/util.py`.
18+
19+
2. Once you have implemented and tested your version of attention run `sbatch scripts/train.slurm` to train your model on Bender. Once converged you can generate poetry via `sbatch scripts/generate.slurm`.
20+
Run `src/model_chat.py` to talk to your model.
1821
1922
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin:
2023
Attention is All you Need. NIPS 2017: 5998-6008
21-
22-
Once you have implemented and tested your version of attention run `sbatch scripts/train.slurm` to train your model on Bender. Once converged you can generate poetry via `sbatch scripts/generate.slurm`.
23-
Run `src/model_chat.py` to talk to your model.

src/attention_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ def dot_product_attention(
2727
Returns:
2828
torch.Tensor: The attention values of shape [batch, heads, out_length, d_v]
2929
"""
30-
# TODO implement multi head attention.
30+
# 1. TODO: implement multi head attention.
3131
# Hint: You will likely need torch.transpose, torch.sqrt, torch.tril,
3232
# torch.inf, and torch.nn.functional.softmax.
3333
# For applying the causal mask, you can either try using torch.exp or torch.masked_fill.

src/util.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,5 +98,5 @@ def convert(sequences: torch.Tensor, inv_vocab: dict) -> list:
9898
list: A list of characters.
9999
"""
100100
res = []
101-
# TODO: Return a nested list of characters.
101+
# 2. TODO: Return a nested list of characters.
102102
return res

0 commit comments

Comments
 (0)