Skip to content

Commit ef3fe64

Browse files
committed
deploy: 6affc32
1 parent 40ce31f commit ef3fe64

3 files changed

Lines changed: 20 additions & 20 deletions

File tree

_sources/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ __Note__: *This schedule will evolve throughout the semseter*
2424
| Mar 17 | __SPRING BREAK - NO CLASSES__ | | |
2525
| Mar 24 | {doc}`_sources/Week_09` | NO HOMEWORK | |
2626
| Mar 31 | {doc}`_sources/Week_10` | [HW 07](_sources/homework/Homework_07) | |
27-
| Apr 07 | {doc}`_sources/Week_11` | HW 08 | |
27+
| Apr 07 | {doc}`_sources/Week_11` | [HW 08](_sources/homework/Homework_08) | |
2828
| Apr 14 | {doc}`_sources/Week_12` | HW 09 | |
2929
| Apr 21 | {doc}`_sources/Week_13` | HW 10 | {doc}`_sources/Project_02` |
3030
| Apr 28 | {doc}`_sources/Week_14` | HW 11 | |

_sources/lectures/Attention.html

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,8 +1100,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
11001100
<section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
11011101
<h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
11021102
<p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
1103-
<div class="amsmath math notranslate nohighlight" id="equation-c5dabc6e-8740-41ec-b9bf-e976882993ce">
1104-
<span class="eqno">(1)<a class="headerlink" href="#equation-c5dabc6e-8740-41ec-b9bf-e976882993ce" title="Permalink to this equation">#</a></span>\[\begin{equation}
1103+
<div class="amsmath math notranslate nohighlight" id="equation-7df825ad-f729-4e01-a429-0de2dfa7af6f">
1104+
<span class="eqno">(1)<a class="headerlink" href="#equation-7df825ad-f729-4e01-a429-0de2dfa7af6f" title="Permalink to this equation">#</a></span>\[\begin{equation}
11051105
\begin{bmatrix}
11061106
7 &amp; -8 &amp; 6 \\
11071107
-3 &amp; 2 &amp; 4 \\
@@ -1114,8 +1114,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11141114
\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
11151115
\]</div>
11161116
<p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
1117-
<div class="amsmath math notranslate nohighlight" id="equation-4338f025-2949-49d0-8afe-11e8c0d9b209">
1118-
<span class="eqno">(2)<a class="headerlink" href="#equation-4338f025-2949-49d0-8afe-11e8c0d9b209" title="Permalink to this equation">#</a></span>\[\begin{equation}
1117+
<div class="amsmath math notranslate nohighlight" id="equation-3d7fc9d9-5840-49d7-a0bd-c8b204413d60">
1118+
<span class="eqno">(2)<a class="headerlink" href="#equation-3d7fc9d9-5840-49d7-a0bd-c8b204413d60" title="Permalink to this equation">#</a></span>\[\begin{equation}
11191119
\text{Softmax}
11201120
\begin{bmatrix}
11211121
7 &amp; -8 &amp; 6 \\
@@ -1134,17 +1134,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11341134
\end{bmatrix}
11351135
\end{equation}\]</div>
11361136
<p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
1137-
<div class="amsmath math notranslate nohighlight" id="equation-2e87ec8a-e434-4c38-80bc-ed2550ad368a">
1138-
<span class="eqno">(3)<a class="headerlink" href="#equation-2e87ec8a-e434-4c38-80bc-ed2550ad368a" title="Permalink to this equation">#</a></span>\[\begin{equation}
1137+
<div class="amsmath math notranslate nohighlight" id="equation-9a32e649-a89b-4b74-b800-01afda0f53a4">
1138+
<span class="eqno">(3)<a class="headerlink" href="#equation-9a32e649-a89b-4b74-b800-01afda0f53a4" title="Permalink to this equation">#</a></span>\[\begin{equation}
11391139
\begin{bmatrix}
11401140
7 &amp; -8 &amp; -\infty \\
11411141
-3 &amp; 2 &amp; -\infty \\
11421142
1 &amp; 6 &amp; -\infty \\
11431143
\end{bmatrix}
11441144
\end{equation}\]</div>
11451145
<p>Taking the softmax of the rows of this matrix then gives:</p>
1146-
<div class="amsmath math notranslate nohighlight" id="equation-30afbd58-956b-4088-9f37-d260d1e2e2bb">
1147-
<span class="eqno">(4)<a class="headerlink" href="#equation-30afbd58-956b-4088-9f37-d260d1e2e2bb" title="Permalink to this equation">#</a></span>\[\begin{equation}
1146+
<div class="amsmath math notranslate nohighlight" id="equation-2f873261-4b94-4b67-a967-3fcacb525ea8">
1147+
<span class="eqno">(4)<a class="headerlink" href="#equation-2f873261-4b94-4b67-a967-3fcacb525ea8" title="Permalink to this equation">#</a></span>\[\begin{equation}
11481148
\text{Softmax}
11491149
\begin{bmatrix}
11501150
7 &amp; -8 &amp; -\infty \\
@@ -1186,8 +1186,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
11861186
<p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
11871187
<p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
11881188
<p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
1189-
<div class="amsmath math notranslate nohighlight" id="equation-0f4663e1-d517-424c-929c-d58bec2a96be">
1190-
<span class="eqno">(5)<a class="headerlink" href="#equation-0f4663e1-d517-424c-929c-d58bec2a96be" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
1189+
<div class="amsmath math notranslate nohighlight" id="equation-bc1060e2-24cd-4814-9a48-491ec577df1e">
1190+
<span class="eqno">(5)<a class="headerlink" href="#equation-bc1060e2-24cd-4814-9a48-491ec577df1e" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
11911191
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11921192
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11931193
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1447,8 +1447,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
14471447
<section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
14481448
<h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
14491449
<p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
1450-
<div class="amsmath math notranslate nohighlight" id="equation-55ac036d-1dff-4bc2-9f4e-9b210b99ab2e">
1451-
<span class="eqno">(6)<a class="headerlink" href="#equation-55ac036d-1dff-4bc2-9f4e-9b210b99ab2e" title="Permalink to this equation">#</a></span>\[\begin{equation}
1450+
<div class="amsmath math notranslate nohighlight" id="equation-0d4eeafd-544e-4690-89fa-344895301282">
1451+
<span class="eqno">(6)<a class="headerlink" href="#equation-0d4eeafd-544e-4690-89fa-344895301282" title="Permalink to this equation">#</a></span>\[\begin{equation}
14521452
\begin{bmatrix}
14531453
7 &amp; -8 &amp; 6 \\
14541454
-3 &amp; 2 &amp; 4 \\
@@ -1459,8 +1459,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14591459
<div class="math notranslate nohighlight">
14601460
\[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
14611461
<p>Then, we can compute softmax for row of the matrix above:</p>
1462-
<div class="amsmath math notranslate nohighlight" id="equation-f7c69116-9ea2-45a6-9347-b9735c86bf72">
1463-
<span class="eqno">(7)<a class="headerlink" href="#equation-f7c69116-9ea2-45a6-9347-b9735c86bf72" title="Permalink to this equation">#</a></span>\[\begin{equation}
1462+
<div class="amsmath math notranslate nohighlight" id="equation-4be3c332-cf21-433b-974e-898d9180a309">
1463+
<span class="eqno">(7)<a class="headerlink" href="#equation-4be3c332-cf21-433b-974e-898d9180a309" title="Permalink to this equation">#</a></span>\[\begin{equation}
14641464
\text{Softmax}
14651465
\begin{bmatrix}
14661466
7 &amp; -8 &amp; 6 \\
@@ -1499,17 +1499,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14991499
\text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
15001500
\]</div>
15011501
<p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
1502-
<div class="amsmath math notranslate nohighlight" id="equation-52a2ce1c-3b05-4134-9667-6aab57bea206">
1503-
<span class="eqno">(8)<a class="headerlink" href="#equation-52a2ce1c-3b05-4134-9667-6aab57bea206" title="Permalink to this equation">#</a></span>\[\begin{equation}
1502+
<div class="amsmath math notranslate nohighlight" id="equation-63c27fb0-75f0-4576-beac-bea1f4f13dbe">
1503+
<span class="eqno">(8)<a class="headerlink" href="#equation-63c27fb0-75f0-4576-beac-bea1f4f13dbe" title="Permalink to this equation">#</a></span>\[\begin{equation}
15041504
\begin{bmatrix}
15051505
7 &amp; -\infty &amp; -\infty \\
15061506
-3 &amp; 2 &amp; -\infty \\
15071507
1 &amp; 6 &amp; -2 \\
15081508
\end{bmatrix}
15091509
\end{equation}\]</div>
15101510
<p>Taking the softmax of the rows of this matrix then gives:</p>
1511-
<div class="amsmath math notranslate nohighlight" id="equation-25a0cf38-9596-43ea-af2a-e18ccf1b5c68">
1512-
<span class="eqno">(9)<a class="headerlink" href="#equation-25a0cf38-9596-43ea-af2a-e18ccf1b5c68" title="Permalink to this equation">#</a></span>\[\begin{equation}
1511+
<div class="amsmath math notranslate nohighlight" id="equation-eb0c75f0-cd5c-41eb-a097-a607d994f596">
1512+
<span class="eqno">(9)<a class="headerlink" href="#equation-eb0c75f0-cd5c-41eb-a097-a607d994f596" title="Permalink to this equation">#</a></span>\[\begin{equation}
15131513
\text{Softmax}
15141514
\begin{bmatrix}
15151515
7 &amp; -\infty &amp; -\infty \\

intro.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -618,7 +618,7 @@ <h2><span style="color:Red">Calendar</span><a class="headerlink" href="#span-sty
618618
</tr>
619619
<tr class="row-odd"><td><p>Apr 07</p></td>
620620
<td><p><a class="reference internal" href="_sources/Week_11.html"><span class="doc">AI Explainablility and Uncertainty Quantification</span></a></p></td>
621-
<td><p>HW 08</p></td>
621+
<td><p><a class="reference internal" href="_sources/homework/Homework_08.html"><span class="doc std std-doc">HW 08</span></a></p></td>
622622
<td><p></p></td>
623623
</tr>
624624
<tr class="row-even"><td><p>Apr 14</p></td>

0 commit comments

Comments
 (0)