Skip to content

Commit fef4566

Browse files
committed
deploy: af223d0
1 parent f78cd54 commit fef4566

4 files changed

Lines changed: 21 additions & 19 deletions

File tree

_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1659,6 +1659,7 @@
16591659
"## <span style=\"color:Orange\">Acknowledgments</span>\n",
16601660
"\n",
16611661
"* Initial version: Mark Neubauer\n",
1662+
"* Modified from this [notebook](https://colab.research.google.com/github/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb) \n",
16621663
"\n",
16631664
"© Copyright 2025"
16641665
]

_sources/lectures/Attention.html

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1099,8 +1099,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
10991099
<section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
11001100
<h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
11011101
<p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
1102-
<div class="amsmath math notranslate nohighlight" id="equation-ad706dd6-a52e-4d11-ab77-c830602bba14">
1103-
<span class="eqno">(1)<a class="headerlink" href="#equation-ad706dd6-a52e-4d11-ab77-c830602bba14" title="Permalink to this equation">#</a></span>\[\begin{equation}
1102+
<div class="amsmath math notranslate nohighlight" id="equation-4879249e-730c-4c72-bd88-f861bae2041e">
1103+
<span class="eqno">(1)<a class="headerlink" href="#equation-4879249e-730c-4c72-bd88-f861bae2041e" title="Permalink to this equation">#</a></span>\[\begin{equation}
11041104
\begin{bmatrix}
11051105
7 &amp; -8 &amp; 6 \\
11061106
-3 &amp; 2 &amp; 4 \\
@@ -1113,8 +1113,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11131113
\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
11141114
\]</div>
11151115
<p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
1116-
<div class="amsmath math notranslate nohighlight" id="equation-82e91dbe-9847-4d4c-ae3e-001922071aa0">
1117-
<span class="eqno">(2)<a class="headerlink" href="#equation-82e91dbe-9847-4d4c-ae3e-001922071aa0" title="Permalink to this equation">#</a></span>\[\begin{equation}
1116+
<div class="amsmath math notranslate nohighlight" id="equation-469b756d-c77d-47bf-8a82-87e2bc96499a">
1117+
<span class="eqno">(2)<a class="headerlink" href="#equation-469b756d-c77d-47bf-8a82-87e2bc96499a" title="Permalink to this equation">#</a></span>\[\begin{equation}
11181118
\text{Softmax}
11191119
\begin{bmatrix}
11201120
7 &amp; -8 &amp; 6 \\
@@ -1133,17 +1133,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11331133
\end{bmatrix}
11341134
\end{equation}\]</div>
11351135
<p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
1136-
<div class="amsmath math notranslate nohighlight" id="equation-4d04027e-7c32-4047-ba97-2231fcf2eafa">
1137-
<span class="eqno">(3)<a class="headerlink" href="#equation-4d04027e-7c32-4047-ba97-2231fcf2eafa" title="Permalink to this equation">#</a></span>\[\begin{equation}
1136+
<div class="amsmath math notranslate nohighlight" id="equation-aa311aff-3c34-4d6d-b620-620f697cb836">
1137+
<span class="eqno">(3)<a class="headerlink" href="#equation-aa311aff-3c34-4d6d-b620-620f697cb836" title="Permalink to this equation">#</a></span>\[\begin{equation}
11381138
\begin{bmatrix}
11391139
7 &amp; -8 &amp; -\infty \\
11401140
-3 &amp; 2 &amp; -\infty \\
11411141
1 &amp; 6 &amp; -\infty \\
11421142
\end{bmatrix}
11431143
\end{equation}\]</div>
11441144
<p>Taking the softmax of the rows of this matrix then gives:</p>
1145-
<div class="amsmath math notranslate nohighlight" id="equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc">
1146-
<span class="eqno">(4)<a class="headerlink" href="#equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc" title="Permalink to this equation">#</a></span>\[\begin{equation}
1145+
<div class="amsmath math notranslate nohighlight" id="equation-606dee84-60c1-4f1f-827c-4913a7a02b0e">
1146+
<span class="eqno">(4)<a class="headerlink" href="#equation-606dee84-60c1-4f1f-827c-4913a7a02b0e" title="Permalink to this equation">#</a></span>\[\begin{equation}
11471147
\text{Softmax}
11481148
\begin{bmatrix}
11491149
7 &amp; -8 &amp; -\infty \\
@@ -1185,8 +1185,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
11851185
<p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
11861186
<p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
11871187
<p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
1188-
<div class="amsmath math notranslate nohighlight" id="equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d">
1189-
<span class="eqno">(5)<a class="headerlink" href="#equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
1188+
<div class="amsmath math notranslate nohighlight" id="equation-befe9eff-9127-4ab8-a1bf-a8fbb58c4325">
1189+
<span class="eqno">(5)<a class="headerlink" href="#equation-befe9eff-9127-4ab8-a1bf-a8fbb58c4325" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
11901190
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11911191
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11921192
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1446,8 +1446,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
14461446
<section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
14471447
<h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
14481448
<p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
1449-
<div class="amsmath math notranslate nohighlight" id="equation-f2df8944-33d4-419c-8837-3107f1fa9e44">
1450-
<span class="eqno">(6)<a class="headerlink" href="#equation-f2df8944-33d4-419c-8837-3107f1fa9e44" title="Permalink to this equation">#</a></span>\[\begin{equation}
1449+
<div class="amsmath math notranslate nohighlight" id="equation-4b7ce628-0c08-401f-8c37-779deeaffc59">
1450+
<span class="eqno">(6)<a class="headerlink" href="#equation-4b7ce628-0c08-401f-8c37-779deeaffc59" title="Permalink to this equation">#</a></span>\[\begin{equation}
14511451
\begin{bmatrix}
14521452
7 &amp; -8 &amp; 6 \\
14531453
-3 &amp; 2 &amp; 4 \\
@@ -1458,8 +1458,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14581458
<div class="math notranslate nohighlight">
14591459
\[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
14601460
<p>Then, we can compute softmax for row of the matrix above:</p>
1461-
<div class="amsmath math notranslate nohighlight" id="equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a">
1462-
<span class="eqno">(7)<a class="headerlink" href="#equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a" title="Permalink to this equation">#</a></span>\[\begin{equation}
1461+
<div class="amsmath math notranslate nohighlight" id="equation-f53bf636-67a1-4047-bd08-90546434449a">
1462+
<span class="eqno">(7)<a class="headerlink" href="#equation-f53bf636-67a1-4047-bd08-90546434449a" title="Permalink to this equation">#</a></span>\[\begin{equation}
14631463
\text{Softmax}
14641464
\begin{bmatrix}
14651465
7 &amp; -8 &amp; 6 \\
@@ -1498,17 +1498,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14981498
\text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
14991499
\]</div>
15001500
<p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
1501-
<div class="amsmath math notranslate nohighlight" id="equation-efa6a921-9021-4f0e-a7d2-518e8e243449">
1502-
<span class="eqno">(8)<a class="headerlink" href="#equation-efa6a921-9021-4f0e-a7d2-518e8e243449" title="Permalink to this equation">#</a></span>\[\begin{equation}
1501+
<div class="amsmath math notranslate nohighlight" id="equation-ae838e54-9d6a-443f-bfe3-ab3c5193121e">
1502+
<span class="eqno">(8)<a class="headerlink" href="#equation-ae838e54-9d6a-443f-bfe3-ab3c5193121e" title="Permalink to this equation">#</a></span>\[\begin{equation}
15031503
\begin{bmatrix}
15041504
7 &amp; -\infty &amp; -\infty \\
15051505
-3 &amp; 2 &amp; -\infty \\
15061506
1 &amp; 6 &amp; -2 \\
15071507
\end{bmatrix}
15081508
\end{equation}\]</div>
15091509
<p>Taking the softmax of the rows of this matrix then gives:</p>
1510-
<div class="amsmath math notranslate nohighlight" id="equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a">
1511-
<span class="eqno">(9)<a class="headerlink" href="#equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a" title="Permalink to this equation">#</a></span>\[\begin{equation}
1510+
<div class="amsmath math notranslate nohighlight" id="equation-a4e382a4-b1a3-46c1-a4d3-623292fdd7db">
1511+
<span class="eqno">(9)<a class="headerlink" href="#equation-a4e382a4-b1a3-46c1-a4d3-623292fdd7db" title="Permalink to this equation">#</a></span>\[\begin{equation}
15121512
\text{Softmax}
15131513
\begin{bmatrix}
15141514
7 &amp; -\infty &amp; -\infty \\

_sources/lectures/UnsupervisedLearningAnomalyDetection.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1630,6 +1630,7 @@ <h4><span style="color:LightPink">Looking at Examples</span><a class="headerlink
16301630
<h2><span style="color:Orange">Acknowledgments</span><a class="headerlink" href="#span-style-color-orange-acknowledgments-span" title="Permalink to this heading">#</a></h2>
16311631
<ul class="simple">
16321632
<li><p>Initial version: Mark Neubauer</p></li>
1633+
<li><p>Modified from this <a class="reference external" href="https://colab.research.google.com/github/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb">notebook</a></p></li>
16331634
</ul>
16341635
<p>© Copyright 2025</p>
16351636
</section>

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)