deploy: af223d0

msneubauer · msneubauer · commit fef456606ed0 · 2025-04-13T20:49:26.000Z
diff --git a/_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb b/_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb
@@ -1659,6 +1659,7 @@
         "## <span style=\"color:Orange\">Acknowledgments</span>\n",
         "\n",
         "* Initial version: Mark Neubauer\n",
+        "* Modified from this [notebook](https://colab.research.google.com/github/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb) \n",
         "\n",
         "© Copyright 2025"
       ]
diff --git a/_sources/lectures/Attention.html b/_sources/lectures/Attention.html
@@ -1099,8 +1099,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
 <section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
 <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
 <p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-ad706dd6-a52e-4d11-ab77-c830602bba14">
-<span class="eqno">(1)<a class="headerlink" href="#equation-ad706dd6-a52e-4d11-ab77-c830602bba14" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-4879249e-730c-4c72-bd88-f861bae2041e">
+<span class="eqno">(1)<a class="headerlink" href="#equation-4879249e-730c-4c72-bd88-f861bae2041e" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
   -3       &amp; 2   &amp; 4   \\
@@ -1113,8 +1113,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
 \text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
 \]</div>
 <p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-82e91dbe-9847-4d4c-ae3e-001922071aa0">
-<span class="eqno">(2)<a class="headerlink" href="#equation-82e91dbe-9847-4d4c-ae3e-001922071aa0" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-469b756d-c77d-47bf-8a82-87e2bc96499a">
+<span class="eqno">(2)<a class="headerlink" href="#equation-469b756d-c77d-47bf-8a82-87e2bc96499a" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
@@ -1133,17 +1133,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
 \end{bmatrix}
 \end{equation}\]</div>
 <p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-4d04027e-7c32-4047-ba97-2231fcf2eafa">
-<span class="eqno">(3)<a class="headerlink" href="#equation-4d04027e-7c32-4047-ba97-2231fcf2eafa" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-aa311aff-3c34-4d6d-b620-620f697cb836">
+<span class="eqno">(3)<a class="headerlink" href="#equation-aa311aff-3c34-4d6d-b620-620f697cb836" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; -\infty  \\
   -3       &amp; 2   &amp; -\infty   \\
   1       &amp; 6  &amp; -\infty  \\
 \end{bmatrix}
 \end{equation}\]</div>
 <p>Taking the softmax of the rows of this matrix then gives:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc">
-<span class="eqno">(4)<a class="headerlink" href="#equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-606dee84-60c1-4f1f-827c-4913a7a02b0e">
+<span class="eqno">(4)<a class="headerlink" href="#equation-606dee84-60c1-4f1f-827c-4913a7a02b0e" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
  7       &amp; -8   &amp; -\infty  \\
@@ -1185,8 +1185,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
 <p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
 <p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
 <p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d">
-<span class="eqno">(5)<a class="headerlink" href="#equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
+<div class="amsmath math notranslate nohighlight" id="equation-befe9eff-9127-4ab8-a1bf-a8fbb58c4325">
+<span class="eqno">(5)<a class="headerlink" href="#equation-befe9eff-9127-4ab8-a1bf-a8fbb58c4325" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1446,8 +1446,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
 <section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
 <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
 <p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-f2df8944-33d4-419c-8837-3107f1fa9e44">
-<span class="eqno">(6)<a class="headerlink" href="#equation-f2df8944-33d4-419c-8837-3107f1fa9e44" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-4b7ce628-0c08-401f-8c37-779deeaffc59">
+<span class="eqno">(6)<a class="headerlink" href="#equation-4b7ce628-0c08-401f-8c37-779deeaffc59" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
   -3       &amp; 2   &amp; 4   \\
@@ -1458,8 +1458,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
 <div class="math notranslate nohighlight">
 \[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
 <p>Then, we can compute softmax for row of the matrix above:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a">
-<span class="eqno">(7)<a class="headerlink" href="#equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-f53bf636-67a1-4047-bd08-90546434449a">
+<span class="eqno">(7)<a class="headerlink" href="#equation-f53bf636-67a1-4047-bd08-90546434449a" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
@@ -1498,17 +1498,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
 \text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
 \]</div>
 <p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-efa6a921-9021-4f0e-a7d2-518e8e243449">
-<span class="eqno">(8)<a class="headerlink" href="#equation-efa6a921-9021-4f0e-a7d2-518e8e243449" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-ae838e54-9d6a-443f-bfe3-ab3c5193121e">
+<span class="eqno">(8)<a class="headerlink" href="#equation-ae838e54-9d6a-443f-bfe3-ab3c5193121e" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -\infty   &amp; -\infty  \\
   -3       &amp; 2   &amp; -\infty   \\
   1       &amp; 6  &amp; -2   \\
 \end{bmatrix}
 \end{equation}\]</div>
 <p>Taking the softmax of the rows of this matrix then gives:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a">
-<span class="eqno">(9)<a class="headerlink" href="#equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-a4e382a4-b1a3-46c1-a4d3-623292fdd7db">
+<span class="eqno">(9)<a class="headerlink" href="#equation-a4e382a4-b1a3-46c1-a4d3-623292fdd7db" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -\infty   &amp; -\infty  \\
diff --git a/_sources/lectures/UnsupervisedLearningAnomalyDetection.html b/_sources/lectures/UnsupervisedLearningAnomalyDetection.html
@@ -1630,6 +1630,7 @@ <h4><span style="color:LightPink">Looking at Examples</span><a class="headerlink
 <h2><span style="color:Orange">Acknowledgments</span><a class="headerlink" href="#span-style-color-orange-acknowledgments-span" title="Permalink to this heading">#</a></h2>
 <ul class="simple">
 <li><p>Initial version: Mark Neubauer</p></li>
+<li><p>Modified from this <a class="reference external" href="https://colab.research.google.com/github/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb">notebook</a></p></li>
 </ul>
 <p>© Copyright 2025</p>
 </section>
diff --git a/searchindex.js b/searchindex.js

Original file line number	Diff line number	Diff line change
`@@ -1659,6 +1659,7 @@`
`1659`	`1659`	`"## <span style=\"color:Orange\">Acknowledgments</span>\n",`
`1660`	`1660`	`"\n",`
`1661`	`1661`	`"* Initial version: Mark Neubauer\n",`
	`1662`	`+ "* Modified from this [notebook](https://colab.research.google.com/github/curiousily/Getting-Things-Done-with-Pytorch/blob/master/06.time-series-anomaly-detection-ecg.ipynb) \n",`
`1662`	`1663`	`"\n",`
`1663`	`1664`	`"© Copyright 2025"`
`1664`	`1665`	`]`