Skip to content

Commit f78cd54

Browse files
committed
deploy: f26cc2d
1 parent 66f813e commit f78cd54

6 files changed

Lines changed: 65 additions & 56 deletions
12.1 KB
Loading
Binary file not shown.

_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb

Lines changed: 30 additions & 26 deletions
Large diffs are not rendered by default.

_sources/lectures/Attention.html

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1099,8 +1099,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
10991099
<section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
11001100
<h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
11011101
<p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
1102-
<div class="amsmath math notranslate nohighlight" id="equation-22d0d05a-babe-4218-84f2-1eaae730f3c9">
1103-
<span class="eqno">(1)<a class="headerlink" href="#equation-22d0d05a-babe-4218-84f2-1eaae730f3c9" title="Permalink to this equation">#</a></span>\[\begin{equation}
1102+
<div class="amsmath math notranslate nohighlight" id="equation-ad706dd6-a52e-4d11-ab77-c830602bba14">
1103+
<span class="eqno">(1)<a class="headerlink" href="#equation-ad706dd6-a52e-4d11-ab77-c830602bba14" title="Permalink to this equation">#</a></span>\[\begin{equation}
11041104
\begin{bmatrix}
11051105
7 &amp; -8 &amp; 6 \\
11061106
-3 &amp; 2 &amp; 4 \\
@@ -1113,8 +1113,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11131113
\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
11141114
\]</div>
11151115
<p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
1116-
<div class="amsmath math notranslate nohighlight" id="equation-6b7555a5-6b71-4539-865e-23bc4614a3d8">
1117-
<span class="eqno">(2)<a class="headerlink" href="#equation-6b7555a5-6b71-4539-865e-23bc4614a3d8" title="Permalink to this equation">#</a></span>\[\begin{equation}
1116+
<div class="amsmath math notranslate nohighlight" id="equation-82e91dbe-9847-4d4c-ae3e-001922071aa0">
1117+
<span class="eqno">(2)<a class="headerlink" href="#equation-82e91dbe-9847-4d4c-ae3e-001922071aa0" title="Permalink to this equation">#</a></span>\[\begin{equation}
11181118
\text{Softmax}
11191119
\begin{bmatrix}
11201120
7 &amp; -8 &amp; 6 \\
@@ -1133,17 +1133,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11331133
\end{bmatrix}
11341134
\end{equation}\]</div>
11351135
<p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
1136-
<div class="amsmath math notranslate nohighlight" id="equation-947d5f8e-5b2d-4424-9aa8-b5d960360ee5">
1137-
<span class="eqno">(3)<a class="headerlink" href="#equation-947d5f8e-5b2d-4424-9aa8-b5d960360ee5" title="Permalink to this equation">#</a></span>\[\begin{equation}
1136+
<div class="amsmath math notranslate nohighlight" id="equation-4d04027e-7c32-4047-ba97-2231fcf2eafa">
1137+
<span class="eqno">(3)<a class="headerlink" href="#equation-4d04027e-7c32-4047-ba97-2231fcf2eafa" title="Permalink to this equation">#</a></span>\[\begin{equation}
11381138
\begin{bmatrix}
11391139
7 &amp; -8 &amp; -\infty \\
11401140
-3 &amp; 2 &amp; -\infty \\
11411141
1 &amp; 6 &amp; -\infty \\
11421142
\end{bmatrix}
11431143
\end{equation}\]</div>
11441144
<p>Taking the softmax of the rows of this matrix then gives:</p>
1145-
<div class="amsmath math notranslate nohighlight" id="equation-ce560d17-0d7e-4acb-a9f3-be4f520bc5dd">
1146-
<span class="eqno">(4)<a class="headerlink" href="#equation-ce560d17-0d7e-4acb-a9f3-be4f520bc5dd" title="Permalink to this equation">#</a></span>\[\begin{equation}
1145+
<div class="amsmath math notranslate nohighlight" id="equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc">
1146+
<span class="eqno">(4)<a class="headerlink" href="#equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc" title="Permalink to this equation">#</a></span>\[\begin{equation}
11471147
\text{Softmax}
11481148
\begin{bmatrix}
11491149
7 &amp; -8 &amp; -\infty \\
@@ -1185,8 +1185,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
11851185
<p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
11861186
<p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
11871187
<p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
1188-
<div class="amsmath math notranslate nohighlight" id="equation-b56a0fc0-20d0-4ce0-b47b-12191ff88df9">
1189-
<span class="eqno">(5)<a class="headerlink" href="#equation-b56a0fc0-20d0-4ce0-b47b-12191ff88df9" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
1188+
<div class="amsmath math notranslate nohighlight" id="equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d">
1189+
<span class="eqno">(5)<a class="headerlink" href="#equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
11901190
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11911191
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
11921192
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1446,8 +1446,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
14461446
<section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
14471447
<h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
14481448
<p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
1449-
<div class="amsmath math notranslate nohighlight" id="equation-92a7365d-03fc-4811-ba7e-240a794e9c1c">
1450-
<span class="eqno">(6)<a class="headerlink" href="#equation-92a7365d-03fc-4811-ba7e-240a794e9c1c" title="Permalink to this equation">#</a></span>\[\begin{equation}
1449+
<div class="amsmath math notranslate nohighlight" id="equation-f2df8944-33d4-419c-8837-3107f1fa9e44">
1450+
<span class="eqno">(6)<a class="headerlink" href="#equation-f2df8944-33d4-419c-8837-3107f1fa9e44" title="Permalink to this equation">#</a></span>\[\begin{equation}
14511451
\begin{bmatrix}
14521452
7 &amp; -8 &amp; 6 \\
14531453
-3 &amp; 2 &amp; 4 \\
@@ -1458,8 +1458,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14581458
<div class="math notranslate nohighlight">
14591459
\[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
14601460
<p>Then, we can compute softmax for row of the matrix above:</p>
1461-
<div class="amsmath math notranslate nohighlight" id="equation-f3baddac-d202-4168-8e1f-b5bae70e5171">
1462-
<span class="eqno">(7)<a class="headerlink" href="#equation-f3baddac-d202-4168-8e1f-b5bae70e5171" title="Permalink to this equation">#</a></span>\[\begin{equation}
1461+
<div class="amsmath math notranslate nohighlight" id="equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a">
1462+
<span class="eqno">(7)<a class="headerlink" href="#equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a" title="Permalink to this equation">#</a></span>\[\begin{equation}
14631463
\text{Softmax}
14641464
\begin{bmatrix}
14651465
7 &amp; -8 &amp; 6 \\
@@ -1498,17 +1498,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14981498
\text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
14991499
\]</div>
15001500
<p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
1501-
<div class="amsmath math notranslate nohighlight" id="equation-349bad15-9a14-4cfa-9921-4f9b9bf0b2f9">
1502-
<span class="eqno">(8)<a class="headerlink" href="#equation-349bad15-9a14-4cfa-9921-4f9b9bf0b2f9" title="Permalink to this equation">#</a></span>\[\begin{equation}
1501+
<div class="amsmath math notranslate nohighlight" id="equation-efa6a921-9021-4f0e-a7d2-518e8e243449">
1502+
<span class="eqno">(8)<a class="headerlink" href="#equation-efa6a921-9021-4f0e-a7d2-518e8e243449" title="Permalink to this equation">#</a></span>\[\begin{equation}
15031503
\begin{bmatrix}
15041504
7 &amp; -\infty &amp; -\infty \\
15051505
-3 &amp; 2 &amp; -\infty \\
15061506
1 &amp; 6 &amp; -2 \\
15071507
\end{bmatrix}
15081508
\end{equation}\]</div>
15091509
<p>Taking the softmax of the rows of this matrix then gives:</p>
1510-
<div class="amsmath math notranslate nohighlight" id="equation-c7c3c1b0-3306-42f6-9fbb-3b6d89d557ad">
1511-
<span class="eqno">(9)<a class="headerlink" href="#equation-c7c3c1b0-3306-42f6-9fbb-3b6d89d557ad" title="Permalink to this equation">#</a></span>\[\begin{equation}
1510+
<div class="amsmath math notranslate nohighlight" id="equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a">
1511+
<span class="eqno">(9)<a class="headerlink" href="#equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a" title="Permalink to this equation">#</a></span>\[\begin{equation}
15121512
\text{Softmax}
15131513
\begin{bmatrix}
15141514
7 &amp; -\infty &amp; -\infty \\

_sources/lectures/UnsupervisedLearningAnomalyDetection.html

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -551,17 +551,13 @@ <h1>Unsupervised Learning and Anomaly Detection<a class="headerlink" href="#unsu
551551
<span class="kn">import</span><span class="w"> </span><span class="nn">warnings</span>
552552
<span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s1">&#39;ignore&#39;</span><span class="p">)</span>
553553

554-
555554
<span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
556555
<span class="kn">import</span><span class="w"> </span><span class="nn">copy</span>
557556
<span class="kn">from</span><span class="w"> </span><span class="nn">pylab</span><span class="w"> </span><span class="kn">import</span> <span class="n">rcParams</span>
558557
<span class="kn">from</span><span class="w"> </span><span class="nn">matplotlib</span><span class="w"> </span><span class="kn">import</span> <span class="n">rc</span>
559558
<span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.model_selection</span><span class="w"> </span><span class="kn">import</span> <span class="n">train_test_split</span>
560-
561559
<span class="kn">from</span><span class="w"> </span><span class="nn">torch</span><span class="w"> </span><span class="kn">import</span> <span class="n">nn</span><span class="p">,</span> <span class="n">optim</span>
562-
563560
<span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn.functional</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">F</span>
564-
565561
<span class="kn">from</span><span class="w"> </span><span class="nn">scipy.io</span><span class="w"> </span><span class="kn">import</span> <span class="n">arff</span>
566562

567563
<span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">42</span>
@@ -571,7 +567,7 @@ <h1>Unsupervised Learning and Anomaly Detection<a class="headerlink" href="#unsu
571567
</div>
572568
</div>
573569
<div class="cell_output docutils container">
574-
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;torch._C.Generator at 0x74268cb5e3d0&gt;
570+
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;torch._C.Generator at 0x134200410&gt;
575571
</pre></div>
576572
</div>
577573
</div>
@@ -630,7 +626,15 @@ <h2><span style="color:Orange">Networks for Unsupervised Learning</span><a class
630626
</section>
631627
<section id="span-style-color-orange-example-time-series-anomaly-detection-using-lstm-autoencoders-span">
632628
<h2><span style="color:Orange">Example: Time Series Anomaly Detection using LSTM Autoencoders</span><a class="headerlink" href="#span-style-color-orange-example-time-series-anomaly-detection-using-lstm-autoencoders-span" title="Permalink to this heading">#</a></h2>
633-
<p>In this example,</p>
629+
<p>In this example, we will learn to:</p>
630+
<ul class="simple">
631+
<li><p>Prepare a dataset for Anomaly Detection from Time Series Data</p></li>
632+
<li><p>Build an LSTM Autoencoder with PyTorch</p></li>
633+
<li><p>Train and evaluate your model</p></li>
634+
<li><p>Choose a threshold for anomaly detection</p></li>
635+
<li><p>Classify unseen examples as normal or anomaly</p></li>
636+
</ul>
637+
<p>While our Time Series data is univariate (we have only 1 feature), the code should work for multivariate datasets (multiple features) with little or no modification. Feel free to try it!</p>
634638
<section id="span-style-color-lightgreen-data-span">
635639
<h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-style-color-lightgreen-data-span" title="Permalink to this heading">#</a></h3>
636640
<p>The <a class="reference external" href="http://timeseriesclassification.com/description.php?Dataset=ECG5000">dataset</a> contains 5,000 Time Series examples (obtained with ECG) with 140 timesteps. Each sequence corresponds to a single heartbeat from a single patient with congestive heart failure.</p>
@@ -658,7 +662,7 @@ <h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-
658662
</div>
659663
</div>
660664
<div class="cell_output docutils container">
661-
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>cuda
665+
<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>cpu
662666
</pre></div>
663667
</div>
664668
</div>
@@ -878,7 +882,8 @@ <h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-
878882
<div class="cell docutils container">
879883
<div class="cell_input docutils container">
880884
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">CLASS_NORMAL</span> <span class="o">=</span> <span class="mi">1</span>
881-
<span class="n">class_names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;Normal&#39;</span><span class="p">,</span><span class="s1">&#39;R on T&#39;</span><span class="p">,</span><span class="s1">&#39;PVC&#39;</span><span class="p">,</span><span class="s1">&#39;SP&#39;</span><span class="p">,</span><span class="s1">&#39;UB&#39;</span><span class="p">]</span>
885+
<span class="c1">#class_names = [&#39;Normal&#39;,&#39;R on T&#39;,&#39;PVC&#39;,&#39;SP&#39;,&#39;UB&#39;] # This ordering sometimes produces wrong counts histogram. Need to check if it affects plots that use class_names</span>
886+
<span class="n">class_names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;Normal&#39;</span><span class="p">,</span><span class="s1">&#39;PVC&#39;</span><span class="p">,</span><span class="s1">&#39;R on T&#39;</span><span class="p">,</span><span class="s1">&#39;SP&#39;</span><span class="p">,</span><span class="s1">&#39;UB&#39;</span><span class="p">]</span>
882887
</pre></div>
883888
</div>
884889
</div>
@@ -918,14 +923,14 @@ <h3><span style="color:LightGreen">Exploratory Data Analysis</span><a class="hea
918923
<p>Let’s plot the results:</p>
919924
<div class="cell docutils container">
920925
<div class="cell_input docutils container">
921-
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ax</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="o">.</span><span class="n">target</span><span class="p">)</span>
922-
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="n">ax</span><span class="o">.</span><span class="n">get_xticks</span><span class="p">(),</span><span class="n">class_names</span><span class="p">)</span>
926+
<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ax</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="o">.</span><span class="n">target</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
927+
<span class="n">ax</span><span class="o">.</span><span class="n">set_xticklabels</span><span class="p">(</span><span class="n">class_names</span><span class="p">)</span>
923928
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
924929
</pre></div>
925930
</div>
926931
</div>
927932
<div class="cell_output docutils container">
928-
<img alt="../../_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png" src="../../_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png" />
933+
<img alt="../../_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png" src="../../_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png" />
929934
</div>
930935
</div>
931936
<p>The normal class, has by far, the most examples. This is great because we’ll use it to train our model.</p>

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)