illinois-mlp
diff --git a/‎_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png‎
12.1 KB b/‎_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png‎
12.1 KB
diff --git a/‎_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png‎
-14.9 KB b/‎_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png‎
-14.9 KB
diff --git a/‎_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb‎
Lines changed: 30 additions & 26 deletions b/‎_sources/_sources/lectures/UnsupervisedLearningAnomalyDetection.ipynb‎
Lines changed: 30 additions & 26 deletions
diff --git a/‎_sources/lectures/Attention.html‎
Lines changed: 18 additions & 18 deletions b/‎_sources/lectures/Attention.html‎
Lines changed: 18 additions & 18 deletions
diff --git a/‎_sources/lectures/UnsupervisedLearningAnomalyDetection.html‎
Lines changed: 16 additions & 11 deletions b/‎_sources/lectures/UnsupervisedLearningAnomalyDetection.html‎
Lines changed: 16 additions & 11 deletions
diff --git a/‎searchindex.js‎
Lines changed: 1 addition & 1 deletion b/‎searchindex.js‎
Lines changed: 1 addition & 1 deletion
@@ -1099,8 +1099,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
 <section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
 <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
 <p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-22d0d05a-babe-4218-84f2-1eaae730f3c9">
-<span class="eqno">(1)<a class="headerlink" href="#equation-22d0d05a-babe-4218-84f2-1eaae730f3c9" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-ad706dd6-a52e-4d11-ab77-c830602bba14">
+<span class="eqno">(1)<a class="headerlink" href="#equation-ad706dd6-a52e-4d11-ab77-c830602bba14" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
   -3       &amp; 2   &amp; 4   \\
@@ -1113,8 +1113,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
 \text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
 \]</div>
 <p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-6b7555a5-6b71-4539-865e-23bc4614a3d8">
-<span class="eqno">(2)<a class="headerlink" href="#equation-6b7555a5-6b71-4539-865e-23bc4614a3d8" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-82e91dbe-9847-4d4c-ae3e-001922071aa0">
+<span class="eqno">(2)<a class="headerlink" href="#equation-82e91dbe-9847-4d4c-ae3e-001922071aa0" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
@@ -1133,17 +1133,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
 \end{bmatrix}
 \end{equation}\]</div>
 <p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-947d5f8e-5b2d-4424-9aa8-b5d960360ee5">
-<span class="eqno">(3)<a class="headerlink" href="#equation-947d5f8e-5b2d-4424-9aa8-b5d960360ee5" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-4d04027e-7c32-4047-ba97-2231fcf2eafa">
+<span class="eqno">(3)<a class="headerlink" href="#equation-4d04027e-7c32-4047-ba97-2231fcf2eafa" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; -\infty  \\
   -3       &amp; 2   &amp; -\infty   \\
   1       &amp; 6  &amp; -\infty  \\
 \end{bmatrix}
 \end{equation}\]</div>
 <p>Taking the softmax of the rows of this matrix then gives:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-ce560d17-0d7e-4acb-a9f3-be4f520bc5dd">
-<span class="eqno">(4)<a class="headerlink" href="#equation-ce560d17-0d7e-4acb-a9f3-be4f520bc5dd" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc">
+<span class="eqno">(4)<a class="headerlink" href="#equation-43d25f1d-801a-4ae0-b49d-4658d866b1fc" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
  7       &amp; -8   &amp; -\infty  \\
@@ -1185,8 +1185,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
 <p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
 <p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
 <p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-b56a0fc0-20d0-4ce0-b47b-12191ff88df9">
-<span class="eqno">(5)<a class="headerlink" href="#equation-b56a0fc0-20d0-4ce0-b47b-12191ff88df9" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
+<div class="amsmath math notranslate nohighlight" id="equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d">
+<span class="eqno">(5)<a class="headerlink" href="#equation-75947fc6-ced9-4ca6-bb5d-602c58a7420d" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
 \textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1446,8 +1446,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
 <section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
 <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
 <p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-92a7365d-03fc-4811-ba7e-240a794e9c1c">
-<span class="eqno">(6)<a class="headerlink" href="#equation-92a7365d-03fc-4811-ba7e-240a794e9c1c" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-f2df8944-33d4-419c-8837-3107f1fa9e44">
+<span class="eqno">(6)<a class="headerlink" href="#equation-f2df8944-33d4-419c-8837-3107f1fa9e44" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
   -3       &amp; 2   &amp; 4   \\
@@ -1458,8 +1458,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
 <div class="math notranslate nohighlight">
 \[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
 <p>Then, we can compute softmax for row of the matrix above:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-f3baddac-d202-4168-8e1f-b5bae70e5171">
-<span class="eqno">(7)<a class="headerlink" href="#equation-f3baddac-d202-4168-8e1f-b5bae70e5171" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a">
+<span class="eqno">(7)<a class="headerlink" href="#equation-2c719a4b-e8a9-4bee-8946-4ac54ff3388a" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -8   &amp; 6  \\
@@ -1498,17 +1498,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
 \text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
 \]</div>
 <p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-349bad15-9a14-4cfa-9921-4f9b9bf0b2f9">
-<span class="eqno">(8)<a class="headerlink" href="#equation-349bad15-9a14-4cfa-9921-4f9b9bf0b2f9" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-efa6a921-9021-4f0e-a7d2-518e8e243449">
+<span class="eqno">(8)<a class="headerlink" href="#equation-efa6a921-9021-4f0e-a7d2-518e8e243449" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \begin{bmatrix}
   7       &amp; -\infty   &amp; -\infty  \\
   -3       &amp; 2   &amp; -\infty   \\
   1       &amp; 6  &amp; -2   \\
 \end{bmatrix}
 \end{equation}\]</div>
 <p>Taking the softmax of the rows of this matrix then gives:</p>
-<div class="amsmath math notranslate nohighlight" id="equation-c7c3c1b0-3306-42f6-9fbb-3b6d89d557ad">
-<span class="eqno">(9)<a class="headerlink" href="#equation-c7c3c1b0-3306-42f6-9fbb-3b6d89d557ad" title="Permalink to this equation">#</a></span>\[\begin{equation}
+<div class="amsmath math notranslate nohighlight" id="equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a">
+<span class="eqno">(9)<a class="headerlink" href="#equation-1e84e5b6-1ac3-4bef-b520-dde67d6cca4a" title="Permalink to this equation">#</a></span>\[\begin{equation}
 \text{Softmax}
 \begin{bmatrix}
   7       &amp; -\infty   &amp; -\infty  \\
 
@@ -551,17 +551,13 @@ <h1>Unsupervised Learning and Anomaly Detection<a class="headerlink" href="#unsu
 <span class="kn">import</span><span class="w"> </span><span class="nn">warnings</span>
 <span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s1">&#39;ignore&#39;</span><span class="p">)</span>
 
-
 <span class="kn">import</span><span class="w"> </span><span class="nn">torch</span>
 <span class="kn">import</span><span class="w"> </span><span class="nn">copy</span>
 <span class="kn">from</span><span class="w"> </span><span class="nn">pylab</span><span class="w"> </span><span class="kn">import</span> <span class="n">rcParams</span>
 <span class="kn">from</span><span class="w"> </span><span class="nn">matplotlib</span><span class="w"> </span><span class="kn">import</span> <span class="n">rc</span>
 <span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.model_selection</span><span class="w"> </span><span class="kn">import</span> <span class="n">train_test_split</span>
-
 <span class="kn">from</span><span class="w"> </span><span class="nn">torch</span><span class="w"> </span><span class="kn">import</span> <span class="n">nn</span><span class="p">,</span> <span class="n">optim</span>
-
 <span class="kn">import</span><span class="w"> </span><span class="nn">torch.nn.functional</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">F</span>
-
 <span class="kn">from</span><span class="w"> </span><span class="nn">scipy.io</span><span class="w"> </span><span class="kn">import</span> <span class="n">arff</span>
 
 <span class="n">RANDOM_SEED</span> <span class="o">=</span> <span class="mi">42</span>
@@ -571,7 +567,7 @@ <h1>Unsupervised Learning and Anomaly Detection<a class="headerlink" href="#unsu
 </div>
 </div>
 <div class="cell_output docutils container">
-<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;torch._C.Generator at 0x74268cb5e3d0&gt;
+<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;torch._C.Generator at 0x134200410&gt;
 </pre></div>
 </div>
 </div>
@@ -630,7 +626,15 @@ <h2><span style="color:Orange">Networks for Unsupervised Learning</span><a class
 </section>
 <section id="span-style-color-orange-example-time-series-anomaly-detection-using-lstm-autoencoders-span">
 <h2><span style="color:Orange">Example: Time Series Anomaly Detection using LSTM Autoencoders</span><a class="headerlink" href="#span-style-color-orange-example-time-series-anomaly-detection-using-lstm-autoencoders-span" title="Permalink to this heading">#</a></h2>
-<p>In this example,</p>
+<p>In this example, we will learn to:</p>
+<ul class="simple">
+<li><p>Prepare a dataset for Anomaly Detection from Time Series Data</p></li>
+<li><p>Build an LSTM Autoencoder with PyTorch</p></li>
+<li><p>Train and evaluate your model</p></li>
+<li><p>Choose a threshold for anomaly detection</p></li>
+<li><p>Classify unseen examples as normal or anomaly</p></li>
+</ul>
+<p>While our Time Series data is univariate (we have only 1 feature), the code should work for multivariate datasets (multiple features) with little or no modification. Feel free to try it!</p>
 <section id="span-style-color-lightgreen-data-span">
 <h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-style-color-lightgreen-data-span" title="Permalink to this heading">#</a></h3>
 <p>The <a class="reference external" href="http://timeseriesclassification.com/description.php?Dataset=ECG5000">dataset</a> contains 5,000 Time Series examples (obtained with ECG) with 140 timesteps. Each sequence corresponds to a single heartbeat from a single patient with congestive heart failure.</p>
@@ -658,7 +662,7 @@ <h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-
 </div>
 </div>
 <div class="cell_output docutils container">
-<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>cuda
+<div class="output stream highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>cpu
 </pre></div>
 </div>
 </div>
@@ -878,7 +882,8 @@ <h3><span style="color:LightGreen">Data</span><a class="headerlink" href="#span-
 <div class="cell docutils container">
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">CLASS_NORMAL</span> <span class="o">=</span> <span class="mi">1</span>
-<span class="n">class_names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;Normal&#39;</span><span class="p">,</span><span class="s1">&#39;R on T&#39;</span><span class="p">,</span><span class="s1">&#39;PVC&#39;</span><span class="p">,</span><span class="s1">&#39;SP&#39;</span><span class="p">,</span><span class="s1">&#39;UB&#39;</span><span class="p">]</span>
+<span class="c1">#class_names = [&#39;Normal&#39;,&#39;R on T&#39;,&#39;PVC&#39;,&#39;SP&#39;,&#39;UB&#39;] # This ordering sometimes produces wrong counts histogram. Need to check if it affects plots that use class_names</span>
+<span class="n">class_names</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;Normal&#39;</span><span class="p">,</span><span class="s1">&#39;PVC&#39;</span><span class="p">,</span><span class="s1">&#39;R on T&#39;</span><span class="p">,</span><span class="s1">&#39;SP&#39;</span><span class="p">,</span><span class="s1">&#39;UB&#39;</span><span class="p">]</span>
 </pre></div>
 </div>
 </div>
@@ -918,14 +923,14 @@ <h3><span style="color:LightGreen">Exploratory Data Analysis</span><a class="hea
 <p>Let’s plot the results:</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ax</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="o">.</span><span class="n">target</span><span class="p">)</span>
-<span class="n">ax</span><span class="o">.</span><span class="n">set_xticks</span><span class="p">(</span><span class="n">ax</span><span class="o">.</span><span class="n">get_xticks</span><span class="p">(),</span><span class="n">class_names</span><span class="p">)</span>
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">ax</span> <span class="o">=</span> <span class="n">sns</span><span class="o">.</span><span class="n">countplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">df</span><span class="o">.</span><span class="n">target</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
+<span class="n">ax</span><span class="o">.</span><span class="n">set_xticklabels</span><span class="p">(</span><span class="n">class_names</span><span class="p">)</span>
 <span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
 </pre></div>
 </div>
 </div>
 <div class="cell_output docutils container">
-<img alt="../../_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png" src="../../_images/b3f3838f52011b28c49b264a79dc674a3d57779cdfd6c5acbd0ecc78ef4f9407.png" />
+<img alt="../../_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png" src="../../_images/a3fe6250458a3ee6f63e2df3c665109490bf6f07cbd31aa336f8db11378fe2c6.png" />
 </div>
 </div>
 <p>The normal class, has by far, the most examples. This is great because we’ll use it to train our model.</p>