Skip to content

Commit f3ec24e

Browse files
committed
deploy: d020e6a
1 parent 4bc6a75 commit f3ec24e

4 files changed

Lines changed: 21 additions & 21 deletions

File tree

_sources/_sources/lectures/PhysicsInformedNeuralNetworks.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@
427427
"* Initial version: Mark Neubauer\n",
428428
"* 1D harmonic oscillator example is based on the blog post [\"So, what is a physics-informed neural network?\"](https://benmoseley.blog/my-research/so-what-is-a-physics-informed-neural-network/). This problem was inspired by the following blog post: https://beltoforion.de/en/harmonic_oscillator/.\n",
429429
"\n",
430-
"© Copyright 2025"
430+
"© Copyright 2026"
431431
]
432432
}
433433
],

_sources/lectures/Attention.html

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1118,8 +1118,8 @@ <h3><span style="color:LightGreen">Sequence Padding and Attention Masking</span>
11181118
<section id="span-style-color-orange-computing-the-reweighted-padded-attention-mask-span">
11191119
<h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</span><a class="headerlink" href="#span-style-color-orange-computing-the-reweighted-padded-attention-mask-span" title="Permalink to this heading">#</a></h2>
11201120
<p>Lets create some numbers so we can get a better idea of how this works. Let the tokens be <span class="math notranslate nohighlight">\(X = [10, 2, \text{&lt;pad&gt;}]\)</span>, so the third token is a padding token. Lets then also pretend, we pass this to our model, and when we go to compute our attention <span class="math notranslate nohighlight">\(QK^T\)</span>. The raw output before the Softmax is below:</p>
1121-
<div class="amsmath math notranslate nohighlight" id="equation-8db11ec8-5537-4341-8562-daa883bd6f56">
1122-
<span class="eqno">(1)<a class="headerlink" href="#equation-8db11ec8-5537-4341-8562-daa883bd6f56" title="Permalink to this equation">#</a></span>\[\begin{equation}
1121+
<div class="amsmath math notranslate nohighlight" id="equation-66693103-b228-4c52-9405-6c950e1dfcfc">
1122+
<span class="eqno">(1)<a class="headerlink" href="#equation-66693103-b228-4c52-9405-6c950e1dfcfc" title="Permalink to this equation">#</a></span>\[\begin{equation}
11231123
\begin{bmatrix}
11241124
7 &amp; -8 &amp; 6 \\
11251125
-3 &amp; 2 &amp; 4 \\
@@ -1132,8 +1132,8 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11321132
\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}
11331133
\]</div>
11341134
<p>If we ignore padding and everything right now, we can compute softmax for row of the matrix above:</p>
1135-
<div class="amsmath math notranslate nohighlight" id="equation-d7da5bab-f1a7-4356-8a51-4f406a6d697d">
1136-
<span class="eqno">(2)<a class="headerlink" href="#equation-d7da5bab-f1a7-4356-8a51-4f406a6d697d" title="Permalink to this equation">#</a></span>\[\begin{equation}
1135+
<div class="amsmath math notranslate nohighlight" id="equation-5903e3a7-442b-44c6-8e54-1f72c57651d0">
1136+
<span class="eqno">(2)<a class="headerlink" href="#equation-5903e3a7-442b-44c6-8e54-1f72c57651d0" title="Permalink to this equation">#</a></span>\[\begin{equation}
11371137
\text{Softmax}
11381138
\begin{bmatrix}
11391139
7 &amp; -8 &amp; 6 \\
@@ -1152,17 +1152,17 @@ <h2><span style="color:Orange">Computing the Reweighted Padded Attention Mask</s
11521152
\end{bmatrix}
11531153
\end{equation}\]</div>
11541154
<p>But what we need is to mask out all the tokens in this matrix related to padding. Just like we did in <a class="reference external" href="https://github.com/priyammaz/HAL-DL-From-Scratch/tree/main/PyTorch%20for%20NLP/GPT">GPT</a>, we will fill in the indexes of the that we want to mask with <span class="math notranslate nohighlight">\(-\infty\)</span>. If only the last token was a padding token in our sequence, then the attention before the softmax should be written as:</p>
1155-
<div class="amsmath math notranslate nohighlight" id="equation-6a24ca1f-3752-41e7-a409-eb6a3f5e4ae8">
1156-
<span class="eqno">(3)<a class="headerlink" href="#equation-6a24ca1f-3752-41e7-a409-eb6a3f5e4ae8" title="Permalink to this equation">#</a></span>\[\begin{equation}
1155+
<div class="amsmath math notranslate nohighlight" id="equation-a9fd1360-8895-4f50-9d1e-3f84b6a88b6a">
1156+
<span class="eqno">(3)<a class="headerlink" href="#equation-a9fd1360-8895-4f50-9d1e-3f84b6a88b6a" title="Permalink to this equation">#</a></span>\[\begin{equation}
11571157
\begin{bmatrix}
11581158
7 &amp; -8 &amp; -\infty \\
11591159
-3 &amp; 2 &amp; -\infty \\
11601160
1 &amp; 6 &amp; -\infty \\
11611161
\end{bmatrix}
11621162
\end{equation}\]</div>
11631163
<p>Taking the softmax of the rows of this matrix then gives:</p>
1164-
<div class="amsmath math notranslate nohighlight" id="equation-b9ec2f62-1be7-4bab-9687-cbd431b5b8db">
1165-
<span class="eqno">(4)<a class="headerlink" href="#equation-b9ec2f62-1be7-4bab-9687-cbd431b5b8db" title="Permalink to this equation">#</a></span>\[\begin{equation}
1164+
<div class="amsmath math notranslate nohighlight" id="equation-d126aa5f-414d-40bc-b02f-57de984a5a8f">
1165+
<span class="eqno">(4)<a class="headerlink" href="#equation-d126aa5f-414d-40bc-b02f-57de984a5a8f" title="Permalink to this equation">#</a></span>\[\begin{equation}
11661166
\text{Softmax}
11671167
\begin{bmatrix}
11681168
7 &amp; -8 &amp; -\infty \\
@@ -1204,8 +1204,8 @@ <h3><span style="color:LightGreen">Repeating to Match Attention Matrix Shape</sp
12041204
<p><code class="docutils literal notranslate"><span class="pre">attn.shape</span></code> - (Batch x seq_len x seq_len)</p>
12051205
<p><code class="docutils literal notranslate"><span class="pre">mask.shape</span></code> - (Batch x seq_len)</p>
12061206
<p>It is clear that our mask is missing a dimension, and we need to repeat it. Lets take sequence_1 for instance that has a mask of [True, True, True, False]. Because the sequence length here is 4, lets repeat this row 4 times:</p>
1207-
<div class="amsmath math notranslate nohighlight" id="equation-89c53b7f-df9f-4054-b2d2-4c98a0fab720">
1208-
<span class="eqno">(5)<a class="headerlink" href="#equation-89c53b7f-df9f-4054-b2d2-4c98a0fab720" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
1207+
<div class="amsmath math notranslate nohighlight" id="equation-54b2dc89-f68c-4179-8b44-2919a01b8f14">
1208+
<span class="eqno">(5)<a class="headerlink" href="#equation-54b2dc89-f68c-4179-8b44-2919a01b8f14" title="Permalink to this equation">#</a></span>\[\begin{bmatrix}
12091209
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
12101210
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
12111211
\textrm{True} &amp; \textrm{True} &amp; \textrm{True} &amp; \textrm{False} \\
@@ -1465,8 +1465,8 @@ <h3><span style="color:LightGreen">Enforcing Causality</span><a class="headerlin
14651465
<section id="span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span">
14661466
<h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mask</span><a class="headerlink" href="#span-style-color-lightgreen-computing-the-reweighted-causal-attention-mask-span" title="Permalink to this heading">#</a></h3>
14671467
<p>Lets pretend the raw outputs of <span class="math notranslate nohighlight">\(QK^T\)</span>, before the softmax, is below:</p>
1468-
<div class="amsmath math notranslate nohighlight" id="equation-f1bd3b36-d087-4430-b415-3f77ded224cc">
1469-
<span class="eqno">(6)<a class="headerlink" href="#equation-f1bd3b36-d087-4430-b415-3f77ded224cc" title="Permalink to this equation">#</a></span>\[\begin{equation}
1468+
<div class="amsmath math notranslate nohighlight" id="equation-b5400ed0-b403-46c3-9114-f1a39648e888">
1469+
<span class="eqno">(6)<a class="headerlink" href="#equation-b5400ed0-b403-46c3-9114-f1a39648e888" title="Permalink to this equation">#</a></span>\[\begin{equation}
14701470
\begin{bmatrix}
14711471
7 &amp; -8 &amp; 6 \\
14721472
-3 &amp; 2 &amp; 4 \\
@@ -1477,8 +1477,8 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
14771477
<div class="math notranslate nohighlight">
14781478
\[\text{Softmax}(\vec{x}) = \frac{e^{x_i}}{\sum_{j=1}^N{e^{x_j}}}\]</div>
14791479
<p>Then, we can compute softmax for row of the matrix above:</p>
1480-
<div class="amsmath math notranslate nohighlight" id="equation-dc2631a9-ae11-4fde-a276-e4464548a039">
1481-
<span class="eqno">(7)<a class="headerlink" href="#equation-dc2631a9-ae11-4fde-a276-e4464548a039" title="Permalink to this equation">#</a></span>\[\begin{equation}
1480+
<div class="amsmath math notranslate nohighlight" id="equation-3117d899-b5d1-4fc0-a13a-3431bb40865e">
1481+
<span class="eqno">(7)<a class="headerlink" href="#equation-3117d899-b5d1-4fc0-a13a-3431bb40865e" title="Permalink to this equation">#</a></span>\[\begin{equation}
14821482
\text{Softmax}
14831483
\begin{bmatrix}
14841484
7 &amp; -8 &amp; 6 \\
@@ -1517,17 +1517,17 @@ <h3><span style="color:LightGreen">Computing the Reweighted Causal Attention Mas
15171517
\text{Softmax}(x_2) = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [\frac{e^{-3}}{e^{-3}+e^{2}+0}, \frac{e^{2}}{e^{-3}+e^{2}+0}, \frac{0}{e^{-3}+e^{2}+0}] = [0.0067, 0.9933, 0.0000]
15181518
\]</div>
15191519
<p>So we have exactly what we want! The attention weight of the last value is set to 0, so when we are on the second vector <span class="math notranslate nohighlight">\(x_2\)</span>, we cannot look forward to the future value vectors <span class="math notranslate nohighlight">\(v_3\)</span>, and the remaining parts add up to 1 so its still a probability vector! To do this correctly for the entire matrix, we can just substitute in the top triangle of <span class="math notranslate nohighlight">\(QK^T\)</span> with <span class="math notranslate nohighlight">\(-\infty\)</span>. This would look like:</p>
1520-
<div class="amsmath math notranslate nohighlight" id="equation-5d8bf146-85f6-4458-94f9-7093ed348a84">
1521-
<span class="eqno">(8)<a class="headerlink" href="#equation-5d8bf146-85f6-4458-94f9-7093ed348a84" title="Permalink to this equation">#</a></span>\[\begin{equation}
1520+
<div class="amsmath math notranslate nohighlight" id="equation-14e79be7-3542-4c18-8ff9-4168ea636de1">
1521+
<span class="eqno">(8)<a class="headerlink" href="#equation-14e79be7-3542-4c18-8ff9-4168ea636de1" title="Permalink to this equation">#</a></span>\[\begin{equation}
15221522
\begin{bmatrix}
15231523
7 &amp; -\infty &amp; -\infty \\
15241524
-3 &amp; 2 &amp; -\infty \\
15251525
1 &amp; 6 &amp; -2 \\
15261526
\end{bmatrix}
15271527
\end{equation}\]</div>
15281528
<p>Taking the softmax of the rows of this matrix then gives:</p>
1529-
<div class="amsmath math notranslate nohighlight" id="equation-1fa44a22-79a9-4d02-bd59-a1e079520661">
1530-
<span class="eqno">(9)<a class="headerlink" href="#equation-1fa44a22-79a9-4d02-bd59-a1e079520661" title="Permalink to this equation">#</a></span>\[\begin{equation}
1529+
<div class="amsmath math notranslate nohighlight" id="equation-4a3cd56a-86ea-4c3d-b9ea-34cc4f401018">
1530+
<span class="eqno">(9)<a class="headerlink" href="#equation-4a3cd56a-86ea-4c3d-b9ea-34cc4f401018" title="Permalink to this equation">#</a></span>\[\begin{equation}
15311531
\text{Softmax}
15321532
\begin{bmatrix}
15331533
7 &amp; -\infty &amp; -\infty \\

_sources/lectures/PhysicsInformedNeuralNetworks.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -819,7 +819,7 @@ <h2><span style="color:Orange">Acknowledgments</span><a class="headerlink" href=
819819
<li><p>Initial version: Mark Neubauer</p></li>
820820
<li><p>1D harmonic oscillator example is based on the blog post <a class="reference external" href="https://benmoseley.blog/my-research/so-what-is-a-physics-informed-neural-network/">“So, what is a physics-informed neural network?”</a>. This problem was inspired by the following blog post: <a class="reference external" href="https://beltoforion.de/en/harmonic_oscillator/">https://beltoforion.de/en/harmonic_oscillator/</a>.</p></li>
821821
</ul>
822-
<p>© Copyright 2025</p>
822+
<p>© Copyright 2026</p>
823823
</section>
824824
</section>
825825

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)