ContextLab
diff --git a/‎slides/week9/lecture26.html‎
Lines changed: 31 additions & 26 deletions b/‎slides/week9/lecture26.html‎
Lines changed: 31 additions & 26 deletions
@@ -709,6 +709,7 @@ <h1 id="learning-objectives">Learning objectives</h1>
 <li>Analyze real data on <strong>AI and employment</strong>: who is affected, how fast, and what the evidence actually says</li>
 <li>Compare diverging <strong>regulatory approaches</strong>: EU enforcement vs. US deregulation</li>
 <li>Articulate your own <strong>ethical framework</strong> for navigating the AI era</li>
+<li>Think about <strong>what's next</strong> for LLMs: the scaling wall, local models, and multimodality</li>
 </ol>
 </div>
 </section>
@@ -741,7 +742,7 @@ <h1 id="alignment-faking">Alignment faking</h1>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="5" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="reward-tampering">Reward tampering</h1>
 <div class="definition-box" data-title="What happened">
-<p><a href="https://www.anthropic.com/research/reward-tampering">Anthropic's reward tampering research</a> (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2024</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
+<p>Anthropic's reward tampering research (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2024</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
 </div>
 <div class="warning-box" data-title="What emerged without explicit training">
 <p>Models trained to agree with users spontaneously learned to:</p>
@@ -768,13 +769,13 @@ <h1 id="from-deception-to-detection">From deception to detection</h1>
 </section>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="7" data-class="scale-65" data-theme="cdl-theme" lang="C" class="scale-65" style="--class:scale-65;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="mechanistic-interpretability-seeing-inside-the-black-box">Mechanistic interpretability: seeing inside the black box</h1>
-<div class="definition-box" data-title="From neurons to features to circuits">
+<div class="note-box" data-title="From neurons to features to circuits">
 <p>The core problem: individual neurons in LLMs respond to many unrelated concepts (<strong>polysemanticity</strong>), making them uninterpretable. The breakthrough: decompose activations into sparse, meaningful <strong>features</strong>.</p>
 </div>
 <div class="definition-box" data-title="Monosemanticity">
 <p>A <strong>monosemantic</strong> feature responds to exactly one concept (e.g., &quot;Golden Gate Bridge&quot; or &quot;deception&quot;). Sparse autoencoders (SAEs) can decompose polysemantic neurons into monosemantic features — giving us interpretable building blocks for understanding what a model represents internally (<a href="https://transformer-circuits.pub/2023/monosemantic-features/">Bricken et al., 2023</a>).</p>
 </div>
-<div class="note-box" data-title="The interpretability timeline">
+<div class="example-box" data-title="The interpretability timeline">
 <table>
 <thead>
 <tr>
@@ -802,14 +803,14 @@ <h1 id="mechanistic-interpretability-seeing-inside-the-black-box">Mechanistic in
 </tbody>
 </table>
 </div>
-<div class="important-box" data-title="What this means">
+<div class="tip-box" data-title="What this means">
 <p>We can now trace <em>why</em> a model produces a specific output — which features activated, how they connected, and what computation they performed. This is like going from knowing a brain region is &quot;active&quot; to tracing the actual neural circuit.</p>
 </div>
 </section>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="8" data-class="scale-90" data-theme="cdl-theme" lang="C" class="scale-90" style="--class:scale-90;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="circuit-tracing-attribution-graphs">Circuit tracing: attribution graphs</h1>
 <div class="definition-box" data-title="How it works">
-<p><a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Circuit tracing</a> (<a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Lindsey et al., 2025</a>) introduced <strong>Cross-Layer Transcoders (CLTs)</strong> — a new architecture that creates an interpretable replacement model, producing <strong>attribution graphs</strong>:</p>
+<p>Circuit tracing (<a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Lindsey et al., 2025</a>) introduced <strong>Cross-Layer Transcoders (CLTs)</strong> — a new architecture that creates an interpretable replacement model, producing <strong>attribution graphs</strong>.</p>
 </div>
 <p><img src="figs/attribution-graph.svg" alt="Attribution graph" /></p>
 <div class="tip-box" data-title="Try it yourself">
@@ -826,7 +827,7 @@ <h1 id="copyright-the-15-billion-question">Copyright: the $1.5 billion question<
 <thead>
 <tr>
 <th>Case</th>
-<th>Status (early 2026)</th>
+<th>Status (as of March, 2026)</th>
 <th>Key issue</th>
 </tr>
 </thead>
@@ -857,27 +858,27 @@ <h1 id="copyright-the-15-billion-question">Copyright: the $1.5 billion question<
 </div>
 </section>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="10" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
-<h1 id="ai-and-employment-what-the-data-actually-says">AI and employment: what the data actually says</h1>
+<h1 id="ai-and-employment-what-the-data-actually-say">AI and employment: what the data actually say</h1>
 <div class="note-box" data-title="The macro picture">
 <p>&quot;The broader labor market has <strong>not experienced a discernible disruption</strong> since ChatGPT's release.&quot; Fewer than 10% of US firms use AI regularly as of mid-2025. (<a href="https://budgetlab.yale.edu/research/ai-and-macroeconomy-what-economics-literature-can-tell-us">Yale Budget Lab, 2025</a>)</p>
 </div>
 <div class="warning-box" data-title="But look closer at white-collar work">
 <ul>
 <li><a href="https://www.challengergray.com/blog/2025-year-end-challenger-report-highest-q4-layoffs-since-2008-lowest-ytd-hiring-since-2010/"><strong>~55,000 US layoffs</strong></a> directly attributed to AI in 2025 (<a href="https://www.challengergray.com/blog/2025-year-end-challenger-report-highest-q4-layoffs-since-2008-lowest-ytd-hiring-since-2010/">Challenger, Gray &amp; Christmas, 2025</a>)</li>
-<li>US employers announced <strong>696,309 total job cuts</strong> in the first 5 months of 2025 — up 80% year-over-year</li>
+<li>US employers announced <a href="https://www.investopedia.com/job-cuts-reach-highest-levels-since-pandemic-11749055"><strong>696,309 total job cuts</strong></a> in the first 5 months of 2025 — up 80% year-over-year</li>
 <li><strong>79% of employed US women</strong> work in high-automation-risk jobs vs. 58% of men (<a href="https://kenaninstitute.unc.edu/kenan-insight/will-generative-ai-disproportionately-affect-the-jobs-of-women/">Kenan Institute, 2023</a>)</li>
-<li>McKinsey laid off 200 tech employees; uses AI agents for junior consultant tasks</li>
-<li>Salesforce cut 4,000 customer support roles</li>
+<li>McKinsey laid off <a href="https://financialpost.com/fp-work/mckinsey-thousands-layoffs-consulting-slowdown">200 tech employees</a>; uses AI agents for junior consultant tasks</li>
+<li>Salesforce cut <a href="https://www.cnbc.com/2025/09/02/salesforce-ceo-confirms-4000-layoffs-because-i-need-less-heads-with-ai.html">4,000 customer support roles</a></li>
 </ul>
 </div>
 <div class="important-box" data-title="The key finding">
-<p><a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">Harvard Business Review</a> found companies are laying off workers based on AI's <strong><em>anticipated</em> future performance</strong>, not current displacement — a forward-looking disruption pattern unlike prior automation waves (<a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">HBR, January 2026</a>).</p>
+<p>Harvard Business Review found that companies are laying off workers based on AI's <strong><em>anticipated</em> future performance</strong>, not current displacement — a forward-looking disruption pattern unlike prior automation waves (<a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">HBR, January 2026</a>).</p>
 </div>
 </section>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="11" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="the-regulatory-divergence">The regulatory divergence</h1>
-<div class="note-box" data-title="Two philosophies, one technology">
-<p>The EU and US have taken <strong>opposite approaches</strong> to AI regulation:</p>
+<div class="tip-box" data-title="Two philosophies, one technology">
+<p>The EU and US have taken <strong>opposite approaches</strong> to AI regulation.</p>
 </div>
 <div class="note-box" data-title="European Union: regulate first">
 <p><a href="https://artificialintelligenceact.eu/"><strong>EU AI Act</strong></a> — risk-based framework, first provisions active since February 2, 2025:</p>
@@ -887,12 +888,11 @@ <h1 id="the-regulatory-divergence">The regulatory divergence</h1>
 <li><strong>Status</strong>: Rules are live but no enforcement actions yet (as of Feb 2026). Finland became the first active national enforcer (Jan 2026). Full high-risk compliance required by August 2026.</li>
 </ul>
 </div>
-<div class="note-box" data-title="United States: deregulate and compete">
+<div class="important-box" data-title="United States: deregulate and compete">
 <p><strong>Trump administration</strong> (January 2025–present):</p>
 <ul>
-<li><strong>Day 1</strong>: Revoked Biden's October 2023 AI safety executive order</li>
-<li><strong>January 2025</strong>: Signed EO 14179 — &quot;Removing Barriers to American Leadership in AI&quot;</li>
-<li><strong>December 2025</strong>: Signed EO seeking <strong>federal preemption of state AI laws</strong> — states with &quot;onerous AI laws&quot; lose federal broadband funding</li>
+<li><strong>Day 1</strong>: Revoked Biden's October 2023 AI safety executive order by signing an <a href="https://www.whitehouse.gov/presidential-actions/2025/01/removing-barriers-to-american-leadership-in-artificial-intelligence/">EO</a> &quot;Removing Barriers to American Leadership in AI&quot;</li>
+<li><strong>December 2025</strong>: Signed <a href="https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-national-artificial-intelligence-policy/">EO</a> seeking <strong>federal preemption of state AI laws</strong> — states with &quot;onerous AI laws&quot; lose federal broadband funding</li>
 <li>No comprehensive federal AI legislation has passed Congress</li>
 </ul>
 </div>
@@ -904,7 +904,7 @@ <h1 id="deepfakes-and-elections-the-2024-test">Deepfakes and elections: the 2024
 <ul>
 <li><a href="https://www.fcc.gov/document/fcc-issues-6m-fine-nh-robocalls"><strong>AI robocalls</strong></a> impersonated Biden urging NH voters not to vote (creator fined <strong>$6M</strong>, criminally indicted)</li>
 <li><a href="https://blogs.microsoft.com/on-the-issues/2024/10/23/as-the-u-s-election-nears-russia-iran-and-china-step-up-influence-efforts/"><strong>Storm-1516</strong></a> network created deepfake videos of candidates — one shared by Elon Musk</li>
-<li><strong>India</strong>: Celebrity deepfakes criticizing Modi went viral on WhatsApp</li>
+<li><strong>India</strong>: <a href="https://www.reuters.com/world/india/deepfakes-bollywood-stars-spark-worries-ai-meddling-india-election-2024-04-22/">Celebrity deepfakes criticizing Modi</a> went viral on WhatsApp</li>
 <li><strong>Germany</strong>: <a href="https://www.isdglobal.org/digital-dispatch/coordinated-disinformation-network-uses-ai-media-impersonation-to-target-german-election/">100+ AI-powered websites</a> distributing deepfakes ahead of elections</li>
 </ul>
 </div>
@@ -915,42 +915,47 @@ <h1 id="deepfakes-and-elections-the-2024-test">Deepfakes and elections: the 2024
 <p>The risk moved from individual deepfakes to <strong>AI-powered disinformation infrastructure</strong> — chatbots, automated accounts, and poisoned information ecosystems that operate continuously.</p>
 </div>
 </section>
-</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="13" data-class="scale-78" data-theme="cdl-theme" lang="C" class="scale-78" style="--class:scale-78;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
+</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="13" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
 <h1 id="the-open-weight-debate">The open-weight debate</h1>
 <div class="note-box" data-title="Meta's evolving position">
 <p>Meta has been the loudest champion of open-weight AI models. But with Llama 4:</p>
 <table>
 <thead>
 <tr>
 <th>Model</th>
-<th>Parameters</th>
+<th>Total params</th>
+<th>Active params</th>
 <th>Status</th>
 </tr>
 </thead>
 <tbody>
 <tr>
 <td>Scout</td>
-<td>109B</td>
+<td>109B (16 experts)</td>
+<td>17B</td>
 <td>Released (open-weight)</td>
 </tr>
 <tr>
 <td>Maverick</td>
-<td>400B</td>
+<td>~400B (128 experts)</td>
+<td>17B</td>
 <td>Released (open-weight)</td>
 </tr>
 <tr>
 <td><strong>Behemoth</strong></td>
-<td><strong>~2 trillion</strong></td>
-<td><strong>May be withheld</strong> — citing &quot;novel safety concerns&quot;</td>
+<td><strong>~2T (16 experts)</strong></td>
+<td><strong>288B</strong></td>
+<td><strong>Delayed</strong> — <a href="https://fortune.com/2025/05/16/why-meta-reportedly-delayed-its-behemoth-ai-model-rollout/">performance concerns</a></td>
 </tr>
 </tbody>
 </table>
+<p>All three use mixture-of-experts (MoE) architecture (<a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/">Meta, 2025</a>).</p>
 </div>
 <div class="important-box" data-title="The capability threshold question">
-<p>Meta's potential decision to withhold Behemoth represents a critical acknowledgment: <strong>there may be a capability level above which open release is irresponsible</strong>. Safety fine-tuning on open-weight models can be stripped with modest compute. Once released, weights cannot be recalled.</p>
+<p>Behemoth's delay was driven by <a href="https://www.computerworld.com/article/3987990/meta-hits-pause-on-llama-4-behemoth-ai-model-amid-capability-concerns.html">benchmark performance falling short of internal targets</a>, not safety concerns. But the broader question remains: <strong>is there a capability level above which open release is irresponsible?</strong> Safety fine-tuning on open-weight models can be stripped with modest compute. Once released, weights cannot be recalled.</p>
 </div>
 <div class="tip-box" data-title="The tension">
-<p>Advocates of openness emphasize auditability, democratization, and preventing power concentration. Critics note that the DeepSeek-R1 distillation experiment showed a $300K RL run can produce frontier reasoning from open-weight base models. Is openness still net positive at the frontier?</p>
+<p>Advocates of openness emphasize auditability, democratization, and preventing power concentration. Critics note that the <a href="https://arxiv.org/abs/2501.12948">DeepSeek-R1</a> distillation experiment showed a $300K RL run can produce frontier reasoning from open-weight base models. Is openness still net positive at the frontier?</p>
 </div>
 </section>
 </foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="14" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">