Skip to content

Commit 0f343b4

Browse files
jeremymanningclaude
andcommitted
Fix slide 13 (open-weight debate): correct Behemoth delay reason, add MoE details and citations
Behemoth delay was due to performance concerns, not safety concerns. Added active vs total parameter counts, MoE expert counts, and source links. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5655e71 commit 0f343b4

3 files changed

Lines changed: 58 additions & 51 deletions

File tree

slides/week9/lecture26.html

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -709,6 +709,7 @@ <h1 id="learning-objectives">Learning objectives</h1>
709709
<li>Analyze real data on <strong>AI and employment</strong>: who is affected, how fast, and what the evidence actually says</li>
710710
<li>Compare diverging <strong>regulatory approaches</strong>: EU enforcement vs. US deregulation</li>
711711
<li>Articulate your own <strong>ethical framework</strong> for navigating the AI era</li>
712+
<li>Think about <strong>what's next</strong> for LLMs: the scaling wall, local models, and multimodality</li>
712713
</ol>
713714
</div>
714715
</section>
@@ -741,7 +742,7 @@ <h1 id="alignment-faking">Alignment faking</h1>
741742
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="5" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
742743
<h1 id="reward-tampering">Reward tampering</h1>
743744
<div class="definition-box" data-title="What happened">
744-
<p><a href="https://www.anthropic.com/research/reward-tampering">Anthropic's reward tampering research</a> (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2024</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
745+
<p>Anthropic's reward tampering research (<a href="https://www.anthropic.com/research/reward-tampering">Denison et al., 2024</a>) revealed that training models to be sycophantic (agreeable) produced unexpected <strong>emergent dangerous behaviors</strong>:</p>
745746
</div>
746747
<div class="warning-box" data-title="What emerged without explicit training">
747748
<p>Models trained to agree with users spontaneously learned to:</p>
@@ -768,13 +769,13 @@ <h1 id="from-deception-to-detection">From deception to detection</h1>
768769
</section>
769770
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="7" data-class="scale-65" data-theme="cdl-theme" lang="C" class="scale-65" style="--class:scale-65;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
770771
<h1 id="mechanistic-interpretability-seeing-inside-the-black-box">Mechanistic interpretability: seeing inside the black box</h1>
771-
<div class="definition-box" data-title="From neurons to features to circuits">
772+
<div class="note-box" data-title="From neurons to features to circuits">
772773
<p>The core problem: individual neurons in LLMs respond to many unrelated concepts (<strong>polysemanticity</strong>), making them uninterpretable. The breakthrough: decompose activations into sparse, meaningful <strong>features</strong>.</p>
773774
</div>
774775
<div class="definition-box" data-title="Monosemanticity">
775776
<p>A <strong>monosemantic</strong> feature responds to exactly one concept (e.g., &quot;Golden Gate Bridge&quot; or &quot;deception&quot;). Sparse autoencoders (SAEs) can decompose polysemantic neurons into monosemantic features — giving us interpretable building blocks for understanding what a model represents internally (<a href="https://transformer-circuits.pub/2023/monosemantic-features/">Bricken et al., 2023</a>).</p>
776777
</div>
777-
<div class="note-box" data-title="The interpretability timeline">
778+
<div class="example-box" data-title="The interpretability timeline">
778779
<table>
779780
<thead>
780781
<tr>
@@ -802,14 +803,14 @@ <h1 id="mechanistic-interpretability-seeing-inside-the-black-box">Mechanistic in
802803
</tbody>
803804
</table>
804805
</div>
805-
<div class="important-box" data-title="What this means">
806+
<div class="tip-box" data-title="What this means">
806807
<p>We can now trace <em>why</em> a model produces a specific output — which features activated, how they connected, and what computation they performed. This is like going from knowing a brain region is &quot;active&quot; to tracing the actual neural circuit.</p>
807808
</div>
808809
</section>
809810
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="8" data-class="scale-90" data-theme="cdl-theme" lang="C" class="scale-90" style="--class:scale-90;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
810811
<h1 id="circuit-tracing-attribution-graphs">Circuit tracing: attribution graphs</h1>
811812
<div class="definition-box" data-title="How it works">
812-
<p><a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Circuit tracing</a> (<a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Lindsey et al., 2025</a>) introduced <strong>Cross-Layer Transcoders (CLTs)</strong> — a new architecture that creates an interpretable replacement model, producing <strong>attribution graphs</strong>:</p>
813+
<p>Circuit tracing (<a href="https://transformer-circuits.pub/2025/attribution-graphs/methods.html">Lindsey et al., 2025</a>) introduced <strong>Cross-Layer Transcoders (CLTs)</strong> — a new architecture that creates an interpretable replacement model, producing <strong>attribution graphs</strong>.</p>
813814
</div>
814815
<p><img src="figs/attribution-graph.svg" alt="Attribution graph" /></p>
815816
<div class="tip-box" data-title="Try it yourself">
@@ -826,7 +827,7 @@ <h1 id="copyright-the-15-billion-question">Copyright: the $1.5 billion question<
826827
<thead>
827828
<tr>
828829
<th>Case</th>
829-
<th>Status (early 2026)</th>
830+
<th>Status (as of March, 2026)</th>
830831
<th>Key issue</th>
831832
</tr>
832833
</thead>
@@ -857,27 +858,27 @@ <h1 id="copyright-the-15-billion-question">Copyright: the $1.5 billion question<
857858
</div>
858859
</section>
859860
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="10" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
860-
<h1 id="ai-and-employment-what-the-data-actually-says">AI and employment: what the data actually says</h1>
861+
<h1 id="ai-and-employment-what-the-data-actually-say">AI and employment: what the data actually say</h1>
861862
<div class="note-box" data-title="The macro picture">
862863
<p>&quot;The broader labor market has <strong>not experienced a discernible disruption</strong> since ChatGPT's release.&quot; Fewer than 10% of US firms use AI regularly as of mid-2025. (<a href="https://budgetlab.yale.edu/research/ai-and-macroeconomy-what-economics-literature-can-tell-us">Yale Budget Lab, 2025</a>)</p>
863864
</div>
864865
<div class="warning-box" data-title="But look closer at white-collar work">
865866
<ul>
866867
<li><a href="https://www.challengergray.com/blog/2025-year-end-challenger-report-highest-q4-layoffs-since-2008-lowest-ytd-hiring-since-2010/"><strong>~55,000 US layoffs</strong></a> directly attributed to AI in 2025 (<a href="https://www.challengergray.com/blog/2025-year-end-challenger-report-highest-q4-layoffs-since-2008-lowest-ytd-hiring-since-2010/">Challenger, Gray &amp; Christmas, 2025</a>)</li>
867-
<li>US employers announced <strong>696,309 total job cuts</strong> in the first 5 months of 2025 — up 80% year-over-year</li>
868+
<li>US employers announced <a href="https://www.investopedia.com/job-cuts-reach-highest-levels-since-pandemic-11749055"><strong>696,309 total job cuts</strong></a> in the first 5 months of 2025 — up 80% year-over-year</li>
868869
<li><strong>79% of employed US women</strong> work in high-automation-risk jobs vs. 58% of men (<a href="https://kenaninstitute.unc.edu/kenan-insight/will-generative-ai-disproportionately-affect-the-jobs-of-women/">Kenan Institute, 2023</a>)</li>
869-
<li>McKinsey laid off 200 tech employees; uses AI agents for junior consultant tasks</li>
870-
<li>Salesforce cut 4,000 customer support roles</li>
870+
<li>McKinsey laid off <a href="https://financialpost.com/fp-work/mckinsey-thousands-layoffs-consulting-slowdown">200 tech employees</a>; uses AI agents for junior consultant tasks</li>
871+
<li>Salesforce cut <a href="https://www.cnbc.com/2025/09/02/salesforce-ceo-confirms-4000-layoffs-because-i-need-less-heads-with-ai.html">4,000 customer support roles</a></li>
871872
</ul>
872873
</div>
873874
<div class="important-box" data-title="The key finding">
874-
<p><a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">Harvard Business Review</a> found companies are laying off workers based on AI's <strong><em>anticipated</em> future performance</strong>, not current displacement — a forward-looking disruption pattern unlike prior automation waves (<a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">HBR, January 2026</a>).</p>
875+
<p>Harvard Business Review found that companies are laying off workers based on AI's <strong><em>anticipated</em> future performance</strong>, not current displacement — a forward-looking disruption pattern unlike prior automation waves (<a href="https://hbr.org/2026/01/companies-are-laying-off-workers-because-of-ais-potential-not-its-performance">HBR, January 2026</a>).</p>
875876
</div>
876877
</section>
877878
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="11" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
878879
<h1 id="the-regulatory-divergence">The regulatory divergence</h1>
879-
<div class="note-box" data-title="Two philosophies, one technology">
880-
<p>The EU and US have taken <strong>opposite approaches</strong> to AI regulation:</p>
880+
<div class="tip-box" data-title="Two philosophies, one technology">
881+
<p>The EU and US have taken <strong>opposite approaches</strong> to AI regulation.</p>
881882
</div>
882883
<div class="note-box" data-title="European Union: regulate first">
883884
<p><a href="https://artificialintelligenceact.eu/"><strong>EU AI Act</strong></a> — risk-based framework, first provisions active since February 2, 2025:</p>
@@ -887,12 +888,11 @@ <h1 id="the-regulatory-divergence">The regulatory divergence</h1>
887888
<li><strong>Status</strong>: Rules are live but no enforcement actions yet (as of Feb 2026). Finland became the first active national enforcer (Jan 2026). Full high-risk compliance required by August 2026.</li>
888889
</ul>
889890
</div>
890-
<div class="note-box" data-title="United States: deregulate and compete">
891+
<div class="important-box" data-title="United States: deregulate and compete">
891892
<p><strong>Trump administration</strong> (January 2025–present):</p>
892893
<ul>
893-
<li><strong>Day 1</strong>: Revoked Biden's October 2023 AI safety executive order</li>
894-
<li><strong>January 2025</strong>: Signed EO 14179 — &quot;Removing Barriers to American Leadership in AI&quot;</li>
895-
<li><strong>December 2025</strong>: Signed EO seeking <strong>federal preemption of state AI laws</strong> — states with &quot;onerous AI laws&quot; lose federal broadband funding</li>
894+
<li><strong>Day 1</strong>: Revoked Biden's October 2023 AI safety executive order by signing an <a href="https://www.whitehouse.gov/presidential-actions/2025/01/removing-barriers-to-american-leadership-in-artificial-intelligence/">EO</a> &quot;Removing Barriers to American Leadership in AI&quot;</li>
895+
<li><strong>December 2025</strong>: Signed <a href="https://www.whitehouse.gov/presidential-actions/2025/12/eliminating-state-law-obstruction-of-national-artificial-intelligence-policy/">EO</a> seeking <strong>federal preemption of state AI laws</strong> — states with &quot;onerous AI laws&quot; lose federal broadband funding</li>
896896
<li>No comprehensive federal AI legislation has passed Congress</li>
897897
</ul>
898898
</div>
@@ -904,7 +904,7 @@ <h1 id="deepfakes-and-elections-the-2024-test">Deepfakes and elections: the 2024
904904
<ul>
905905
<li><a href="https://www.fcc.gov/document/fcc-issues-6m-fine-nh-robocalls"><strong>AI robocalls</strong></a> impersonated Biden urging NH voters not to vote (creator fined <strong>$6M</strong>, criminally indicted)</li>
906906
<li><a href="https://blogs.microsoft.com/on-the-issues/2024/10/23/as-the-u-s-election-nears-russia-iran-and-china-step-up-influence-efforts/"><strong>Storm-1516</strong></a> network created deepfake videos of candidates — one shared by Elon Musk</li>
907-
<li><strong>India</strong>: Celebrity deepfakes criticizing Modi went viral on WhatsApp</li>
907+
<li><strong>India</strong>: <a href="https://www.reuters.com/world/india/deepfakes-bollywood-stars-spark-worries-ai-meddling-india-election-2024-04-22/">Celebrity deepfakes criticizing Modi</a> went viral on WhatsApp</li>
908908
<li><strong>Germany</strong>: <a href="https://www.isdglobal.org/digital-dispatch/coordinated-disinformation-network-uses-ai-media-impersonation-to-target-german-election/">100+ AI-powered websites</a> distributing deepfakes ahead of elections</li>
909909
</ul>
910910
</div>
@@ -915,42 +915,47 @@ <h1 id="deepfakes-and-elections-the-2024-test">Deepfakes and elections: the 2024
915915
<p>The risk moved from individual deepfakes to <strong>AI-powered disinformation infrastructure</strong> — chatbots, automated accounts, and poisoned information ecosystems that operate continuously.</p>
916916
</div>
917917
</section>
918-
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="13" data-class="scale-78" data-theme="cdl-theme" lang="C" class="scale-78" style="--class:scale-78;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
918+
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="13" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">
919919
<h1 id="the-open-weight-debate">The open-weight debate</h1>
920920
<div class="note-box" data-title="Meta's evolving position">
921921
<p>Meta has been the loudest champion of open-weight AI models. But with Llama 4:</p>
922922
<table>
923923
<thead>
924924
<tr>
925925
<th>Model</th>
926-
<th>Parameters</th>
926+
<th>Total params</th>
927+
<th>Active params</th>
927928
<th>Status</th>
928929
</tr>
929930
</thead>
930931
<tbody>
931932
<tr>
932933
<td>Scout</td>
933-
<td>109B</td>
934+
<td>109B (16 experts)</td>
935+
<td>17B</td>
934936
<td>Released (open-weight)</td>
935937
</tr>
936938
<tr>
937939
<td>Maverick</td>
938-
<td>400B</td>
940+
<td>~400B (128 experts)</td>
941+
<td>17B</td>
939942
<td>Released (open-weight)</td>
940943
</tr>
941944
<tr>
942945
<td><strong>Behemoth</strong></td>
943-
<td><strong>~2 trillion</strong></td>
944-
<td><strong>May be withheld</strong> — citing &quot;novel safety concerns&quot;</td>
946+
<td><strong>~2T (16 experts)</strong></td>
947+
<td><strong>288B</strong></td>
948+
<td><strong>Delayed</strong><a href="https://fortune.com/2025/05/16/why-meta-reportedly-delayed-its-behemoth-ai-model-rollout/">performance concerns</a></td>
945949
</tr>
946950
</tbody>
947951
</table>
952+
<p>All three use mixture-of-experts (MoE) architecture (<a href="https://ai.meta.com/blog/llama-4-multimodal-intelligence/">Meta, 2025</a>).</p>
948953
</div>
949954
<div class="important-box" data-title="The capability threshold question">
950-
<p>Meta's potential decision to withhold Behemoth represents a critical acknowledgment: <strong>there may be a capability level above which open release is irresponsible</strong>. Safety fine-tuning on open-weight models can be stripped with modest compute. Once released, weights cannot be recalled.</p>
955+
<p>Behemoth's delay was driven by <a href="https://www.computerworld.com/article/3987990/meta-hits-pause-on-llama-4-behemoth-ai-model-amid-capability-concerns.html">benchmark performance falling short of internal targets</a>, not safety concerns. But the broader question remains: <strong>is there a capability level above which open release is irresponsible?</strong> Safety fine-tuning on open-weight models can be stripped with modest compute. Once released, weights cannot be recalled.</p>
951956
</div>
952957
<div class="tip-box" data-title="The tension">
953-
<p>Advocates of openness emphasize auditability, democratization, and preventing power concentration. Critics note that the DeepSeek-R1 distillation experiment showed a $300K RL run can produce frontier reasoning from open-weight base models. Is openness still net positive at the frontier?</p>
958+
<p>Advocates of openness emphasize auditability, democratization, and preventing power concentration. Critics note that the <a href="https://arxiv.org/abs/2501.12948">DeepSeek-R1</a> distillation experiment showed a $300K RL run can produce frontier reasoning from open-weight base models. Is openness still net positive at the frontier?</p>
954959
</div>
955960
</section>
956961
</foreignObject></svg><svg data-marpit-svg="" viewBox="0 0 1280 720"><foreignObject width="1280" height="720"><section id="14" data-class="scale-80" data-theme="cdl-theme" lang="C" class="scale-80" style="--class:scale-80;--theme:cdl-theme;" data-transition-back="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}" data-transition="{&quot;name&quot;:&quot;fade&quot;,&quot;duration&quot;:&quot;0.25s&quot;,&quot;builtinFallback&quot;:true}">

0 commit comments

Comments
 (0)