Skip to content

Commit f45273b

Browse files
committed
RSI
1 parent dcb2916 commit f45273b

File tree

9 files changed

+188
-768
lines changed

9 files changed

+188
-768
lines changed

_freeze/posts/2025-09-13-recursive-self-improvement-explosion-optimization/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/posts/2026-03-13-apple-picking-ai/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/index.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
<link>tecunningham.github.io/</link>
1010
<atom:link href="tecunningham.github.io/index.xml" rel="self" type="application/rss+xml"/>
1111
<description>{{&lt; meta description-meta &gt;}}</description>
12-
<generator>quarto-1.5.57</generator>
12+
<generator>quarto-1.8.25</generator>
1313
<lastBuildDate>Sat, 07 Mar 2026 08:00:00 GMT</lastBuildDate>
1414
<item>
1515
<title>When You Overtake More Runners than You’re Overtaken by</title>

docs/posts/2025-09-13-recursive-self-improvement-explosion-optimization.html

Lines changed: 95 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -257,51 +257,47 @@ <h1 class="title">Recursive Self-Improvement Overview</h1>
257257

258258

259259

260-
<p>See also: apple-picking model.</p>
261-
<section id="toy-model-of-ai-stack" class="level1">
262-
<h1>Toy model of AI stack</h1>
263-
<p>See also <code>posts/2026-02-10-model-of-labs.qmd</code></p>
264-
<dl>
265-
<dt>Model A: Chinchilla. You just choose model size.</dt>
266-
<dd>
260+
<section id="summary" class="level1">
261+
<h1>Summary</h1>
262+
<ol type="1">
263+
<li><p><strong>Baseline model:</strong></p>
267264
<ul>
268-
<li>One choice variable: model size (holding compute fixed, this trades-off against training runs).</li>
269-
<li>You can do some low-cost derisks, but extrapolating to a large compute scale is imperfect.</li>
270-
<li><em>Convex but expensive to evaluate.</em></li>
271-
</ul>
272-
</dd>
273-
<dt>Model B: algorithm search.</dt>
274-
<dd>
265+
<li>Frontier training compute is growing at 3X/year</li>
266+
<li>Algorithmic efficiency is growing at 3X/year</li>
267+
<li>Frontier model capability is growing at 9X/year (measured in effective compute)</li>
268+
<li>R&amp;D staff growing at 2X/year.</li>
269+
</ul></li>
270+
<li><p><strong>Two ways we can get RSI:</strong> (1) augmentation of AI R&amp;D; (2) automation of AI R&amp;D.</p></li>
271+
<li><p><strong>AI augmentation of R&amp;D:</strong></p>
275272
<ul>
276-
<li>Each algorithm is a mapping from compute-&gt;loss, and you can summarize it with a scalar efficiency (just like nanoGPT).</li>
277-
<li><em>medium-cost evaluation; non-convex.</em></li>
278-
</ul>
279-
</dd>
280-
<dt>Model C: training data.</dt>
281-
<dd>
273+
<li>Suppose higher levels of model capability <span class="math inline">\(A\)</span> increases the effective R&amp;D labor, <span class="math inline">\(R\)</span>, by some amount, <span class="math inline">\(R=\bar{R}A^\theta\)</span>.</li>
274+
<li>We want to measure how <span class="math inline">\(A\)</span> affects <span class="math inline">\(R\)</span>, e.g.&nbsp;by running an uplift study with different levels of <span class="math inline">\(A\)</span>.</li>
275+
<li>We can then plug this into a standard R&amp;D model to estimate the net degree of acceleration, with <span class="math inline">\(\dot{A}=R^\gamma A^{1-\beta}\)</span>.</li>
276+
<li>However it’s really hard to estimate the relationship between model capability and R&amp;D worker speedup. People are poor at introspecting uplift, and it’s difficult to run experiments.</li>
277+
<li>Notes:
282278
<ul>
283-
<li>There’s a distribution of tasks, with probability f(t).</li>
284-
<li>An LLM is just a lookup table, you slowly expand coverage, collecting a larger fraction of tasks t.</li>
285-
</ul>
286-
</dd>
287-
<dt>Model D: layer cake.</dt>
288-
<dd>
279+
<li>The augmentation could take place through automation of subtasks, where the subtasks are complements. E.g. if they’re perfect complements then the acceleration will be <span class="math inline">\(\frac{1}{1-f}\)</span>, where <span class="math inline">\(f\)</span> is the share of tasks you’ve automated.</li>
280+
<li>Any <span class="math inline">\(\theta&gt;0\)</span> will make growth super-exponential, AKA hyperbolic, but the magnitude matters.</li>
281+
<li>If <span class="math inline">\(\theta\)</span> is sufficiently high (<span class="math inline">\(\theta&gt;\gamma-\beta\)</span>) then there will be self-sustaining growth, i.e.&nbsp;growth even if R&amp;D labor was constant.</li>
282+
<li>We ignore expenditure on inference, assume that it asymptotes relatively quickly (seems reasonable).</li>
283+
</ul></li>
284+
</ul></li>
285+
<li><p><strong>AI automation of R&amp;D:</strong></p>
289286
<ul>
290-
<li><p>You have a function that turns money-&gt;intelligence, &amp; that function gets more efficient over time, it’s falling at about 8X/year.</p></li>
291-
<li><p>You can separate the optimization problem into layers, they are partially separable:</p>
287+
<li>Suppose instead that model capability <span class="math inline">\(A\)</span> can <em>autonomously</em> do AI R&amp;D work, i.e.&nbsp;it’s a perfect substitute for all human labor.</li>
288+
<li>Then we should expect a rapid adjustment in wages and model prices until they’re in equilibrium.</li>
289+
</ul>
292290
<ol type="1">
293-
<li>GPU design</li>
294-
<li>Model architecture</li>
295-
<li>Pretraining optimizer</li>
296-
<li>Pretraining data</li>
297-
<li>Posttraining algorithm (RL)</li>
298-
<li>Posttraining data</li>
299-
<li>Elicitation/harness</li>
291+
<li>Augmentation: it makes each R&amp;D worker more productive.</li>
292+
<li>Automation:</li>
300293
</ol></li>
301-
<li><p>The cost of an experiment is different for different parts. Some parts scale additively (GPU design), other parts its hard to predict effect of scale (architecture). If they scale additively then the cost of experimentation is small (but expenditure could still be large).</p></li>
302-
</ul>
303-
</dd>
304-
</dl>
294+
<li><p><strong>Complications:</strong></p>
295+
<ul>
296+
<li><strong>Measuring capability.</strong></li>
297+
<li><strong>Scale-dependent algorithmic progress.</strong> <span class="citation" data-cites="gundlach2025algorithmicprogressai">Gundlach et al. (<a href="#ref-gundlach2025algorithmicprogressai" role="doc-biblioref">2025</a>)</span> argue that algorithmic progress has contributed much less.</li>
298+
<li><strong>Data contribution.</strong> Berren Millidge argues <a href="https://www.beren.io/2025-08-02-Most-Algorithmic-Progress-is-Data-Progress/">“Most Algorithmic Progress is Data Progress”</a></li>
299+
</ul></li>
300+
</ol>
305301
<div class="columns" style="height: 100vh; gap: 2rem;">
306302
<div class="column">
307303
<div class="cell">
@@ -317,7 +313,7 @@ <h1>Toy model of AI stack</h1>
317313
<p>I think this picture gives a reasonable summary of the orthodox picture of LLM progress:</p>
318314
<ul>
319315
<li>Technology is getting 3X more efficient each year (blue curves shifting left)</li>
320-
<li>Frontier training compute is increasing around 4-5X/year (black points shifting right), though spend/cost growth is slower</li>
316+
<li>Frontier training compute is increasing at 3X/year (black points shifting right)</li>
321317
</ul>
322318
<p>Some notes:</p>
323319
<ol type="1">
@@ -871,18 +867,58 @@ <h2 class="anchored" data-anchor-id="estimates-of-llm-algorithmic-progress">Esti
871867
</section>
872868
<section id="literature-review-on-rsi-models" class="level1">
873869
<h1>Literature Review on RSI Models</h1>
874-
<p>Basic setup:</p>
870+
<p>We want to answer these questions:</p>
871+
<ol type="1">
872+
<li>Based on history of AI R&amp;D, what’s a reasonable estimate of how AI R&amp;D will change the future trajectory?</li>
873+
<li>What is the best capability metric for estimating AI R&amp;D contribution?</li>
874+
</ol>
875+
<p>Basic argument:</p>
875876
<ol type="1">
876-
<li>Algorithmic progress has been proceeding at around 5X/year.</li>
877+
<li>Say algorithmic efficiency has been growing at 5X/year, so we can get the same intelligence for 5X lower cost.</li>
878+
<li>We are starting to see glimmers of AI contributing to R&amp;D, &amp; want to predict how this will change the system.</li>
879+
<li>Some alternatives in how to model this:
880+
<ul>
881+
<li><p><strong>Accelerant.</strong> Suppose a certain level of AI can speed up a human researcher by a certain proportion, <span class="math inline">\(\lambda\)</span>. This feedback loop should accelerate AI development, but it’s sensitive to how strong this connection is.</p></li>
882+
<li><p><strong>Perfect substitutes.</strong> Suppose AI can do entirely autonomous research, i.e.&nbsp;it’s a perfect substitute for a human. Then we are only constrained on the cost of AI compute. Unless the cost is very high, this implies a massive acceleration in AI progress, constrained only by (a) experiment bottlenecks; (b) a ceiling to AI progress.</p></li>
883+
<li><p><strong>Imperfect substitutes.</strong></p></li>
884+
</ul></li>
877885
<li>Endogenous growth setup: (1) diminishing returns to R&amp;D about 1/2; (2) increasing returns to knowledge about 1/2; these balance and get balanced growth.</li>
878886
<li>Now we add link back from algorithms to R&amp;D, but very little guidance on how to calibrate it.</li>
879887
</ol>
888+
<p>Capability metrics:</p>
889+
<ol type="1">
890+
<li>Human researcher uplift.</li>
891+
<li>Ability to push forward the frontier.</li>
892+
</ol>
893+
<p>Wrinkles:</p>
894+
<ol type="1">
895+
<li>Estimating algorithmic progress.</li>
896+
<li>Scale-dependent algorithmic progress.</li>
897+
<li>Bottlenecks on experiment compute.</li>
898+
<li>Shape of the landscape.</li>
899+
<li>Adding capital.</li>
900+
</ol>
880901
<p>Ajeya’s questions:</p>
881902
<ol type="1">
882-
<li>When will we have automated AI R&amp;D?</li>
903+
<li>When will we have automated AI R&amp;D?
904+
<ul>
905+
<li>Answer: we already have automated R&amp;D.</li>
906+
</ul></li>
883907
<li>When we have automated AI R&amp;D, will this trigger super-exponential growth?</li>
884908
</ol>
885-
<hr>
909+
<p>Core model:</p>
910+
<p><span class="math display">\[\begin{aligned}
911+
\utt{\dot{A}}{algorithmic}{progress}
912+
&amp;=\utt{R^{\gamma}}{new}{research}
913+
{\utt{A^{1-\beta}}{algorithms}{}}
914+
&amp;&amp; \text{(progress)}\\
915+
R&amp;=A^{\kappa}
916+
&amp;&amp; \text{}\\
917+
\dot{A}&amp;=A^{1-\beta+\kappa}\\
918+
\end{aligned}
919+
\]</span></p>
920+
<section id="section" class="level2">
921+
<h2 class="anchored" data-anchor-id="section"></h2>
886922
<table class="caption-top table">
887923
<colgroup>
888924
<col style="width: 38%">
@@ -896,7 +932,7 @@ <h1>Literature Review on RSI Models</h1>
896932
</thead>
897933
<tbody>
898934
<tr class="odd">
899-
<td><span class="citation" data-cites="jones1995rd">C. I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span> “R&amp;D-Based Models of Economic Growth.”</td>
935+
<td><span class="citation" data-cites="jones1995rd">Charles I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span> “R&amp;D-Based Models of Economic Growth.”</td>
900936
<td>Standard endogenous growth model: (1) diminishing returns to R&amp;D; (2) positive spillovers from knowledge.</td>
901937
</tr>
902938
<tr class="even">
@@ -983,23 +1019,26 @@ <h1>Literature Review on RSI Models</h1>
9831019
</tr>
9841020
</tbody>
9851021
</table>
1022+
</section>
9861023
<section id="jones1995rd-rd-based-models-of-economic-growth" class="level2">
987-
<h2 class="anchored" data-anchor-id="jones1995rd-rd-based-models-of-economic-growth"><span class="citation" data-cites="jones1995rd">C. I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span> “R&amp;D-Based Models of Economic Growth”</h2>
988-
<p>Basic model with accumulating ideas: <span class="math display">\[\begin{gathered}\dot{A}=R^{\gamma}\\
989-
\xymatrix{*++[F]{R\&amp;D} \ar[r] &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}}
1024+
<h2 class="anchored" data-anchor-id="jones1995rd-rd-based-models-of-economic-growth"><span class="citation" data-cites="jones1995rd">Charles I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span> “R&amp;D-Based Models of Economic Growth”</h2>
1025+
<p>Best reference is the review article in <span class="citation" data-cites="jones2022semiendogenous">Charles I. Jones (<a href="#ref-jones2022semiendogenous" role="doc-biblioref">2022</a>)</span>.</p>
1026+
<p>Basic model with diminishing returns to R&amp;D: <span class="math display">\[\begin{gathered}\dot{A}=R^{\gamma}\\
1027+
\xymatrix{*++[F]{R\&amp;D} \ar[r]|(0.4)\gamma &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}}
9901028
\end{gathered}
9911029
\]</span></p>
992-
<p>Model with shoulders/armpits of giants (Jones model):<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> <span class="math display">\[\begin{gathered}
1030+
<p>Now we add positive spillovers (“shoulders of giants”). If <span class="math inline">\(R\)</span> is constant then you’ll get declining growth rate. If <span class="math inline">\(R\)</span> is growing at <span class="math inline">\(g_R\)</span> then you’ll get <span class="math inline">\(g_A=\frac{\alpha}{\beta}g_R\)</span>.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> <span class="math display">\[\begin{gathered}
9931031
\dot{A}=R^{\gamma}A^{1-\beta}\\
994-
\xymatrix{*++[F]{R\&amp;D} \ar[r] &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}\ar@/^2em/[l]}
1032+
\xymatrix{*++[F]{R\&amp;D} \ar[r]|(0.4)\gamma &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}\ar@/^2em/[l]|{1-\beta}}
9951033
\end{gathered}
9961034
\]</span></p>
997-
<p>Recursive self-improvement, where knowledge actually helps R&amp;D: <span class="math display">\[\begin{gathered}
1035+
<p>Recursive self-improvement, where knowledge directly helps R&amp;D: <span class="math display">\[\begin{gathered}
9981036
R=A^{\kappa}\\
9991037
\dot{A}=A^{1-\beta+\kappa}\\
1000-
\xymatrix{*++[F]{R\&amp;D} \ar[r] &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}\ar@/^2em/[l]\ar@/^3em/[ll]}
1038+
\xymatrix{*++[F]{R\&amp;D} \ar[r]|(0.4)\gamma &amp; *++[F]{\Delta knowledge}\ar[r] &amp; *++[F]{knowledge}\ar@/^2em/[l]|{1-\beta}\ar@/^4em/[ll]|\kappa}
10011039
\end{gathered}
10021040
\]</span></p>
1041+
<p>Here you will get constant exponential growth if and only if <span class="math inline">\(\beta=\kappa\)</span>.</p>
10031042
<dl>
10041043
<dt>Extrapolating from these models.</dt>
10051044
<dd>
@@ -1329,6 +1368,9 @@ <h2 class="anchored" data-anchor-id="david-rein-notes">David Rein notes</h2>
13291368
<div id="ref-jones1995rd" class="csl-entry" role="listitem">
13301369
Jones, Charles I. 1995. <span>“R&amp;d-Based Models of Economic Growth.”</span> <em>Journal of Political Economy</em> 103 (4): 759–84. https://doi.org/<a href="https://doi.org/10.1086/262002">https://doi.org/10.1086/262002</a>.
13311370
</div>
1371+
<div id="ref-jones2022semiendogenous" class="csl-entry" role="listitem">
1372+
Jones, Charles I. 2022. <span>“The Past and Future of Economic Growth: A Semi-Endogenous Perspective.”</span> <em>Annual Review of Economics</em> 14 (1): 125–52. <a href="https://doi.org/10.1146/annurev-economics-080321-033317">https://doi.org/10.1146/annurev-economics-080321-033317</a>.
1373+
</div>
13321374
<div id="ref-kellerjordan2026moddednanogpt" class="csl-entry" role="listitem">
13331375
Jordan, Keller, and contributors. 2026. <span>“Modded-Nanogpt.”</span> <a href="https://github.com/KellerJordan/modded-nanogpt">https://github.com/KellerJordan/modded-nanogpt</a>.
13341376
</div>
@@ -1356,7 +1398,7 @@ <h2 class="anchored" data-anchor-id="david-rein-notes">David Rein notes</h2>
13561398
</div></section><section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>
13571399

13581400
<ol>
1359-
<li id="fn1"><p>This was introduced by <span class="citation" data-cites="jones1995rd">C. I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span>, where there are diminishing returns to knowledge, whereas Romer (1990) had assumed no diminishing returns to knowledge, <span class="math inline">\(\beta=0\)</span>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
1401+
<li id="fn1"><p>This was introduced by <span class="citation" data-cites="jones1995rd">Charles I. Jones (<a href="#ref-jones1995rd" role="doc-biblioref">1995</a>)</span>, where there are diminishing returns to knowledge, whereas Romer (1990) had assumed no diminishing returns to knowledge, <span class="math inline">\(\beta=0\)</span>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
13601402
</ol>
13611403
</section></div></main> <!-- /main -->
13621404
<script id="quarto-html-after-body" type="application/javascript">

docs/posts/2026-03-13-apple-picking-ai.html

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -223,12 +223,20 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
223223
</div>
224224

225225
<div class="quarto-title-meta-author">
226-
<div class="quarto-title-meta-heading">Author</div>
226+
<div class="quarto-title-meta-heading">Authors</div>
227227
<div class="quarto-title-meta-heading">Affiliation</div>
228228

229229
<div class="quarto-title-meta-contents">
230230
<p class="author">Tom Cunningham </p>
231231
</div>
232+
<div class="quarto-title-meta-contents">
233+
<p class="affiliation">
234+
METR
235+
</p>
236+
</div>
237+
<div class="quarto-title-meta-contents">
238+
<p class="author">Manish Shetty </p>
239+
</div>
232240
<div class="quarto-title-meta-contents">
233241
<p class="affiliation">
234242
METR
@@ -270,7 +278,7 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
270278
<dt>A simple model for AI R&amp;D.</dt>
271279
<dd>
272280
<p>There are signs that AI agents are able to make genuine contributions to to AI R&amp;D, how should we interpret this? Is recursive self-improvement imminent?</p>
273-
<p>Here I describe a simple “apple-picking” model of optimization problems in which AI agents can make genuine autonomous contributions, without being able to replace human researchers. I’m not confident that the model is correct, maybe we are on the verge of human takeover, but I find this model helpful for thinking about what evidence we would need.</p>
281+
<p>Here I describe a simple “apple-picking” model of optimization problems in which AI agents can make genuine autonomous contributions, without being able to replace human researchers. I’m not confident that the model is correct, maybe we are indeed on the verge of AI takeover, but I find the model helpful for thinking about what evidence we would need.</p>
274282
<p>An agent helping you to optimize an algorithm is like a robot helping you pick apples. It will take care of all the apples up to a certain height, and it may find apples you haven’t found yet, but there will still be apples out of its reach.</p>
275283
<p>Below I give a formal model but the basic ideas can all be seen in the drawing below. Here both the human and robot have picked four apples, but they’ve left the tree in a very different state, so the robot isn’t ready to replace the human yet:</p>
276284
<div class="cell" data-layout-align="center">

posts/2025-09-13-recursive-self-improvement-explosion-optimization-offcuts.qmd

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ execute:
1818

1919
==OFFCUTS FILE==
2020

21+
2122
# Summary
2223

2324

@@ -70,6 +71,38 @@ AK model of recursive growth.
7071
The critical question is whether the returns to R&D effort are sufficiently steep. Some classic papers: Bloom et al. "are we running out of ideas"; and Erdil.
7172

7273

74+
# Models
75+
76+
See also `posts/2026-02-10-model-of-labs.qmd`
77+
78+
Model A: Chinchilla. You just choose model size.
79+
:
80+
- One choice variable: model size (holding compute fixed, this trades-off against training runs).
81+
- You can do some low-cost derisks, but extrapolating to a large compute scale is imperfect.
82+
- *Convex but expensive to evaluate.*
83+
84+
Model B: algorithm search.
85+
: - Each algorithm is a mapping from compute->loss, and you can summarize it with a scalar efficiency (just like nanoGPT).
86+
- *medium-cost evaluation; non-convex.*
87+
88+
Model C: training data.
89+
: - There's a distribution of tasks, with probability f(t).
90+
- An LLM is just a lookup table, you slowly expand coverage, collecting a larger fraction of tasks t.
91+
92+
Model D: layer cake.
93+
: - You have a function that turns money->intelligence, & that function gets more efficient over time, it's falling at about 8X/year.
94+
95+
- You can separate the optimization problem into layers, they are partially separable:
96+
1. GPU design
97+
2. Model architecture
98+
3. Pretraining optimizer
99+
4. Pretraining data
100+
5. Posttraining algorithm (RL)
101+
6. Posttraining data
102+
7. Elicitation/harness
103+
104+
- The cost of an experiment is different for different parts. Some parts scale additively (GPU design), other parts its hard to predict effect of scale (architecture). If they scale additively then the cost of experimentation is small (but expenditure could still be large).
105+
73106
# Trainee:Trainer Curves
74107

75108
Q: what evals would be useful in learning about RSI? Suppose we can plot the following:

0 commit comments

Comments
 (0)