Skip to content

Commit b3bd2b8

Browse files
committed
apple picking
1 parent 8b1f9c4 commit b3bd2b8

File tree

6 files changed

+16
-10
lines changed

6 files changed

+16
-10
lines changed

_freeze/posts/2026-03-13-apple-picking-ai/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.
Binary file not shown.
-35 Bytes
Loading

docs/posts/2026-03-13-apple-picking-ai.html

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
290290
</div>
291291
</div>
292292
</div>
293-
<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&amp;D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents’ optimization ability.</p>
293+
<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&amp;D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. The idea is not original, it already exists in cliche form (“low hanging fruit”), but I still found it useful to formalize it.</p>
294294
</dd>
295295
</dl>
296296
<table class="caption-top table">
@@ -347,9 +347,11 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
347347
</div>
348348
</div>
349349
<p>Humans have been painstakingly pushing those blue curves to the left. We now are seeing clear signs of agents joining the effort and contributing to that leftward movement, and we want to know what to expect.</p>
350-
<p>The apple-picking model gives us some broad implications in this model:</p>
350+
<p>The apple-picking model gives us some broad implications:</p>
351351
<ol type="1">
352-
<li>Observing an agent improve the frontier does not imply .</li>
352+
<li>We should benchmark agent AI R&amp;D ability against the <em>frontier</em>, which represent cumulative human effort, rather than against toy problems and the effort of a single human.</li>
353+
<li>Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.</li>
354+
<li>The effects will show up in parts of the AI stack that have low-hanging fruit, e.g.&nbsp;where there’s a lot of fairly natural ideas to try but we are bottlenecked on execution.</li>
353355
</ol>
354356
</dd>
355357
</dl>
@@ -379,6 +381,7 @@ <h1>Discussion</h1>
379381
<li><p><em>Shape of the tree.</em> You can extend the model such that apples are non-uniformly distributed, then we can replace <span class="math inline">\(\lambda\)</span> with <span class="math inline">\(F(\lambda)\)</span> below. We can then talk about types of domain which are bottom-heavy (most optimizations are pretty easy to find) vs top-heavy (most optimizations are hard to find). It then becomes important to know whether AI R&amp;D is relatively more bottom-heavy or top-heavy, if the former then we might already be on the brink of an intelligence explosion.</p></li>
380382
<li><p><em>Directed search.</em> We assumed that the probability of finding an apple is independent of other apples already found. Realistically people have an ability to put direct their attention to finding new innovations. This implies lower diminishing returns to expenditure, and higher complementarity between agents and humans, I’m not sure whether it would change the qualitative conclusions of the model.</p></li>
381383
<li><p><em>Bottlenecks.</em> Some R&amp;D is bottlenecked not just by thinking (which agents can do), but also by running experiments.</p></li>
384+
<li><p><em>Acceleration.</em> This model is missing the effect of acceleration through explicit cooperation – e.g.&nbsp;a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.</p></li>
382385
<li><p><em>Sketch of a quantitative model of LLM training.</em> LLM training is a big stack of algorithms, which we’ve been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.</p></li>
383386
</ul>
384387
</dd>
-35 Bytes
Loading

posts/2026-03-13-apple-picking-ai.qmd

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ A simple model for AI R&D.
171171
```
172172

173173

174-
The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents' optimization ability.
174+
The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. The idea is not original, it already exists in cliche form ("low hanging fruit"), but I still found it useful to formalize it.
175175

176176

177177
|Robots picking apples|Agents finding optimizations|
@@ -201,8 +201,8 @@ Some implications for AI R&D.
201201
\draw (0,0) -- (1,0) node[midway,below] {ln(training expenditure)}
202202
-- (1,1) -- (0,1) -- (0,0) node[midway,above,rotate=90] {model intelligence};
203203
204-
\draw[blue,->] (0.17,0.1)--(0.13,0.1);
205-
\draw[blue,->] (0.27,0.1)--(0.23,0.1);
204+
\draw[blue,->] (0.15,0.1)--(0.11,0.1);
205+
\draw[blue,->] (0.25,0.1)--(0.2,0.1);
206206

207207
\draw[thick, blue, dotted] (0,0) plot[domain=0.2:1, samples=100] (\x, {0.75*(1-exp(-2*(\x-0.2)))}) node[anchor=north west]{2023 technology};
208208
\draw[thick, blue, dotted] (0,0) plot[domain=0.1:1, samples=100] (\x, {0.75*(1-exp(-2*(\x-0.1)))}) node[right]{2024 technology};
@@ -218,8 +218,9 @@ Some implications for AI R&D.
218218

219219
The apple-picking model gives us some broad implications:
220220

221-
1. Observing an agent meaningfully improve the frontier (push the curve left) does not imply that agents can replace humans.
222-
2.
221+
1. We should benchmark agent AI R&D ability against the *frontier*, which represent cumulative human effort, rather than against toy problems and the effort of a single human.
222+
2. Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.
223+
3. The effects will show up in parts of the AI stack that have low-hanging fruit, e.g. where there's a lot of fairly natural ideas to try but we are bottlenecked on execution.
223224

224225

225226
# Discussion
@@ -253,6 +254,8 @@ Things to add.
253254

254255
- *Bottlenecks.* Some R&D is bottlenecked not just by thinking (which agents can do), but also by running experiments.
255256

257+
- *Acceleration.* This model is missing the effect of acceleration through explicit cooperation -- e.g. a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.
258+
256259
- *Sketch of a quantitative model of LLM training.* LLM training is a big stack of algorithms, which we've been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.
257260

258261

0 commit comments

Comments
 (0)