You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/posts/2026-03-13-apple-picking-ai.html
+6-3Lines changed: 6 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -290,7 +290,7 @@ <h1 class="title">An Apple-Picking Model of AI R&D</h1>
290
290
</div>
291
291
</div>
292
292
</div>
293
-
<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents’ optimization ability.</p>
293
+
<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. The idea is not original, it already exists in cliche form (“low hanging fruit”), but I still found it useful to formalize it.</p>
294
294
</dd>
295
295
</dl>
296
296
<tableclass="caption-top table">
@@ -347,9 +347,11 @@ <h1 class="title">An Apple-Picking Model of AI R&D</h1>
347
347
</div>
348
348
</div>
349
349
<p>Humans have been painstakingly pushing those blue curves to the left. We now are seeing clear signs of agents joining the effort and contributing to that leftward movement, and we want to know what to expect.</p>
350
-
<p>The apple-picking model gives us some broad implications in this model:</p>
350
+
<p>The apple-picking model gives us some broad implications:</p>
351
351
<oltype="1">
352
-
<li>Observing an agent improve the frontier does not imply .</li>
352
+
<li>We should benchmark agent AI R&D ability against the <em>frontier</em>, which represent cumulative human effort, rather than against toy problems and the effort of a single human.</li>
353
+
<li>Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.</li>
354
+
<li>The effects will show up in parts of the AI stack that have low-hanging fruit, e.g. where there’s a lot of fairly natural ideas to try but we are bottlenecked on execution.</li>
353
355
</ol>
354
356
</dd>
355
357
</dl>
@@ -379,6 +381,7 @@ <h1>Discussion</h1>
379
381
<li><p><em>Shape of the tree.</em> You can extend the model such that apples are non-uniformly distributed, then we can replace <spanclass="math inline">\(\lambda\)</span> with <spanclass="math inline">\(F(\lambda)\)</span> below. We can then talk about types of domain which are bottom-heavy (most optimizations are pretty easy to find) vs top-heavy (most optimizations are hard to find). It then becomes important to know whether AI R&D is relatively more bottom-heavy or top-heavy, if the former then we might already be on the brink of an intelligence explosion.</p></li>
380
382
<li><p><em>Directed search.</em> We assumed that the probability of finding an apple is independent of other apples already found. Realistically people have an ability to put direct their attention to finding new innovations. This implies lower diminishing returns to expenditure, and higher complementarity between agents and humans, I’m not sure whether it would change the qualitative conclusions of the model.</p></li>
381
383
<li><p><em>Bottlenecks.</em> Some R&D is bottlenecked not just by thinking (which agents can do), but also by running experiments.</p></li>
384
+
<li><p><em>Acceleration.</em> This model is missing the effect of acceleration through explicit cooperation – e.g. a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.</p></li>
382
385
<li><p><em>Sketch of a quantitative model of LLM training.</em> LLM training is a big stack of algorithms, which we’ve been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.</p></li>
Copy file name to clipboardExpand all lines: posts/2026-03-13-apple-picking-ai.qmd
+8-5Lines changed: 8 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -171,7 +171,7 @@ A simple model for AI R&D.
171
171
```
172
172
173
173
174
-
The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents' optimization ability.
174
+
The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. The idea is not original, it already exists in cliche form ("low hanging fruit"), but I still found it useful to formalize it.
The apple-picking model gives us some broad implications:
220
220
221
-
1. Observing an agent meaningfully improve the frontier (push the curve left) does not imply that agents can replace humans.
222
-
2.
221
+
1. We should benchmark agent AI R&D ability against the *frontier*, which represent cumulative human effort, rather than against toy problems and the effort of a single human.
222
+
2. Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.
223
+
3. The effects will show up in parts of the AI stack that have low-hanging fruit, e.g. where there's a lot of fairly natural ideas to try but we are bottlenecked on execution.
223
224
224
225
225
226
# Discussion
@@ -253,6 +254,8 @@ Things to add.
253
254
254
255
- *Bottlenecks.* Some R&D is bottlenecked not just by thinking (which agents can do), but also by running experiments.
255
256
257
+
- *Acceleration.* This model is missing the effect of acceleration through explicit cooperation -- e.g. a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.
258
+
256
259
- *Sketch of a quantitative model of LLM training.* LLM training is a big stack of algorithms, which we've been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.
0 commit comments