You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>Thanks to Nate Rush, Manish Shetty, Basil Halperin for helpful comments.</p>
275
+
<p>Thanks to Nate Rush, Manish Shetty, Basil Halperin, & Parker Whitfill for helpful comments.</p>
276
276
</div></div><dl>
277
277
<dt>An apple-picking model of AI work.</dt>
278
278
<dd>
279
279
<p>Here’s a simple model useful for thinking about AI’s contribution to solving problems.</p>
280
280
<p>In short: an agent helping you with programming is like a robot helping you pick apples. It will take care of all the apples up to a certain height, and find apples you haven’t found, but there will still be apples out of its reach.</p>
<p>The specific motivation was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can pay $10 to increase the efficiency of an algorithm by 1% then, on its surface, this look like the path to self-improvement, and you can replace humans with AI agents. But realistically the 1% improvement is a <em>shallow</em> improvement, and the apple tree model tries to distinguish between shallow and deep improvements.</p>
282
+
<p>The specific motivation was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can pay $10 to increase the efficiency of an algorithm by 1% then, on its surface, this look like the path to self-improvement, and you can replace humans with AI agents. But realistically the 1% improvement is a <em>shallow</em> improvement, and the apple-picking model tries to distinguish between shallow and deep improvements.</p>
283
283
</dd>
284
284
<dt>Implications of the apple-picking model.</dt>
285
285
<dd>
@@ -352,11 +352,12 @@ <h1>Basic Model</h1>
352
352
</div>
353
353
</div>
354
354
</dd>
355
-
<dt>Implication: agent value and asymptote depends on starting point.</dt>
355
+
<dt>Implication: agent asymptote depends on the starting point.</dt>
356
356
<dd>
357
-
<p>The plot below shows agent trajectories, each starting after a different amount of human work.</p>
358
-
<p>If you start an agent from scratch, it will have high marginal value, but a low asymptote.</p>
359
-
<p>If you start an agent after some human optimization, it will have lower marginal value, but it will be able to achieve a higher asymptote (below, starting after 1 unit of human search raises the agent asymptote from 0.5 to 0.7).</p>
357
+
<p>The plot below shows a variety of agent trajectories, each starting after a different amount of human work.</p>
358
+
<p>You could interpret this as starting an agent at different points in the history of optimizing some algorithm, e.g. nanoGPT.</p>
359
+
<p>The model implies that if you start an agent from the original unoptimized version of an algorithm it will quickly make high gains, but asymptote to a value well below the human state-of-the-art.</p>
360
+
<p>If you start an agent after a small amount of human optimization, it will have smaller initial value (some of the apples have already been picked), but it will be able to achieve a higher asymptote.</p>
<dt>Now let the robot’s height depend on apples harvested.</dt>
377
+
<dd>
378
+
<p>The previous model applied to agents working on an arbitrary problem. Now we focus on agents working on AI R&D. We make two changes:</p>
374
379
<oltype="1">
375
-
<li>Apples sit at heights in <spanclass="math inline">\([0,\infty)\)</span>; human reach is normalized to 1. The agent has reach <spanclass="math inline">\(\lambda_t \ge 0\)</span> and picks everything below it.</li>
376
-
<li>Humans pick in the band <spanclass="math inline">\((\lambda_t, 1]\)</span> at a rate governed by <spanclass="math inline">\(p\)</span>.</li>
377
-
<li>Agent reach depends on cumualtive apples harvested: they can only pick apples after some minimum threshold (<spanclass="math inline">\(\bar{a}\)</span>), and then linear in <spanclass="math inline">\(a\)</span>.</li>
380
+
<li>We assume that the agent’s ability (<spanclass="math inline">\(\lambda\)</span>) is itself a function of AI R&D progress (the robot is eating the apples and getting taller). It turns out that we can get a simple closed-form solution when this function is linear. To add a touch of realism we assume that agents have no meaningful ability until algorithmic progress passes some minimum threshold (<spanclass="math inline">\(\bar{a}\)</span>).</li>
381
+
<li>We assume that agents pick <em>all</em> the apples available to them each period. This makes things easier to model (the state of the tree can be summarized with just two variables, <spanclass="math inline">\(\lambda\)</span> and <spanclass="math inline">\(a\)</span>), but it also seems a reasonable assumption: AI research labs will keep spending money on agent-optimizing their algorithms until they hit low returns.</li>
378
382
</ol>
379
-
<p><strong>Implications:</strong></p>
383
+
</dd>
384
+
<dt>Implications:</dt>
385
+
<dd>
380
386
<oltype="1">
381
387
<li>Agents will get taller than humans iff <spanclass="math inline">\(\alpha + \beta(1-\bar{a}) > 1\)</span></li>
382
388
<li>Agent height will be explosive iff <spanclass="math inline">\(\beta >1\)</span>, i.e. if eating all the apples in a 1-cm slice of tree causes you to grow 1cm higher. If not then you converge to a finite height <spanclass="math inline">\(\lambda^*\)</span>.</li>
Thanks to Nate Rush, Manish Shetty, Basil Halperin for helpful comments.
31
+
Thanks to Nate Rush, Manish Shetty, Basil Halperin, & Parker Whitfill for helpful comments.
32
32
:::
33
33
34
34
@@ -40,7 +40,7 @@ An apple-picking model of AI work.
40
40
41
41

42
42
43
-
The specific motivation was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can pay $10 to increase the efficiency of an algorithm by 1% then, on its surface, this look like the path to self-improvement, and you can replace humans with AI agents. But realistically the 1% improvement is a *shallow* improvement, and the apple tree model tries to distinguish between shallow and deep improvements.
43
+
The specific motivation was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can pay $10 to increase the efficiency of an algorithm by 1% then, on its surface, this look like the path to self-improvement, and you can replace humans with AI agents. But realistically the 1% improvement is a *shallow* improvement, and the apple-picking model tries to distinguish between shallow and deep improvements.
44
44
45
45
Implications of the apple-picking model.
46
46
:
@@ -164,18 +164,20 @@ Implication: agents can improve on human SoTA, but only by a limited amount.
164
164
```
165
165
166
166
167
-
Implication: agent value and asymptote depends on starting point.
167
+
Implication: agent asymptote depends on the starting point.
168
168
:
169
-
The plot below shows agent trajectories, each starting after a different amount of human work.
169
+
The plot below shows a variety of agent trajectories, each starting after a different amount of human work.
170
+
171
+
You could interpret this as starting an agent at different points in the history of optimizing some algorithm, e.g. nanoGPT.
170
172
171
-
If you start an agent from scratch, it will have high marginal value, but a low asymptote.
173
+
The model implies that if you start an agent from the original unoptimized version of an algorithm it will quickly make high gains, but asymptote to a value well below the human state-of-the-art.
172
174
173
-
If you start an agent after some human optimization, it will have lower marginal value, but it will be able to achieve a higher asymptote (below, starting after 1 unit of human search raises the agent asymptote from 0.5 to 0.7).
175
+
If you start an agent after a small amount of human optimization, it will have smaller initial value (some of the apples have already been picked), but it will be able to achieve a higher asymptote.
Now let the robot's height depend on apples harvested.
207
+
:
208
+
The previous model applied to agents working on an arbitrary problem. Now we focus on agents working on AI R&D. We make two changes:
205
209
206
-
1. Apples sit at heights in $[0,\infty)$; human reach is normalized to 1. The agent has reach $\lambda_t \ge 0$ and picks everything below it.
207
-
2. Humans pick in the band $(\lambda_t, 1]$ at a rate governed by $p$.
208
-
3. Agent reach depends on cumualtive apples harvested: they can only pick apples after some minimum threshold ($\bar{a}$), and then linear in $a$.
210
+
1. We assume that the agent's ability ($\lambda$) is itself a function of AI R&D progress (the robot is eating the apples and getting taller). It turns out that we can get a simple closed-form solution when this function is linear. To add a touch of realism we assume that agents have no meaningful ability until algorithmic progress passes some minimum threshold ($\bar{a}$).
211
+
2. We assume that agents pick *all* the apples available to them each period. This makes things easier to model (the state of the tree can be summarized with just two variables, $\lambda$ and $a$), but it also seems a reasonable assumption: AI research labs will keep spending money on agent-optimizing their algorithms until they hit low returns.
212
+
213
+
Implications:
214
+
:
215
+
1. Agents will get taller than humans iff $\alpha + \beta(1-\bar{a}) > 1$
216
+
2. Agent height will be explosive iff $\beta >1$, i.e. if eating all the apples in a 1-cm slice of tree causes you to grow 1cm higher. If not then you converge to a finite height $\lambda^*$.
209
217
210
-
**Implications:**
211
218
212
-
1. Agents will get taller than humans iff $\alpha + \beta(1-\bar{a}) > 1$
213
-
2. Agent height will be explosive iff $\beta >1$, i.e. if eating all the apples in a 1-cm slice of tree causes you to grow 1cm higher. If not then you converge to a finite height $\lambda^*$.
219
+
<!-- 1. Apples sit at heights in $[0,\infty)$; human reach is normalized to 1. The agent has reach $\lambda_t \ge 0$ and picks everything below it.
220
+
2. Humans pick in the band $(\lambda_t, 1]$ at a rate governed by $p$.
221
+
3. Agent reach depends on cumualtive apples harvested: they can only pick apples after some minimum threshold ($\bar{a}$), and then linear in $a$. -->
0 commit comments