tecunningham
diff --git a/‎_freeze/posts/2026-03-13-apple-picking-ai/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/posts/2026-03-13-apple-picking-ai/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/posts/2026-03-13-apple-picking-ai/figure-html/unnamed-chunk-3-1.pdf‎
0 Bytes b/‎_freeze/posts/2026-03-13-apple-picking-ai/figure-html/unnamed-chunk-3-1.pdf‎
0 Bytes
diff --git a/‎_freeze/posts/2026-03-13-apple-picking-ai/figure-html/unnamed-chunk-3-1.png‎
-35 Bytes b/‎_freeze/posts/2026-03-13-apple-picking-ai/figure-html/unnamed-chunk-3-1.png‎
-35 Bytes
diff --git a/‎docs/posts/2026-03-13-apple-picking-ai.html‎
Lines changed: 6 additions & 3 deletions b/‎docs/posts/2026-03-13-apple-picking-ai.html‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎docs/posts/2026-03-13-apple-picking-ai_files/figure-html/unnamed-chunk-3-1.png‎
-35 Bytes b/‎docs/posts/2026-03-13-apple-picking-ai_files/figure-html/unnamed-chunk-3-1.png‎
-35 Bytes
diff --git a/‎posts/2026-03-13-apple-picking-ai.qmd‎
Lines changed: 8 additions & 5 deletions b/‎posts/2026-03-13-apple-picking-ai.qmd‎
Lines changed: 8 additions & 5 deletions
@@ -290,7 +290,7 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
 </div>
 </div>
 </div>
-<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&amp;D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents’ optimization ability.</p>
+<p>The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&amp;D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we’ll be able to replace humans with AI. But realistically the agents have been discovering <em>shallow</em> improvements to algorithms. The idea is not original, it already exists in cliche form (“low hanging fruit”), but I still found it useful to formalize it.</p>
 </dd>
 </dl>
 <table class="caption-top table">
@@ -347,9 +347,11 @@ <h1 class="title">An Apple-Picking Model of AI R&amp;D</h1>
 </div>
 </div>
 <p>Humans have been painstakingly pushing those blue curves to the left. We now are seeing clear signs of agents joining the effort and contributing to that leftward movement, and we want to know what to expect.</p>
-<p>The apple-picking model gives us some broad implications in this model:</p>
+<p>The apple-picking model gives us some broad implications:</p>
 <ol type="1">
-<li>Observing an agent improve the frontier does not imply .</li>
+<li>We should benchmark agent AI R&amp;D ability against the <em>frontier</em>, which represent cumulative human effort, rather than against toy problems and the effort of a single human.</li>
+<li>Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.</li>
+<li>The effects will show up in parts of the AI stack that have low-hanging fruit, e.g.&nbsp;where there’s a lot of fairly natural ideas to try but we are bottlenecked on execution.</li>
 </ol>
 </dd>
 </dl>
@@ -379,6 +381,7 @@ <h1>Discussion</h1>
 <li><p><em>Shape of the tree.</em> You can extend the model such that apples are non-uniformly distributed, then we can replace <span class="math inline">\(\lambda\)</span> with <span class="math inline">\(F(\lambda)\)</span> below. We can then talk about types of domain which are bottom-heavy (most optimizations are pretty easy to find) vs top-heavy (most optimizations are hard to find). It then becomes important to know whether AI R&amp;D is relatively more bottom-heavy or top-heavy, if the former then we might already be on the brink of an intelligence explosion.</p></li>
 <li><p><em>Directed search.</em> We assumed that the probability of finding an apple is independent of other apples already found. Realistically people have an ability to put direct their attention to finding new innovations. This implies lower diminishing returns to expenditure, and higher complementarity between agents and humans, I’m not sure whether it would change the qualitative conclusions of the model.</p></li>
 <li><p><em>Bottlenecks.</em> Some R&amp;D is bottlenecked not just by thinking (which agents can do), but also by running experiments.</p></li>
+<li><p><em>Acceleration.</em> This model is missing the effect of acceleration through explicit cooperation – e.g.&nbsp;a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.</p></li>
 <li><p><em>Sketch of a quantitative model of LLM training.</em> LLM training is a big stack of algorithms, which we’ve been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.</p></li>
 </ul>
 </dd>
 
@@ -171,7 +171,7 @@ A simple model for AI R&D.
     ```
 
 
-    The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. This apple-picking model is my attempt to help think through the distinction, and figure out how to measure agents' optimization ability.
+    The motivation for writing this model was to help think through the implications of recent evidence that AI can push forward the frontier on various optimization and AI R&D problems. If you can spend $100 in tokens to increase the efficiency of a frontier AI training algorithm by 0.1% then this looks like the path to self-improvement, and we'll be able to replace humans with AI. But realistically the agents have been discovering *shallow* improvements to algorithms. The idea is not original, it already exists in cliche form ("low hanging fruit"), but I still found it useful to formalize it.
 
 
 |Robots picking apples|Agents finding optimizations|
@@ -201,8 +201,8 @@ Some implications for AI R&D.
         \draw (0,0) -- (1,0) node[midway,below] {ln(training expenditure)}
             -- (1,1) -- (0,1) -- (0,0) node[midway,above,rotate=90] {model intelligence};
         
-        \draw[blue,->] (0.17,0.1)--(0.13,0.1);
-        \draw[blue,->] (0.27,0.1)--(0.23,0.1);
+        \draw[blue,->] (0.15,0.1)--(0.11,0.1);
+        \draw[blue,->] (0.25,0.1)--(0.2,0.1);
 
         \draw[thick, blue, dotted] (0,0) plot[domain=0.2:1, samples=100] (\x, {0.75*(1-exp(-2*(\x-0.2)))}) node[anchor=north west]{2023 technology};
         \draw[thick, blue, dotted] (0,0) plot[domain=0.1:1, samples=100] (\x, {0.75*(1-exp(-2*(\x-0.1)))}) node[right]{2024 technology};
@@ -218,8 +218,9 @@ Some implications for AI R&D.
 
     The apple-picking model gives us some broad implications:
 
-    1. Observing an agent meaningfully improve the frontier (push the curve left) does not imply that agents can replace humans.
-    2. 
+    1. We should benchmark agent AI R&D ability against the *frontier*, which represent cumulative human effort, rather than against toy problems and the effort of a single human.
+    2. Observing an agent improve the frontier (push the curve left) does not imply that agents can replace humans, because they may be only picking low-hanging fruit.
+    3. The effects will show up in parts of the AI stack that have low-hanging fruit, e.g. where there's a lot of fairly natural ideas to try but we are bottlenecked on execution.
 
 
 #               Discussion
@@ -253,6 +254,8 @@ Things to add.
 
     - *Bottlenecks.* Some R&D is bottlenecked not just by thinking (which agents can do), but also by running experiments.
 
+    - *Acceleration.* This model is missing the effect of acceleration through explicit cooperation -- e.g. a human comes up with an idea, and an agent executes it. This seems very likely to be an important contribution to AI progress.
+
     - *Sketch of a quantitative model of LLM training.* LLM training is a big stack of algorithms, which we've been optimizing at perhaps 10X/year. Would be useful to add some speculation about which parts of the stack have low-hanging fruit.