Skip to content

Commit 4acae7c

Browse files
committed
apple trees
1 parent 7079195 commit 4acae7c

18 files changed

Lines changed: 326 additions & 9 deletions
85.1 KB
Loading
130 KB
Loading
91.8 KB
Loading
146 KB
Loading
87.6 KB
Loading
83.3 KB
Loading
124 KB
Loading

docs/posts/2025-09-13-recursive-self-improvement-explosion-optimization.html

Lines changed: 104 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,14 @@ <h1>Apple-Picking Model / Low Hanging</h1>
336336
<li>You can list out all the insights, some are low, some are</li>
337337
</ul>
338338
</dd>
339+
</dl>
340+
<div class="quarto-figure quarto-figure-center">
341+
<figure class="figure">
342+
<p><img src="images/2026-03-12-11-32-34.png" class="img-fluid figure-img"></p>
343+
<figcaption>apple tree</figcaption>
344+
</figure>
345+
</div>
346+
<dl>
339347
<dt>Model.</dt>
340348
<dd>
341349
<ul>
@@ -462,10 +470,7 @@ <h2 class="anchored" data-anchor-id="bernoulli-apple-picking-model">Bernoulli Ap
462470
</div>
463471
</div>
464472
</div>
465-
<hr>
466-
<ul>
467-
<li>For <span class="math inline">\(p=0.5\)</span> and <span class="math inline">\(\lambda=0.5\)</span>, we can see the share implied by this expression in a small heat map:</li>
468-
</ul>
473+
<p>Table:</p>
469474
<table style="border-collapse:collapse; margin:1rem 0; font-size:0.95rem;">
470475
<caption style="caption-side:top; text-align:center; font-weight:600; padding-bottom:0.45rem;">
471476
Heat map of share found (p = 0.5, λ = 0.5)
@@ -722,6 +727,84 @@ <h2 class="anchored" data-anchor-id="bernoulli-apple-picking-model">Bernoulli Ap
722727
</tbody>
723728
</table>
724729
</section>
730+
<section id="closed-apple-picking-model" class="level2">
731+
<h2 class="anchored" data-anchor-id="closed-apple-picking-model">Closed Apple-Picking Model</h2>
732+
<p><strong>Setup:</strong></p>
733+
<ol type="1">
734+
<li>Apples sit at heights in <span class="math inline">\([0,\infty)\)</span>; human reach is normalized to 1. The agent has reach <span class="math inline">\(\lambda_t \ge 0\)</span> and picks everything below it.</li>
735+
<li>Humans pick in the band <span class="math inline">\((\lambda_t, 1]\)</span> at a rate governed by <span class="math inline">\(p\)</span>.</li>
736+
<li>Agent reach depends on cumualtive apples harvested: they can only pick apples after some minimum threshold (<span class="math inline">\(\bar{a}\)</span>), and then linear in <span class="math inline">\(a\)</span>.</li>
737+
</ol>
738+
<p><strong>Implications:</strong></p>
739+
<ol type="1">
740+
<li>Agents will get taller than humans iff <span class="math inline">\(\alpha + \beta(1-\bar{a}) &gt; 1\)</span></li>
741+
<li>Agent height will be explosive iff <span class="math inline">\(\beta &gt;1\)</span>, i.e.&nbsp;if eating all the apples in a 1-cm slice of tree causes you to grow 1cm higher. If not then you converge to a finite height <span class="math inline">\(\lambda^*\)</span>.</li>
742+
</ol>
743+
<section id="state-variables-and-dynamics" class="level3">
744+
<h3 class="anchored" data-anchor-id="state-variables-and-dynamics">1. State variables and dynamics</h3>
745+
<p>Normalize human reach to 1.</p>
746+
<ul>
747+
<li><strong><span class="math inline">\(\lambda_t \ge 0\)</span></strong>: agent reach (how high the AI can pick).</li>
748+
<li><strong><span class="math inline">\(h_t \in [0,1]\)</span></strong>: human coverage of the human-only band <span class="math inline">\((\lambda_t, 1]\)</span> (fraction of that band already picked by humans).</li>
749+
</ul>
750+
<p><strong>Human dynamics</strong> (one parameter <span class="math inline">\(p \in (0,1)\)</span>): per period, a fraction <span class="math inline">\(1-p\)</span> of the remaining human-level band gets picked, so <span class="math display">\[h_{t+1} = 1 - p(1-h_t), \qquad h_0 = 0.\]</span> (Equivalently <span class="math inline">\(h_t = 1 - p^t\)</span>; the recursion keeps the model closed and autonomous.)</p>
751+
<p><strong>Apples harvested</strong> (agent + humans, with clipping at 1): <span class="math display">\[a_t = \lambda_t + (1-\lambda_t)_+ \, h_t, \qquad (x)_+ \equiv \max\{x,0\}.\]</span> Agent gets everything up to <span class="math inline">\(\lambda_t\)</span>; humans only contribute on the band of length <span class="math inline">\((1-\lambda_t)_+\)</span>, of which fraction <span class="math inline">\(h_t\)</span> is covered by time <span class="math inline">\(t\)</span>.</p>
752+
<p><strong>Self-improvement</strong> (activation threshold <span class="math inline">\(\bar{a}\)</span>, then affine in <span class="math inline">\(a_t\)</span>): <span class="math display">\[\lambda_{t+1} = \begin{cases} 0, &amp; a_t &lt; \bar{a} \\ \alpha + \beta(a_t - \bar{a}), &amp; a_t \ge \bar{a}. \end{cases}\]</span></p>
753+
<p>Parameters: <strong><span class="math inline">\(p\)</span></strong> (human speed), <strong><span class="math inline">\(\bar{a}\)</span></strong> (minimum progress to “turn on” the agent), <strong><span class="math inline">\(\alpha\)</span></strong> (baseline capability once on), <strong><span class="math inline">\(\beta\)</span></strong> (strength of recursive improvement). Initial condition <span class="math inline">\(\lambda_0\)</span> (typically 0). Four parameters plus <span class="math inline">\(\lambda_0\)</span>.</p>
754+
<hr>
755+
</section>
756+
<section id="crisp-conditions" class="level3">
757+
<h3 class="anchored" data-anchor-id="crisp-conditions">2. Crisp conditions</h3>
758+
<p><strong>A) Activation.</strong> With <span class="math inline">\(\lambda_t = 0\)</span>, <span class="math inline">\(a_t = h_t \to 1\)</span>. So the agent can ever turn on <strong>iff <span class="math inline">\(\bar{a} &lt; 1\)</span></strong>. If <span class="math inline">\(\bar{a} \ge 1\)</span>, <span class="math inline">\(\lambda_t \equiv 0\)</span> forever. Activation-time approximation: <span class="math inline">\(h_t = 1 - p^t \ge \bar{a}\)</span> <span class="math inline">\(\Leftrightarrow\)</span> <span class="math inline">\(t \ge \ln(1-\bar{a})/\ln p\)</span>; <span class="math inline">\(p\)</span> mainly shifts <em>when</em> activation happens.</p>
759+
<p><strong>B) Crossing human level.</strong> As <span class="math inline">\(t \to \infty\)</span>, <span class="math inline">\(h_t \to 1\)</span>. If <span class="math inline">\(\lambda_t &lt; 1\)</span>, <span class="math inline">\(a_t \to 1\)</span>; if <span class="math inline">\(\lambda_t \ge 1\)</span>, <span class="math inline">\(a_t = \lambda_t\)</span>. So asymptotically <span class="math inline">\(\lambda_{t+1} \to f(1)\)</span> with <span class="math inline">\(f(1) = 0\)</span> if <span class="math inline">\(1 &lt; \bar{a}\)</span>, and <span class="math inline">\(f(1) = \alpha + \beta(1-\bar{a})\)</span> if <span class="math inline">\(1 \ge \bar{a}\)</span>. So <strong>takeoff past human level</strong> (eventually <span class="math inline">\(\lambda &gt; 1\)</span>) <strong>iff</strong> <span class="math display">\[\boxed{\alpha + \beta(1-\bar{a}) &gt; 1.}\]</span> Interpretation: “If the orchard were fully human-level (<span class="math inline">\(a=1\)</span>), would the next agent be at least human-level?” If not, the system stays below 1. This condition is essentially independent of <span class="math inline">\(p\)</span> (timing, not whether).</p>
760+
<p><strong>C) Above human level: runaway vs saturation.</strong> For <span class="math inline">\(\lambda_t \ge 1\)</span>, <span class="math inline">\(a_t = \lambda_t\)</span> and <span class="math display">\[\lambda_{t+1} = \alpha + \beta(\lambda_t - \bar{a}).\]</span> - <strong>Runaway / hard takeoff</strong> iff <span class="math inline">\(\boxed{\beta &gt; 1}\)</span> (roughly geometric growth in <span class="math inline">\(\lambda_t\)</span>). - <strong>Soft takeoff / saturation</strong> iff <span class="math inline">\(\boxed{\beta &lt; 1}\)</span>: convergence to <span class="math display">\[\lambda^* = \frac{\alpha - \beta\bar{a}}{1-\beta}\]</span> (provided the system crosses 1 first). - <strong>Knife-edge</strong> <span class="math inline">\(\beta = 1\)</span>: linear growth.</p>
761+
<hr>
762+
</section>
763+
<section id="illustrations" class="level3">
764+
<h3 class="anchored" data-anchor-id="illustrations">3. Illustrations</h3>
765+
<p>The figures below implement this model: four canonical trajectories, phase diagram in <span class="math inline">\((\alpha,\beta)\)</span>, cobweb plots of the asymptotic map, and sensitivity to <span class="math inline">\(p\)</span> and <span class="math inline">\(\bar{a}\)</span>.</p>
766+
<div class="cell" data-layout-align="center">
767+
<div class="cell-output-display">
768+
<div class="quarto-figure quarto-figure-center">
769+
<figure class="figure">
770+
<p><img src="2025-09-13-recursive-self-improvement-explosion-optimization_files/figure-html/apple-rsi-four-regimes-1.png" class="img-fluid figure-img" width="672"></p>
771+
<figcaption>Four regimes: no activation (ā≥1), activated but stuck (f(1)&lt;1), soft takeoff (f(1)&gt;1, β&lt;1), hard takeoff (f(1)&gt;1, β&gt;1)</figcaption>
772+
</figure>
773+
</div>
774+
</div>
775+
</div>
776+
<div class="cell" data-layout-align="center">
777+
<div class="cell-output-display">
778+
<div class="quarto-figure quarto-figure-center">
779+
<figure class="figure">
780+
<p><img src="2025-09-13-recursive-self-improvement-explosion-optimization_files/figure-html/apple-rsi-phase-1.png" class="img-fluid figure-img" width="528"></p>
781+
<figcaption>Phase diagram: f(1)=1 (cross human level) and β=1 (runaway vs saturation) at fixed ā = 0.3</figcaption>
782+
</figure>
783+
</div>
784+
</div>
785+
</div>
786+
<div class="cell" data-layout-align="center">
787+
<div class="cell-output-display">
788+
<div class="quarto-figure quarto-figure-center">
789+
<figure class="figure">
790+
<p><img src="2025-09-13-recursive-self-improvement-explosion-optimization_files/figure-html/apple-rsi-cobweb-1.png" class="img-fluid figure-img" width="672"></p>
791+
<figcaption>Cobweb: asymptotic map λ_{t+1} = f(max(λ_t,1)); soft (β&lt;1) vs hard (β&gt;1)</figcaption>
792+
</figure>
793+
</div>
794+
</div>
795+
</div>
796+
<div class="cell" data-layout-align="center">
797+
<div class="cell-output-display">
798+
<div class="quarto-figure quarto-figure-center">
799+
<figure class="figure">
800+
<p><img src="2025-09-13-recursive-self-improvement-explosion-optimization_files/figure-html/apple-rsi-sensitivity-1.png" class="img-fluid figure-img" width="672"></p>
801+
<figcaption>Sensitivity: effect of p (left) and of ā (right) on λ_t trajectory</figcaption>
802+
</figure>
803+
</div>
804+
</div>
805+
</div>
806+
</section>
807+
</section>
725808
</section>
726809
<section id="toy-model-of-ai-stack" class="level1">
727810
<h1>Toy model of AI stack</h1>
@@ -1338,6 +1421,23 @@ <h1>References</h1>
13381421
</section>
13391422
<section id="overflow-offcuts" class="level1">
13401423
<h1>Overflow / Offcuts</h1>
1424+
<dl>
1425+
<dt>How important is scale?</dt>
1426+
<dd>
1427+
<ul>
1428+
<li><a href="https://arxiv.org/pdf/2509.01440">Benchmarking Optimizers for Large Language Model Pretraining</a></li>
1429+
<li>They compare 10 optimizers which are alternatives to AdamW, &amp; shown to beat it in some circumstances, and find that 3/10 beat it.</li>
1430+
<li><blockquote class="blockquote">
1431+
<p>“Methods that perform well in short speedruns might not be optimal for longer training horizons”</p>
1432+
</blockquote></li>
1433+
<li>Figure 1 shows only 3/10 optimizers outperform AdamW at scale.</li>
1434+
</ul>
1435+
</dd>
1436+
<dt>Terrance Tao / optimization problems.</dt>
1437+
<dd>
1438+
https://github.com/teorth/optimizationproblems?tab=readme-ov-file They list 7 cases of “recent progress”, 3 of them are explicitly LLM-assisted or LLM-generated.
1439+
</dd>
1440+
</dl>
13411441
<section id="metaphors-for-rsi" class="level2">
13421442
<h2 class="anchored" data-anchor-id="metaphors-for-rsi">metaphors for RSI</h2>
13431443
<dl>
85.1 KB
Loading
130 KB
Loading

0 commit comments

Comments
 (0)