You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -462,10 +470,7 @@ <h2 class="anchored" data-anchor-id="bernoulli-apple-picking-model">Bernoulli Ap
462
470
</div>
463
471
</div>
464
472
</div>
465
-
<hr>
466
-
<ul>
467
-
<li>For <spanclass="math inline">\(p=0.5\)</span> and <spanclass="math inline">\(\lambda=0.5\)</span>, we can see the share implied by this expression in a small heat map:</li>
<li>Apples sit at heights in <spanclass="math inline">\([0,\infty)\)</span>; human reach is normalized to 1. The agent has reach <spanclass="math inline">\(\lambda_t \ge 0\)</span> and picks everything below it.</li>
735
+
<li>Humans pick in the band <spanclass="math inline">\((\lambda_t, 1]\)</span> at a rate governed by <spanclass="math inline">\(p\)</span>.</li>
736
+
<li>Agent reach depends on cumualtive apples harvested: they can only pick apples after some minimum threshold (<spanclass="math inline">\(\bar{a}\)</span>), and then linear in <spanclass="math inline">\(a\)</span>.</li>
737
+
</ol>
738
+
<p><strong>Implications:</strong></p>
739
+
<oltype="1">
740
+
<li>Agents will get taller than humans iff <spanclass="math inline">\(\alpha + \beta(1-\bar{a}) > 1\)</span></li>
741
+
<li>Agent height will be explosive iff <spanclass="math inline">\(\beta >1\)</span>, i.e. if eating all the apples in a 1-cm slice of tree causes you to grow 1cm higher. If not then you converge to a finite height <spanclass="math inline">\(\lambda^*\)</span>.</li>
<h3class="anchored" data-anchor-id="state-variables-and-dynamics">1. State variables and dynamics</h3>
745
+
<p>Normalize human reach to 1.</p>
746
+
<ul>
747
+
<li><strong><spanclass="math inline">\(\lambda_t \ge 0\)</span></strong>: agent reach (how high the AI can pick).</li>
748
+
<li><strong><spanclass="math inline">\(h_t \in [0,1]\)</span></strong>: human coverage of the human-only band <spanclass="math inline">\((\lambda_t, 1]\)</span> (fraction of that band already picked by humans).</li>
749
+
</ul>
750
+
<p><strong>Human dynamics</strong> (one parameter <spanclass="math inline">\(p \in (0,1)\)</span>): per period, a fraction <spanclass="math inline">\(1-p\)</span> of the remaining human-level band gets picked, so <spanclass="math display">\[h_{t+1} = 1 - p(1-h_t), \qquad h_0 = 0.\]</span> (Equivalently <spanclass="math inline">\(h_t = 1 - p^t\)</span>; the recursion keeps the model closed and autonomous.)</p>
751
+
<p><strong>Apples harvested</strong> (agent + humans, with clipping at 1): <spanclass="math display">\[a_t = \lambda_t + (1-\lambda_t)_+ \, h_t, \qquad (x)_+ \equiv \max\{x,0\}.\]</span> Agent gets everything up to <spanclass="math inline">\(\lambda_t\)</span>; humans only contribute on the band of length <spanclass="math inline">\((1-\lambda_t)_+\)</span>, of which fraction <spanclass="math inline">\(h_t\)</span> is covered by time <spanclass="math inline">\(t\)</span>.</p>
752
+
<p><strong>Self-improvement</strong> (activation threshold <spanclass="math inline">\(\bar{a}\)</span>, then affine in <spanclass="math inline">\(a_t\)</span>): <spanclass="math display">\[\lambda_{t+1} = \begin{cases} 0, & a_t < \bar{a} \\ \alpha + \beta(a_t - \bar{a}), & a_t \ge \bar{a}. \end{cases}\]</span></p>
753
+
<p>Parameters: <strong><spanclass="math inline">\(p\)</span></strong> (human speed), <strong><spanclass="math inline">\(\bar{a}\)</span></strong> (minimum progress to “turn on” the agent), <strong><spanclass="math inline">\(\alpha\)</span></strong> (baseline capability once on), <strong><spanclass="math inline">\(\beta\)</span></strong> (strength of recursive improvement). Initial condition <spanclass="math inline">\(\lambda_0\)</span> (typically 0). Four parameters plus <spanclass="math inline">\(\lambda_0\)</span>.</p>
<p><strong>A) Activation.</strong> With <spanclass="math inline">\(\lambda_t = 0\)</span>, <spanclass="math inline">\(a_t = h_t \to 1\)</span>. So the agent can ever turn on <strong>iff <spanclass="math inline">\(\bar{a} < 1\)</span></strong>. If <spanclass="math inline">\(\bar{a} \ge 1\)</span>, <spanclass="math inline">\(\lambda_t \equiv 0\)</span> forever. Activation-time approximation: <spanclass="math inline">\(h_t = 1 - p^t \ge \bar{a}\)</span><spanclass="math inline">\(\Leftrightarrow\)</span><spanclass="math inline">\(t \ge \ln(1-\bar{a})/\ln p\)</span>; <spanclass="math inline">\(p\)</span> mainly shifts <em>when</em> activation happens.</p>
759
+
<p><strong>B) Crossing human level.</strong> As <spanclass="math inline">\(t \to \infty\)</span>, <spanclass="math inline">\(h_t \to 1\)</span>. If <spanclass="math inline">\(\lambda_t < 1\)</span>, <spanclass="math inline">\(a_t \to 1\)</span>; if <spanclass="math inline">\(\lambda_t \ge 1\)</span>, <spanclass="math inline">\(a_t = \lambda_t\)</span>. So asymptotically <spanclass="math inline">\(\lambda_{t+1} \to f(1)\)</span> with <spanclass="math inline">\(f(1) = 0\)</span> if <spanclass="math inline">\(1 < \bar{a}\)</span>, and <spanclass="math inline">\(f(1) = \alpha + \beta(1-\bar{a})\)</span> if <spanclass="math inline">\(1 \ge \bar{a}\)</span>. So <strong>takeoff past human level</strong> (eventually <spanclass="math inline">\(\lambda > 1\)</span>) <strong>iff</strong><spanclass="math display">\[\boxed{\alpha + \beta(1-\bar{a}) > 1.}\]</span> Interpretation: “If the orchard were fully human-level (<spanclass="math inline">\(a=1\)</span>), would the next agent be at least human-level?” If not, the system stays below 1. This condition is essentially independent of <spanclass="math inline">\(p\)</span> (timing, not whether).</p>
760
+
<p><strong>C) Above human level: runaway vs saturation.</strong> For <spanclass="math inline">\(\lambda_t \ge 1\)</span>, <spanclass="math inline">\(a_t = \lambda_t\)</span> and <spanclass="math display">\[\lambda_{t+1} = \alpha + \beta(\lambda_t - \bar{a}).\]</span> - <strong>Runaway / hard takeoff</strong> iff <spanclass="math inline">\(\boxed{\beta > 1}\)</span> (roughly geometric growth in <spanclass="math inline">\(\lambda_t\)</span>). - <strong>Soft takeoff / saturation</strong> iff <spanclass="math inline">\(\boxed{\beta < 1}\)</span>: convergence to <spanclass="math display">\[\lambda^* = \frac{\alpha - \beta\bar{a}}{1-\beta}\]</span> (provided the system crosses 1 first). - <strong>Knife-edge</strong><spanclass="math inline">\(\beta = 1\)</span>: linear growth.</p>
<p>The figures below implement this model: four canonical trajectories, phase diagram in <spanclass="math inline">\((\alpha,\beta)\)</span>, cobweb plots of the asymptotic map, and sensitivity to <spanclass="math inline">\(p\)</span> and <spanclass="math inline">\(\bar{a}\)</span>.</p>
<li><ahref="https://arxiv.org/pdf/2509.01440">Benchmarking Optimizers for Large Language Model Pretraining</a></li>
1429
+
<li>They compare 10 optimizers which are alternatives to AdamW, & shown to beat it in some circumstances, and find that 3/10 beat it.</li>
1430
+
<li><blockquoteclass="blockquote">
1431
+
<p>“Methods that perform well in short speedruns might not be optimal for longer training horizons”</p>
1432
+
</blockquote></li>
1433
+
<li>Figure 1 shows only 3/10 optimizers outperform AdamW at scale.</li>
1434
+
</ul>
1435
+
</dd>
1436
+
<dt>Terrance Tao / optimization problems.</dt>
1437
+
<dd>
1438
+
https://github.com/teorth/optimizationproblems?tab=readme-ov-file They list 7 cases of “recent progress”, 3 of them are explicitly LLM-assisted or LLM-generated.
1439
+
</dd>
1440
+
</dl>
1341
1441
<sectionid="metaphors-for-rsi" class="level2">
1342
1442
<h2class="anchored" data-anchor-id="metaphors-for-rsi">metaphors for RSI</h2>
0 commit comments