tecunningham
diff --git a/‎_freeze/posts/2026-05-06-paper-on-effects/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/posts/2026-05-06-paper-on-effects/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_freeze/posts/2026-05-06-paper-on-effects/figure-html/unnamed-chunk-2-1.pdf‎
-909 Bytes b/‎_freeze/posts/2026-05-06-paper-on-effects/figure-html/unnamed-chunk-2-1.pdf‎
-909 Bytes
diff --git a/‎_freeze/posts/2026-05-06-paper-on-effects/figure-html/unnamed-chunk-2-1.png‎
-3.81 KB b/‎_freeze/posts/2026-05-06-paper-on-effects/figure-html/unnamed-chunk-2-1.png‎
-3.81 KB
diff --git a/‎docs/posts/2026-05-06-paper-on-effects.html‎
Lines changed: 38 additions & 30 deletions b/‎docs/posts/2026-05-06-paper-on-effects.html‎
Lines changed: 38 additions & 30 deletions
diff --git a/‎docs/posts/2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png‎
-3.81 KB b/‎docs/posts/2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png‎
-3.81 KB
@@ -1,8 +1,8 @@
 {
-  "hash": "1e0e6d4835475f40e83bdac8024a15e8",
+  "hash": "4609f8c41aec2deebbb14be95a1b2671",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: Paper on Effects of Efficiency\nauthor: Tom Cunningham\nbibliography: ai.bib\ndate: today\ndraft: true\nengine: knitr\nreference-location: document\ncitation-location: document\n#toc: true\n#toc-location: left-body\nexecute:\n  echo: false\n  warning: false\n  error: false\n  cache: true # caches chunk output\n# format:\n#     pdf:\n#        pdf-engine: pdflatex\n#        include-in-header:\n#           - text: |\n#              \\usepackage[utf8]{inputenc}\n#              \\usepackage{bm}\n#              \\usepackage[all,2cell]{xy}\n#               \\newcommand{\\utt}[3]{\\underbrace{#1}_{\\substack{\\text{#2}\\\\\\text{#3}}}}\n---\n\nQuestion: does automated AI R&D result in a fast takeoff?\n: \n    Suppose we successfully automate AI R&D, so that we have an agent that can substitute for human AI researchers, what will be the effect on capabilities progress?\n    \n    What data from a lab would help answer this question?\n\nThe key data is the correlation between R&D investment and algorithmic efficiency.\n: \n    If algorithmic progress is very sensitive to R&D effort then automating R&D would have a big effect, and vice versa. So the core useful data would be the following:\n\n    - R&D investment (number of FTE researchers, maybe weighted by salary)\n    - algorithmic efficiency\n    \n\nA fuller model would split out different parts of the production.\n: \n    We also want to account for:\n\n    - Experiment compute\n    - Data\n    - (etc.)\n\nBasic data requests.\n: \n    Data for each team and each quarter:\n\n    - Inputs: researchers, compoute, data.\n    - Outputs: algorithmic efficiency.\n    \n    We also want survey data giving best-estimates on substitutability. Ask them hypotheticals about how much progress they would get .\n\nOut of scope: estimating R&D automation.\n: \n    There are lots of related questions about R&D automation:\n\n    - How much uplift is AI giving to researchers?\n    - How close substitutes are agentic and human workers?\n    - Is agentic R&D less experiment-efficient than human R&D? (this is potentially in-scope).\n\n\n\n#               Core Model\n\nCore model: R&D and algorithmic efficiency.\n: \n    We start with this basic model:\n    $$\\xymatrix@C=3em@R=1.4em{\n        *++[F]{\\text{R\\&D}}\\ar[r]|(0.4)r\n        & *++[F]{\\text{algorithmic}\\atop\\text{efficiency}}\n    }\n    $$\n\n    A very simple condition:\n        $$r=\\frac{\\Delta\\ln(\\text{algorithmic efficiency})}{\\Delta \\ln(\\text{R\\&D})}$$\n    \n    This would allow us to estimate $r$, and immediately know whether to expect a takeoff.\n\n\nLog data to collect.\n: \n    For each research area (pretraining, midtraining, etc.) and in each quarter:\n    \n    - R&D investment (number of FTE researchers, maybe weighted by salary)\n    - compute efficiency\n\n    | quarter | researchers | researcher salaries | compute efficiency win |\n    | ------- | :---------: | :-----------------: | :--------------------: |\n    | 2025Q1  |      3      |        $30M         |          20%           |\n    | 2025Q2  |      5      |        $50M         |          20%           |\n    | ...     |             |                     |                        |\n    \n    The absolute value are sensitive, but we only need to know the relative numbers to estimate the relationship.\n\n    Hypothetical scatter plot (each point is a quarter):\n\n\n    ::: {.cell layout-align=\"center\"}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-1-1.png){fig-align='center' width=384}\n    :::\n    :::\n\n\nSurvey data to collect.\n: \n    There are many reasons why the log data will be imperfect, we can ask the following question:\n\n    - Q: if you had 2X as many researchers last quarter, how much larger do you think your compute efficiency gains would be? (hold fixed experiment compute, training compute, data).\n\n\nComplication: scale-dependent efficiency.\n: \n    @gundlach2025algorithmicprogressai argue that (1) many algorithms have different efficiency at different scales; (2) most algorithmic efficiency growth over the past 10 years was due to the Transformer. I don't think this is a big worry for us, because we're just looking at the last few years, and in fact model scale hasn't changed so enormously (though I'm not sure if the Gundlach paper's \"scale\" refers to parameters or to training compute).\n\nComplication: limits of compute efficiency.\n: \n    Training compute efficiency can be an imperfect metric: (1) some algorithms shift the asymptote; (2) some algorithms change the inference-time efficiency.\n\n\n#               Adding Experiment Compute\n\nWe can add experiment compute.\n: \n    We want to know the relative importance of R&D labor and experiment compute. We can write this as follows, the $\\sigma$ refers to an elasticity of substitution.\n\n\n    ::: {.cell}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png){width=672}\n    :::\n    :::\n\n\n    If $\\sigma=0.5$, this means R&D and experiment-compute are strong complements, and having infinite R&D labor will only increase algorithmic efficiency by around 2X (assuming constant returns to scale).\n\nIt's hard to identify substitutability from historical data.\n: \n    Suppose we had the following data:\n\n    | quarter | researchers | experiment compute |     |\n    | ------- | ----------- | ----------------- | --- |\n    | 2025Q1  |             |                   |     |\n    | 2025Q2  |             |                   |     |\n    |         |             |                   |     |\n\n\n\n#               Fuller Model of Capabilities\n\nA fuller model of AI R&D.\n: \n\n    ::: {.cell}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-3-1.png){width=672}\n    :::\n    :::\n\n\nSurvey data to collect.\n: \n    What would your best-guess be at compute-equivalent gains in the following scenarios, over the last quarter:\n\n    | researchers | experiment compute | training compute |  data   | **guess at gains?** |\n    | :---------: | :----------------: | :--------------: | :-----: | :-----------------: |\n    |     2X      |      (fixed)       |     (fixed)      | (fixed) |         ___         |\n    |   (fixed)   |         2X         |     (fixed)      | (fixed) |         ___         |\n    |   (fixed)   |      (fixed)       |        2X        | (fixed) |         ___         |\n    |   (fixed)   |      (fixed)       |     (fixed)      |   2X    |         ___         |\n\n",
+    "markdown": "---\ntitle: Paper on Effects of Efficiency\nauthor: Tom Cunningham\nbibliography: ai.bib\ndate: today\ndraft: true\nengine: knitr\nreference-location: document\ncitation-location: document\n#toc: true\n#toc-location: left-body\nexecute:\n  echo: false\n  warning: false\n  error: false\n  cache: true # caches chunk output\n---\n\n<!-- https://tecunningham.github.io/posts/2026-05-06-paper-on-effects.html -->\n\nQuestion: does automated AI R&D result in a fast takeoff?\n: \n    Suppose we successfully automate AI R&D, so that we have an agent that can substitute for human AI researchers, what will be the effect on capabilities progress?\n    \n    What data from a lab would help answer this question?\n\nThe key data is the correlation between R&D investment and algorithmic efficiency.\n: \n    If algorithmic progress is very sensitive to R&D effort then automating R&D would have a big effect, and vice versa. So the core useful data would be the following:\n\n    - R&D investment (number of FTE researchers, maybe weighted by salary)\n    - algorithmic efficiency\n    \n\nA fuller model would split out different parts of the production.\n: \n    We also want to account for:\n\n    - Experiment compute\n    - Data\n    - (etc.)\n\nBasic data requests.\n: \n    Data for each team and each quarter:\n\n    - Inputs: researchers, compoute, data.\n    - Outputs: algorithmic efficiency.\n    \n    We also want survey data giving best-estimates on substitutability. Ask them hypotheticals about how much progress they would get .\n\nOut of scope: estimating R&D automation.\n: \n    There are lots of related questions about R&D automation:\n\n    - How much uplift is AI giving to researchers?\n    - How close substitutes are agentic and human workers?\n    - Is agentic R&D less experiment-efficient than human R&D? (this is potentially in-scope).\n\n\n\n#               Core Model\n\nCore model: R&D and algorithmic efficiency.\n: \n    We start with this basic model:\n    $$\\xymatrix@C=3em@R=1.4em{\n        *++[F]{\\text{R\\&D}}\\ar[r]|(0.4)r\n        & *++[F]{\\text{algorithmic}\\atop\\text{efficiency}}\n    }\n    $$\n\n    A very simple condition:\n        $$r=\\frac{\\Delta\\ln(\\text{algorithmic efficiency})}{\\Delta \\ln(\\text{R\\&D})}$$\n    \n    This would allow us to estimate $r$, and immediately know whether to expect a takeoff.\n\n\nLog data to collect.\n: \n    For each research area (pretraining, midtraining, etc.) and in each quarter:\n\n    | quarter | researchers | researcher salaries | compute efficiency win |\n    | ------- | :---------: | :-----------------: | :--------------------: |\n    | 2025Q1  |      3      |        $30M         |          +20%           |\n    | 2025Q2  |      5      |        $50M         |          +30%           |\n    | ...     |     ...     |         ...         |          ...           |\n\n    \n    The absolute value are sensitive, but we only need to know the relative numbers to estimate the relationship. Hypothetical scatter plot (each point is a quarter):\n\n\n    ::: {.cell layout-align=\"center\"}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-1-1.png){fig-align='center' width=384}\n    :::\n    :::\n\n\nSurvey data to collect.\n: \n    There are many reasons why the log data will be imperfect, we can ask the following question:\n\n    > If you had 2X as many researchers last quarter, how much larger do you think your compute efficiency gains would be? (hold fixed experiment compute, training compute, data).\n\n\nComplication: scale-dependent efficiency.\n: \n    @gundlach2025algorithmicprogressai argue that (1) many algorithms have different efficiency at different scales; (2) most algorithmic efficiency growth over the past 10 years was due to the Transformer. I don't think this is a big worry for us, because we're just looking at the last few years, and in fact model scale hasn't changed so enormously (though I'm not sure if the Gundlach paper's \"scale\" refers to parameters or to training compute).\n\nComplication: limits of compute efficiency.\n: \n    Training compute efficiency can be an imperfect metric: (1) some algorithms shift the asymptote; (2) some algorithms change the inference-time efficiency.\n\n\n#               Adding Experiment Compute\n\nWe can add experiment compute.\n: \n    We want to know the relative importance of R&D labor and experiment compute. We can write this as follows, the $\\sigma$ refers to an elasticity of substitution.\n\n\n    ::: {.cell layout-align=\"center\"}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png){fig-align='center' width=288}\n    :::\n    :::\n\n\n    If $\\sigma=0.5$, this means R&D and experiment-compute are strong complements, and having infinite R&D labor will only increase algorithmic efficiency by around 2X (assuming constant returns to scale).\n\nIt's harder to identify substitutability from historical data.\n: \n    Suppose we had the following data:\n\n    | quarter | researcher expenditure | experiment expenditure | compute efficiency |\n    | ------- | :--------------------: | :--------------------: | :----------------: |\n    | 2025Q1  |          $50M          |          $50M          |        +20%        |\n    | 2025Q2  |          $50M          |          $50M          |        +20%        |\n    |         |                        |                        |                    |\n\n    The fact that we're spending on both researchers and experiments tells us that they're complements, but doesn't tell us how strong the complementarity it.\n\n\n#               Fuller Model of Capabilities\n\nA fuller model of AI R&D.\n: \n\n    ::: {.cell}\n    ::: {.cell-output-display}\n    ![](2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-3-1.png){width=672}\n    :::\n    :::\n\n\nSurvey data to collect.\n: \n    What would your best-guess be at compute-equivalent gains in the following scenarios, over the last quarter:\n\n    | researchers | experiment compute | training compute |  data   | **guess at gains?** |\n    | :---------: | :----------------: | :--------------: | :-----: | :-----------------: |\n    |     2X      |      (fixed)       |     (fixed)      | (fixed) |         ___         |\n    |   (fixed)   |         2X         |     (fixed)      | (fixed) |         ___         |\n    |   (fixed)   |      (fixed)       |        2X        | (fixed) |         ___         |\n    |   (fixed)   |      (fixed)       |     (fixed)      |   2X    |         ___         |\n\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
 
@@ -311,11 +311,13 @@ <h1>Core Model</h1>
 <dt>Log data to collect.</dt>
 <dd>
 <p>For each research area (pretraining, midtraining, etc.) and in each quarter:</p>
-<ul>
-<li>R&amp;D investment (number of FTE researchers, maybe weighted by salary)</li>
-<li>compute efficiency</li>
-</ul>
 <table class="caption-top table">
+<colgroup>
+<col style="width: 11%">
+<col style="width: 18%">
+<col style="width: 32%">
+<col style="width: 37%">
+</colgroup>
 <thead>
 <tr class="header">
 <th>quarter</th>
@@ -329,24 +331,23 @@ <h1>Core Model</h1>
 <td>2025Q1</td>
 <td style="text-align: center;">3</td>
 <td style="text-align: center;">$30M</td>
-<td style="text-align: center;">20%</td>
+<td style="text-align: center;">+20%</td>
 </tr>
 <tr class="even">
 <td>2025Q2</td>
 <td style="text-align: center;">5</td>
 <td style="text-align: center;">$50M</td>
-<td style="text-align: center;">20%</td>
+<td style="text-align: center;">+30%</td>
 </tr>
 <tr class="odd">
 <td>…</td>
-<td style="text-align: center;"></td>
-<td style="text-align: center;"></td>
-<td style="text-align: center;"></td>
+<td style="text-align: center;">…</td>
+<td style="text-align: center;">…</td>
+<td style="text-align: center;">…</td>
 </tr>
 </tbody>
 </table>
-<p>The absolute value are sensitive, but we only need to know the relative numbers to estimate the relationship.</p>
-<p>Hypothetical scatter plot (each point is a quarter):</p>
+<p>The absolute value are sensitive, but we only need to know the relative numbers to estimate the relationship. Hypothetical scatter plot (each point is a quarter):</p>
 <div class="cell" data-layout-align="center">
 <div class="cell-output-display">
 <div class="quarto-figure quarto-figure-center">
@@ -360,9 +361,9 @@ <h1>Core Model</h1>
 <dt>Survey data to collect.</dt>
 <dd>
 <p>There are many reasons why the log data will be imperfect, we can ask the following question:</p>
-<ul>
-<li>Q: if you had 2X as many researchers last quarter, how much larger do you think your compute efficiency gains would be? (hold fixed experiment compute, training compute, data).</li>
-</ul>
+<blockquote class="blockquote">
+<p>If you had 2X as many researchers last quarter, how much larger do you think your compute efficiency gains would be? (hold fixed experiment compute, training compute, data).</p>
+</blockquote>
 </dd>
 <dt>Complication: scale-dependent efficiency.</dt>
 <dd>
@@ -380,50 +381,57 @@ <h1>Adding Experiment Compute</h1>
 <dt>We can add experiment compute.</dt>
 <dd>
 <p>We want to know the relative importance of R&amp;D labor and experiment compute. We can write this as follows, the <span class="math inline">\(\sigma\)</span> refers to an elasticity of substitution.</p>
-<div class="cell">
+<div class="cell" data-layout-align="center">
 <div class="cell-output-display">
-<div>
+<div class="quarto-figure quarto-figure-center">
 <figure class="figure">
-<p><img src="2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
+<p><img src="2026-05-06-paper-on-effects_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid quarto-figure quarto-figure-center figure-img" width="288"></p>
 </figure>
 </div>
 </div>
 </div>
 <p>If <span class="math inline">\(\sigma=0.5\)</span>, this means R&amp;D and experiment-compute are strong complements, and having infinite R&amp;D labor will only increase algorithmic efficiency by around 2X (assuming constant returns to scale).</p>
 </dd>
-<dt>It’s hard to identify substitutability from historical data.</dt>
+<dt>It’s harder to identify substitutability from historical data.</dt>
 <dd>
 <p>Suppose we had the following data:</p>
 <table class="caption-top table">
+<colgroup>
+<col style="width: 10%">
+<col style="width: 31%">
+<col style="width: 31%">
+<col style="width: 26%">
+</colgroup>
 <thead>
 <tr class="header">
 <th>quarter</th>
-<th>researchers</th>
-<th>experiment compute</th>
-<th></th>
+<th style="text-align: center;">researcher expenditure</th>
+<th style="text-align: center;">experiment expenditure</th>
+<th style="text-align: center;">compute efficiency</th>
 </tr>
 </thead>
 <tbody>
 <tr class="odd">
 <td>2025Q1</td>
-<td></td>
-<td></td>
-<td></td>
+<td style="text-align: center;">$50M</td>
+<td style="text-align: center;">$50M</td>
+<td style="text-align: center;">+20%</td>
 </tr>
 <tr class="even">
 <td>2025Q2</td>
-<td></td>
-<td></td>
-<td></td>
+<td style="text-align: center;">$50M</td>
+<td style="text-align: center;">$50M</td>
+<td style="text-align: center;">+20%</td>
 </tr>
 <tr class="odd">
 <td></td>
-<td></td>
-<td></td>
-<td></td>
+<td style="text-align: center;"></td>
+<td style="text-align: center;"></td>
+<td style="text-align: center;"></td>
 </tr>
 </tbody>
 </table>
+<p>The fact that we’re spending on both researchers and experiments tells us that they’re complements, but doesn’t tell us how strong the complementarity it.</p>
 </dd>
 </dl>
 </section>