tecunningham
diff --git a/‎_freeze/posts/2025-10-06-the-curve/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/posts/2025-10-06-the-curve/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/2023-03-13-resume-tom-cunningham.pdf‎
-639 KB b/‎docs/2023-03-13-resume-tom-cunningham.pdf‎
-639 KB
diff --git a/‎docs/IMG_0571.jpeg‎
-534 KB b/‎docs/IMG_0571.jpeg‎
-534 KB
diff --git a/‎docs/index.html‎
Lines changed: 2 additions & 2 deletions b/‎docs/index.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments.html‎
Lines changed: 2 additions & 2 deletions b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-1-1.png‎
-35.9 KB b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-1-1.png‎
-35.9 KB
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-2-1.png‎
-17.1 KB b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-2-1.png‎
-17.1 KB
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-3-1.png‎
-20.1 KB b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-3-1.png‎
-20.1 KB
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-4-1.png‎
-20 KB b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-4-1.png‎
-20 KB
diff --git a/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-5-1.png‎
-15.6 KB b/‎docs/posts/2020-08-06-bayesian-interpretation-of-experiments_files/figure-html/unnamed-chunk-5-1.png‎
-15.6 KB
@@ -1,8 +1,8 @@
 {
-  "hash": "c1a3ea6e3b802b8d9ac00e2bdf5f85f6",
+  "hash": "c13da325195f2ec44f8bad4a812d50e2",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: The Curve\ncitation: true\nbibliography: ai.bib\nreference-location: section\ncitation-location: document\ndate: 2025-10-06\ndraft: true\nengine: knitr\nformat:\n  html:\n    toc: true\n    other-formats: false\n    lightbox: auto    # ← enables click-to-zoom for figures/images\n---\n\n\n\n<!-- https://tecunningham.github.io/posts/2025-09-19-transformative-AI-notes.html -->\n\n<style>\n   dl {display: grid;}\n   dt {grid-column-start: 1; width: 5cm;}\n   dd {grid-column-start: 2; margin-left: 2em;}\n</style>\n\n::: {.column-margin}\n   Thanks to XXX\n:::\n\n\n\n\n\n\n\n\n\n\n\n\n\nSome topics that kept coming up at the Curve:\n\n1. We need more forecasts of economic impacts.\n2. We need more theory of capabilities.\n3. We need more metrics of capabilities.\n4. We need more theory of offense-defense balance.\n5. There are two ways of modelling future capabilities.\n\nI feel bad saying \"we need\", so I've also added my best guesses about each.\n\n##       We need more forecasts of economic impacts\n\nOnly a few people have written down explicit forecasts of the economic effect of strong AI, i.e. the effects on GDP, employment, wages, asset prices, welfare. The closest I'm aware of is Epoch's GATE model.\n\nI think it would be incredibly useful to have a set of serious forecasts that we can use to argue about. Many informal conversations and arguments often use vague terms, & it's not clear whether we're really disagreeing. If we are going to consider a policy then we should have a clear prediction. It's like early 2020: if we're going to make a decision about a lock-down it should be based on a clear idea about the counterfactual, which includes a lot of equilibrium effects (FWIW New Zealand managed to navigate COVID exceptionally well because they saw what was happening to other countries first).\n\nThe existing state of economic forecasts of strong AI:\n\n1. Most academic forecasts assume that AI plateaus, i.e. it's a one-time labor-saving innovation, and the effects are spread out over time due to diffusion (Acemoglu, Aghion & Bunel, Wharton).\n2. Brynjolfsson and Korinek give a range of forecasts on GDP growth, but they're relatively lightly modelled. Also Korinek and Hu.\n3. Epoch's GATE model gives quantitative timelines.\n\nWe also have forecasting markets and individual forecats but these are hard to interpret: they are a combination of capabilities progress and the effects of those capabilities.\n\nWhat am I missing?\n\nOne idea that Anna Yelikazova and I discussed: sponsor a couple of dozen grad students to make forecasts.\n\n\n##       We need more theory of capabilities.\n\nThere are many projects to collect data on the whole waterfall of AI impacts:\n\n1. Data on AI capabilities -- e.g. benchmarks that have representative tasks across the economy (e.g. GDP-val, APEX).\n2. Data on AI uplift -- how much more efficient does it make you?\n3. Data on AI adoption -- adoption by occupations, by industry, by demographic.\n4. Data on AI usage -- what types of economic tasks are LLMs used for.\n5. Data on AI impacts -- e.g. changes in hiring and wages by occupation.\n\nHowever I think we don't have that many opinionated *theories* on how each of these should move. I feel theories are super important because we're expecting things to change rapidly, both due to capability growth and adoption growth. If we don't have an explicit theory then we're using an implicit theory.\n\nHere are some theories:\n\n1. Informal observations about what types of tasks do well with LLM architectures: verifiable, short-horizon.\n2. Eloundou et al.'s ranking of ONET task \"exposure\"\n3. METR's ranking of benchmark task \"time horizon.\"\n\nA related argument: my feeling is we're more constrained on theory than data. \n\n\n##       We need more metrics of capabilities\n\nWe don't have a standard way of defining AI capabilities, we say \"strong AI\", \"transformative AI\", \"AGI\", or \"ASI\".\n\nThe best concrete metric is probably METR's time horizon index: we can say \"what happens when AI can do a one month task?\"\n\nThe Forecasting Research Institute is working on a set of well-defined capability scenarios.\n\nHere's a metric I think would be useful: frontier cost-efficiency growth. There are hundreds of cost-efficiency metrics that have been regularly increasing over decades: transistor density, corn yield, compression efficiency. When AI becomes useful then we expect these metrics to start improving more quickly, & that rate of improvement is a useful metric because it's (1) unambiguous; (2) clearly economically relevant; (3) upstream of other economic impacts like employment.\n\n##       We need more theory of offense-defense balance\n\nMany discussions were about how AI will change the offense-defense balance: ransomware, spearfishing, media manipulation, drone assassinations, drone warfare. There are dozens of others.\n\nIt seems wasteful to treat each of these problems independently, there ought to be some general principles we can apply on how AI will affect offense-defense balance. .\n\n\n\n##       There are two ways of modelling AI's impact\n\nFrom conversations I realized there's a nice distinction between two alternative approaches.\n\n1. _Top down:_ AI replaces each of the human subtasks. Hits human-level ability.\n2. _Bottom-up:_ AI just does the entire procedure from first principles.\n\n\n: - Applied to recursive self-improvement: AI treats it as a pure optimization problem, already it's better at chip design, algorithm design.\n\n##       What would I do\n\n\nThis is all complaining about others' failure to be sufficiently opinionated. For what it's worth, here are my best guesses:\n\n\n\n\n\n##       Other things\n\nWhat metrics should the labs report.\n: .\n\nMy highest compliment to your work is that I didn't read it.\n: If I start reading your essay and I realize it's really good then I put it aside until I can organize my own thoughts on this question. My thoughts can take a long time to organize. My favorite book, after 20 years, I still haven't made it farther than half-way through. If I tell you I've read your essay it means I liked it but I didn't love it.\n",
+    "markdown": "---\ntitle: The Curve\ncitation: true\nbibliography: ai.bib\nreference-location: section\ncitation-location: document\ndate: 2025-10-06\ndraft: true\nengine: knitr\nformat:\n  html:\n    toc: true\n    other-formats: false\n    lightbox: auto    # ← enables click-to-zoom for figures/images\n---\n\n\n\n<!-- https://tecunningham.github.io/posts/2025-09-19-transformative-AI-notes.html -->\n\n<style>\n   dl {display: grid;}\n   dt {grid-column-start: 1; width: 10em;}\n   dd {grid-column-start: 2; margin-left: 2em;}\n</style>\n\n::: {.column-margin}\n   Thanks to XXX\n:::\n\n\nSome topics that kept coming up at the Curve:\n\n1. We need more forecasts of economic impacts.\n2. We need more theory of capabilities.\n3. We need more metrics of capabilities.\n4. We need more theory of offense-defense balance.\n5. There are two ways of modelling future capabilities.\n\nI feel bad saying \"we need,\" & scolding others for the work they're not doing, so I've also added my own very tentative best guesses about each.\n\n![](images/2025-10-06-11-35-49.png){.column-margin}\n\n##       We need more forecasts of economic impacts\n\nWe only have a few explicit forecasts of strong AI.\n: Only a few people have written down explicit forecasts of the economic effect of strong AI, i.e. the effects on GDP, employment, wages, asset prices, welfare. The closest I'm aware of is Epoch's GATE model.\n\nI think explicit forecasts would be incredibly useful.\n: Many informal conversations and arguments often use vague terms, & it's not clear whether we're really disagreeing. If we are going to consider a policy then we should have a clear prediction. It's like early 2020: if we're going to make a decision about a lock-down it should be based on a clear idea about the counterfactual, which includes a lot of equilibrium effects (FWIW New Zealand managed to navigate COVID exceptionally well because they saw what was happening to other countries first).\n\nExisting forecasts:\n:     1. Most academic forecasts assume that AI capabilities plateau, i.e. it's a one-time labor-saving innovation, and the effects are spread out over time due to diffusion (Acemoglu, Aghion & Bunel, Wharton).\n      1. Brynjolfsson and Korinek give a range of forecasts on GDP growth, but they're relatively lightly modelled. Also Korinek and Hu.\n      2. Epoch's GATE model gives quantitative timelines, I think it's probably the most explicit. They forecast full automation in 2034, by which point GWP has grown 10X. They forecast wages will increase dramatically then collapse somewhat after full automation is achieved.\n\n      We also have forecasting markets and individual forecats but these are hard to interpret: they are a combination of capabilities progress and the effects of those capabilities.\n\n      What am I missing?\n\nWhat are my forecasts?\n: (...)\n\nHow can we get more people to forecast?\n: One idea that Anna Yelikazova and I discussed: sponsor a couple of dozen econ grad students to make forecasts. Get them to write 5-page justifications, and give prizes to the most compelling ones.\n\n\n##       We need more theory of capabilities.\n\nThere are many projects to collect data on the whole waterfall\n\n: 1. Data on AI capabilities -- e.g. benchmarks that have representative tasks across the economy (e.g. GDP-val, APEX).\n      1. Data on AI uplift -- how much more efficient does it make you?\n      2. Data on AI adoption -- adoption by occupations, by industry, by demographic.\n      3. Data on AI usage -- what types of economic tasks are LLMs used for.\n      4. Data on AI impacts -- e.g. changes in hiring and wages by occupation.\n\nCollecting data is hard without theory.\n\n: I don't have that many opinionated *theories* on how each of these should move. I feel theories are super important because we're expecting things to change rapidly, both due to capability growth and adoption growth. If we don't have an explicit theory then we're using an implicit theory. Think of spending a lot of time & resources collecting samples of COVID, but not, at the same time, working on a theory of how epidemics evolve and who's more susceptible. \n\nHere are some existing theories of AI impacts:\n\n:     1. Informal observations about what types of tasks LLMs do well on: verifiable, short-horizon, low-context.\n      1. Eloundou et al.'s ranking of ONET task \"exposure\"\n      2. METR's ranking of benchmark task \"time horizon.\"\n\n      A related argument: my feeling is we're more constrained on theory than data. \n\n\n##       We need more metrics of capabilities\n\nWe don't have a standard way of defining AI capabilities.\n: We say \"strong AI\", \"transformative AI\", \"AGI\", or \"ASI\".\n\n      The best concrete metric is probably METR's time horizon index: we can say \"what happens when AI can do a one month task?\"\n\n      The Forecasting Research Institute is working on a set of well-defined capability scenarios.\n\nA metric I think would be useful: frontier cost-efficiency growth.\n: There are hundreds of cost-efficiency metrics that have been regularly increasing over decades: transistor density, corn yield, compression efficiency. When AI becomes useful then we expect these metrics to start improving more quickly, & that rate of improvement is a useful metric because it's (1) unambiguous; (2) clearly economically relevant; (3) upstream of other economic impacts like employment.\n\n![](images/2025-10-06-11-33-10.png){.column-margin}\n\n##       We need more theory of offense-defense balance\n\nMany discussions were about how AI will change the offense-defense balance\n\n: Examples that came up in the Curve:\n\n      - ransomware\n      - spearfishing\n      - media manipulation\n      - drone assassinations\n      - drone warfare. \n   \n    There are dozens of others.\n\nWe should have some common theory.\n: It seems wasteful to treat each of these problems independently, there ought to be some general principles we can apply on how AI will affect offense-defense balance. The closest I know is @garfinkel2019offensedefense, they hypothesise that in many cases, when investments are sufficiently large, adding more resources will tend to favor defense.\n\nMy best guess.\n: The \n\n\n![](images/2025-10-06-11-30-58.png){.column-margin}\n\n##       There are two ways of modelling AI's impact\n\nThere's a nice distinction between two approaches.\n\n: From a lot of conversations about recursive self-improvement I realized it's useful to distinguish between two qualitatively different ways of thinking about AI's ability to do work:\n\n      1. _Top down:_ AI replaces each of the human subtasks. Hits human-level ability.\n      2. _Bottom-up:_ AI just does the entire procedure from first principles.\n   \n   I think 80% of discussion of economic impacts talks about top-down.\n\nWhat would bottom-up modelling look like?\n: Instead of.\n\nWhat are the implications for recursive self-improvement?\n: AI treats it as a pure optimization problem, already it's better at chip design, algorithm design.\n\n\n\n\n##       Other things\n\nWhat metrics should the labs report.\n: .\n\nMy highest compliment to your work is that I didn't read it.\n: If I start reading your essay and I realize it's really good then I put it aside until I can organize my own thoughts on this question. My thoughts can take a long time to organize. My favorite book, after 20 years, I still haven't made it farther than half-way through. If I tell you I've read your essay it means I liked it but I didn't love it.\n",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
 
@@ -254,7 +254,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1759734000000" data-listing-file-modified-sort="1735428686821" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1848">
+<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1759734000000" data-listing-file-modified-sort="1759773495979" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="10" data-listing-word-count-sort="1848">
 <div class="thumbnail">
 <p><a href="./posts/2024-12-26-heavy-tailed-noise.html" class="no-external"></a></p><a href="./posts/2024-12-26-heavy-tailed-noise.html" class="no-external">
 </a><p><a href="./posts/2024-12-26-heavy-tailed-noise.html" class="no-external"></a></p>
@@ -281,7 +281,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1759388400000" data-listing-file-modified-sort="1759773426215" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="43" data-listing-word-count-sort="8412">
+<div class="quarto-post image-right" data-index="2" data-listing-date-sort="1759388400000" data-listing-file-modified-sort="1759775041680" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="43" data-listing-word-count-sort="8413">
 <div class="thumbnail">
 <p><a href="./posts/2025-09-19-transformative-AI-notes.html" class="no-external"></a></p><a href="./posts/2025-09-19-transformative-AI-notes.html" class="no-external">
 </a><p><a href="./posts/2025-09-19-transformative-AI-notes.html" class="no-external"></a></p>
 
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 <meta name="author" content="Tom Cunningham">
-<meta name="dcterms.date" content="2025-09-29">
+<meta name="dcterms.date" content="2025-10-06">
 <meta name="description" content="Tom Cunningham blog">
 
 <title>The Bayesian Interpretation of Experiments | Tom Cunningham – Tom Cunningham</title>
@@ -202,7 +202,7 @@ <h1 class="title">The Bayesian Interpretation of Experiments</h1>
       <div>
       <div class="quarto-title-meta-heading">Published</div>
       <div class="quarto-title-meta-contents">
-        <p class="date">September 29, 2025</p>
+        <p class="date">October 6, 2025</p>
       </div>
     </div>