tecunningham
diff --git a/‎_freeze/posts/2025-10-06-the-curve/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions b/‎_freeze/posts/2025-10-06-the-curve/execute-results/html.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/posts/2025-09-19-transformative-AI-notes.html‎
Lines changed: 1 addition & 1 deletion b/‎docs/posts/2025-09-19-transformative-AI-notes.html‎
Lines changed: 1 addition & 1 deletion
@@ -1,8 +1,8 @@
 {
-  "hash": "c13da325195f2ec44f8bad4a812d50e2",
+  "hash": "48f6c42e6ed8a4a421ec75b356c72590",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: The Curve\ncitation: true\nbibliography: ai.bib\nreference-location: section\ncitation-location: document\ndate: 2025-10-06\ndraft: true\nengine: knitr\nformat:\n  html:\n    toc: true\n    other-formats: false\n    lightbox: auto    # ← enables click-to-zoom for figures/images\n---\n\n\n\n<!-- https://tecunningham.github.io/posts/2025-09-19-transformative-AI-notes.html -->\n\n<style>\n   dl {display: grid;}\n   dt {grid-column-start: 1; width: 10em;}\n   dd {grid-column-start: 2; margin-left: 2em;}\n</style>\n\n::: {.column-margin}\n   Thanks to XXX\n:::\n\n\nSome topics that kept coming up at the Curve:\n\n1. We need more forecasts of economic impacts.\n2. We need more theory of capabilities.\n3. We need more metrics of capabilities.\n4. We need more theory of offense-defense balance.\n5. There are two ways of modelling future capabilities.\n\nI feel bad saying \"we need,\" & scolding others for the work they're not doing, so I've also added my own very tentative best guesses about each.\n\n![](images/2025-10-06-11-35-49.png){.column-margin}\n\n##       We need more forecasts of economic impacts\n\nWe only have a few explicit forecasts of strong AI.\n: Only a few people have written down explicit forecasts of the economic effect of strong AI, i.e. the effects on GDP, employment, wages, asset prices, welfare. The closest I'm aware of is Epoch's GATE model.\n\nI think explicit forecasts would be incredibly useful.\n: Many informal conversations and arguments often use vague terms, & it's not clear whether we're really disagreeing. If we are going to consider a policy then we should have a clear prediction. It's like early 2020: if we're going to make a decision about a lock-down it should be based on a clear idea about the counterfactual, which includes a lot of equilibrium effects (FWIW New Zealand managed to navigate COVID exceptionally well because they saw what was happening to other countries first).\n\nExisting forecasts:\n:     1. Most academic forecasts assume that AI capabilities plateau, i.e. it's a one-time labor-saving innovation, and the effects are spread out over time due to diffusion (Acemoglu, Aghion & Bunel, Wharton).\n      1. Brynjolfsson and Korinek give a range of forecasts on GDP growth, but they're relatively lightly modelled. Also Korinek and Hu.\n      2. Epoch's GATE model gives quantitative timelines, I think it's probably the most explicit. They forecast full automation in 2034, by which point GWP has grown 10X. They forecast wages will increase dramatically then collapse somewhat after full automation is achieved.\n\n      We also have forecasting markets and individual forecats but these are hard to interpret: they are a combination of capabilities progress and the effects of those capabilities.\n\n      What am I missing?\n\nWhat are my forecasts?\n: (...)\n\nHow can we get more people to forecast?\n: One idea that Anna Yelikazova and I discussed: sponsor a couple of dozen econ grad students to make forecasts. Get them to write 5-page justifications, and give prizes to the most compelling ones.\n\n\n##       We need more theory of capabilities.\n\nThere are many projects to collect data on the whole waterfall\n\n: 1. Data on AI capabilities -- e.g. benchmarks that have representative tasks across the economy (e.g. GDP-val, APEX).\n      1. Data on AI uplift -- how much more efficient does it make you?\n      2. Data on AI adoption -- adoption by occupations, by industry, by demographic.\n      3. Data on AI usage -- what types of economic tasks are LLMs used for.\n      4. Data on AI impacts -- e.g. changes in hiring and wages by occupation.\n\nCollecting data is hard without theory.\n\n: I don't have that many opinionated *theories* on how each of these should move. I feel theories are super important because we're expecting things to change rapidly, both due to capability growth and adoption growth. If we don't have an explicit theory then we're using an implicit theory. Think of spending a lot of time & resources collecting samples of COVID, but not, at the same time, working on a theory of how epidemics evolve and who's more susceptible. \n\nHere are some existing theories of AI impacts:\n\n:     1. Informal observations about what types of tasks LLMs do well on: verifiable, short-horizon, low-context.\n      1. Eloundou et al.'s ranking of ONET task \"exposure\"\n      2. METR's ranking of benchmark task \"time horizon.\"\n\n      A related argument: my feeling is we're more constrained on theory than data. \n\n\n##       We need more metrics of capabilities\n\nWe don't have a standard way of defining AI capabilities.\n: We say \"strong AI\", \"transformative AI\", \"AGI\", or \"ASI\".\n\n      The best concrete metric is probably METR's time horizon index: we can say \"what happens when AI can do a one month task?\"\n\n      The Forecasting Research Institute is working on a set of well-defined capability scenarios.\n\nA metric I think would be useful: frontier cost-efficiency growth.\n: There are hundreds of cost-efficiency metrics that have been regularly increasing over decades: transistor density, corn yield, compression efficiency. When AI becomes useful then we expect these metrics to start improving more quickly, & that rate of improvement is a useful metric because it's (1) unambiguous; (2) clearly economically relevant; (3) upstream of other economic impacts like employment.\n\n![](images/2025-10-06-11-33-10.png){.column-margin}\n\n##       We need more theory of offense-defense balance\n\nMany discussions were about how AI will change the offense-defense balance\n\n: Examples that came up in the Curve:\n\n      - ransomware\n      - spearfishing\n      - media manipulation\n      - drone assassinations\n      - drone warfare. \n   \n    There are dozens of others.\n\nWe should have some common theory.\n: It seems wasteful to treat each of these problems independently, there ought to be some general principles we can apply on how AI will affect offense-defense balance. The closest I know is @garfinkel2019offensedefense, they hypothesise that in many cases, when investments are sufficiently large, adding more resources will tend to favor defense.\n\nMy best guess.\n: The \n\n\n![](images/2025-10-06-11-30-58.png){.column-margin}\n\n##       There are two ways of modelling AI's impact\n\nThere's a nice distinction between two approaches.\n\n: From a lot of conversations about recursive self-improvement I realized it's useful to distinguish between two qualitatively different ways of thinking about AI's ability to do work:\n\n      1. _Top down:_ AI replaces each of the human subtasks. Hits human-level ability.\n      2. _Bottom-up:_ AI just does the entire procedure from first principles.\n   \n   I think 80% of discussion of economic impacts talks about top-down.\n\nWhat would bottom-up modelling look like?\n: Instead of.\n\nWhat are the implications for recursive self-improvement?\n: AI treats it as a pure optimization problem, already it's better at chip design, algorithm design.\n\n\n\n\n##       Other things\n\nWhat metrics should the labs report.\n: .\n\nMy highest compliment to your work is that I didn't read it.\n: If I start reading your essay and I realize it's really good then I put it aside until I can organize my own thoughts on this question. My thoughts can take a long time to organize. My favorite book, after 20 years, I still haven't made it farther than half-way through. If I tell you I've read your essay it means I liked it but I didn't love it.\n",
+    "markdown": "---\ntitle: The Curve\ncitation: true\nbibliography: ai.bib\nreference-location: document\ncitation-location: document\ndate: 2025-10-06\nauthor: Tom Cunningham\ndraft: true\nengine: knitr\nformat:\n  html:\n    toc: true\n    other-formats: false\n    lightbox: auto    # ← enables click-to-zoom for figures/images\n---\n\n\n\n<!-- https://tecunningham.github.io/posts/2025-10-06-the-curve.html -->\n<style>\n   dl {display: grid;}\n   dt {grid-column-start: 1; width: 10em;}\n   @media (min-width: 768px) {\n     dt { width: 15em; }\n   }\n   dd {grid-column-start: 2; margin-left: 2em;}\n</style>\n\n::: {.column-margin}\n   <!-- Thanks to XXX -->\n:::\n\n![](images/2025-10-06-11-35-49.png){.column-margin}\nSome topics that kept coming up at the Curve:\n\n1. We need more forecasts of economic impacts.\n2. We need more theory of capabilities.\n3. We need more metrics of capabilities.\n4. We need more theory of offense-defense balance.\n5. There are two ways of modelling future capabilities.\n\nI feel bad saying \"we need,\" & scolding others for the work they're not doing, so I've also added my own very tentative best guesses about each.\n\n#       We need more forecasts of economic impacts\n\nWe have few forecasts of the impact of strong AI.\n: Only a few people have written down explicit forecasts of the economic effect of strong AI, i.e. the effects on GDP, employment, wages, asset prices, welfare. The closest I'm aware of is Epoch's GATE model, see below.\n\nI think explicit forecasts would be incredibly useful.\n: Many informal conversations and arguments often use vague terms, & it's not clear whether we're really disagreeing. It would be super useful if I could say \"do you think things will be more like the X model, or the Y model?\"\n\n      I feel it's like early 2020 and COVID: if we're trying to make a decision about whether to announce a lock-down it should be based on a clear idea about the counterfactual, which includes a lot of equilibrium effects (FWIW New Zealand managed to get through COVID so well because they saw what was happening to other countries first).\n\nExisting forecasts:\n:     1. Most academic forecasts assume that AI capabilities plateau, i.e. it's a one-time labor-saving innovation, and the effects are spread out over time due to diffusion: @acemoglu2024simple forecasts 0.06% incremental annual growth, @aghion2024ai forecasts 1%.\n      1. @korinek2024scenarios forecast that, over 15 years, GDP triples; wages increase a little at first, then collapse when everything is automated. GDP continues to increase.\n      2. Epoch's GATE model (@erdil2025gate): they forecast full automation in 2034, by which point GWP has grown 10X. They forecast wages will increase dramatically then collapse somewhat after full automation is achieved.[^others]\n\n      We also have forecasting markets and individual forecasts but these are hard to interpret: they are a combination of capabilities progress and the effects of those capabilities. Finally there are financial forecasts, e.g. those used to plan AI and datacenter investments, which implicitly contains models of the economic evolution. My sense is these are often fairly crude extrapolations of existing trends.\n\n      Am I missing others?\n\n[^others]: Tom Davidson has a [2021 report on Explosive Growth](https://www.openphilanthropy.org/research/could-advanced-ai-drive-explosive-economic-growth/), and a model of [takeoff speeds](https://takeoffspeeds.com)), but I don't think either has a central forecast with multiple aggregate economic variables.\n\n\n<!-- What are my own forecasts?\n: (...) -->\n\nHow can we get more people to forecast?\n: One idea that Anna Yelikazova and I discussed: sponsor a couple of dozen econ grad students to make forecasts. Get them to write 5-page justifications, and give prizes to the most compelling ones.\n\n\n#       We need more theory of capabilities.\n\nWe have many projects to collect data on AI impacts.\n\n: We can organize AI impacts into a waterfall, top to bottom:\n\n      1. **Data on AI capabilities** -- benchmarks that have representative tasks across the economy  - e.g. GDP-val (@patwardhan2025gdpval), APEX.\n      2. **Data on AI uplift** -- effect on productivity, e.g. @becker2025uplift.\n      3. **Data on AI adoption** -- adoption by occupations, by industry, by demographic, E.g. @bick2024rapid.\n      4. **Data on AI usage** -- what types of economic tasks are LLMs used for, e.g. @handa2025economicindex, @chatterji2025chatgpt.\n      5. **Data on AI economic effects** -- changes in hiring and wages by occupation, e.g. @brynjolfsson2025canaries.\n\nCollecting data is hard without theory.\n\n: I don't think we have that many opinionated *theories* on how each of these should move. I feel theories are important here because we're expecting things to change rapidly, both due to capability growth and adoption growth. If we don't have an explicit theory then we're using an implicit theory. Think of spending a lot of time & resources collecting samples of COVID, but not, at the same time, working on a theory of how epidemics evolve and who's more susceptible. \n\nWe don't have many theories of AI's impacts.\n\n: Here are the prominent theories of the *ways* in which AI is likely to be adopted:\n\n      1. There  are some standard informal observations about what types of tasks LLMs do well on: verifiable, short-horizon, low-context, text-based.\n      2. Many people have proposed indices of task or occupation \"exposure\" to AI: @frey2013future, @brynjolfsson2018can, @felten2018method, @webb2019impact, @eloundou2023gpts. METR's time-horizon paper (@kwa2025longtasks) can also be interpreted as an exposure index for tasks.\n\n      A related argument: my feeling is we're more constrained on theory than data. \n\nNathan Lambert seems to feel the same way.\n: Nathan Lambert's post-curve [post](https://www.interconnects.ai/p/thoughts-on-the-curve) says *\"many AI obsessors are more interested in where the technology is going rather than how or what exactly it is going to be.\"*\n\n\n#       We need more metrics of capabilities\n\nWe don't have a standard way of defining AI capabilities.\n: We say \"strong AI\", \"transformative AI\", \"AGI\", or \"ASI\".\n\n      The best concrete metric is probably METR's time horizon index. We can then say \"what happens when AI can do a one month task?\"\n\n      The Forecasting Research Institute is working on a set of well-defined capability scenarios.\n\n![Our World in Data: Technology Costs over Time.](images/2025-09-19-16-49-40.png){.column-margin}\n\nMy favorite metric: frontier cost-efficiency growth.\n: There are hundreds of cost-efficiency metrics that have been regularly increasing over decades: transistor density, corn yield, compression efficiency (see the chart on the right). When AI becomes useful then we expect these metrics to start improving more quickly. Cost-efficiency growth is a useful metric because it's (1) unambiguous; (2) economically relevant; (3) upstream of other economic impacts like employment.\n\n      Existing historical cost-efficiency data:\n\n      - @farmer2016predictable documents progress in 53 technologies (visualized at [Our World in Data](https://ourworldindata.org/grapher/costs-of-66-different-technologies-over-time)), but only up to 2013.\n      - @sherry2021fast document historical trends in algorithmic efficiency across a variety of algorithms.\n\n#       We need more theory of the offense-defense balance\n\n![](images/2025-10-06-11-33-10.png){.column-margin}\n\nMany discussions were about how AI will change the offense-defense balance\n\n: Examples that came up in the Curve:\n\n      - ransomware\n      - spearfishing\n      - media manipulation\n      - drone assassinations\n      - drone warfare. \n   \n    There are dozens of others.\n\nWe should have some common theory.\n: It seems wasteful to treat each of these problems independently, there ought to be some general principles we can apply on how AI will affect offense-defense balance. The closest I know is @garfinkel2019offensedefense, they hypothesise that in many cases, when investments are sufficiently large, adding more resources will tend to favor defense.\n\nMy best guess.\n: The \n\n\n![](images/2025-10-06-11-30-58.png){.column-margin}\n\n#       There are two ways of modelling AI's impact\n\nThere's a nice distinction between two approaches.\n\n: From a lot of conversations about recursive self-improvement I realized it's useful to distinguish between two qualitatively different ways of thinking about AI's ability to do work:\n\n      1. _Top down:_ AI replaces each of the human subtasks. Hits human-level ability.\n      2. _Bottom-up:_ AI just does the entire procedure from first principles.\n   \n      I think 80% of discussion of economic impacts was of the top-down type.\n\nWhat would bottom-up modelling look like?\n: Instead of.\n\n      Algorithm discovery:\n\n      - MLGO; \n\n\n      There are some papers with \"model organisms\" of recursive self-improvement: Grefenstette (1986) [genetic algorithm to learn parameters for genetic algorithsm](https://ui.adsabs.harvard.edu/abs/1986ITSMC..16..122G/abstract); [AutoML-Zero](https://arxiv.org/pdf/2003.03384) (2020); Schmidhuber (2003) [Godel Machines](https://arxiv.org/abs/cs/0309048?utm_source=chatgpt.com). \n\nWhat are the implications for recursive self-improvement?\n: AI treats it as a pure optimization problem, already it's better at chip design, algorithm design.\n\n\nSome related discussion:\n: Nathan Lambert recapitulates some discussions about recursive self-improvement [here](https://www.interconnects.ai/p/thoughts-on-the-curve).\n\n    This is related to the Bresnahan/systems view of AI. He talks about the first wave of ML models: \"The transition to ICT-based production has largely proceeded at the production system level, not the task level.\"[^bresnahan] \n\n\n[^bresnahan]: \"My empirical conclusion about these applications is that the lazy idea of AI – that is, of computer systems that are able to perform productive tasks previously done by humans– is irrelevant to understanding how these technologies create value. Here“irrelevant” does not mean that substitution of machine for human tasks is less important than other determinants of the value in use of AITs. It means irrelevant: task-level substitution of machine for human plays no role in these highly valuable systems.\"\n\n\n   <!-- What metrics should the labs report?\n   : .\n\n   \\ \n\n   My highest compliment to your work is that I didn't read it.\n   : If I start reading your essay and I realize it's really good then I put it aside until I can organize my own thoughts on this question. My thoughts can take a long time to organize. My favorite book, after 20 years, I still haven't made it farther than half-way through. If I tell you I've read your essay it means I liked it but I didn't love it. -->\n\n\n#        References",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
 
@@ -1179,7 +1179,7 @@ <h1>AI scientists will be unlike human scientists</h1>
 });
 </script>
 </div> <!-- /content -->
-<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","loop":false,"descPosition":"bottom","openEffect":"zoom"});
+<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","loop":false,"closeEffect":"zoom","descPosition":"bottom","selector":".lightbox"});
 (function() {
   let previousOnload = window.onload;
   window.onload = () => {