|
| 1 | +--- |
| 2 | +title: "LLM Time-Saving, Demand Theory, and the Cadillac Tasks" |
| 3 | +author: "Tom Cunningham (METR)" |
| 4 | +date: today |
| 5 | +citation: true |
| 6 | +reference-location: document |
| 7 | +bibliography: ai.bib |
| 8 | +format: |
| 9 | + html: |
| 10 | + toc: true |
| 11 | + toc-depth: 3 |
| 12 | +execute: |
| 13 | + echo: false |
| 14 | + warning: false |
| 15 | + error: false |
| 16 | + cache: true |
| 17 | +--- |
| 18 | + |
| 19 | +## Results-first summary |
| 20 | + |
| 21 | +This note reframes LLM time-saving as a *price index* problem in a time-allocation model, then adds a discrete extensive margin for “Cadillac tasks.” The continuous and discrete cases behave very differently, so I keep them separate. |
| 22 | + |
| 23 | +**Continuous (intensive-margin) case:** treat each task’s time requirement as a price $p_i$, with LLM speedup $\beta_i$ implying $p_i' = p_i/\beta_i$. Under homothetic preferences, the output index is $v(p)=1/P(p)$ where $P(p)=e(p,1)$ is the unit time-expenditure function. Equivalent/compensating variation become time-index ratios, and small changes are share-weighted. Large changes require the *area under the compensated demand curve*. (Citations: @caves1982indexnumbers; @hausman1981exact; @willig1976consumerssurplus.) |
| 24 | + |
| 25 | +**Discrete (extensive-margin) case:** tasks can require setup time or be unit-demand. Then LLM speedups change *which* tasks are done, not just *how much* of each task. This is the home of “Cadillac tasks” (tasks you’d never do absent the LLM) and the *worked example* below. Constant-elasticity formulas can fail here. |
| 26 | + |
| 27 | +## Setup: time prices and speedups |
| 28 | + |
| 29 | +**Objects.** Tasks $i=1,\dots,n$, outputs $x_i\ge 0$, time endowment normalized to $1$, time prices $p_i>0$ (time per unit of task output), and LLM speedups $\beta_i>0$ so $p_i' = p_i/\beta_i$. |
| 30 | + |
| 31 | +We interpret $u(x)$ as an *output index* or “effective work accomplished,” with time prices defining the budget: |
| 32 | +\[ |
| 33 | +\sum_i p_i x_i \le 1. |
| 34 | +\] |
| 35 | + |
| 36 | +The important split is: |
| 37 | + |
| 38 | +1. **Continuous intensive margin:** choose continuous $x_i$ (smooth substitution). |
| 39 | +2. **Discrete extensive margin:** choose which tasks to activate (unit demand or setup costs). |
| 40 | + |
| 41 | +I treat these separately because the logic, formulas, and data requirements diverge. |
| 42 | + |
| 43 | +## Continuous (intensive-margin) model |
| 44 | + |
| 45 | +### Primal, dual, and the time price index |
| 46 | + |
| 47 | +The primal problem is |
| 48 | +\[ |
| 49 | +v(p)\;=\;\max_{x\ge 0} u(x) \quad\text{s.t.}\quad \sum_i p_i x_i\le 1. |
| 50 | +\] |
| 51 | + |
| 52 | +Define the expenditure function |
| 53 | +\[ |
| 54 | +e(p,\bar u)=\min_{x\ge 0} \Big\{\sum_i p_i x_i: u(x)\ge \bar u\Big\}. |
| 55 | +\] |
| 56 | + |
| 57 | +If $u(\cdot)$ is homothetic and degree-1 homogeneous, then $e(p,\bar u)=\bar u\,e(p,1)$. Define the **unit time price index** |
| 58 | +\[ |
| 59 | +P(p)\equiv e(p,1)\quad\Rightarrow\quad v(p)=\frac{1}{P(p)}. |
| 60 | +\] |
| 61 | + |
| 62 | +This is the classic index-number framing applied to time prices. @caves1982indexnumbers |
| 63 | + |
| 64 | +### EV/CV in time units |
| 65 | + |
| 66 | +Let $p^0\to p^1$ and $u^k=v(p^k)$. Equivalent and compensating variation (measured in *time*) are |
| 67 | +\[ |
| 68 | +EV=e(p^0,u^1)-1,\qquad CV=e(p^1,u^0)-1. |
| 69 | +\] |
| 70 | + |
| 71 | +Under homotheticity, |
| 72 | +\[ |
| 73 | +EV=\frac{P(p^0)}{P(p^1)}-1,\qquad CV=\frac{P(p^1)}{P(p^0)}-1. |
| 74 | +\] |
| 75 | + |
| 76 | +This is the cleanest way to translate LLM time savings into a welfare measure. @hausman1981exact |
| 77 | + |
| 78 | +### Small changes (share-weighted) |
| 79 | + |
| 80 | +Let $t_i(p)\equiv p_i x_i^*(p)$ be optimal time shares. For small changes in time prices, |
| 81 | +\[ |
| 82 | +d\ln v\;=\;-d\ln P\;\approx\;\sum_i t_i\,d\ln \beta_i. |
| 83 | +\] |
| 84 | + |
| 85 | +This is the time-allocation analog of share-weighted Hulten-style approximations. @hulten1978growth |
| 86 | + |
| 87 | +### Large changes (area under compensated demand) |
| 88 | + |
| 89 | +When LLM gains are large, constant-elasticity approximations are dangerous. Using Hicksian (compensated) shares $s_i^H(p)$, |
| 90 | +\[ |
| 91 | +d\ln P(p)=\sum_i s_i^H(p)\,d\ln p_i. |
| 92 | +\] |
| 93 | + |
| 94 | +For a single changing price $p_2$, |
| 95 | +\[ |
| 96 | +\ln\frac{P(p^1)}{P(p^0)}=\int s_2^H(p_2)\,d\ln p_2, |
| 97 | +\] |
| 98 | + |
| 99 | +i.e., exact welfare is the **area under the compensated demand curve**. @willig1976consumerssurplus |
| 100 | + |
| 101 | +### CES specialization (closed-form) |
| 102 | + |
| 103 | +For a CES aggregator |
| 104 | +\[ |
| 105 | +u(x)=\left(\sum_i \alpha_i x_i^{\frac{\sigma-1}{\sigma}}\right)^{\frac{\sigma}{\sigma-1}},\quad\sigma>0, |
| 106 | +\] |
| 107 | + |
| 108 | +the price index and time shares are |
| 109 | +\[ |
| 110 | +P(p)=\left(\sum_i \alpha_i^{\sigma}p_i^{1-\sigma}\right)^{\frac{1}{1-\sigma}},\qquad |
| 111 | + t_i(p)=\frac{\alpha_i^{\sigma}p_i^{1-\sigma}}{\sum_j \alpha_j^{\sigma}p_j^{1-\sigma}}. |
| 112 | +\] |
| 113 | + |
| 114 | +In the two-task case, if task 2 speeds up by $\beta$ and its ex-ante share is $s_0$, then |
| 115 | +\[ |
| 116 | +\frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}},\qquad \varepsilon\equiv\sigma. |
| 117 | +\] |
| 118 | + |
| 119 | +This is the continuous benchmark; I will **not** use it for Cadillac tasks, which are discrete. @caves1982indexnumbers |
| 120 | + |
| 121 | +#### Proposition (Lamport-style): CES output response |
| 122 | + |
| 123 | +**Claim.** For two-task CES with ex-ante share $s_0$ on task 2 and speedup $\beta$, the optimized output ratio is |
| 124 | +\[ |
| 125 | +\frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}}. |
| 126 | +\] |
| 127 | + |
| 128 | +**Proof (Lamport style).** |
| 129 | + |
| 130 | +1. *Given* CES price index $P(p)=\left(\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}\right)^{\frac{1}{1-\varepsilon}}$ and output $v(p)=1/P(p)$. |
| 131 | +2. *Let* $p_2' = p_2/\beta$ while $p_1$ is fixed. |
| 132 | +3. *Then* the output ratio is |
| 133 | + \[ |
| 134 | + \frac{y'}{y}=\frac{P(p^0)}{P(p^1)}= |
| 135 | + \left(\frac{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}(p_2/\beta)^{1-\varepsilon}}\right)^{\frac{1}{\varepsilon-1}}. |
| 136 | + \] |
| 137 | +4. *Define* the ex-ante share |
| 138 | + \[ |
| 139 | + s_0\equiv\frac{\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}. |
| 140 | + \] |
| 141 | +5. *Substitute* into Step 3 to obtain |
| 142 | + \[ |
| 143 | + \frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}}. |
| 144 | + \] |
| 145 | +6. **QED.** |
| 146 | + |
| 147 | +## Discrete (extensive-margin) model: Cadillac tasks live here |
| 148 | + |
| 149 | +The continuous model assumes you always do *some* of each task. That is wrong when tasks are lumpy, have setup costs, or are unit-demand. In those cases, LLMs can create **newly affordable tasks**, meaning the major effect is *selection*, not *intensive* time reallocation. |
| 150 | + |
| 151 | +### Unit-demand formulation |
| 152 | + |
| 153 | +Let each task have payoff $u_i$ and required time $w_i(p)$, with decision $q_i\in\{0,1\}$. Then |
| 154 | +\[ |
| 155 | +\max_{q\in\{0,1\}^n}\sum_i u_i q_i\quad\text{s.t.}\quad \sum_i w_i(p) q_i\le 1. |
| 156 | +\] |
| 157 | + |
| 158 | +Speedups change $w_i$ by $\beta_i$, which can **turn tasks on** once a threshold is crossed. This is exactly the “Cadillac tasks” phenomenon: tasks that were too time-expensive become attractive after the LLM. The usual CES elasticity is not a good summary in this regime. |
| 159 | + |
| 160 | +### Setup-cost variant (bridging discrete and continuous) |
| 161 | + |
| 162 | +Add a fixed setup time $\phi_i$ and a continuous intensity $x_i$: |
| 163 | +\[ |
| 164 | +\max_{q,x}\;u(x)\quad\text{s.t.}\quad \sum_i \phi_i q_i + \sum_i p_i x_i \le 1,\; x_i=0\;\text{if }q_i=0. |
| 165 | +\] |
| 166 | + |
| 167 | +If $\phi_i=0$, we recover the continuous model. If $\phi_i>0$, large LLM speedups mostly expand the active set $\{i:q_i=1\}$, not the intensive shares. |
| 168 | + |
| 169 | +### Worked example (discrete, not continuous) |
| 170 | + |
| 171 | +**Example.** Suppose you can pick *one* task (unit-demand). Task A yields value $u_A=10$ and takes 1 hour. Task B yields $u_B=12$ and takes 2 hours. Without LLMs you choose A. Now an LLM speeds up task B so it takes 1 hour. You switch to B. |
| 172 | + |
| 173 | +- **Upper bound on time-equivalent gain:** 1 hour (if the extra value $u_B-u_A$ is “worth” a full hour). |
| 174 | +- **Lower bound:** 0 hours (if the extra value is just a small quality bump). |
| 175 | + |
| 176 | +So the *observed* reallocation does not identify a precise time-savings without modeling discrete choice. This is why elasticity-of-substitution estimates are weak in the Cadillac regime. |
| 177 | + |
| 178 | +### Cadillac tasks (discrete interpretation) |
| 179 | + |
| 180 | +Cadillac tasks are those you would not do *at all* without the LLM, but you do once their time cost drops. Examples: |
| 181 | + |
| 182 | +- literature reviews you previously would not attempt, |
| 183 | +- custom data visualizations, |
| 184 | +- long-form proofreading or refactoring. |
| 185 | + |
| 186 | +In a unit-demand or setup-cost model, these tasks appear as **newly activated $q_i=1$ choices**, not as marginal increases in $x_i$. This is a discrete effect, so apply discrete logic—not the continuous CES approximation. |
| 187 | + |
| 188 | +## Practical examples (grounding) |
| 189 | + |
| 190 | +- **Query-level time savings:** If a chatbot is used for 10% of tasks and yields 5x speedups, naive share-weighted estimates imply large gains. But if those tasks are Cadillac tasks (discrete selection), the aggregate gain is much smaller. @anthropic2025estimatingproductivitygains |
| 191 | +- **RCTs with task selection:** In uplift experiments, participants may choose different tasks once AI is available. That makes comparisons tricky unless you model the discrete choice margin. @becker2025uplift |
| 192 | +- **Time allocation as a resource constraint:** Classic time-allocation models already interpret time as a shadow price. @deserpa1971time |
| 193 | + |
| 194 | +## Diagrams |
| 195 | + |
| 196 | +### Unified map (assumptions → objects → results) |
| 197 | + |
| 198 | +```{mermaid} |
| 199 | +graph TD |
| 200 | + A[Primitives: tasks i=1..n, time endowment=1] --> B[Technology: time prices p_i; AI => p_i' = p_i/β_i] |
| 201 | + B --> C[Choice: allocate time/output s.t. Σ p_i x_i ≤ 1] |
| 202 | +
|
| 203 | + C --> D{Preference/output aggregator u(x)} |
| 204 | + D --> D1[Homothetic (continuous)] |
| 205 | + D --> D2[CES] |
| 206 | + D --> D3[Discrete or setup-cost tasks] |
| 207 | +
|
| 208 | + D1 --> E[Price index P(p)=e(p,1)] |
| 209 | + E --> F[Indirect output v(p)=1/P(p)] |
| 210 | + F --> G[EV/CV = ratios of P(p)] |
| 211 | + F --> H[Local: d ln v = Σ t_i d ln β_i] |
| 212 | +
|
| 213 | + D2 --> I[Closed forms for P(p), shares] |
| 214 | + I --> J[Two-task formula] |
| 215 | +
|
| 216 | + D3 --> K[Activation/threshold effects] |
| 217 | + K --> L[Cadillac tasks, discrete selection] |
| 218 | +``` |
| 219 | + |
| 220 | +### Threshold diagram (discrete activation) |
| 221 | + |
| 222 | +```{tikz} |
| 223 | +#| fig-cap: "Discrete activation: speedups switch on tasks" |
| 224 | +#| fig-align: center |
| 225 | +\begin{tikzpicture}[scale=1.0] |
| 226 | + \draw[->] (0,0) -- (4.2,0) node[below] {time cost}; |
| 227 | + \draw[->] (0,0) -- (0,3.2) node[left] {task value}; |
| 228 | +
|
| 229 | + \draw[dashed] (0,1.5) -- (4,1.5) node[right] {value threshold}; |
| 230 | +
|
| 231 | + \fill[black] (1,2.2) circle[radius=2pt] node[above] {task A}; |
| 232 | + \fill[black] (3.2,2.7) circle[radius=2pt] node[above] {task B}; |
| 233 | +
|
| 234 | + \draw[blue,->] (3.2,2.7) -- (2.0,2.7) node[midway,above] {LLM speedup}; |
| 235 | + \node[blue] at (2.2,2.1) {activation}; |
| 236 | +\end{tikzpicture} |
| 237 | +``` |
| 238 | + |
| 239 | +## Checklist for the desiderata |
| 240 | + |
| 241 | +- **Bibliography validity:** citations are included and checked against `ai.bib`. See the tests. |
| 242 | +- **Citation faithfulness:** the LLM-based test asks a model to flag any suspicious claim-to-citation mismatches. |
| 243 | +- **Lamport-style proofs:** proofs are structured with numbered steps and a QED marker. |
| 244 | +- **Legible diagrams:** one Mermaid flowchart + one TikZ threshold figure. |
| 245 | +- **Practical examples:** see the query-level and RCT examples above. |
| 246 | + |
| 247 | +## Related literature (short pointers) |
| 248 | + |
| 249 | +- Index-number theory for price changes and substitution. @caves1982indexnumbers |
| 250 | +- Exact welfare measures and integrable demand systems. @hausman1981exact; @deaton1980aids |
| 251 | +- Time allocation and shadow pricing. @deserpa1971time |
| 252 | +- Task-based technological change. @autor2003skill; @acemoglu2011handbook |
| 253 | + |
0 commit comments