Skip to content

Commit 7b437ec

Browse files
authored
Merge pull request #3 from tecunningham/codex/rewrite-llm-time-saving-theory-as-qmd
Add LLM-authored QMD rewrite and automated quality checks
2 parents 17f50ad + 629736e commit 7b437ec

2 files changed

Lines changed: 432 additions & 0 deletions

File tree

Lines changed: 253 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,253 @@
1+
---
2+
title: "LLM Time-Saving, Demand Theory, and the Cadillac Tasks"
3+
author: "Tom Cunningham (METR)"
4+
date: today
5+
citation: true
6+
reference-location: document
7+
bibliography: ai.bib
8+
format:
9+
html:
10+
toc: true
11+
toc-depth: 3
12+
execute:
13+
echo: false
14+
warning: false
15+
error: false
16+
cache: true
17+
---
18+
19+
## Results-first summary
20+
21+
This note reframes LLM time-saving as a *price index* problem in a time-allocation model, then adds a discrete extensive margin for “Cadillac tasks.” The continuous and discrete cases behave very differently, so I keep them separate.
22+
23+
**Continuous (intensive-margin) case:** treat each task’s time requirement as a price $p_i$, with LLM speedup $\beta_i$ implying $p_i' = p_i/\beta_i$. Under homothetic preferences, the output index is $v(p)=1/P(p)$ where $P(p)=e(p,1)$ is the unit time-expenditure function. Equivalent/compensating variation become time-index ratios, and small changes are share-weighted. Large changes require the *area under the compensated demand curve*. (Citations: @caves1982indexnumbers; @hausman1981exact; @willig1976consumerssurplus.)
24+
25+
**Discrete (extensive-margin) case:** tasks can require setup time or be unit-demand. Then LLM speedups change *which* tasks are done, not just *how much* of each task. This is the home of “Cadillac tasks” (tasks you’d never do absent the LLM) and the *worked example* below. Constant-elasticity formulas can fail here.
26+
27+
## Setup: time prices and speedups
28+
29+
**Objects.** Tasks $i=1,\dots,n$, outputs $x_i\ge 0$, time endowment normalized to $1$, time prices $p_i>0$ (time per unit of task output), and LLM speedups $\beta_i>0$ so $p_i' = p_i/\beta_i$.
30+
31+
We interpret $u(x)$ as an *output index* or “effective work accomplished,” with time prices defining the budget:
32+
\[
33+
\sum_i p_i x_i \le 1.
34+
\]
35+
36+
The important split is:
37+
38+
1. **Continuous intensive margin:** choose continuous $x_i$ (smooth substitution).
39+
2. **Discrete extensive margin:** choose which tasks to activate (unit demand or setup costs).
40+
41+
I treat these separately because the logic, formulas, and data requirements diverge.
42+
43+
## Continuous (intensive-margin) model
44+
45+
### Primal, dual, and the time price index
46+
47+
The primal problem is
48+
\[
49+
v(p)\;=\;\max_{x\ge 0} u(x) \quad\text{s.t.}\quad \sum_i p_i x_i\le 1.
50+
\]
51+
52+
Define the expenditure function
53+
\[
54+
e(p,\bar u)=\min_{x\ge 0} \Big\{\sum_i p_i x_i: u(x)\ge \bar u\Big\}.
55+
\]
56+
57+
If $u(\cdot)$ is homothetic and degree-1 homogeneous, then $e(p,\bar u)=\bar u\,e(p,1)$. Define the **unit time price index**
58+
\[
59+
P(p)\equiv e(p,1)\quad\Rightarrow\quad v(p)=\frac{1}{P(p)}.
60+
\]
61+
62+
This is the classic index-number framing applied to time prices. @caves1982indexnumbers
63+
64+
### EV/CV in time units
65+
66+
Let $p^0\to p^1$ and $u^k=v(p^k)$. Equivalent and compensating variation (measured in *time*) are
67+
\[
68+
EV=e(p^0,u^1)-1,\qquad CV=e(p^1,u^0)-1.
69+
\]
70+
71+
Under homotheticity,
72+
\[
73+
EV=\frac{P(p^0)}{P(p^1)}-1,\qquad CV=\frac{P(p^1)}{P(p^0)}-1.
74+
\]
75+
76+
This is the cleanest way to translate LLM time savings into a welfare measure. @hausman1981exact
77+
78+
### Small changes (share-weighted)
79+
80+
Let $t_i(p)\equiv p_i x_i^*(p)$ be optimal time shares. For small changes in time prices,
81+
\[
82+
d\ln v\;=\;-d\ln P\;\approx\;\sum_i t_i\,d\ln \beta_i.
83+
\]
84+
85+
This is the time-allocation analog of share-weighted Hulten-style approximations. @hulten1978growth
86+
87+
### Large changes (area under compensated demand)
88+
89+
When LLM gains are large, constant-elasticity approximations are dangerous. Using Hicksian (compensated) shares $s_i^H(p)$,
90+
\[
91+
d\ln P(p)=\sum_i s_i^H(p)\,d\ln p_i.
92+
\]
93+
94+
For a single changing price $p_2$,
95+
\[
96+
\ln\frac{P(p^1)}{P(p^0)}=\int s_2^H(p_2)\,d\ln p_2,
97+
\]
98+
99+
i.e., exact welfare is the **area under the compensated demand curve**. @willig1976consumerssurplus
100+
101+
### CES specialization (closed-form)
102+
103+
For a CES aggregator
104+
\[
105+
u(x)=\left(\sum_i \alpha_i x_i^{\frac{\sigma-1}{\sigma}}\right)^{\frac{\sigma}{\sigma-1}},\quad\sigma>0,
106+
\]
107+
108+
the price index and time shares are
109+
\[
110+
P(p)=\left(\sum_i \alpha_i^{\sigma}p_i^{1-\sigma}\right)^{\frac{1}{1-\sigma}},\qquad
111+
t_i(p)=\frac{\alpha_i^{\sigma}p_i^{1-\sigma}}{\sum_j \alpha_j^{\sigma}p_j^{1-\sigma}}.
112+
\]
113+
114+
In the two-task case, if task 2 speeds up by $\beta$ and its ex-ante share is $s_0$, then
115+
\[
116+
\frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}},\qquad \varepsilon\equiv\sigma.
117+
\]
118+
119+
This is the continuous benchmark; I will **not** use it for Cadillac tasks, which are discrete. @caves1982indexnumbers
120+
121+
#### Proposition (Lamport-style): CES output response
122+
123+
**Claim.** For two-task CES with ex-ante share $s_0$ on task 2 and speedup $\beta$, the optimized output ratio is
124+
\[
125+
\frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}}.
126+
\]
127+
128+
**Proof (Lamport style).**
129+
130+
1. *Given* CES price index $P(p)=\left(\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}\right)^{\frac{1}{1-\varepsilon}}$ and output $v(p)=1/P(p)$.
131+
2. *Let* $p_2' = p_2/\beta$ while $p_1$ is fixed.
132+
3. *Then* the output ratio is
133+
\[
134+
\frac{y'}{y}=\frac{P(p^0)}{P(p^1)}=
135+
\left(\frac{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}(p_2/\beta)^{1-\varepsilon}}\right)^{\frac{1}{\varepsilon-1}}.
136+
\]
137+
4. *Define* the ex-ante share
138+
\[
139+
s_0\equiv\frac{\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}{\alpha_1^{\varepsilon}p_1^{1-\varepsilon}+\alpha_2^{\varepsilon}p_2^{1-\varepsilon}}.
140+
\]
141+
5. *Substitute* into Step 3 to obtain
142+
\[
143+
\frac{y'}{y}=\left((1-s_0)+s_0\,\beta^{\varepsilon-1}\right)^{\frac{1}{\varepsilon-1}}.
144+
\]
145+
6. **QED.**
146+
147+
## Discrete (extensive-margin) model: Cadillac tasks live here
148+
149+
The continuous model assumes you always do *some* of each task. That is wrong when tasks are lumpy, have setup costs, or are unit-demand. In those cases, LLMs can create **newly affordable tasks**, meaning the major effect is *selection*, not *intensive* time reallocation.
150+
151+
### Unit-demand formulation
152+
153+
Let each task have payoff $u_i$ and required time $w_i(p)$, with decision $q_i\in\{0,1\}$. Then
154+
\[
155+
\max_{q\in\{0,1\}^n}\sum_i u_i q_i\quad\text{s.t.}\quad \sum_i w_i(p) q_i\le 1.
156+
\]
157+
158+
Speedups change $w_i$ by $\beta_i$, which can **turn tasks on** once a threshold is crossed. This is exactly the “Cadillac tasks” phenomenon: tasks that were too time-expensive become attractive after the LLM. The usual CES elasticity is not a good summary in this regime.
159+
160+
### Setup-cost variant (bridging discrete and continuous)
161+
162+
Add a fixed setup time $\phi_i$ and a continuous intensity $x_i$:
163+
\[
164+
\max_{q,x}\;u(x)\quad\text{s.t.}\quad \sum_i \phi_i q_i + \sum_i p_i x_i \le 1,\; x_i=0\;\text{if }q_i=0.
165+
\]
166+
167+
If $\phi_i=0$, we recover the continuous model. If $\phi_i>0$, large LLM speedups mostly expand the active set $\{i:q_i=1\}$, not the intensive shares.
168+
169+
### Worked example (discrete, not continuous)
170+
171+
**Example.** Suppose you can pick *one* task (unit-demand). Task A yields value $u_A=10$ and takes 1 hour. Task B yields $u_B=12$ and takes 2 hours. Without LLMs you choose A. Now an LLM speeds up task B so it takes 1 hour. You switch to B.
172+
173+
- **Upper bound on time-equivalent gain:** 1 hour (if the extra value $u_B-u_A$ is “worth” a full hour).
174+
- **Lower bound:** 0 hours (if the extra value is just a small quality bump).
175+
176+
So the *observed* reallocation does not identify a precise time-savings without modeling discrete choice. This is why elasticity-of-substitution estimates are weak in the Cadillac regime.
177+
178+
### Cadillac tasks (discrete interpretation)
179+
180+
Cadillac tasks are those you would not do *at all* without the LLM, but you do once their time cost drops. Examples:
181+
182+
- literature reviews you previously would not attempt,
183+
- custom data visualizations,
184+
- long-form proofreading or refactoring.
185+
186+
In a unit-demand or setup-cost model, these tasks appear as **newly activated $q_i=1$ choices**, not as marginal increases in $x_i$. This is a discrete effect, so apply discrete logic—not the continuous CES approximation.
187+
188+
## Practical examples (grounding)
189+
190+
- **Query-level time savings:** If a chatbot is used for 10% of tasks and yields 5x speedups, naive share-weighted estimates imply large gains. But if those tasks are Cadillac tasks (discrete selection), the aggregate gain is much smaller. @anthropic2025estimatingproductivitygains
191+
- **RCTs with task selection:** In uplift experiments, participants may choose different tasks once AI is available. That makes comparisons tricky unless you model the discrete choice margin. @becker2025uplift
192+
- **Time allocation as a resource constraint:** Classic time-allocation models already interpret time as a shadow price. @deserpa1971time
193+
194+
## Diagrams
195+
196+
### Unified map (assumptions → objects → results)
197+
198+
```{mermaid}
199+
graph TD
200+
A[Primitives: tasks i=1..n, time endowment=1] --> B[Technology: time prices p_i; AI => p_i' = p_i/β_i]
201+
B --> C[Choice: allocate time/output s.t. Σ p_i x_i ≤ 1]
202+
203+
C --> D{Preference/output aggregator u(x)}
204+
D --> D1[Homothetic (continuous)]
205+
D --> D2[CES]
206+
D --> D3[Discrete or setup-cost tasks]
207+
208+
D1 --> E[Price index P(p)=e(p,1)]
209+
E --> F[Indirect output v(p)=1/P(p)]
210+
F --> G[EV/CV = ratios of P(p)]
211+
F --> H[Local: d ln v = Σ t_i d ln β_i]
212+
213+
D2 --> I[Closed forms for P(p), shares]
214+
I --> J[Two-task formula]
215+
216+
D3 --> K[Activation/threshold effects]
217+
K --> L[Cadillac tasks, discrete selection]
218+
```
219+
220+
### Threshold diagram (discrete activation)
221+
222+
```{tikz}
223+
#| fig-cap: "Discrete activation: speedups switch on tasks"
224+
#| fig-align: center
225+
\begin{tikzpicture}[scale=1.0]
226+
\draw[->] (0,0) -- (4.2,0) node[below] {time cost};
227+
\draw[->] (0,0) -- (0,3.2) node[left] {task value};
228+
229+
\draw[dashed] (0,1.5) -- (4,1.5) node[right] {value threshold};
230+
231+
\fill[black] (1,2.2) circle[radius=2pt] node[above] {task A};
232+
\fill[black] (3.2,2.7) circle[radius=2pt] node[above] {task B};
233+
234+
\draw[blue,->] (3.2,2.7) -- (2.0,2.7) node[midway,above] {LLM speedup};
235+
\node[blue] at (2.2,2.1) {activation};
236+
\end{tikzpicture}
237+
```
238+
239+
## Checklist for the desiderata
240+
241+
- **Bibliography validity:** citations are included and checked against `ai.bib`. See the tests.
242+
- **Citation faithfulness:** the LLM-based test asks a model to flag any suspicious claim-to-citation mismatches.
243+
- **Lamport-style proofs:** proofs are structured with numbered steps and a QED marker.
244+
- **Legible diagrams:** one Mermaid flowchart + one TikZ threshold figure.
245+
- **Practical examples:** see the query-level and RCT examples above.
246+
247+
## Related literature (short pointers)
248+
249+
- Index-number theory for price changes and substitution. @caves1982indexnumbers
250+
- Exact welfare measures and integrable demand systems. @hausman1981exact; @deaton1980aids
251+
- Time allocation and shadow pricing. @deserpa1971time
252+
- Task-based technological change. @autor2003skill; @acemoglu2011handbook
253+

0 commit comments

Comments
 (0)