You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><p>Different domains have different costs of verification:</p>
103
+
<ul>
104
+
<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
105
+
<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
106
+
</ul></li>
107
+
<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
108
+
<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
109
+
<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
110
+
<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
111
+
</ol>
112
+
</section>
99
113
<section id="formal-models" class="level1">
100
114
<h1>Formal Models</h1>
101
-
<p>A couple of very hasty models to sketch how to formalize this.</p>
115
+
<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
102
116
<dl>
103
117
<dt>Model 1: quality vs polish.</dt>
104
118
<dd>
@@ -125,7 +139,7 @@ You have one friend who is full of new ideas, you have another friend who can te
<li><p>Different domains have different costs of verification:</p>
312
+
<ul>
313
+
<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
314
+
<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
315
+
</ul></li>
316
+
<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
317
+
<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
318
+
<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
319
+
<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
320
+
</ol>
321
+
</section>
308
322
<sectionid="formal-models" class="level1">
309
323
<h1>Formal Models</h1>
310
-
<p>A couple of very hasty models to sketch how to formalize this.</p>
324
+
<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
Copy file name to clipboardExpand all lines: posts/2025-12-26-water-into-wine-model-of-ai.qmd
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -138,6 +138,9 @@ Comparison with single-product models?
138
138
: Much discussion of the economic impacts of AI talks about aggregate output, and how AI changes the relative productivity of capital and labor, or the different types of labor.
Copy file name to clipboardExpand all lines: posts/2025-12-30-llm-verification.qmd
+15-1Lines changed: 15 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -50,10 +50,24 @@ Implication: credentials become less important.
50
50
51
51
A related point: in an old [post on AI and communication](https://tecunningham.github.io/posts/2023-06-06-effect-of-ai-on-communication.html) I argued that with LLMs reputation will become less important for internal properties (where the ground truth is human judgment, i.e. verification is cheap), more important for external properties (where the ground truth is in the world, i.e. verification is expensive).
52
52
53
+
# [EDIT] A More Precise Story
54
+
55
+
1. Different domains have different costs of verification:
56
+
- Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.
57
+
- Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.
58
+
59
+
2. LLM-verification will be a big benefit in domains where it's costly to verify.
60
+
61
+
3. There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.
62
+
63
+
4. When doing LLM-generation it's useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.
64
+
65
+
5. LLM-generation can *hurt* communication equilibria where verification is costly, when LLM generation lowers the cost of *accidental* attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.
66
+
53
67
54
68
# Formal Models
55
69
56
-
A couple of very hasty models to sketch how to formalize this.
70
+
A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.
57
71
58
72
Model 1: quality vs polish.
59
73
: Suppose you care just about intrinsic quality $q$, but your signal is
0 commit comments