Skip to content

Commit 62c82e8

Browse files
committed
verification
1 parent c663d98 commit 62c82e8

8 files changed

Lines changed: 59 additions & 14 deletions

docs/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ <h3 class="no-anchor listing-title">
258258
</a>
259259
</div>
260260
</div>
261-
<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1767081600000" data-listing-file-modified-sort="1767120469953" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1082">
261+
<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1767168000000" data-listing-file-modified-sort="1767193737256" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1301">
262262
<div class="thumbnail"><a href="./posts/2025-12-30-llm-verification.html" class="no-external">
263263

264264
<div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
@@ -277,7 +277,7 @@ <h3 class="no-anchor listing-title">
277277
<div class="metadata">
278278
<a href="./posts/2025-12-30-llm-verification.html" class="no-external">
279279
<div class="listing-date">
280-
Dec 30, 2025
280+
Dec 31, 2025
281281
</div>
282282
<div class="listing-author">
283283
Tom Cunningham

docs/index.xml

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
<atom:link href="tecunningham.github.io/index.xml" rel="self" type="application/rss+xml"/>
1111
<description>{{&lt; meta description-meta &gt;}}</description>
1212
<generator>quarto-1.8.25</generator>
13-
<lastBuildDate>Tue, 30 Dec 2025 08:00:00 GMT</lastBuildDate>
13+
<lastBuildDate>Wed, 31 Dec 2025 08:00:00 GMT</lastBuildDate>
1414
<item>
1515
<title>LLM verification</title>
1616
<dc:creator>Tom Cunningham</dc:creator>
@@ -96,9 +96,23 @@ You have one friend who is full of new ideas, you have another friend who can te
9696
</dd>
9797
</dl>
9898
</section>
99+
<section id="edit-a-more-precise-story" class="level1">
100+
<h1>[EDIT] A More Precise Story</h1>
101+
<ol type="1">
102+
<li><p>Different domains have different costs of verification:</p>
103+
<ul>
104+
<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
105+
<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
106+
</ul></li>
107+
<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
108+
<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
109+
<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
110+
<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
111+
</ol>
112+
</section>
99113
<section id="formal-models" class="level1">
100114
<h1>Formal Models</h1>
101-
<p>A couple of very hasty models to sketch how to formalize this.</p>
115+
<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
102116
<dl>
103117
<dt>Model 1: quality vs polish.</dt>
104118
<dd>
@@ -125,7 +139,7 @@ You have one friend who is full of new ideas, you have another friend who can te
125139

126140
]]></description>
127141
<guid>tecunningham.github.io/posts/2025-12-30-llm-verification.html</guid>
128-
<pubDate>Tue, 30 Dec 2025 08:00:00 GMT</pubDate>
142+
<pubDate>Wed, 31 Dec 2025 08:00:00 GMT</pubDate>
129143
</item>
130144
<item>
131145
<title>Forecasts of AI &amp; Economic Growth</title>

docs/posts/2025-12-26-water-into-wine-model-of-ai.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
88

99
<meta name="author" content="Tom Cunningham">
10-
<meta name="dcterms.date" content="2025-12-30">
10+
<meta name="dcterms.date" content="2025-12-31">
1111
<meta name="description" content="Tom Cunningham blog">
1212

1313
<title>Water Into Wine, a Model of AI | Tom Cunningham – Tom Cunningham</title>
@@ -4138,7 +4138,7 @@ <h1 class="title">Water Into Wine, a Model of AI</h1>
41384138
<div>
41394139
<div class="quarto-title-meta-heading">Published</div>
41404140
<div class="quarto-title-meta-contents">
4141-
<p class="date">December 30, 2025</p>
4141+
<p class="date">December 31, 2025</p>
41424142
</div>
41434143
</div>
41444144

docs/posts/2025-12-30-llm-verification.html

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
88

99
<meta name="author" content="Tom Cunningham">
10-
<meta name="dcterms.date" content="2025-12-30">
10+
<meta name="dcterms.date" content="2025-12-31">
1111
<meta name="description" content="Tom Cunningham blog">
1212

1313
<title>LLM verification | Tom Cunningham – Tom Cunningham</title>
@@ -211,7 +211,7 @@ <h1 class="title">LLM verification</h1>
211211
<div>
212212
<div class="quarto-title-meta-heading">Published</div>
213213
<div class="quarto-title-meta-contents">
214-
<p class="date">December 30, 2025</p>
214+
<p class="date">December 31, 2025</p>
215215
</div>
216216
</div>
217217

@@ -305,9 +305,23 @@ <h1>Notes</h1>
305305
</dd>
306306
</dl>
307307
</section>
308+
<section id="edit-a-more-precise-story" class="level1">
309+
<h1>[EDIT] A More Precise Story</h1>
310+
<ol type="1">
311+
<li><p>Different domains have different costs of verification:</p>
312+
<ul>
313+
<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
314+
<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
315+
</ul></li>
316+
<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
317+
<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
318+
<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
319+
<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
320+
</ol>
321+
</section>
308322
<section id="formal-models" class="level1">
309323
<h1>Formal Models</h1>
310-
<p>A couple of very hasty models to sketch how to formalize this.</p>
324+
<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
311325
<dl>
312326
<dt>Model 1: quality vs polish.</dt>
313327
<dd>

docs/search.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

docs/sitemap.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,6 @@
118118
</url>
119119
<url>
120120
<loc>tecunningham.github.io/posts/2025-12-30-llm-verification.html</loc>
121-
<lastmod>2025-12-30T18:47:49.953Z</lastmod>
121+
<lastmod>2025-12-31T15:08:57.256Z</lastmod>
122122
</url>
123123
</urlset>

posts/2025-12-26-water-into-wine-model-of-ai.qmd

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,9 @@ Comparison with single-product models?
138138
: Much discussion of the economic impacts of AI talks about aggregate output, and how AI changes the relative productivity of capital and labor, or the different types of labor.
139139

140140

141+
# Two Specific Scenarios
142+
143+
-
141144

142145
# Quantitative Model
143146

posts/2025-12-30-llm-verification.qmd

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,24 @@ Implication: credentials become less important.
5050

5151
A related point: in an old [post on AI and communication](https://tecunningham.github.io/posts/2023-06-06-effect-of-ai-on-communication.html) I argued that with LLMs reputation will become less important for internal properties (where the ground truth is human judgment, i.e. verification is cheap), more important for external properties (where the ground truth is in the world, i.e. verification is expensive).
5252

53+
# [EDIT] A More Precise Story
54+
55+
1. Different domains have different costs of verification:
56+
- Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.
57+
- Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.
58+
59+
2. LLM-verification will be a big benefit in domains where it's costly to verify.
60+
61+
3. There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.
62+
63+
4. When doing LLM-generation it's useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.
64+
65+
5. LLM-generation can *hurt* communication equilibria where verification is costly, when LLM generation lowers the cost of *accidental* attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.
66+
5367

5468
# Formal Models
5569

56-
A couple of very hasty models to sketch how to formalize this.
70+
A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.
5771

5872
Model 1: quality vs polish.
5973
: Suppose you care just about intrinsic quality $q$, but your signal is

0 commit comments

Comments
 (0)