verification

tecunningham · tecunningham · commit 62c82e87f378 · 2025-12-31T07:13:19.000-08:00
diff --git a/docs/index.html b/docs/index.html
@@ -258,7 +258,7 @@ <h3 class="no-anchor listing-title">
 </a>
 </div>
 </div>
-<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1767081600000" data-listing-file-modified-sort="1767120469953" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="6" data-listing-word-count-sort="1082">
+<div class="quarto-post image-right" data-index="1" data-listing-date-sort="1767168000000" data-listing-file-modified-sort="1767193737256" data-listing-date-modified-sort="NaN" data-listing-reading-time-sort="7" data-listing-word-count-sort="1301">
 <div class="thumbnail"><a href="./posts/2025-12-30-llm-verification.html" class="no-external">
 
 <div class="listing-item-img-placeholder card-img-top" >&nbsp;</div>
@@ -277,7 +277,7 @@ <h3 class="no-anchor listing-title">
 <div class="metadata">
 <a href="./posts/2025-12-30-llm-verification.html" class="no-external">
 <div class="listing-date">
-Dec 30, 2025
+Dec 31, 2025
 </div>
 <div class="listing-author">
 Tom Cunningham
diff --git a/docs/index.xml b/docs/index.xml
@@ -10,7 +10,7 @@
 <atom:link href="tecunningham.github.io/index.xml" rel="self" type="application/rss+xml"/>
 <description>{{&lt; meta description-meta &gt;}}</description>
 <generator>quarto-1.8.25</generator>
-<lastBuildDate>Tue, 30 Dec 2025 08:00:00 GMT</lastBuildDate>
+<lastBuildDate>Wed, 31 Dec 2025 08:00:00 GMT</lastBuildDate>
 <item>
   <title>LLM verification</title>
   <dc:creator>Tom Cunningham</dc:creator>
@@ -96,9 +96,23 @@ You have one friend who is full of new ideas, you have another friend who can te
 </dd>
 </dl>
 </section>
+<section id="edit-a-more-precise-story" class="level1">
+<h1>[EDIT] A More Precise Story</h1>
+<ol type="1">
+<li><p>Different domains have different costs of verification:</p>
+<ul>
+<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
+<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
+</ul></li>
+<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
+<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
+<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
+<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
+</ol>
+</section>
 <section id="formal-models" class="level1">
 <h1>Formal Models</h1>
-<p>A couple of very hasty models to sketch how to formalize this.</p>
+<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
 <dl>
 <dt>Model 1: quality vs polish.</dt>
 <dd>
@@ -125,7 +139,7 @@ You have one friend who is full of new ideas, you have another friend who can te
 
  ]]></description>
   <guid>tecunningham.github.io/posts/2025-12-30-llm-verification.html</guid>
-  <pubDate>Tue, 30 Dec 2025 08:00:00 GMT</pubDate>
+  <pubDate>Wed, 31 Dec 2025 08:00:00 GMT</pubDate>
 </item>
 <item>
   <title>Forecasts of AI &amp; Economic Growth</title>
diff --git a/docs/posts/2025-12-26-water-into-wine-model-of-ai.html b/docs/posts/2025-12-26-water-into-wine-model-of-ai.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 <meta name="author" content="Tom Cunningham">
-<meta name="dcterms.date" content="2025-12-30">
+<meta name="dcterms.date" content="2025-12-31">
 <meta name="description" content="Tom Cunningham blog">
 
 <title>Water Into Wine, a Model of AI | Tom Cunningham – Tom Cunningham</title>
@@ -4138,7 +4138,7 @@ <h1 class="title">Water Into Wine, a Model of AI</h1>
       <div>
       <div class="quarto-title-meta-heading">Published</div>
       <div class="quarto-title-meta-contents">
-        <p class="date">December 30, 2025</p>
+        <p class="date">December 31, 2025</p>
       </div>
     </div>
     
diff --git a/docs/posts/2025-12-30-llm-verification.html b/docs/posts/2025-12-30-llm-verification.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 <meta name="author" content="Tom Cunningham">
-<meta name="dcterms.date" content="2025-12-30">
+<meta name="dcterms.date" content="2025-12-31">
 <meta name="description" content="Tom Cunningham blog">
 
 <title>LLM verification | Tom Cunningham – Tom Cunningham</title>
@@ -211,7 +211,7 @@ <h1 class="title">LLM verification</h1>
       <div>
       <div class="quarto-title-meta-heading">Published</div>
       <div class="quarto-title-meta-contents">
-        <p class="date">December 30, 2025</p>
+        <p class="date">December 31, 2025</p>
       </div>
     </div>
     
@@ -305,9 +305,23 @@ <h1>Notes</h1>
 </dd>
 </dl>
 </section>
+<section id="edit-a-more-precise-story" class="level1">
+<h1>[EDIT] A More Precise Story</h1>
+<ol type="1">
+<li><p>Different domains have different costs of verification:</p>
+<ul>
+<li>Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.</li>
+<li>Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.</li>
+</ul></li>
+<li><p>LLM-verification will be a big benefit in domains where it’s costly to verify.</p></li>
+<li><p>There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.</p></li>
+<li><p>When doing LLM-generation it’s useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.</p></li>
+<li><p>LLM-generation can <em>hurt</em> communication equilibria where verification is costly, when LLM generation lowers the cost of <em>accidental</em> attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.</p></li>
+</ol>
+</section>
 <section id="formal-models" class="level1">
 <h1>Formal Models</h1>
-<p>A couple of very hasty models to sketch how to formalize this.</p>
+<p>A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.</p>
 <dl>
 <dt>Model 1: quality vs polish.</dt>
 <dd>
diff --git a/docs/search.json b/docs/search.json
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
@@ -118,6 +118,6 @@
   </url>
   <url>
     <loc>tecunningham.github.io/posts/2025-12-30-llm-verification.html</loc>
-    <lastmod>2025-12-30T18:47:49.953Z</lastmod>
+    <lastmod>2025-12-31T15:08:57.256Z</lastmod>
   </url>
 </urlset>
diff --git a/posts/2025-12-26-water-into-wine-model-of-ai.qmd b/posts/2025-12-26-water-into-wine-model-of-ai.qmd
@@ -138,6 +138,9 @@ Comparison with single-product models?
 : Much discussion of the economic impacts of AI talks about aggregate output, and how AI changes the relative productivity of capital and labor, or the different types of labor.
 
 
+#           Two Specific Scenarios
+
+- 
 
 #        Quantitative Model
 
diff --git a/posts/2025-12-30-llm-verification.qmd b/posts/2025-12-30-llm-verification.qmd
@@ -50,10 +50,24 @@ Implication: credentials become less important.
 
     A related point: in an old [post on AI and communication](https://tecunningham.github.io/posts/2023-06-06-effect-of-ai-on-communication.html) I argued that with LLMs reputation will become less important for internal properties (where the ground truth is human judgment, i.e. verification is cheap), more important for external properties (where the ground truth is in the world, i.e. verification is expensive).
 
+#        [EDIT] A More Precise Story
+
+1. Different domains have different costs of verification:
+   - Cheap to verify: whether an image looks good, whether a joke is funny, whether a sudoku solution is valid, whether a formalized proof is sound, whether code passes a specific test.
+   - Costly to verify: whether a medical paper works, whether an academic paper is high quality, whether a human-written proof is sound, whether code fulfills a specification.
+
+2. LLM-verification will be a big benefit in domains where it's costly to verify.
+
+3. There is a complementarity between LLM-generation and LLM-verification, the value of both is more than the sum of the value of each.
+
+4. When doing LLM-generation it's useful to ask the LLM to self-verify. E.g. by (1) generating a Lean proof and validating it; (2) generating unit tests and running them; (3) generating a checklist and asking an independent LLM to check each box.
+
+5. LLM-generation can *hurt* communication equilibria where verification is costly, when LLM generation lowers the cost of *accidental* attributes (not essential attributes). E.g. if LLMs make it cheap to fix spelling errors, or to adopt idioms of the discipline, then there will be less separation in equilibrium.
+
 
 #        Formal Models
 
-A couple of very hasty models to sketch how to formalize this.
+A couple of very hasty models to sketch how to formalize this. It would be nice to have a single model which incorporates all the mechanisms above.
 
 Model 1: quality vs polish.
 : Suppose you care just about intrinsic quality $q$, but your signal is