docs: generalize pairwise mechanism page

kronosapiens · claude · kronosapiens · commit 9508f0062569 · 2026-05-03T22:27:42.000-04:00
Reframe Pairwise as a family of preference aggregation methods rather
than a single tool. Add estimation vs. aggregation distinction, origins
section (Thurstone, Elo, PageRank), and broaden examples beyond
RetroPGF (All Our Ideas / PlaNYC, Chatbot Arena, Deep Funding). Drop
"(formerly Budget Box)" from the title — that lineage is specific to
one tool, not the mechanism. Add foundational and overview references
to Further Reading.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/src/content/mechanisms/pairwise.md b/src/content/mechanisms/pairwise.md
@@ -1,13 +1,16 @@
 ---
 id: '1741171220'
 slug: pairwise
-name: "Pairwise (formerly Budget Box)"
-shortDescription: "Pairwise comparison voting where participants choose between two options at a time — building robust preference rankings from simple binary choices."
+name: "Pairwise"
+shortDescription: "A family of preference aggregation methods that build rankings or weights from many simple A-vs-B comparisons."
 tags:
   - governance
   - democratic
   - voting
-lastUpdated: '2026-03-05'
+lastUpdated: '2026-05-03'
+authors:
+  - "Kevin Owocki"
+  - "Daniel Kronovet"
 relatedMechanisms:
   - quadratic-voting
   - ranked-choice-voting
@@ -22,47 +25,66 @@ relatedCampaigns: []
 banner: /content-images/mechanisms/pairwise/banner.png
 ---
 
-**Pairwise** (formerly known as Budget Box) is a preference aggregation mechanism where participants make a series of simple binary comparisons — choosing between two options at a time. These individual pairwise comparisons are then aggregated to produce a robust ranking of all options. The mechanism reduces cognitive load on voters while producing more nuanced preference data than single-choice voting.
+**Pairwise** methods are a family of preference aggregation techniques in which participants compare two options at a time, and a model converts the resulting binary judgments into a ranking or set of weights. The simplicity of each individual judgment makes the mechanism accessible to large, distributed audiences, while the aggregation step lets it produce richer outputs — orderings, scores, or funding allocations — than single-choice voting.
+
+## Two Modes: Estimation and Aggregation
+
+Pairwise methods are used for two related but distinct purposes:
+
+- **Estimation** — recovering a hidden ground-truth ranking from noisy observations. Voters are treated as imperfect measurements of an underlying reality. Examples: judging competition entries, scoring chess players, evaluating language models.
+- **Aggregation** — synthesizing intentional preferences where no ground truth exists. Voters express what they want, and the algorithm combines their judgments into a collective outcome. Examples: prioritizing community proposals, allocating grant funding.
+
+The two modes share the same core mechanic but call for different design choices around aggregation, vote weighting, and pair selection.
 
 ## How It Works
 
-1. **Options are submitted** — projects, proposals, or items to be evaluated
-2. **Participants receive pairs** — the system presents two options at a time
-3. **Binary choices are made** — the participant selects which of the two they prefer (or indicates indifference)
-4. **Many comparisons aggregate** — across all participants and pairs, a mathematical model builds a global ranking
-5. **Rankings inform allocation** — the resulting preference ordering can drive funding distribution, prioritization, or governance decisions
+1. **Items are submitted** — projects, proposals, or options to be evaluated.
+2. **Pairs are surfaced** — the system presents two items at a time. Pair selection can be random, bucketed by category or tier, or actively chosen to maximize information.
+3. **Binary judgments are made** — the participant picks one (or skips, or marks them equal).
+4. **An aggregation algorithm produces a ranking** — common choices include the Bradley-Terry model, Elo, and spectral methods (eigenvector-based, related to PageRank). Each handles noise, sparsity, and intransitive preferences differently.
+5. **The ranking informs allocation** — used as-is for prioritization, or mapped to funding weights, slates, or governance decisions.
+
+## Origins
+
+Pairwise comparison is one of the oldest measurement techniques in psychometrics. L. L. Thurstone's *Law of Comparative Judgment* (1927) established its statistical foundations. Variants have since been used for chess ratings (Elo, 1960s), web search (PageRank, 1998), peer-to-peer trust (EigenTrust), and recommendation systems. The application to public goods funding and online community decision-making is more recent.
 
 ## Advantages
 
-- Dramatically reduces cognitive load — choosing between two things is easier than ranking many
-- Produces richer preference data than simple plurality voting
-- Resistant to strategic voting — harder to game when you don't see the full picture
-- Handles large option sets gracefully — participants don't need to evaluate everything
-- Accessible to participants without deep context on all options
+- Each judgment is cognitively cheap — choosing between two things is easier than ranking many or assigning numeric scores.
+- Accessible to participants without deep context on every option, since each comparison only requires familiarity with two items.
+- Produces richer preference data than plurality voting and supports several aggregation algorithms with well-studied properties.
+- Random pair surfacing distributes voter attention across the option set, reducing the marketing-driven dynamics that can dominate other allocation mechanisms.
+- Resistant to certain forms of strategic voting — manipulating a global ranking through individual A-vs-B choices is harder than burying rivals on a ranked ballot.
 
 ## Limitations
 
-- Requires many comparisons across participants to build reliable rankings
-- May produce inconsistent results with small sample sizes
-- Participants may fatigue if asked to make too many comparisons
-- Doesn't capture intensity of preference — a slight and strong preference look the same
-- Aggregation algorithms affect outcomes and may not be transparent to participants
+- Reliable rankings require many comparisons; with k items, naive approaches need on the order of k² votes, though smart pair selection can reduce this substantially.
+- Binary judgments don't capture intensity of preference, though Likert-style inputs (e.g. a 1–5 scale stored as a real-valued score) can recover it at the cost of a slightly heavier interface.
+- Voters may fatigue across long sessions, especially without an engaging interface.
+- The aggregation algorithm shapes outcomes in ways that may not be transparent to participants, so legitimacy depends on clear communication of how votes become weights.
+- Less suited to small expert groups who would prefer to assign weights directly.
 
 ## Best Used When
 
-- A large number of options must be ranked or prioritized
-- Voters have limited time or context to evaluate all options
-- Simple, accessible participation is a priority
-- The goal is relative prioritization rather than absolute scoring
+- A large set of options must be ranked or prioritized.
+- The audience is broad and distributed, with limited time per voter but at least some baseline familiarity with the domain.
+- Discovery is a goal — surfacing items voters wouldn't have sought out on their own.
 
 ## Examples and Use Cases
 
-**Optimism RetroPGF** used pairwise comparison in certain rounds to help badgeholders evaluate and rank hundreds of projects for retroactive funding allocation.
+**Optimism Retro Funding** has used pairwise comparison in some rounds to help badgeholders evaluate large slates of projects, with results aggregated using the Bradley-Terry model.
+
+**Deep Funding** (2024), an Ethereum ecosystem initiative supported by Vitalik Buterin, uses pairwise judgments by a human jury to score competing AI-generated weight proposals for allocating funds across a project dependency graph.
+
+**All Our Ideas** (Salganik & Levy, 2015) introduced "pairwise wiki surveys" as a research instrument, used by the New York City Mayor's Office to gather public input for the PlaNYC 2030 sustainability plan.
 
-**General Fund** and other community funding tools have experimented with pairwise interfaces for participatory budgeting.
+**LLM evaluation** — pairwise comparison is now the dominant method for ranking large language models, with platforms like Chatbot Arena aggregating millions of human A-vs-B judgments via Bradley-Terry or Elo.
 
-**Academic and research applications** — pairwise comparison methods (Elo ratings, Bradley-Terry models) are widely used in recommendation systems, sports rankings, and preference learning.
+**Sports, recommendation systems, and academic research** have used pairwise methods (Elo ratings, Bradley-Terry models, spectral ranking) for decades to produce rankings from competitive or preferential data.
 
 ## Further Reading
 
 - [Pairwise.vote](https://pairwise.vote)
+- [Wiki Surveys: Open and Quantifiable Social Data Collection](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123483) — Salganik & Levy, 2015
+- [Decentralized Capital Allocation via Budgeting Boxes](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3317445) — Kronovet, Fischer & du Rose, 2018; the first description of pairwise methods for onchain public goods funding
+- [The Pairwise Paradigm](http://kronosapiens.github.io/blog/2025/12/14/pairwise-paradigm) — overview of pairwise methods for capital allocation
diff --git a/src/data/authors.json b/src/data/authors.json
@@ -162,5 +162,9 @@
   {
     "name": "Kenny",
     "social": "https://x.com/kennyistyping"
+  },
+  {
+    "name": "Daniel Kronovet",
+    "social": "https://kronosapiens.github.io"
   }
 ]

Original file line number	Diff line number	Diff line change
`@@ -162,5 +162,9 @@`
`162`	`162`	`{`
`163`	`163`	`"name": "Kenny",`
`164`	`164`	`"social": "https://x.com/kennyistyping"`
	`165`	`+ },`
	`166`	`+ {`
	`167`	`+ "name": "Daniel Kronovet",`
	`168`	`+ "social": "https://kronosapiens.github.io"`
`165`	`169`	`}`
`166`	`170`	`]`