Skip to content

Commit 9508f00

Browse files
kronosapiensclaude
andcommitted
docs: generalize pairwise mechanism page
Reframe Pairwise as a family of preference aggregation methods rather than a single tool. Add estimation vs. aggregation distinction, origins section (Thurstone, Elo, PageRank), and broaden examples beyond RetroPGF (All Our Ideas / PlaNYC, Chatbot Arena, Deep Funding). Drop "(formerly Budget Box)" from the title — that lineage is specific to one tool, not the mechanism. Add foundational and overview references to Further Reading. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent bb743e7 commit 9508f00

2 files changed

Lines changed: 52 additions & 26 deletions

File tree

src/content/mechanisms/pairwise.md

Lines changed: 48 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
---
22
id: '1741171220'
33
slug: pairwise
4-
name: "Pairwise (formerly Budget Box)"
5-
shortDescription: "Pairwise comparison voting where participants choose between two options at a time — building robust preference rankings from simple binary choices."
4+
name: "Pairwise"
5+
shortDescription: "A family of preference aggregation methods that build rankings or weights from many simple A-vs-B comparisons."
66
tags:
77
- governance
88
- democratic
99
- voting
10-
lastUpdated: '2026-03-05'
10+
lastUpdated: '2026-05-03'
11+
authors:
12+
- "Kevin Owocki"
13+
- "Daniel Kronovet"
1114
relatedMechanisms:
1215
- quadratic-voting
1316
- ranked-choice-voting
@@ -22,47 +25,66 @@ relatedCampaigns: []
2225
banner: /content-images/mechanisms/pairwise/banner.png
2326
---
2427

25-
**Pairwise** (formerly known as Budget Box) is a preference aggregation mechanism where participants make a series of simple binary comparisons — choosing between two options at a time. These individual pairwise comparisons are then aggregated to produce a robust ranking of all options. The mechanism reduces cognitive load on voters while producing more nuanced preference data than single-choice voting.
28+
**Pairwise** methods are a family of preference aggregation techniques in which participants compare two options at a time, and a model converts the resulting binary judgments into a ranking or set of weights. The simplicity of each individual judgment makes the mechanism accessible to large, distributed audiences, while the aggregation step lets it produce richer outputs — orderings, scores, or funding allocations — than single-choice voting.
29+
30+
## Two Modes: Estimation and Aggregation
31+
32+
Pairwise methods are used for two related but distinct purposes:
33+
34+
- **Estimation** — recovering a hidden ground-truth ranking from noisy observations. Voters are treated as imperfect measurements of an underlying reality. Examples: judging competition entries, scoring chess players, evaluating language models.
35+
- **Aggregation** — synthesizing intentional preferences where no ground truth exists. Voters express what they want, and the algorithm combines their judgments into a collective outcome. Examples: prioritizing community proposals, allocating grant funding.
36+
37+
The two modes share the same core mechanic but call for different design choices around aggregation, vote weighting, and pair selection.
2638

2739
## How It Works
2840

29-
1. **Options are submitted** — projects, proposals, or items to be evaluated
30-
2. **Participants receive pairs** — the system presents two options at a time
31-
3. **Binary choices are made** — the participant selects which of the two they prefer (or indicates indifference)
32-
4. **Many comparisons aggregate** — across all participants and pairs, a mathematical model builds a global ranking
33-
5. **Rankings inform allocation** — the resulting preference ordering can drive funding distribution, prioritization, or governance decisions
41+
1. **Items are submitted** — projects, proposals, or options to be evaluated.
42+
2. **Pairs are surfaced** — the system presents two items at a time. Pair selection can be random, bucketed by category or tier, or actively chosen to maximize information.
43+
3. **Binary judgments are made** — the participant picks one (or skips, or marks them equal).
44+
4. **An aggregation algorithm produces a ranking** — common choices include the Bradley-Terry model, Elo, and spectral methods (eigenvector-based, related to PageRank). Each handles noise, sparsity, and intransitive preferences differently.
45+
5. **The ranking informs allocation** — used as-is for prioritization, or mapped to funding weights, slates, or governance decisions.
46+
47+
## Origins
48+
49+
Pairwise comparison is one of the oldest measurement techniques in psychometrics. L. L. Thurstone's *Law of Comparative Judgment* (1927) established its statistical foundations. Variants have since been used for chess ratings (Elo, 1960s), web search (PageRank, 1998), peer-to-peer trust (EigenTrust), and recommendation systems. The application to public goods funding and online community decision-making is more recent.
3450

3551
## Advantages
3652

37-
- Dramatically reduces cognitive load — choosing between two things is easier than ranking many
38-
- Produces richer preference data than simple plurality voting
39-
- Resistant to strategic voting — harder to game when you don't see the full picture
40-
- Handles large option sets gracefully — participants don't need to evaluate everything
41-
- Accessible to participants without deep context on all options
53+
- Each judgment is cognitively cheap — choosing between two things is easier than ranking many or assigning numeric scores.
54+
- Accessible to participants without deep context on every option, since each comparison only requires familiarity with two items.
55+
- Produces richer preference data than plurality voting and supports several aggregation algorithms with well-studied properties.
56+
- Random pair surfacing distributes voter attention across the option set, reducing the marketing-driven dynamics that can dominate other allocation mechanisms.
57+
- Resistant to certain forms of strategic voting — manipulating a global ranking through individual A-vs-B choices is harder than burying rivals on a ranked ballot.
4258

4359
## Limitations
4460

45-
- Requires many comparisons across participants to build reliable rankings
46-
- May produce inconsistent results with small sample sizes
47-
- Participants may fatigue if asked to make too many comparisons
48-
- Doesn't capture intensity of preference — a slight and strong preference look the same
49-
- Aggregation algorithms affect outcomes and may not be transparent to participants
61+
- Reliable rankings require many comparisons; with k items, naive approaches need on the order of k² votes, though smart pair selection can reduce this substantially.
62+
- Binary judgments don't capture intensity of preference, though Likert-style inputs (e.g. a 1–5 scale stored as a real-valued score) can recover it at the cost of a slightly heavier interface.
63+
- Voters may fatigue across long sessions, especially without an engaging interface.
64+
- The aggregation algorithm shapes outcomes in ways that may not be transparent to participants, so legitimacy depends on clear communication of how votes become weights.
65+
- Less suited to small expert groups who would prefer to assign weights directly.
5066

5167
## Best Used When
5268

53-
- A large number of options must be ranked or prioritized
54-
- Voters have limited time or context to evaluate all options
55-
- Simple, accessible participation is a priority
56-
- The goal is relative prioritization rather than absolute scoring
69+
- A large set of options must be ranked or prioritized.
70+
- The audience is broad and distributed, with limited time per voter but at least some baseline familiarity with the domain.
71+
- Discovery is a goal — surfacing items voters wouldn't have sought out on their own.
5772

5873
## Examples and Use Cases
5974

60-
**Optimism RetroPGF** used pairwise comparison in certain rounds to help badgeholders evaluate and rank hundreds of projects for retroactive funding allocation.
75+
**Optimism Retro Funding** has used pairwise comparison in some rounds to help badgeholders evaluate large slates of projects, with results aggregated using the Bradley-Terry model.
76+
77+
**Deep Funding** (2024), an Ethereum ecosystem initiative supported by Vitalik Buterin, uses pairwise judgments by a human jury to score competing AI-generated weight proposals for allocating funds across a project dependency graph.
78+
79+
**All Our Ideas** (Salganik & Levy, 2015) introduced "pairwise wiki surveys" as a research instrument, used by the New York City Mayor's Office to gather public input for the PlaNYC 2030 sustainability plan.
6180

62-
**General Fund** and other community funding tools have experimented with pairwise interfaces for participatory budgeting.
81+
**LLM evaluation** — pairwise comparison is now the dominant method for ranking large language models, with platforms like Chatbot Arena aggregating millions of human A-vs-B judgments via Bradley-Terry or Elo.
6382

64-
**Academic and research applications** pairwise comparison methods (Elo ratings, Bradley-Terry models) are widely used in recommendation systems, sports rankings, and preference learning.
83+
**Sports, recommendation systems, and academic research** have used pairwise methods (Elo ratings, Bradley-Terry models, spectral ranking) for decades to produce rankings from competitive or preferential data.
6584

6685
## Further Reading
6786

6887
- [Pairwise.vote](https://pairwise.vote)
88+
- [Wiki Surveys: Open and Quantifiable Social Data Collection](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123483) — Salganik & Levy, 2015
89+
- [Decentralized Capital Allocation via Budgeting Boxes](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3317445) — Kronovet, Fischer & du Rose, 2018; the first description of pairwise methods for onchain public goods funding
90+
- [The Pairwise Paradigm](http://kronosapiens.github.io/blog/2025/12/14/pairwise-paradigm) — overview of pairwise methods for capital allocation

src/data/authors.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,5 +162,9 @@
162162
{
163163
"name": "Kenny",
164164
"social": "https://x.com/kennyistyping"
165+
},
166+
{
167+
"name": "Daniel Kronovet",
168+
"social": "https://kronosapiens.github.io"
165169
}
166170
]

0 commit comments

Comments
 (0)