You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reframe Pairwise as a family of preference aggregation methods rather
than a single tool. Add estimation vs. aggregation distinction, origins
section (Thurstone, Elo, PageRank), and broaden examples beyond
RetroPGF (All Our Ideas / PlaNYC, Chatbot Arena, Deep Funding). Drop
"(formerly Budget Box)" from the title — that lineage is specific to
one tool, not the mechanism. Add foundational and overview references
to Further Reading.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shortDescription: "Pairwise comparison voting where participants choose between two options at a time — building robust preference rankings from simple binary choices."
4
+
name: "Pairwise"
5
+
shortDescription: "A family of preference aggregation methods that build rankings or weights from many simple A-vs-B comparisons."
**Pairwise** (formerly known as Budget Box) is a preference aggregation mechanism where participants make a series of simple binary comparisons — choosing between two options at a time. These individual pairwise comparisons are then aggregated to produce a robust ranking of all options. The mechanism reduces cognitive load on voters while producing more nuanced preference data than single-choice voting.
28
+
**Pairwise** methods are a family of preference aggregation techniques in which participants compare two options at a time, and a model converts the resulting binary judgments into a ranking or set of weights. The simplicity of each individual judgment makes the mechanism accessible to large, distributed audiences, while the aggregation step lets it produce richer outputs — orderings, scores, or funding allocations — than single-choice voting.
29
+
30
+
## Two Modes: Estimation and Aggregation
31
+
32
+
Pairwise methods are used for two related but distinct purposes:
33
+
34
+
-**Estimation** — recovering a hidden ground-truth ranking from noisy observations. Voters are treated as imperfect measurements of an underlying reality. Examples: judging competition entries, scoring chess players, evaluating language models.
35
+
-**Aggregation** — synthesizing intentional preferences where no ground truth exists. Voters express what they want, and the algorithm combines their judgments into a collective outcome. Examples: prioritizing community proposals, allocating grant funding.
36
+
37
+
The two modes share the same core mechanic but call for different design choices around aggregation, vote weighting, and pair selection.
26
38
27
39
## How It Works
28
40
29
-
1.**Options are submitted** — projects, proposals, or items to be evaluated
30
-
2.**Participants receive pairs** — the system presents two options at a time
31
-
3.**Binary choices are made** — the participant selects which of the two they prefer (or indicates indifference)
32
-
4.**Many comparisons aggregate** — across all participants and pairs, a mathematical model builds a global ranking
33
-
5.**Rankings inform allocation** — the resulting preference ordering can drive funding distribution, prioritization, or governance decisions
41
+
1.**Items are submitted** — projects, proposals, or options to be evaluated.
42
+
2.**Pairs are surfaced** — the system presents two items at a time. Pair selection can be random, bucketed by category or tier, or actively chosen to maximize information.
43
+
3.**Binary judgments are made** — the participant picks one (or skips, or marks them equal).
44
+
4.**An aggregation algorithm produces a ranking** — common choices include the Bradley-Terry model, Elo, and spectral methods (eigenvector-based, related to PageRank). Each handles noise, sparsity, and intransitive preferences differently.
45
+
5.**The ranking informs allocation** — used as-is for prioritization, or mapped to funding weights, slates, or governance decisions.
46
+
47
+
## Origins
48
+
49
+
Pairwise comparison is one of the oldest measurement techniques in psychometrics. L. L. Thurstone's *Law of Comparative Judgment* (1927) established its statistical foundations. Variants have since been used for chess ratings (Elo, 1960s), web search (PageRank, 1998), peer-to-peer trust (EigenTrust), and recommendation systems. The application to public goods funding and online community decision-making is more recent.
34
50
35
51
## Advantages
36
52
37
-
-Dramatically reduces cognitive load — choosing between two things is easier than ranking many
38
-
-Produces richer preference data than simple plurality voting
39
-
-Resistant to strategic voting — harder to game when you don't see the full picture
40
-
-Handles large option sets gracefully — participants don't need to evaluate everything
41
-
-Accessible to participants without deep context on all options
53
+
-Each judgment is cognitively cheap — choosing between two things is easier than ranking many or assigning numeric scores.
54
+
-Accessible to participants without deep context on every option, since each comparison only requires familiarity with two items.
55
+
-Produces richer preference data than plurality voting and supports several aggregation algorithms with well-studied properties.
56
+
-Random pair surfacing distributes voter attention across the option set, reducing the marketing-driven dynamics that can dominate other allocation mechanisms.
57
+
-Resistant to certain forms of strategic voting — manipulating a global ranking through individual A-vs-B choices is harder than burying rivals on a ranked ballot.
42
58
43
59
## Limitations
44
60
45
-
-Requires many comparisons across participants to build reliable rankings
46
-
-May produce inconsistent results with small sample sizes
47
-
-Participants may fatigue if asked to make too many comparisons
48
-
-Doesn't capture intensity of preference — a slight and strong preference look the same
49
-
-Aggregation algorithms affect outcomes and may not be transparent to participants
61
+
-Reliable rankings require many comparisons; with k items, naive approaches need on the order of k² votes, though smart pair selection can reduce this substantially.
62
+
-Binary judgments don't capture intensity of preference, though Likert-style inputs (e.g. a 1–5 scale stored as a real-valued score) can recover it at the cost of a slightly heavier interface.
63
+
-Voters may fatigue across long sessions, especially without an engaging interface.
64
+
-The aggregation algorithm shapes outcomes in ways that may not be transparent to participants, so legitimacy depends on clear communication of how votes become weights.
65
+
-Less suited to small expert groups who would prefer to assign weights directly.
50
66
51
67
## Best Used When
52
68
53
-
- A large number of options must be ranked or prioritized
54
-
- Voters have limited time or context to evaluate all options
55
-
- Simple, accessible participation is a priority
56
-
- The goal is relative prioritization rather than absolute scoring
69
+
- A large set of options must be ranked or prioritized.
70
+
- The audience is broad and distributed, with limited time per voter but at least some baseline familiarity with the domain.
71
+
- Discovery is a goal — surfacing items voters wouldn't have sought out on their own.
57
72
58
73
## Examples and Use Cases
59
74
60
-
**Optimism RetroPGF** used pairwise comparison in certain rounds to help badgeholders evaluate and rank hundreds of projects for retroactive funding allocation.
75
+
**Optimism Retro Funding** has used pairwise comparison in some rounds to help badgeholders evaluate large slates of projects, with results aggregated using the Bradley-Terry model.
76
+
77
+
**Deep Funding** (2024), an Ethereum ecosystem initiative supported by Vitalik Buterin, uses pairwise judgments by a human jury to score competing AI-generated weight proposals for allocating funds across a project dependency graph.
78
+
79
+
**All Our Ideas** (Salganik & Levy, 2015) introduced "pairwise wiki surveys" as a research instrument, used by the New York City Mayor's Office to gather public input for the PlaNYC 2030 sustainability plan.
61
80
62
-
**General Fund**and other community funding tools have experimented with pairwise interfaces for participatory budgeting.
81
+
**LLM evaluation**— pairwise comparison is now the dominant method for ranking large language models, with platforms like Chatbot Arena aggregating millions of human A-vs-B judgments via Bradley-Terry or Elo.
63
82
64
-
**Academic and research applications**— pairwise comparison methods (Elo ratings, Bradley-Terry models) are widely used in recommendation systems, sports rankings, and preference learning.
83
+
**Sports, recommendation systems, and academic research**have used pairwise methods (Elo ratings, Bradley-Terry models, spectral ranking) for decades to produce rankings from competitive or preferential data.
65
84
66
85
## Further Reading
67
86
68
87
-[Pairwise.vote](https://pairwise.vote)
88
+
-[Wiki Surveys: Open and Quantifiable Social Data Collection](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123483) — Salganik & Levy, 2015
89
+
-[Decentralized Capital Allocation via Budgeting Boxes](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3317445) — Kronovet, Fischer & du Rose, 2018; the first description of pairwise methods for onchain public goods funding
90
+
-[The Pairwise Paradigm](http://kronosapiens.github.io/blog/2025/12/14/pairwise-paradigm) — overview of pairwise methods for capital allocation
0 commit comments