Summary
The scoring evaluation in docs/scoring-evaluation.md has compelling findings (Copeland-Borda 86% concordance, weighted as outlier) that deserve a proper academic write-up. An arXiv preprint would:
- Establish thinktank as a research-backed tool, not just another CLI
- Make the work discoverable by researchers working on ensemble code generation
- Create a citable reference for the social-choice-theory approach to agent selection
- Attract academic contributors and reviewers
Proposed paper structure
Title
"Ensemble AI Coding: Applying Social Choice Theory to Multi-Agent Code Selection"
Sections
- Introduction — the ensemble coding hypothesis, pass@k evidence
- Related Work — AlphaCode, CodeT, MBR-Exec, SWE-bench, Kambhampati LLM-Modulo
- System Design — thinktank architecture, worktree isolation, convergence analysis
- Scoring Methods — Weighted Sum, Copeland, Borda (formal definitions)
- Experimental Setup — controlled experiments with fixed N=5 across diverse tasks
- Results — agreement rates, Friedman test, Kendall's W, effect sizes
- Discussion — why pairwise methods outperform, limitations, when weighted is appropriate
- Conclusion — Copeland as default, future work (LLM-as-judge, cross-project)
What needs to happen
References to cite
- Arrow (1951) Social Choice and Individual Values
- Merlin & Valognes (2004) Condorcet-Borda coincidence
- Tetlock & Gardner (2015) Superforecasting
- Kambhampati (2024) LLM-Modulo framework
- Li et al (2022) AlphaCode
- Chen et al (2022) CodeT
- Wang et al (2022) Self-Consistency
Summary
The scoring evaluation in docs/scoring-evaluation.md has compelling findings (Copeland-Borda 86% concordance, weighted as outlier) that deserve a proper academic write-up. An arXiv preprint would:
Proposed paper structure
Title
"Ensemble AI Coding: Applying Social Choice Theory to Multi-Agent Code Selection"
Sections
What needs to happen
References to cite