Skip to content

Commit 6608bcf

Browse files
authored
Merge pull request #28 from janbatzner/jan/acl-fixes-and-facct-tutorial
fix typos, reorder, add FAQ and new FAccT tutorial page
2 parents cdb2d97 + b246ad7 commit 6608bcf

3 files changed

Lines changed: 34 additions & 31 deletions

File tree

_events/2026-acl-workshop.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
layout: event
3-
title: 2026 ACL Workshop on Evaluating Evaluations (EvalEval)
3+
title: ACL 2026 Workshop on Evaluating Evaluations (EvalEval)
44
subtitle: Examining Best Practices for Utilizing and Developing Generative Model Evaluations
55
team: Mubashara Akhtar, Jan Batzner, Leshem Choshen, Avijit Ghosh, Usman Gohar, Jennifer Mickel, Ichhya Pant, Zeerak Talat
66
status: active
77
order: 1
88
category: Organization
99
event_date: 2026-7-04
10-
location: Room Harbor A
10+
location: San Diego (USA), Room Harbor A
1111
host: EvalEval
1212
description: |
1313
This workshop focuses on AI evaluation in practice, centering the tensions and collaborations between model developers and evaluation researchers and aims to surface practical insights from across the evaluation ecosystem.
@@ -73,7 +73,7 @@ This panel brings together model developers and evaluation researchers to examin
7373
### 🔬 4:25 PM – 5:15 PM | Shared Task *(50 mins)*
7474
**Moderator:** Jan Batzner, Weizenbaum Institute, Technical University Munich
7575

76-
AI evaluation results are scattered across leaderboards, papers, blog posts, and harness logs in incompatible formats, with different frameworks producing divergent scores and inconsistent metadata that hinder comparison, reuse, and cost reduction. **Every Eval Ever** is the first shared schema and community-crowdsourced repository for AI evaluation results — source-agnostic by design and at unprecedented scale: **22,235 models, 2,273 unique benchmarks, and 31 evaluation formats — and growing**. At ACL, we present the [Every Eval Ever Shared Task](https://github.com/evaleval/every_eval_ever) and the community case studies it has enabled on this data. 🖊️
76+
AI evaluation results are scattered across leaderboards, papers, blog posts, and harness logs in incompatible formats, with different frameworks producing divergent scores and inconsistent metadata that hinder comparison, reuse, and [cost](https://evalevalai.com/research/2026/04/29/eval-costs-bottleneck/) reduction. [**Every Eval Ever**](https://evalevalai.com/infrastructure/2026/02/17/everyevalever-launch/) is the first unifying schema and community-crowdsourced repository for all AI evaluation results — source-agnostic by design and at unprecedented scale: over 22,235 models, 2,273 unique benchmarks, and 31 evaluation formats — and growing! At ACL, we present the [ACL Every Eval Ever Shared Task](https://evalevalai.com/events/shared-task-every-eval-ever/) and the fantastic community case studies it has already enabled on this [database](https://huggingface.co/datasets/evaleval/EEE_datastore).
7777

7878
---
7979

@@ -173,6 +173,15 @@ To support a fair, high-quality, and sustainable review process, we adopt a reci
173173

174174

175175
## ❓ FAQ
176+
**Can I attend this workshop online?**
177+
The workshop is in-person at ACL 2026 in San Diego. At least one author of each accepted paper must present and register on-site.
178+
179+
**How do I change from non-archival to archival?**
180+
Please indicate in the form sent to all accepted authors whether you want your submission to be considered archival or non-archival.
181+
182+
**Do I need to upload a Camera Ready if I selected non-archival?**
183+
We encourage all authors to deanonymize their papers and incorporate the reviewers feedback in their revision. Nevertheless, Camera Ready instructions are tailored to archival submissions.
184+
176185
**I'm waiting for my ARR decision — can I still submit to EvalEval?**
177186
Yes! If your paper is later accepted at ACL, you would simply choose our non-archival option.
178187

@@ -185,9 +194,6 @@ Please refer to at least one our main topic areas outlined in the Call for Paper
185194
**Can I also submit in the ICML format?**
186195
No, please use the [ARR formatting](https://github.com/acl-org/acl-style-files).
187196

188-
**Can I attend this workshop online?**
189-
The workshop is in-person at ACL 2026 in San Diego. At least one author of each accepted paper must present on-site.
190-
191197
**My position paper is 6 pages. Does that work?**
192198
Yes, all submission types (research and positions/provocations) are welcome at any of the three length tiers.
193199

_events/2026-facct-tutorial.md

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,48 @@
11
---
22
layout: event
3-
title: "🎓 Every Eval Ever: Building Community-Governed AI Evaluation Infrastructure"
4-
subtitle: ACM FAccT 2026 Tutorial
5-
team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Avijit Ghosh, Angelie Kraft, Wm. Matthew Kennedy, Leon Staufer, David Hartmann, Usman Gohar, Michelle Lin, Yanan Long, Jennifer Mickel, Leshem Choshen, Irene Solaiman
3+
title: "FAccT 2026 Tutorial on Every Eval Ever"
4+
subtitle: Building Community-Governed AI Evaluation Infrastructure
5+
team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Avijit Ghosh, Angelie Kraft, Usman Gohar, Michelle Lin, Yanan Long, Jennifer Mickel, Wm. Matthew Kennedy, Leon Staufer, David Hartmann, Leshem Choshen*, Irene Solaiman*
66
status: active
77
order: 1
88
category: Organization
99
event_date: 2026-06-26
10-
location: ACM FAccT, Montreal
10+
location: Montreal (Canada)
1111
host: EvalEval
1212
description: |
13-
A FAccT 2026 tutorial walking through Every Eval Ever — a community-governed open source infrastructure that unifies evaluation results under a shared metadata schema — and Evaluation Cards, an interpretive layer for evaluation reporting.
13+
A FAccT 2026 tutorial walking through Every Eval Ever — a community-governed open source infrastructure unifying evaluation results under a shared metadata schema — and Evaluation Cards, an interpretive layer for evaluation reporting.
1414
---
1515

16-
## 📖 About
17-
16+
## 🪧 About
1817
Existing model evaluation results are scattered across leaderboards, papers, and technical reports in incompatible formats. This fragmentation obscures transparency, hinders progress, and disadvantages researchers, civil society, policymakers, and industry alike, especially those who can't afford to run evaluations from scratch. Built once, shared eval infrastructure serves us all.
1918

20-
In this tutorial, we walk through [Every Eval Ever](https://github.com/evaleval/every_eval_ever), a community-governed open source infrastructure that unifies all evaluation results under a shared metadata schema. We then present **Evaluation Cards**, an interface and interpretive layer for evaluation reporting designed around practitioner needs from stakeholder interviews, and show how participants can find, compare, and contribute evaluations themselves.
21-
22-
All technical experience levels are welcome. **If you can, please bring a laptop or tablet!** 💻
19+
In this tutorial, we walk through [**Every Eval Ever**](https://evalevalai.com/infrastructure/2026/02/17/everyevalever-launch/), a community-governed open source infrastructure that unifies all evaluation results under a shared metadata schema. We then present [**Evaluation Cards**](https://evalcards.evalevalai.com), an interface and interpretive layer for evaluation reporting designed around practitioner needs from stakeholder interviews, and show how participants can find, compare, and contribute evaluations themselves.
2320

24-
## 📅 Date & Location
21+
All technical experience levels are welcome. If you can, please bring a laptop or tablet! 💻
2522

26-
- **When:** Friday, June 26, 2026 · 3:00 – 4:00 PM
23+
## 📅 In-Person FAccT Tutorial
24+
- **When:** Friday, June 26, 2026 · 3:00 – 4:00 PM (Canada)
2725
- **Where:** ACM FAccT 2026, Montreal (in person)
2826

29-
## 🎤 Presenters
27+
## 🌐 Online FAccT Tutorial
28+
- **When:** Friday, June 26, 2026 · TBD
29+
- **Where:** Zoom Video Conference
3030

31+
## 🏛️ Tutorial Program Committee
3132
- Jan Batzner, Weizenbaum Institute, Technical University Munich
3233
- Sree Harsha Nelaturu, Zuse Institute
3334
- Anastassia Kornilova, Trustible
3435
- Avijit Ghosh, Hugging Face
3536
- Angelie Kraft, Weizenbaum Institute
36-
- Wm. Matthew Kennedy, Oxford
37-
- Leon Staufer, Cambridge
38-
- David Hartmann, Weizenbaum Institute
3937
- Usman Gohar, Iowa State University
4038
- Michelle Lin, Mila, Quebec AI Institute
4139
- Yanan Long, StickFluxLabs
4240
- Jennifer Mickel, EleutherAI
43-
- Leshem Choshen, MIT, IBM Research
44-
- Irene Solaiman, Hugging Face
45-
46-
## 🏛️ Organizers
47-
48-
[EvalEval Coalition](/about/)
41+
- Wm. Matthew Kennedy, University of Oxford
42+
- Leon Staufer, University of Cambridge
43+
- David Hartmann, Weizenbaum Institute
44+
- Leshem Choshen*, MIT, IBM Research, MIT-IBM Watson AI Lab
45+
- Irene Solaiman*, Hugging Face
4946

5047
## 📬 Contact
51-
52-
- [evalevalpc@googlegroups.com](mailto:evalevalpc@googlegroups.com)
48+
We are looking forward to meeting you! For any questions, please reach the [EvalEval Organizing Team here](mailto:evalevalpc@googlegroups.com).

_events/shared-task-every-eval-ever.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
---
22
layout: event
3-
title: "Shared Task: Every Eval Ever"
3+
title: "ACL Shared Task on Every Eval Ever"
44
subtitle: Building a Unifying, Standardized Database of LLM Evaluations
55
status: active
66
order: 2
77
category: Infrastructure
88
event_date: 2026-05-01
99
location: 🌐 Online
1010
host: EvalEval
11+
team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Damian Stachura, Stella Biderman, Irene Solaiman, Avijit Ghosh, Leshem Choshen
1112
description: |
1213
Help us build the first unifying, open database of LLM evaluation results! Convert evaluation data from leaderboards, papers, or your own runs into a shared format — and join as co-author on the resulting paper.
1314
---

0 commit comments

Comments
 (0)