Merge pull request #28 from janbatzner/jan/acl-fixes-and-facct-tutorial

evijit · web-flow · commit 6608bcfa312f · 2026-05-11T10:33:36.000-04:00
fix typos, reorder, add FAQ and new FAccT tutorial page
diff --git a/_events/2026-acl-workshop.md b/_events/2026-acl-workshop.md
@@ -1,13 +1,13 @@
 ---
 layout: event
-title: 2026 ACL Workshop on Evaluating Evaluations (EvalEval)
+title: ACL 2026 Workshop on Evaluating Evaluations (EvalEval)
 subtitle: Examining Best Practices for Utilizing and Developing Generative Model Evaluations
 team: Mubashara Akhtar, Jan Batzner, Leshem Choshen, Avijit Ghosh, Usman Gohar, Jennifer Mickel, Ichhya Pant, Zeerak Talat
 status: active
 order: 1
 category: Organization
 event_date: 2026-7-04
-location: Room Harbor A
+location: San Diego (USA), Room Harbor A
 host: EvalEval
 description:  |
   This workshop focuses on AI evaluation in practice, centering the tensions and collaborations between model developers and evaluation researchers and aims to surface practical insights from across the evaluation ecosystem.
@@ -73,7 +73,7 @@ This panel brings together model developers and evaluation researchers to examin
 ### 🔬 4:25 PM – 5:15 PM | Shared Task *(50 mins)*
 **Moderator:** Jan Batzner, Weizenbaum Institute, Technical University Munich
 
-AI evaluation results are scattered across leaderboards, papers, blog posts, and harness logs in incompatible formats, with different frameworks producing divergent scores and inconsistent metadata that hinder comparison, reuse, and cost reduction. **Every Eval Ever** is the first shared schema and community-crowdsourced repository for AI evaluation results — source-agnostic by design and at unprecedented scale: **22,235 models, 2,273 unique benchmarks, and 31 evaluation formats — and growing**. At ACL, we present the [Every Eval Ever Shared Task](https://github.com/evaleval/every_eval_ever) and the community case studies it has enabled on this data. 🖊️
+AI evaluation results are scattered across leaderboards, papers, blog posts, and harness logs in incompatible formats, with different frameworks producing divergent scores and inconsistent metadata that hinder comparison, reuse, and [cost](https://evalevalai.com/research/2026/04/29/eval-costs-bottleneck/) reduction. [**Every Eval Ever**](https://evalevalai.com/infrastructure/2026/02/17/everyevalever-launch/) is the first unifying schema and community-crowdsourced repository for all AI evaluation results — source-agnostic by design and at unprecedented scale: over 22,235 models, 2,273 unique benchmarks, and 31 evaluation formats — and growing! At ACL, we present the [ACL Every Eval Ever Shared Task](https://evalevalai.com/events/shared-task-every-eval-ever/) and the fantastic community case studies it has already enabled on this [database](https://huggingface.co/datasets/evaleval/EEE_datastore). 
 
 ---
 
@@ -173,6 +173,15 @@ To support a fair, high-quality, and sustainable review process, we adopt a reci
 
 
 ## ❓ FAQ
+**Can I attend this workshop online?**
+The workshop is in-person at ACL 2026 in San Diego. At least one author of each accepted paper must present and register on-site.
+
+**How do I change from non-archival to archival?**
+Please indicate in the form sent to all accepted authors whether you want your submission to be considered archival or non-archival.
+
+**Do I need to upload a Camera Ready if I selected non-archival?**
+We encourage all authors to deanonymize their papers and incorporate the reviewers feedback in their revision. Nevertheless, Camera Ready instructions are tailored to archival submissions.
+
 **I'm waiting for my ARR decision — can I still submit to EvalEval?**
 Yes! If your paper is later accepted at ACL, you would simply choose our non-archival option.
 
@@ -185,9 +194,6 @@ Please refer to at least one our main topic areas outlined in the Call for Paper
 **Can I also submit in the ICML format?**
 No, please use the [ARR formatting](https://github.com/acl-org/acl-style-files).
 
-**Can I attend this workshop online?**
-The workshop is in-person at ACL 2026 in San Diego. At least one author of each accepted paper must present on-site.
-
 **My position paper is 6 pages. Does that work?**
 Yes, all submission types (research and positions/provocations) are welcome at any of the three length tiers.
 
diff --git a/_events/2026-facct-tutorial.md b/_events/2026-facct-tutorial.md
@@ -1,52 +1,48 @@
 ---
 layout: event
-title: "🎓 Every Eval Ever: Building Community-Governed AI Evaluation Infrastructure"
-subtitle: ACM FAccT 2026 Tutorial
-team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Avijit Ghosh, Angelie Kraft, Wm. Matthew Kennedy, Leon Staufer, David Hartmann, Usman Gohar, Michelle Lin, Yanan Long, Jennifer Mickel, Leshem Choshen, Irene Solaiman
+title: "FAccT 2026 Tutorial on Every Eval Ever"
+subtitle: Building Community-Governed AI Evaluation Infrastructure
+team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Avijit Ghosh, Angelie Kraft, Usman Gohar, Michelle Lin, Yanan Long, Jennifer Mickel, Wm. Matthew Kennedy, Leon Staufer, David Hartmann, Leshem Choshen*, Irene Solaiman*
 status: active
 order: 1
 category: Organization
 event_date: 2026-06-26
-location: ACM FAccT, Montreal
+location: Montreal (Canada)
 host: EvalEval
 description:  |
-  A FAccT 2026 tutorial walking through Every Eval Ever — a community-governed open source infrastructure that unifies evaluation results under a shared metadata schema — and Evaluation Cards, an interpretive layer for evaluation reporting.
+  A FAccT 2026 tutorial walking through Every Eval Ever — a community-governed open source infrastructure unifying evaluation results under a shared metadata schema — and Evaluation Cards, an interpretive layer for evaluation reporting.
 ---
 
-## 📖 About
-
+## 🪧 About
 Existing model evaluation results are scattered across leaderboards, papers, and technical reports in incompatible formats. This fragmentation obscures transparency, hinders progress, and disadvantages researchers, civil society, policymakers, and industry alike, especially those who can't afford to run evaluations from scratch. Built once, shared eval infrastructure serves us all.
 
-In this tutorial, we walk through [Every Eval Ever](https://github.com/evaleval/every_eval_ever), a community-governed open source infrastructure that unifies all evaluation results under a shared metadata schema. We then present **Evaluation Cards**, an interface and interpretive layer for evaluation reporting designed around practitioner needs from stakeholder interviews, and show how participants can find, compare, and contribute evaluations themselves.
-
-All technical experience levels are welcome. **If you can, please bring a laptop or tablet!** 💻
+In this tutorial, we walk through [**Every Eval Ever**](https://evalevalai.com/infrastructure/2026/02/17/everyevalever-launch/), a community-governed open source infrastructure that unifies all evaluation results under a shared metadata schema. We then present [**Evaluation Cards**](https://evalcards.evalevalai.com), an interface and interpretive layer for evaluation reporting designed around practitioner needs from stakeholder interviews, and show how participants can find, compare, and contribute evaluations themselves.
 
-## 📅 Date & Location
+All technical experience levels are welcome. If you can, please bring a laptop or tablet! 💻
 
-- **When:** Friday, June 26, 2026 · 3:00 – 4:00 PM
+## 📅 In-Person FAccT Tutorial
+- **When:** Friday, June 26, 2026 · 3:00 – 4:00 PM (Canada)
 - **Where:** ACM FAccT 2026, Montreal (in person)
 
-## 🎤 Presenters
+## 🌐 Online FAccT Tutorial
+- **When:** Friday, June 26, 2026 · TBD
+- **Where:** Zoom Video Conference
 
+## 🏛️ Tutorial Program Committee
 - Jan Batzner, Weizenbaum Institute, Technical University Munich
 - Sree Harsha Nelaturu, Zuse Institute
 - Anastassia Kornilova, Trustible
 - Avijit Ghosh, Hugging Face
 - Angelie Kraft, Weizenbaum Institute
-- Wm. Matthew Kennedy, Oxford
-- Leon Staufer, Cambridge
-- David Hartmann, Weizenbaum Institute
 - Usman Gohar, Iowa State University
 - Michelle Lin, Mila, Quebec AI Institute
 - Yanan Long, StickFluxLabs
 - Jennifer Mickel, EleutherAI
-- Leshem Choshen, MIT, IBM Research
-- Irene Solaiman, Hugging Face
-
-## 🏛️ Organizers
-
-[EvalEval Coalition](/about/)
+- Wm. Matthew Kennedy, University of Oxford
+- Leon Staufer, University of Cambridge
+- David Hartmann, Weizenbaum Institute
+- Leshem Choshen*, MIT, IBM Research, MIT-IBM Watson AI Lab
+- Irene Solaiman*, Hugging Face
 
 ## 📬 Contact
-
-- [evalevalpc@googlegroups.com](mailto:evalevalpc@googlegroups.com)
+We are looking forward to meeting you! For any questions, please reach the [EvalEval Organizing Team here](mailto:evalevalpc@googlegroups.com).
diff --git a/_events/shared-task-every-eval-ever.md b/_events/shared-task-every-eval-ever.md
@@ -1,13 +1,14 @@
 ---
 layout: event
-title: "Shared Task: Every Eval Ever"
+title: "ACL Shared Task on Every Eval Ever"
 subtitle: Building a Unifying, Standardized Database of LLM Evaluations
 status: active
 order: 2
 category: Infrastructure
 event_date: 2026-05-01
 location: 🌐 Online
 host: EvalEval
+team: Jan Batzner, Sree Harsha Nelaturu, Anastassia Kornilova, Damian Stachura, Stella Biderman, Irene Solaiman, Avijit Ghosh, Leshem Choshen
 description: |
   Help us build the first unifying, open database of LLM evaluation results! Convert evaluation data from leaderboards, papers, or your own runs into a shared format — and join as co-author on the resulting paper.
 ---