Skip to content

Commit f2ba26a

Browse files
committed
chore: update auto-revision script and documentation
1 parent e958894 commit f2ba26a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+5846
-4882
lines changed

README.md

Lines changed: 277 additions & 121 deletions
Large diffs are not rendered by default.

data/papers_data_analysis.yaml

Lines changed: 58 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,82 +1,67 @@
11
# Data Analysis
2-
# Auto-generated from papers_raw/taxonomy.tex and papers_raw/main.bib
2+
# Auto-generated from taxonomy.tex and BibTeX file
33

4-
- short_name: "SWE-bench Verified"
5-
title: "Introducing SWE-bench Verified | OpenAI"
6-
authors: "OpenAI"
7-
venue: "arXiv 2024"
8-
year: "2024"
4+
- short_name: SWE-bench Verified
5+
title: Introducing SWE-bench Verified | OpenAI
6+
authors: OpenAI
7+
year: '2024'
8+
venue: '2024'
9+
- short_name: Patch Correctness
10+
title: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study
11+
authors: You Wang, Michael Pradel, Zhongxin Liu
12+
year: '2025'
13+
venue: arXiv preprint arXiv:2503.15223 2025
914
links:
10-
arxiv: "https://openai.com/index/introducing-swe-bench-verified/"
11-
12-
- short_name: "SWE-Bench+"
13-
title: "SWE-Bench+: Enhanced Coding Benchmark for LLMs"
14-
authors: "Reem Aleithan, Haoran Xue, Mohammad Mahdi Mohajer, Elijah Nnorom, Gias Uddin, Song Wang"
15-
venue: "arXiv 2024"
16-
year: "2024"
15+
arxiv: https://arxiv.org/abs/2503.15223
16+
- short_name: UTBoost
17+
title: 'UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench'
18+
authors: Boxi Yu, Yuxuan Zhu, Pinjia He, Daniel Kang
19+
year: '2025'
20+
venue: arXiv preprint arXiv:2506.09289 2025
1721
links:
18-
arxiv: "https://arxiv.org/abs/2410.06992"
19-
20-
- short_name: "Patch Correctness"
21-
title: "Are \"Solved Issues\" in SWE-bench Really Solved Correctly? An Empirical Study"
22-
authors: "You Wang, Michael Pradel, Zhongxin Liu"
23-
venue: "arXiv 2025"
24-
year: "2025"
22+
arxiv: https://arxiv.org/abs/2506.09289
23+
- short_name: Trustworthiness
24+
title: Is Your Automated Software Engineer Trustworthy?
25+
authors: Noble Saji Mathews, Meiyappan Nagappan
26+
year: '2025'
27+
venue: arXiv preprint arXiv:2506.17812 2025
2528
links:
26-
arxiv: "http://arxiv.org/abs/2503.15223"
27-
28-
- short_name: "UTBoost"
29-
title: "UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench"
30-
authors: "Boxi Yu, Yuxuan Zhu, Pinjia He, Daniel Kang"
31-
venue: "arXiv 2025"
32-
year: "2025"
29+
arxiv: https://arxiv.org/abs/2506.17812
30+
- short_name: Rigorous agentic benchmarks
31+
title: Establishing Best Practices for Building Rigorous Agentic Benchmarks
32+
authors: Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha
33+
Cui, Sayash Kapoor et al.
34+
year: '2025'
35+
venue: arXiv preprint arXiv:2507.02825 2025
3336
links:
34-
arxiv: "https://arxiv.org/abs/2506.09289"
35-
36-
- short_name: "Trustworthiness"
37-
title: "Is Your Automated Software Engineer Trustworthy?"
38-
authors: "Noble Saji Mathews, Meiyappan Nagappan"
39-
venue: "arXiv 2025"
40-
year: "2025"
37+
arxiv: https://arxiv.org/abs/2507.02825
38+
- short_name: The SWE-Bench Illusion
39+
title: 'The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason'
40+
authors: Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam
41+
year: '2025'
42+
venue: arXiv preprint arXiv:2506.12286 2025
4143
links:
42-
arxiv: "https://arxiv.org/abs/2506.17812"
43-
44-
- short_name: "Rigorous agentic benchmarks"
45-
title: "Establishing Best Practices for Building Rigorous Agentic Benchmarks"
46-
authors: "Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha Cui, Sayash Kapoor, Shayne Longpre, Kevin Meng, Rebecca Weiss, Fazl Barez, Rahul Gupta, Jwala Dhamala, Jacob Merizian, Mario Giulianelli, Harry Coppock, Cozmin Ududec, Jasjeet Sekhon, Jacob Steinhardt, Antony Kellermann, Sarah Schwettmann, Matei Zaharia, Ion Stoica, Percy Liang, Daniel Kang"
47-
venue: "arXiv 2025"
48-
year: "2025"
44+
arxiv: https://arxiv.org/abs/2506.12286
45+
- short_name: Revisiting SWE-Bench
46+
title: 'Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code
47+
Models'
48+
authors: Aleithan, Reem
49+
year: '2025'
50+
venue: '2025 IEEE/ACM 47th International Conference on Software Engineering: Companion
51+
Proceedings (ICSE-Companion) 2025'
4952
links:
50-
arxiv: "https://arxiv.org/abs/2507.02825"
51-
52-
- short_name: "The SWE-Bench Illusion"
53-
title: "The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason"
54-
authors: "Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam"
55-
venue: "arXiv 2025"
56-
year: "2025"
53+
doi: http://dx.doi.org/10.1109/ICSE-Companion66252.2025.00075
54+
- short_name: SPICE
55+
title: "SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity,\n \
56+
\ Test Coverage, and Effort Estimation"
57+
authors: Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang,
58+
Yihao Chen, Zhilong Chen, Arthur Leung et al.
59+
year: '2025'
60+
venue: ASE 2025
61+
- short_name: Data contamination
62+
title: Does SWE-Bench-Verified Test Agent Ability or Model Memory?
63+
authors: Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan
64+
year: '2025'
65+
venue: arXiv preprint arXiv:2512.10218 2025
5766
links:
58-
arxiv: "https://arxiv.org/abs/2506.12286"
59-
60-
- short_name: "Revisiting SWE-Bench"
61-
title: "Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code Models"
62-
authors: "Reem Aleithan"
63-
venue: "2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) 2025"
64-
year: "2025"
65-
links:
66-
67-
- short_name: "SPICE"
68-
title: "SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation"
69-
authors: "Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang, Yihao Chen, Zhilong Chen, Arthur Leung, Dayi Lin, Boyuan Chen, Ahmed E. Hassan"
70-
venue: "arXiv 2025"
71-
year: "2025"
72-
links:
73-
arxiv: "https://arxiv.org/abs/2507.09108"
74-
75-
- short_name: "Data contamination"
76-
title: "Does SWE-Bench-Verified Test Agent Ability or Model Memory?"
77-
authors: "Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan"
78-
venue: "arXiv 2025"
79-
year: "2025"
80-
links:
81-
arxiv: "https://arxiv.org/abs/2512.10218"
82-
67+
arxiv: https://arxiv.org/abs/2512.10218

data/papers_data_collection.yaml

Lines changed: 50 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,58 @@
11
# Data Collection
2-
# Auto-generated from papers_raw/taxonomy.tex and papers_raw/main.bib
2+
# Auto-generated from taxonomy.tex and BibTeX file
33

4-
- short_name: "SWE-rebench"
5-
title: "SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents"
6-
authors: "Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich, Anton Shevtsov, Simon Karasik, Andrei Andriushchenko, Maria Trofimova, Daria Litvintseva, Boris Yangel"
7-
venue: "arXiv 2025"
8-
year: "2025"
4+
- short_name: SWE-rebench
5+
title: 'SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated
6+
Evaluation of Software Engineering Agents'
7+
authors: Ibragim Badertdinov, Alexander Golubev, Maksim Nekrashevich, Anton Shevtsov,
8+
Simon Karasik, Andrei Andriushchenko, Maria Trofimova et al.
9+
year: '2025'
10+
venue: The Thirty-ninth Annual Conference on Neural Information Processing Systems
11+
Datasets and Benchmarks Track 2025
912
links:
10-
arxiv: "https://arxiv.org/abs/2505.20411"
11-
12-
- short_name: "RepoLaunch"
13-
title: "SWE-bench Goes Live!"
14-
authors: "Linghao Zhang, Shilin He, Chaoyun Zhang, Yu Kang, Bowen Li, Chengxing Xie, Junhao Wang, Maoquan Wang, Yufan Huang, Shengyu Fu, Elsie Nallipogu, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, Dongmei Zhang"
15-
venue: "arXiv 2025"
16-
year: "2025"
13+
openreview: https://openreview.net/forum?id=nMpJoVmRy1
14+
- short_name: RepoLaunch
15+
title: SWE-bench Goes Live!
16+
authors: Linghao Zhang, Shilin He, Chaoyun Zhang, Yu Kang, Bowen Li, Chengxing Xie,
17+
Junhao Wang et al.
18+
year: '2025'
19+
venue: The Thirty-ninth Annual Conference on Neural Information Processing Systems
20+
Datasets and Benchmarks Track 2025
1721
links:
18-
arxiv: "https://arxiv.org/abs/2505.23419"
19-
20-
- short_name: "SWE-Factory"
21-
title: "SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks"
22-
authors: "Lianghong Guo, Yanlin Wang, Caihua Li, Pengyu Yang, Jiachi Chen, Wei Tao, Yingtian Zou, Duyu Tang, Zibin Zheng"
23-
venue: "arXiv 2025"
24-
year: "2025"
22+
openreview: https://openreview.net/forum?id=OGWkr7gXka
23+
- short_name: SWE-Factory
24+
title: 'SWE-Factory: Your Automated Factory for Issue Resolution Training Data and
25+
Evaluation Benchmarks'
26+
authors: Lianghong Guo, Yanlin Wang, Caihua Li, Wei Tao, Pengyu Yang, Jiachi Chen,
27+
Haoyu Song et al.
28+
year: '2025'
29+
venue: arXiv preprint arXiv:2506.10954 2025
2530
links:
26-
arxiv: "https://arxiv.org/abs/2506.10954"
27-
28-
- short_name: "SWE-MERA"
29-
title: "SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering Tasks"
30-
authors: "Pavel Adamenko, Mikhail Ivanov, Aidar Valeev, Rodion Levichev, Pavel Zadorozhny, Ivan Lopatin, Dmitry Babayev, Alena Fenogenova, Valentin Malykh"
31-
venue: "arXiv 2025"
32-
year: "2025"
31+
arxiv: https://arxiv.org/abs/2506.10954
32+
- short_name: SWE-MERA
33+
title: 'SWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models
34+
on Software Engineering Tasks'
35+
authors: Pavel Adamenko, Mikhail Ivanov, Aidar Valeev, Rodion Levichev, Pavel Zadorozhny,
36+
Ivan Lopatin, Dmitry Babayev et al.
37+
year: '2025'
38+
venue: arXiv preprint arXiv:2507.11059 2025
3339
links:
34-
arxiv: "https://arxiv.org/abs/2507.11059"
35-
36-
- short_name: "RepoForge"
37-
title: "RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale"
38-
authors: "Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan"
39-
venue: "arXiv 2025"
40-
year: "2025"
40+
arxiv: https://arxiv.org/abs/2507.11059
41+
- short_name: RepoForge
42+
title: 'RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data
43+
Curation Pipeline Synergizing SFT and RL at Scale'
44+
authors: Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur
45+
Leung, Gopi Krishnan Rajbahadur et al.
46+
year: '2025'
47+
venue: arXiv preprint arXiv:2508.01550 2025
4148
links:
42-
arxiv: "https://arxiv.org/abs/2508.01550"
43-
44-
- short_name: "Multi-Docker-Eval"
45-
title: "Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering"
46-
authors: "Kelin Fu, Tianyu Liu, Zeyu Shang, Yingwei Ma, Jian Yang, Jiaheng Liu, Kaigui Bian"
47-
venue: "arXiv 2025"
48-
year: "2025"
49+
arxiv: https://arxiv.org/abs/2508.01550
50+
- short_name: Multi-Docker-Eval
51+
title: 'Multi-Docker-Eval: A `Shovel of the Gold Rush'' Benchmark on Automatic Environment
52+
Building for Software Engineering'
53+
authors: Kelin Fu, Tianyu Liu, Zeyu Shang, Yingwei Ma, Jian Yang, Jiaheng Liu, Kaigui
54+
Bian
55+
year: '2025'
56+
venue: arXiv preprint arXiv:2512.06915 2025
4957
links:
50-
arxiv: "https://arxiv.org/abs/2512.06915"
51-
58+
arxiv: https://arxiv.org/abs/2512.06915

data/papers_data_synthesis.yaml

Lines changed: 46 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,54 @@
11
# Data Synthesis
2-
# Auto-generated from papers_raw/taxonomy.tex and papers_raw/main.bib
2+
# Auto-generated from taxonomy.tex and BibTeX file
33

4-
- short_name: "Learn-by-interact"
5-
title: "Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments"
6-
authors: "Hongjin SU, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan O Arik"
7-
venue: "The Thirteenth International Conference on Learning Representations 2025"
8-
year: "2025"
4+
- short_name: Learn-by-interact
5+
title: 'Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in
6+
Realistic Environments'
7+
authors: Hongjin SU, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan O Arik
8+
year: '2025'
9+
venue: The Thirteenth International Conference on Learning Representations 2025
910
links:
10-
arxiv: "https://openreview.net/forum?id=3UKOzGWCVY"
11-
12-
- short_name: "R2E-Gym"
13-
title: "R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents"
14-
authors: "Naman Jain, Jaskirat Singh, Manish Shetty, Liang Zheng, Koushik Sen, Ion Stoica"
15-
venue: "arXiv 2025"
16-
year: "2025"
11+
openreview: https://openreview.net/forum?id=3UKOzGWCVY
12+
- short_name: R2E-Gym
13+
title: 'R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling
14+
Open-Weights SWE Agents'
15+
authors: Naman Jain, Jaskirat Singh, Manish Shetty, Tianjun Zhang, Liang Zheng,
16+
Koushik Sen, Ion Stoica
17+
year: '2025'
18+
venue: Second Conference on Language Modeling 2025
1719
links:
18-
arxiv: "https://arxiv.org/abs/2504.07164"
19-
20-
- short_name: "SWE-Synth"
21-
title: "SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs"
22-
authors: "Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen, Nghi D. Q. Bui"
23-
venue: "arXiv 2025"
24-
year: "2025"
20+
openreview: https://openreview.net/forum?id=7evvwwdo3z
21+
- short_name: SWE-Synth
22+
title: 'SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language
23+
Models in Resolving Real-World Bugs'
24+
authors: Minh V. T. Pham, Huy N. Phan, Hoang N. Phan, Cuong Le Chi, Tien N. Nguyen,
25+
Nghi D. Q. Bui
26+
year: '2025'
27+
venue: arXiv preprint arXiv:2504.14757 2025
2528
links:
26-
arxiv: "https://arxiv.org/abs/2504.14757"
27-
28-
- short_name: "SWE-smith"
29-
title: "SWE-smith: Scaling Data for Software Engineering Agents"
30-
authors: "John Yang, Kilian Lieret, Carlos E. Jimenez, Alexander Wettig, Kabir Khandpur, Yanzhe Zhang, Binyuan Hui, Ofir Press, Ludwig Schmidt, Diyi Yang"
31-
venue: "arXiv 2025"
32-
year: "2025"
29+
arxiv: https://arxiv.org/abs/2504.14757
30+
- short_name: SWE-smith
31+
title: 'SWE-smith: Scaling Data for Software Engineering Agents'
32+
authors: John Yang, Kilian Lieret, Carlos E Jimenez, Alexander Wettig, Kabir Khandpur,
33+
Yanzhe Zhang, Binyuan Hui et al.
34+
year: '2025'
35+
venue: The Thirty-ninth Annual Conference on Neural Information Processing Systems
36+
Datasets and Benchmarks Track 2025
3337
links:
34-
arxiv: "https://arxiv.org/abs/2504.21798"
35-
36-
- short_name: "SWE-Flow"
37-
title: "SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner"
38-
authors: "Lei Zhang, Jiaxi Yang, Min Yang, Jian Yang, Mouxiang Chen, Jiajun Zhang, Zeyu Cui, Binyuan Hui, Junyang Lin"
39-
venue: "arXiv 2025"
40-
year: "2025"
38+
openreview: https://openreview.net/forum?id=63iVrXc8cC
39+
- short_name: SWE-Flow
40+
title: Synthesizing Software Engineering Data in a Test-Driven Manner
41+
authors: Lei Zhang, Jiaxi Yang, Min Yang, Jian Yang, Mouxiang Chen, Jiajun Zhang,
42+
Zeyu Cui et al.
43+
year: '2025'
44+
venue: Forty-second International Conference on Machine Learning 2025
4145
links:
42-
arxiv: "https://arxiv.org/abs/2506.09003"
43-
44-
- short_name: "SWE-Mirror"
45-
title: "SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories"
46-
authors: "Junhao Wang, Daoguang Zan, Shulin Xin, Siyao Liu, Yurong Wu, Kai Shen"
47-
venue: "arXiv 2025"
48-
year: "2025"
46+
openreview: https://openreview.net/forum?id=P9DQ2IExgS
47+
- short_name: SWE-Mirror
48+
title: 'SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across
49+
Repositories'
50+
authors: Junhao Wang, Daoguang Zan, Shulin Xin, Siyao Liu, Yurong Wu, Kai Shen
51+
year: '2025'
52+
venue: arXiv preprint arXiv:2509.08724 2025
4953
links:
50-
arxiv: "https://arxiv.org/abs/2509.08724"
51-
54+
arxiv: https://arxiv.org/abs/2509.08724

0 commit comments

Comments
 (0)