Skip to content

Commit 5fe0bed

Browse files
committed
fix: add back 12 papers incorrectly filtered during migration
Classifier was too strict — surveys, empirical studies, and position papers ABOUT code agents are also in scope. Updated classifier prompt to include: - Papers that survey/review/empirically evaluate code/CLI agents - Papers studying impact or behaviour of AI coding tools Manually classified and added 12 papers: - 5x issue_resolution (surveys + empirical) - 3x code_generation (empirical) - 1x foundation_models (general LLM agent survey) - 1x terminal (OS agents survey) - 1x qa (empirical) - 1x issue_resolution (position)
1 parent 93aab5f commit 5fe0bed

7 files changed

Lines changed: 201 additions & 66 deletions

File tree

automation/classifier/llm.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,11 @@ def _build_user_prompt(
8484
- The primary contribution is a general NLP/ML method that happens to be evaluated
8585
on a code dataset, but the method itself is not about code-executing agents.
8686
87-
Mark relevant=true if the agent uses code execution or CLI as a primary action,
88-
regardless of the end task (software engineering, data analysis, science, games,
89-
embodied control via code, web tasks via code, etc.).
87+
Mark relevant=true if ANY of the following:
88+
- The agent uses code execution or CLI as a primary action (regardless of end task).
89+
- The paper surveys, systematically reviews, or empirically evaluates code/CLI agents.
90+
- The paper proposes a benchmark or dataset for evaluating code/CLI agents.
91+
- The paper studies the impact, behaviour, or limitations of AI coding tools/agents.
9092
9193
Other rules:
9294
- Choose the SINGLE most specific functional category (e.g. code_generation,

data/papers_code_generation.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,40 @@
1+
- title: Is Multi-Agent Debate (MAD) the Silver Bullet? An Empirical Analysis of MAD in Code Summarization and Translation
2+
authors: Jina Chun, Qihong Chen, Jiawei Li, Iftekhar Ahmed
3+
venue: arXiv 2025
4+
tags:
5+
- empirical
6+
links:
7+
paper: https://arxiv.org/abs/2503.12029
8+
github: ''
9+
website: ''
10+
- title: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
11+
authors: Joel Becker, Nate Rush, Elizabeth Barnes, David Rein
12+
venue: arXiv 2025
13+
tags:
14+
- empirical
15+
links:
16+
paper: https://arxiv.org/abs/2507.09089
17+
github: ''
18+
website: ''
19+
- title: Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows
20+
authors: Valerie Chen, Ameet Talwalkar, Robert Brennan, Graham Neubig
21+
venue: arXiv 2025
22+
tags:
23+
- empirical
24+
links:
25+
paper: https://arxiv.org/abs/2507.08149
26+
github: ''
27+
website: ''
28+
- title: Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks
29+
authors: Xing Hu, Feifei Niu, Junkai Chen, Xin Zhou, Junwei Zhang, Junda He, Xin Xia, David Lo
30+
venue: arXiv 2025
31+
tags:
32+
- survey
33+
- empirical
34+
links:
35+
paper: https://arxiv.org/abs/2505.08903
36+
github: ''
37+
website: ''
138
- title: 'Vibe Checker: Aligning Code Evaluation with Human Preference'
239
authors: Ming Zhong, Xiang Zhou, Ting-Yun Chang, Qingze Wang, Nan Xu, Xiance Si, Dan Garrette, Shyam Upadhyay, Jeremiah
340
Liu, Jiawei Han, Benoit Schillings, Jiao Sun

data/papers_foundation_models.yaml

Lines changed: 51 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,55 @@
1-
- title: "CWM: An Open-Weights LLM for Research on Code Generation with World Models"
2-
authors: "FAIR CodeGen team, Jade Copet, Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, Kunhao Zheng, Jordi Armengol-Estapé, Pedram Bashiri, Maximilian Beck, Pierre Chambon, Abhishek Charnalia, Chris Cummins, Juliette Decugis, Zacharias V. Fisches, François Fleuret, Fabian Gloeckle, Alex Gu, Michael Hassid, Daniel Haziza, Badr Youbi Idrissi, Christian Keller, Rahul Kindi, Hugh Leather, Gallil Maimon, Aram Markosyan, Francisco Massa, Pierre-Emmanuel Mazaré, Vegard Mella, Naila Murray, Keyur Muzumdar, Peter O'Hearn, Matteo Pagliardini, Dmitrii Pedchenko, Tal Remez, Volker Seeker, Marco Selvi, Oren Sultan, Sida Wang, Luca Wehrstedt, Ori Yoran, Lingming Zhang, Taco Cohen, Yossi Adi, Gabriel Synnaeve"
3-
venue: "arXiv 2025/09"
1+
- title: 'Large Language Model Agent: A Survey on Methodology, Applications and Challenges'
2+
authors: Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing
3+
Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang,
4+
Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dacheng Tao, Philip S. Yu, Ming Zhang
5+
venue: arXiv 2025
6+
tags:
7+
- survey
48
links:
5-
paper: "https://arxiv.org/abs/2510.02387"
6-
github: "https://github.com/facebookresearch/cwm"
7-
website: ""
8-
9-
- title: "Introducing: Devstral 2 and Mistral Vibe CLI"
10-
authors: "Mistral"
11-
venue: "2025/12"
9+
paper: https://arxiv.org/abs/2503.21460
10+
github: https://github.com/luo-junyu/Awesome-Agent-Papers
11+
website: ''
12+
- title: 'CWM: An Open-Weights LLM for Research on Code Generation with World Models'
13+
authors: FAIR CodeGen team, Jade Copet, Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix
14+
Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, Kunhao Zheng, Jordi Armengol-Estapé, Pedram Bashiri, Maximilian
15+
Beck, Pierre Chambon, Abhishek Charnalia, Chris Cummins, Juliette Decugis, Zacharias V. Fisches, François Fleuret, Fabian
16+
Gloeckle, Alex Gu, Michael Hassid, Daniel Haziza, Badr Youbi Idrissi, Christian Keller, Rahul Kindi, Hugh Leather, Gallil
17+
Maimon, Aram Markosyan, Francisco Massa, Pierre-Emmanuel Mazaré, Vegard Mella, Naila Murray, Keyur Muzumdar, Peter O'Hearn,
18+
Matteo Pagliardini, Dmitrii Pedchenko, Tal Remez, Volker Seeker, Marco Selvi, Oren Sultan, Sida Wang, Luca Wehrstedt,
19+
Ori Yoran, Lingming Zhang, Taco Cohen, Yossi Adi, Gabriel Synnaeve
20+
venue: arXiv 2025/09
1221
links:
13-
paper: "https://mistral.ai/news/devstral-2-vibe-cli"
14-
github: ""
15-
website: ""
16-
17-
# - title: "Devstral: Fine-tuning Language Models for Coding Agent Applications"
18-
# authors: "Abhinav Rastogi, Adam Yang, Albert Q. Jiang, Alexander H. Liu, Alexandre Sablayrolles, Amélie Héliou, Amélie Martin, Anmol Agarwal, Andy Ehrenberg, Andy Lo, Antoine Roux, Arthur Darcet, Arthur Mensch, Baptiste Bout, Baptiste Rozière, Baudouin De Monicault, Chris Bamford, Christian Wallenwein, Christophe Renaudin, Clémence Lanfranchi, Clément Denoix, Corentin Barreau, Darius Dabert, Devon Mizelle, Diego de las Casas, Elliot Chane-Sane, Emilien Fugier, Emma Bou Hanna, Gabrielle Berrada, Gauthier Delerce, Gauthier Guinet, Georgii Novikov, Graham Neubig, Guillaume Lample, Guillaume Martin, Himanshu Jaju, Jan Ludziejewski, Jason Rute, Jean-Malo Delignon, Jean-Hadrien Chabran, Joachim Studnia, Joep Barmentlo, Jonas Amar, Josselin Somerville Roberts, Julien Denize, Karan Saxena, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Kush Jain, Lélio Renard Lavaud, Léonard Blier, Lingxiao Zhao, Louis Martin, Lucile Saulnier, Luyu Gao, Marie Pellat, Mathilde Guillaumin, Mathis Felardos, Matthieu Dinot, Maxime Darrin, Maximilian Augustin, Mickaël Seznec, Neha Gupta, Nikhil Raghuraman, Olivier Duchenne, Patricia Wang, Patrick von Platen, Patryk Saffer, Paul Jacob, Paul Wambergue, Paula Kurylowicz, Philomène Chagniot, Pierre Stock, Pravesh Agrawal, Rémi Delacourt, Roman Soletskyi, Romain Sauvestre, Sagar Vaze, Sanchit Gandhi, Sandeep Subramanian, Shashwat Dalal, Soham Ghosh, Srijan Mishra, Sumukh Aithal, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Thibault Schueller, Thomas Foubert, Thomas Robert, Thomas Wang, Timothée Lacroix, Tom Bewley, Valeriia Nemychnikova, Victor Paltz, Virgile Richard, Wen-Ding Li, William Marshall, Xuanyu Zhang, Yihan Wan, Yunhao Tang"
19-
# venue: "arXiv 2025/09"
20-
# links:
21-
# paper: "https://arxiv.org/abs/2509.25193"
22-
# github: ""
23-
# website: ""
24-
25-
- title: "Qwen3-Coder: Agentic Coding in the World"
26-
authors: "QwenTeam"
27-
venue: "2025/07"
22+
paper: https://arxiv.org/abs/2510.02387
23+
github: https://github.com/facebookresearch/cwm
24+
website: ''
25+
- title: 'Introducing: Devstral 2 and Mistral Vibe CLI'
26+
authors: Mistral
27+
venue: 2025/12
2828
links:
29-
paper: "https://qwen.ai/blog?id=qwen3-coder"
30-
github: "https://github.com/QwenLM/Qwen3-Coder"
31-
website: ""
32-
33-
- title: "Kimi K2: Open Agentic Intelligence"
34-
authors: "Kimi Team: Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Hao Hu, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Chao Hong, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, Chengyin Liu, Chenyu Liu, Hongzhang Liu, Jingyuan Liu, Junqi Liu, Liang Liu, Shaowei Liu, T.Y. Liu, Tianwei Liu, Weizhou Liu, Yangyang Liu, Yibo Liu, Yiping Liu, Yue Liu, Zhengying Liu, Enzhe Lu, Lijun Lu, Shengling Ma, Xinyu Ma, Yingwei Ma, Shaoguang Mao, Jie Mei, Xin Men, Yibo Miao, Siyuan Pan, Yebo Peng, Ruoyu Qin, Bowen Qu, Zeyu Shang, Lidong Shi, Shengyuan Shi, Feifan Song, Jianlin Su, Zhengyuan Su, Xinjie Sun, Flood Sung, Heyi Tang, Jiawen Tao, Qifeng Teng, Chensi Wang, Dinglu Wang, Feng Wang, Haiming Wang et al."
35-
venue: "arXiv 2025/07"
29+
paper: https://mistral.ai/news/devstral-2-vibe-cli
30+
github: ''
31+
website: ''
32+
- title: 'Qwen3-Coder: Agentic Coding in the World'
33+
authors: QwenTeam
34+
venue: 2025/07
3635
links:
37-
paper: "https://arxiv.org/abs/2507.20534"
38-
github: ""
39-
website: ""
36+
paper: https://qwen.ai/blog?id=qwen3-coder
37+
github: https://github.com/QwenLM/Qwen3-Coder
38+
website: ''
39+
- title: 'Kimi K2: Open Agentic Intelligence'
40+
authors: 'Kimi Team: Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen,
41+
Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen
42+
Feng, Kelin Fu, Bofei Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Hao
43+
Hu, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Chao Hong, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao
44+
Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao
45+
Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, Chengyin Liu, Chenyu Liu, Hongzhang
46+
Liu, Jingyuan Liu, Junqi Liu, Liang Liu, Shaowei Liu, T.Y. Liu, Tianwei Liu, Weizhou Liu, Yangyang Liu, Yibo Liu, Yiping
47+
Liu, Yue Liu, Zhengying Liu, Enzhe Lu, Lijun Lu, Shengling Ma, Xinyu Ma, Yingwei Ma, Shaoguang Mao, Jie Mei, Xin Men,
48+
Yibo Miao, Siyuan Pan, Yebo Peng, Ruoyu Qin, Bowen Qu, Zeyu Shang, Lidong Shi, Shengyuan Shi, Feifan Song, Jianlin Su,
49+
Zhengyuan Su, Xinjie Sun, Flood Sung, Heyi Tang, Jiawen Tao, Qifeng Teng, Chensi Wang, Dinglu Wang, Feng Wang, Haiming
50+
Wang et al.'
51+
venue: arXiv 2025/07
52+
links:
53+
paper: https://arxiv.org/abs/2507.20534
54+
github: ''
55+
website: ''

data/papers_issue_resolution.yaml

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,60 @@
1+
- title: 'Position: Future Research and Challenges Remain Towards AI for Software Engineering'
2+
authors: Alex Gu, Naman Jain, Wen-Ding Li, Manish Shetty, Kevin Ellis, Koushik Sen, Armando Solar-Lezama
3+
venue: ICML 2025 Position Paper Track
4+
tags:
5+
- position
6+
links:
7+
paper: https://openreview.net/forum?id=RuLsq4LSZK
8+
github: ''
9+
website: ''
10+
- title: How can we assess human-agent interactions? Case studies in software agent design
11+
authors: Valerie Chen, Rohit Malhotra, Xingyao Wang, Juan Michelini, Xuhui Zhou, Aditya Bharat Soni, Hoang H. Tran, Calvin
12+
Smith, Ameet Talwalkar, Graham Neubig
13+
venue: arXiv 2025
14+
tags:
15+
- empirical
16+
links:
17+
paper: https://arxiv.org/abs/2510.09801
18+
github: ''
19+
website: ''
20+
- title: Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks
21+
authors: Xing Hu, Feifei Niu, Junkai Chen, Xin Zhou, Junwei Zhang, Junda He, Xin Xia, David Lo
22+
venue: arXiv 2025
23+
tags:
24+
- benchmark
25+
- empirical
26+
links:
27+
paper: https://arxiv.org/abs/2505.08903
28+
github: ''
29+
website: ''
30+
- title: A Comprehensive Empirical Evaluation of Agent Frameworks on Code-centric Software Engineering Tasks
31+
authors: Zhuowen Yin, Cuifeng Gao, Chunsong Fan, Wenzhang Yang, Yinxing Xue, Lijun Zhang
32+
venue: arXiv 2025
33+
tags:
34+
- empirical
35+
links:
36+
paper: https://arxiv.org/abs/2511.00872
37+
github: ''
38+
website: ''
39+
- title: 'Large Language Model-Based Agents for Software Engineering: A Survey'
40+
authors: Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou
41+
venue: arXiv 2024
42+
tags:
43+
- survey
44+
links:
45+
paper: https://arxiv.org/abs/2409.02977
46+
github: ''
47+
website: ''
48+
- title: A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
49+
authors: Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xingsheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu,
50+
Christian Jensen, Pietro Lio, Kwok-Yan Lam
51+
venue: arXiv 2025
52+
tags:
53+
- survey
54+
links:
55+
paper: https://arxiv.org/abs/2510.09721
56+
github: https://github.com/lisaGuojl/LLM-Agent-SE-Survey
57+
website: ''
158
- title: 'Agents in software engineering: survey, landscape, and vision'
259
authors: Yanlin Wang, Wanjun Zhong, Yanxian Huang, Ensheng Shi, Min Yang, Jiachi Chen, Hui Li, Yuchi Ma, Qianxiang Wang,
360
Zibin Zheng

data/papers_qa.yaml

Lines changed: 27 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,30 @@
1-
# - title: "Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding"
2-
# authors: "Ziv Nevo, Orna Raz, Karen Yorav"
3-
# venue: "AISM 2025"
4-
# links:
5-
# paper: "https://arxiv.org/abs/2511.03549"
6-
# github: ""
7-
# website: ""
8-
9-
- title: "SWE-QA: Can Language Models Answer Repository-level Code Questions?"
10-
authors: "Weihan Peng, Yuling Shi, Yuhang Wang, Xinyun Zhang, Beijun Shen, Xiaodong Gu"
11-
venue: "arXiv 2025"
1+
- title: Can LLMs Replace Manual Annotation of Software Engineering Artifacts?
2+
authors: Toufique Ahmed, Premkumar Devanbu, Christoph Treude, Michael Pradel
3+
venue: MSR 2025
4+
tags:
5+
- empirical
126
links:
13-
paper: "https://arxiv.org/abs/2509.14635"
14-
github: "https://github.com/peng-weihan/SWE-QA-Bench"
15-
website: ""
16-
17-
- title: "Benchmarking Long-Context Language Models on Long Code Understanding"
18-
authors: "Jia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Jia Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Yuqi Zhu, Zhi Jin"
19-
venue: "ACL 2025"
7+
paper: https://ieeexplore.ieee.org/document/11025652
8+
github: ''
9+
website: ''
10+
- title: 'SWE-QA: Can Language Models Answer Repository-level Code Questions?'
11+
authors: Weihan Peng, Yuling Shi, Yuhang Wang, Xinyun Zhang, Beijun Shen, Xiaodong Gu
12+
venue: arXiv 2025
2013
links:
21-
paper: "https://aclanthology.org/2025.acl-long.1324/"
22-
github: ""
23-
website: ""
24-
25-
- title: "On Improving Repository-Level Code QA for Large Language Models"
26-
authors: "Jan Strich, Florian Schneider, Irina Nikishina, Chris Biemann"
27-
venue: "ACL 2024 Workshop"
14+
paper: https://arxiv.org/abs/2509.14635
15+
github: https://github.com/peng-weihan/SWE-QA-Bench
16+
website: ''
17+
- title: Benchmarking Long-Context Language Models on Long Code Understanding
18+
authors: Jia Li, Xuyuan Guo, Lei Li, Kechi Zhang, Ge Li, Jia Li, Zhengwei Tao, Fang Liu, Chongyang Tao, Yuqi Zhu, Zhi Jin
19+
venue: ACL 2025
2820
links:
29-
paper: "https://aclanthology.org/2024.acl-srw.28/"
30-
github: ""
31-
website: ""
21+
paper: https://aclanthology.org/2025.acl-long.1324/
22+
github: ''
23+
website: ''
24+
- title: On Improving Repository-Level Code QA for Large Language Models
25+
authors: Jan Strich, Florian Schneider, Irina Nikishina, Chris Biemann
26+
venue: ACL 2024 Workshop
27+
links:
28+
paper: https://aclanthology.org/2024.acl-srw.28/
29+
github: ''
30+
website: ''
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
- title: 'OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use'
2+
authors: Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu
3+
Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang,
4+
Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang,
5+
Fei Wu
6+
venue: ACL 2025
7+
tags:
8+
- survey
9+
links:
10+
paper: https://arxiv.org/abs/2508.04482
11+
github: https://github.com/OS-Agent-Survey/OS-Agent-Survey
12+
website: ''

data/papers_terminal.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
- title: 'OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use'
2+
authors: Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu
3+
Zhao, Yuhuai Li, Shengze Xu, Shenzhi Wang, Xinchen Xu, Shuofei Qiao, Zhaokai Wang, Kun Kuang, Tieyong Zeng, Liang Wang,
4+
Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang,
5+
Fei Wu
6+
venue: ACL 2025
7+
tags:
8+
- survey
9+
links:
10+
paper: https://arxiv.org/abs/2508.04482
11+
github: https://github.com/OS-Agent-Survey/OS-Agent-Survey
12+
website: ''
113
- title: 'Terminal-Bench: A Benchmark for AI Agents in Terminal Environments'
214
authors: The Terminal-Bench Team
315
venue: '2025'

0 commit comments

Comments
 (0)