Skip to content

Commit 892ab3b

Browse files
committed
feat(data): add 16 terminal-related papers manually curated
1 parent e0d7bdd commit 892ab3b

4 files changed

Lines changed: 189 additions & 5 deletions

File tree

README.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
<!-- START PAPERS SUMMARY -->
9292
🔥 **We are actively tracking the frontier research of code agents.**<br>
9393
🧹 *We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.*<br>
94-
📚 *Currently collected:* **`499` papers***(Last update: 2026-06-07)*
94+
📚 *Currently collected:* **`515` papers***(Last update: 2026-06-20)*
9595
<!-- END PAPERS SUMMARY -->
9696

9797
<!-- - [🚀 Products & Tools](#-products--tools)
@@ -731,6 +731,70 @@
731731
> AI agents that operate within terminal environments, executing shell commands, managing system operations, and automating command-line workflows through natural language interfaces and autonomous task execution.
732732
733733
<!-- START PAPERS:terminal -->
734+
- **TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks.**
735+
_Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li, Chao Peng, Peter O'Hearn, Earl T. Barr, Mark Harman, Federica Sarro, et al._ arXiv 2026/05.
736+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.22535) [![GitHub Stars](https://img.shields.io/github/stars/EuniAI/TerminalWorld?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/EuniAI/TerminalWorld)
737+
738+
- **Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?**
739+
_Spandan Garg, Vikram Nitin, Yufan Huang._ arXiv 2026/05.
740+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.03195)
741+
742+
- **ECHO: Terminal Agents Learn World Models for Free.**
743+
_Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos._ arXiv 2026/05.
744+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.24517)
745+
746+
- **Terminal Agents Suffice for Enterprise Automation.**
747+
_Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar._ arXiv 2026/03.
748+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.00073)
749+
750+
- **Toward Scalable Terminal Task Synthesis via Skill Graphs.**
751+
_Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, et al._ arXiv 2026/04.
752+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.25727)
753+
754+
- **What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design.**
755+
_Ivan Bercovich._ arXiv 2026/04.
756+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.28093)
757+
758+
- **TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents.**
759+
_Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng Liu, et al._ arXiv 2026/02.
760+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.07274) [![GitHub Stars](https://img.shields.io/github/stars/ucsb-mlsec/terminal-bench-env?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/ucsb-mlsec/terminal-bench-env)
761+
762+
- **On Data Engineering for Scaling LLM Terminal Capabilities.**
763+
_Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping._ arXiv 2026/02.
764+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.21193)
765+
766+
- **Endless Terminals: Scaling RL Environments for Terminal Agents.**
767+
_Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos._ arXiv 2026/01.
768+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.16443)
769+
770+
- **Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments.**
771+
_Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai, Jian Yang, et al._ arXiv 2026/02.
772+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.01244) [![GitHub Stars](https://img.shields.io/github/stars/Wusiwei0410/TerminalTraj?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/Wusiwei0410/TerminalTraj)
773+
774+
- **MMTB: Evaluating Terminal Agents on Multimedia-File Tasks.**
775+
_Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok._ arXiv 2026/05.
776+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.10966)
777+
778+
- **Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories.**
779+
_Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong._ arXiv 2026/04.
780+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.17596) [![GitHub Stars](https://img.shields.io/github/stars/few-sh/terminal-wrench?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/few-sh/terminal-wrench)
781+
782+
- **Learning CLI Agents with Structured Action Credit under Selective Observation.**
783+
_Haoyang Su, Ying Wen._ arXiv 2026/05.
784+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.08013)
785+
786+
- **LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents.**
787+
_Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun._ arXiv 2026/05.
788+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.29559)
789+
790+
- **What Makes Interaction Trajectories Effective for Training Terminal Agents?**
791+
_Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, et al._ arXiv 2026/06.
792+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.03461)
793+
794+
- **A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.**
795+
_Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, et al._ arXiv 2026/04.
796+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.19572) [![GitHub Stars](https://img.shields.io/github/stars/multimodal-art-projection/TACO?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/multimodal-art-projection/TACO)
797+
734798
- **Terminal-Bench: A Benchmark for AI Agents in Terminal Environments.**
735799
_The Terminal-Bench Team._ 2025.
736800
[![GitHub Stars](https://img.shields.io/github/stars/laude-institute/terminal-bench?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/laude-institute/terminal-bench) [![Website](https://img.shields.io/website?url=https://www.tbench.ai/&up_message=TBENCH.AI&up_color=blue&down_message=TBENCH.AI&down_color=blue&style=for-the-badge)](https://www.tbench.ai/) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)

data/papers_terminal.yaml

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,123 @@
1+
- title: 'TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks'
2+
authors: Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li, Chao Peng, Peter O'Hearn, Earl T. Barr, Mark Harman,
3+
Federica Sarro, He Ye
4+
venue: arXiv 2026/05
5+
links:
6+
paper: https://arxiv.org/abs/2605.22535
7+
github: https://github.com/EuniAI/TerminalWorld
8+
website: ''
9+
- title: 'Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?'
10+
authors: Spandan Garg, Vikram Nitin, Yufan Huang
11+
venue: arXiv 2026/05
12+
links:
13+
paper: https://arxiv.org/abs/2605.03195
14+
github: ''
15+
website: ''
16+
- title: 'ECHO: Terminal Agents Learn World Models for Free'
17+
authors: Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos
18+
venue: arXiv 2026/05
19+
links:
20+
paper: https://arxiv.org/abs/2605.24517
21+
github: ''
22+
website: ''
23+
- title: Terminal Agents Suffice for Enterprise Automation
24+
authors: Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav,
25+
Sai Rajeswar
26+
venue: arXiv 2026/03
27+
links:
28+
paper: https://arxiv.org/abs/2604.00073
29+
github: ''
30+
website: ''
31+
- title: Toward Scalable Terminal Task Synthesis via Skill Graphs
32+
authors: Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng
33+
Zhang, Lilin Wang
34+
venue: arXiv 2026/04
35+
links:
36+
paper: https://arxiv.org/abs/2604.25727
37+
github: ''
38+
website: ''
39+
- title: 'What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation
40+
Design'
41+
authors: Ivan Bercovich
42+
venue: arXiv 2026/04
43+
links:
44+
paper: https://arxiv.org/abs/2604.28093
45+
github: ''
46+
website: ''
47+
- title: 'TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents'
48+
authors: Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng
49+
Liu, Emad Barsoum, William Yang Wang, Wenbo Guo
50+
venue: arXiv 2026/02
51+
links:
52+
paper: https://arxiv.org/abs/2602.07274
53+
github: https://github.com/ucsb-mlsec/terminal-bench-env
54+
website: ''
55+
- title: On Data Engineering for Scaling LLM Terminal Capabilities
56+
authors: Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping
57+
venue: arXiv 2026/02
58+
links:
59+
paper: https://arxiv.org/abs/2602.21193
60+
github: ''
61+
website: ''
62+
- title: 'Endless Terminals: Scaling RL Environments for Terminal Agents'
63+
authors: Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos
64+
venue: arXiv 2026/01
65+
links:
66+
paper: https://arxiv.org/abs/2601.16443
67+
github: ''
68+
website: ''
69+
- title: Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
70+
authors: Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai,
71+
Jian Yang, Chenghua Lin
72+
venue: arXiv 2026/02
73+
links:
74+
paper: https://arxiv.org/abs/2602.01244
75+
github: https://github.com/Wusiwei0410/TerminalTraj
76+
website: ''
77+
- title: 'MMTB: Evaluating Terminal Agents on Multimedia-File Tasks'
78+
authors: Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok
79+
venue: arXiv 2026/05
80+
links:
81+
paper: https://arxiv.org/abs/2605.10966
82+
github: ''
83+
website: ''
84+
- title: 'Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories'
85+
authors: Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong
86+
venue: arXiv 2026/04
87+
links:
88+
paper: https://arxiv.org/abs/2604.17596
89+
github: https://github.com/few-sh/terminal-wrench
90+
website: ''
91+
- title: Learning CLI Agents with Structured Action Credit under Selective Observation
92+
authors: Haoyang Su, Ying Wen
93+
venue: arXiv 2026/05
94+
links:
95+
paper: https://arxiv.org/abs/2605.08013
96+
github: ''
97+
website: ''
98+
- title: 'LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents'
99+
authors: Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
100+
venue: arXiv 2026/05
101+
links:
102+
paper: https://arxiv.org/abs/2605.29559
103+
github: ''
104+
website: ''
105+
- title: What Makes Interaction Trajectories Effective for Training Terminal Agents?
106+
authors: Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang
107+
Wu, Lifeng Shang, Xiaohui Li, Ngai Wong, Haoli Bai
108+
venue: arXiv 2026/06
109+
links:
110+
paper: https://arxiv.org/abs/2606.03461
111+
github: ''
112+
website: ''
113+
- title: A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
114+
authors: Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian
115+
Yang, Chenghua Lin
116+
venue: arXiv 2026/04
117+
links:
118+
paper: https://arxiv.org/abs/2604.19572
119+
github: https://github.com/multimodal-art-projection/TACO
120+
website: ''
1121
- title: 'Terminal-Bench: A Benchmark for AI Agents in Terminal Environments'
2122
authors: The Terminal-Bench Team
3123
venue: '2025'

docs/static/badges/papers.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{"schemaVersion": 1, "label": "Papers", "message": "499", "color": "brightgreen"}
1+
{"schemaVersion": 1, "label": "Papers", "message": "515", "color": "brightgreen"}

docs/static/badges/papers.svg

Lines changed: 3 additions & 3 deletions
Loading

0 commit comments

Comments
 (0)