|
91 | 91 | <!-- START PAPERS SUMMARY --> |
92 | 92 | 🔥 **We are actively tracking the frontier research of code agents.**<br> |
93 | 93 | 🧹 *We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.*<br> |
94 | | -📚 *Currently collected:* **`499` papers** — *(Last update: 2026-06-07)* |
| 94 | +📚 *Currently collected:* **`515` papers** — *(Last update: 2026-06-20)* |
95 | 95 | <!-- END PAPERS SUMMARY --> |
96 | 96 |
|
97 | 97 | <!-- - [🚀 Products & Tools](#-products--tools) |
|
731 | 731 | > AI agents that operate within terminal environments, executing shell commands, managing system operations, and automating command-line workflows through natural language interfaces and autonomous task execution. |
732 | 732 |
|
733 | 733 | <!-- START PAPERS:terminal --> |
| 734 | +- **TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks.** |
| 735 | + _Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li, Chao Peng, Peter O'Hearn, Earl T. Barr, Mark Harman, Federica Sarro, et al._ arXiv 2026/05. |
| 736 | + [](https://arxiv.org/abs/2605.22535) [](https://github.com/EuniAI/TerminalWorld) |
| 737 | + |
| 738 | +- **Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?** |
| 739 | + _Spandan Garg, Vikram Nitin, Yufan Huang._ arXiv 2026/05. |
| 740 | + [](https://arxiv.org/abs/2605.03195) |
| 741 | + |
| 742 | +- **ECHO: Terminal Agents Learn World Models for Free.** |
| 743 | + _Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos._ arXiv 2026/05. |
| 744 | + [](https://arxiv.org/abs/2605.24517) |
| 745 | + |
| 746 | +- **Terminal Agents Suffice for Enterprise Automation.** |
| 747 | + _Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar._ arXiv 2026/03. |
| 748 | + [](https://arxiv.org/abs/2604.00073) |
| 749 | + |
| 750 | +- **Toward Scalable Terminal Task Synthesis via Skill Graphs.** |
| 751 | + _Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, et al._ arXiv 2026/04. |
| 752 | + [](https://arxiv.org/abs/2604.25727) |
| 753 | + |
| 754 | +- **What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design.** |
| 755 | + _Ivan Bercovich._ arXiv 2026/04. |
| 756 | + [](https://arxiv.org/abs/2604.28093) |
| 757 | + |
| 758 | +- **TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents.** |
| 759 | + _Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng Liu, et al._ arXiv 2026/02. |
| 760 | + [](https://arxiv.org/abs/2602.07274) [](https://github.com/ucsb-mlsec/terminal-bench-env) |
| 761 | + |
| 762 | +- **On Data Engineering for Scaling LLM Terminal Capabilities.** |
| 763 | + _Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping._ arXiv 2026/02. |
| 764 | + [](https://arxiv.org/abs/2602.21193) |
| 765 | + |
| 766 | +- **Endless Terminals: Scaling RL Environments for Terminal Agents.** |
| 767 | + _Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos._ arXiv 2026/01. |
| 768 | + [](https://arxiv.org/abs/2601.16443) |
| 769 | + |
| 770 | +- **Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments.** |
| 771 | + _Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai, Jian Yang, et al._ arXiv 2026/02. |
| 772 | + [](https://arxiv.org/abs/2602.01244) [](https://github.com/Wusiwei0410/TerminalTraj) |
| 773 | + |
| 774 | +- **MMTB: Evaluating Terminal Agents on Multimedia-File Tasks.** |
| 775 | + _Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok._ arXiv 2026/05. |
| 776 | + [](https://arxiv.org/abs/2605.10966) |
| 777 | + |
| 778 | +- **Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories.** |
| 779 | + _Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong._ arXiv 2026/04. |
| 780 | + [](https://arxiv.org/abs/2604.17596) [](https://github.com/few-sh/terminal-wrench) |
| 781 | + |
| 782 | +- **Learning CLI Agents with Structured Action Credit under Selective Observation.** |
| 783 | + _Haoyang Su, Ying Wen._ arXiv 2026/05. |
| 784 | + [](https://arxiv.org/abs/2605.08013) |
| 785 | + |
| 786 | +- **LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents.** |
| 787 | + _Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun._ arXiv 2026/05. |
| 788 | + [](https://arxiv.org/abs/2605.29559) |
| 789 | + |
| 790 | +- **What Makes Interaction Trajectories Effective for Training Terminal Agents?** |
| 791 | + _Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, et al._ arXiv 2026/06. |
| 792 | + [](https://arxiv.org/abs/2606.03461) |
| 793 | + |
| 794 | +- **A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.** |
| 795 | + _Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, et al._ arXiv 2026/04. |
| 796 | + [](https://arxiv.org/abs/2604.19572) [](https://github.com/multimodal-art-projection/TACO) |
| 797 | + |
734 | 798 | - **Terminal-Bench: A Benchmark for AI Agents in Terminal Environments.** |
735 | 799 | _The Terminal-Bench Team._ 2025. |
736 | 800 | [](https://github.com/laude-institute/terminal-bench) [](https://www.tbench.ai/)   |
|
0 commit comments