feat(data): add 16 terminal-related papers manually curated

Zhaoyang-Chu · Zhaoyang-Chu · commit 892ab3b2a373 · 2026-06-20T21:47:28.000Z
diff --git a/README.md b/README.md
@@ -91,7 +91,7 @@
 <!-- START PAPERS SUMMARY -->
 🔥 **We are actively tracking the frontier research of code agents.**<br>
 🧹 *We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.*<br>
-📚 *Currently collected:* **`499` papers** — *(Last update: 2026-06-07)*
+📚 *Currently collected:* **`515` papers** — *(Last update: 2026-06-20)*
 <!-- END PAPERS SUMMARY -->
 
 <!-- - [🚀 Products & Tools](#-products--tools)
@@ -731,6 +731,70 @@
 > AI agents that operate within terminal environments, executing shell commands, managing system operations, and automating command-line workflows through natural language interfaces and autonomous task execution.
 
 <!-- START PAPERS:terminal -->
+- **TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks.**  
+  _Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li, Chao Peng, Peter O'Hearn, Earl T. Barr, Mark Harman, Federica Sarro, et al._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.22535) [![GitHub Stars](https://img.shields.io/github/stars/EuniAI/TerminalWorld?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/EuniAI/TerminalWorld)
+
+- **Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?**  
+  _Spandan Garg, Vikram Nitin, Yufan Huang._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.03195)
+
+- **ECHO: Terminal Agents Learn World Models for Free.**  
+  _Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.24517)
+
+- **Terminal Agents Suffice for Enterprise Automation.**  
+  _Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar._ arXiv 2026/03.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.00073)
+
+- **Toward Scalable Terminal Task Synthesis via Skill Graphs.**  
+  _Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng Zhang, et al._ arXiv 2026/04.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.25727)
+
+- **What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design.**  
+  _Ivan Bercovich._ arXiv 2026/04.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.28093)
+
+- **TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents.**  
+  _Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng Liu, et al._ arXiv 2026/02.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.07274) [![GitHub Stars](https://img.shields.io/github/stars/ucsb-mlsec/terminal-bench-env?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/ucsb-mlsec/terminal-bench-env)
+
+- **On Data Engineering for Scaling LLM Terminal Capabilities.**  
+  _Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping._ arXiv 2026/02.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.21193)
+
+- **Endless Terminals: Scaling RL Environments for Terminal Agents.**  
+  _Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos._ arXiv 2026/01.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.16443)
+
+- **Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments.**  
+  _Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai, Jian Yang, et al._ arXiv 2026/02.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.01244) [![GitHub Stars](https://img.shields.io/github/stars/Wusiwei0410/TerminalTraj?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/Wusiwei0410/TerminalTraj)
+
+- **MMTB: Evaluating Terminal Agents on Multimedia-File Tasks.**  
+  _Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.10966)
+
+- **Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories.**  
+  _Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong._ arXiv 2026/04.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.17596) [![GitHub Stars](https://img.shields.io/github/stars/few-sh/terminal-wrench?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/few-sh/terminal-wrench)
+
+- **Learning CLI Agents with Structured Action Credit under Selective Observation.**  
+  _Haoyang Su, Ying Wen._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.08013)
+
+- **LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents.**  
+  _Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.29559)
+
+- **What Makes Interaction Trajectories Effective for Training Terminal Agents?**  
+  _Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang Wu, et al._ arXiv 2026/06.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.03461)
+
+- **A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression.**  
+  _Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian Yang, et al._ arXiv 2026/04.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.19572) [![GitHub Stars](https://img.shields.io/github/stars/multimodal-art-projection/TACO?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/multimodal-art-projection/TACO)
+
 - **Terminal-Bench: A Benchmark for AI Agents in Terminal Environments.**  
   _The Terminal-Bench Team._ 2025.  
   [![GitHub Stars](https://img.shields.io/github/stars/laude-institute/terminal-bench?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/laude-institute/terminal-bench) [![Website](https://img.shields.io/website?url=https://www.tbench.ai/&up_message=TBENCH.AI&up_color=blue&down_message=TBENCH.AI&down_color=blue&style=for-the-badge)](https://www.tbench.ai/) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
diff --git a/data/papers_terminal.yaml b/data/papers_terminal.yaml
@@ -1,3 +1,123 @@
+- title: 'TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks'
+  authors: Zhaoyang Chu, Jiarui Hu, Xingyu Jiang, Pengyu Zou, Han Li, Chao Peng, Peter O'Hearn, Earl T. Barr, Mark Harman,
+    Federica Sarro, He Ye
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.22535
+    github: https://github.com/EuniAI/TerminalWorld
+    website: ''
+- title: 'Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?'
+  authors: Spandan Garg, Vikram Nitin, Yufan Huang
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.03195
+    github: ''
+    website: ''
+- title: 'ECHO: Terminal Agents Learn World Models for Free'
+  authors: Vaishnavi Shrivastava, Piero Kauffmann, Ahmed Awadallah, Dimitris Papailiopoulos
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.24517
+    github: ''
+    website: ''
+- title: Terminal Agents Suffice for Enterprise Automation
+  authors: Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav,
+    Sai Rajeswar
+  venue: arXiv 2026/03
+  links:
+    paper: https://arxiv.org/abs/2604.00073
+    github: ''
+    website: ''
+- title: Toward Scalable Terminal Task Synthesis via Skill Graphs
+  authors: Zhiyuan Fan, Tinghao Yu, Yuanjun Cai, Jiangtao Guan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing Wu, Zhuo Han, Feng
+    Zhang, Lilin Wang
+  venue: arXiv 2026/04
+  links:
+    paper: https://arxiv.org/abs/2604.25727
+    github: ''
+    website: ''
+- title: 'What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation
+    Design'
+  authors: Ivan Bercovich
+  venue: arXiv 2026/04
+  links:
+    paper: https://arxiv.org/abs/2604.28093
+    github: ''
+    website: ''
+- title: 'TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents'
+  authors: Kaijie Zhu, Yuzhou Nie, Yijiang Li, Yiming Huang, Jialian Wu, Jiang Liu, Ximeng Sun, Zhenfei Yin, Lun Wang, Zicheng
+    Liu, Emad Barsoum, William Yang Wang, Wenbo Guo
+  venue: arXiv 2026/02
+  links:
+    paper: https://arxiv.org/abs/2602.07274
+    github: https://github.com/ucsb-mlsec/terminal-bench-env
+    website: ''
+- title: On Data Engineering for Scaling LLM Terminal Capabilities
+  authors: Renjie Pi, Grace Lam, Mohammad Shoeybi, Pooya Jannaty, Bryan Catanzaro, Wei Ping
+  venue: arXiv 2026/02
+  links:
+    paper: https://arxiv.org/abs/2602.21193
+    github: ''
+    website: ''
+- title: 'Endless Terminals: Scaling RL Environments for Terminal Agents'
+  authors: Kanishk Gandhi, Shivam Garg, Noah D. Goodman, Dimitris Papailiopoulos
+  venue: arXiv 2026/01
+  links:
+    paper: https://arxiv.org/abs/2601.16443
+    github: ''
+    website: ''
+- title: Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments
+  authors: Siwei Wu, Yizhi Li, Yuyang Song, Wei Zhang, Yang Wang, Riza Batista-Navarro, Xian Yang, Mingjie Tang, Bryan Dai,
+    Jian Yang, Chenghua Lin
+  venue: arXiv 2026/02
+  links:
+    paper: https://arxiv.org/abs/2602.01244
+    github: https://github.com/Wusiwei0410/TerminalTraj
+    website: ''
+- title: 'MMTB: Evaluating Terminal Agents on Multimedia-File Tasks'
+  authors: Chiyeong Heo, Jaechang Kim, Junhyuk Kwon, Hoyoung Kim, Dongmin Park, Jonghyun Lee, Jungseul Ok
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.10966
+    github: ''
+    website: ''
+- title: 'Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories'
+  authors: Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong
+  venue: arXiv 2026/04
+  links:
+    paper: https://arxiv.org/abs/2604.17596
+    github: https://github.com/few-sh/terminal-wrench
+    website: ''
+- title: Learning CLI Agents with Structured Action Credit under Selective Observation
+  authors: Haoyang Su, Ying Wen
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.08013
+    github: ''
+    website: ''
+- title: 'LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents'
+  authors: Xiaoxuan Peng, Kaiqi Zhang, Xinyu Lu, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
+  venue: arXiv 2026/05
+  links:
+    paper: https://arxiv.org/abs/2605.29559
+    github: ''
+    website: ''
+- title: What Makes Interaction Trajectories Effective for Training Terminal Agents?
+  authors: Sidi Yang, Chaofan Tao, Jierun Chen, Tiezheng Yu, Ruoyu Wang, Yuxin Jiang, Yiming Du, Wendong Xu, Jing Xiong, Taiqiang
+    Wu, Lifeng Shang, Xiaohui Li, Ngai Wong, Haoli Bai
+  venue: arXiv 2026/06
+  links:
+    paper: https://arxiv.org/abs/2606.03461
+    github: ''
+    website: ''
+- title: A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
+  authors: Jincheng Ren, Siwei Wu, Yizhi Li, Kang Zhu, Shu Xu, Boyu Feng, Ruibin Yuan, Wei Zhang, Riza Batista-Navarro, Jian
+    Yang, Chenghua Lin
+  venue: arXiv 2026/04
+  links:
+    paper: https://arxiv.org/abs/2604.19572
+    github: https://github.com/multimodal-art-projection/TACO
+    website: ''
 - title: 'Terminal-Bench: A Benchmark for AI Agents in Terminal Environments'
   authors: The Terminal-Bench Team
   venue: '2025'
diff --git a/docs/static/badges/papers.json b/docs/static/badges/papers.json
@@ -1 +1 @@
-{"schemaVersion": 1, "label": "Papers", "message": "499", "color": "brightgreen"}
+{"schemaVersion": 1, "label": "Papers", "message": "515", "color": "brightgreen"}
diff --git a/docs/static/badges/papers.svg b/docs/static/badges/papers.svg
@@ -1,5 +1,5 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="77" height="20" role="img" aria-label="papers: 499">
-  <title>papers: 499</title>
+<svg xmlns="http://www.w3.org/2000/svg" width="77" height="20" role="img" aria-label="papers: 515">
+  <title>papers: 515</title>
   <linearGradient id="s" x2="0" y2="100%">
     <stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
     <stop offset="1" stop-opacity=".1"/>
@@ -12,6 +12,6 @@
   </g>
   <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,Arial,sans-serif" font-size="11">
     <text x="23.0" y="14">papers</text>
-    <text x="61.5" y="14">499</text>
+    <text x="61.5" y="14">515</text>
   </g>
 </svg>

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-{"schemaVersion": 1, "label": "Papers", "message": "499", "color": "brightgreen"}`
	`1`	`+{"schemaVersion": 1, "label": "Papers", "message": "515", "color": "brightgreen"}`