Skip to content

Commit 0bc8240

Browse files
committed
chore(bot): add 4 paper(s) from review pipeline
- MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Phy - Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents - BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise - I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applica
1 parent 5aa2529 commit 0bc8240

6 files changed

Lines changed: 77 additions & 12 deletions

File tree

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@
9191
<!-- START PAPERS SUMMARY -->
9292
🔥 **We are actively tracking the frontier research of code agents.**<br>
9393
🧹 *We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.*<br>
94-
📚 *Currently collected:* **`494` papers***(Last update: 2026-06-01)*
94+
📚 *Currently collected:* **`498` papers***(Last update: 2026-06-02)*
9595
<!-- END PAPERS SUMMARY -->
9696

9797
<!-- - [🚀 Products & Tools](#-products--tools)
@@ -1464,6 +1464,14 @@ This includes OS kernel code, runtime systems, device drivers, and system-level
14641464
> Autonomous agents for solving SQL challenges in real-world database systems (_e.g_., query generation and optimization, issue resolution).
14651465
14661466
<!-- START PAPERS:sql_engineering -->
1467+
- **BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning.**
1468+
_Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller._ arXiv 2026/06.
1469+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.02109) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge)
1470+
1471+
- **Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents.**
1472+
_Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He._ arXiv 2026/05.
1473+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.00547)
1474+
14671475
- **Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks.**
14681476
_Jiajing Guo, Kenil Patel, Jorge Piazentin Ono, Wenbin He, Liu Ren._ arXiv 2025/10.
14691477
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.10885) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
@@ -1517,6 +1525,10 @@ This category covers Verilog/VHDL/RTL, FPGA kernels, and hardware–software co-
15171525
> Code agents for the automated creation and maintenance of web interfaces and front-end components.
15181526
15191527
<!-- START PAPERS:website_generation -->
1528+
- **I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications.**
1529+
_Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li, Wenhao Wang._ arXiv 2026/05.
1530+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.00750) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge)
1531+
15201532
- **ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding.**
15211533
_Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou._ arXiv 2025/10.
15221534
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.11498) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
@@ -1803,6 +1815,10 @@ This category covers Verilog/VHDL/RTL, FPGA kernels, and hardware–software co-
18031815
> Agents designed to autonomously utilize specialized scientific software—such as simulation engines, data analysis suites, and visualization platforms—to automate and enhance domain-specific scientific workflows.
18041816
18051817
<!-- START PAPERS:scientific_workflows -->
1818+
- **MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics.**
1819+
_Žiga Kovačič, Kevin Ellis._ arXiv 2026/06.
1820+
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.01538) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
1821+
18061822
- **ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows.**
18071823
_Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, et al._ arXiv 2025/06.
18081824
[![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.19897) [![GitHub Stars](https://img.shields.io/github/stars/OS-Copilot/ScienceBoard?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/OS-Copilot/ScienceBoard) [![Website](https://img.shields.io/website?url=https://qiushisun.github.io/ScienceBoard-Home/&up_message=SCIENCEBOARD-HOME&up_color=blue&down_message=SCIENCEBOARD-HOME&down_color=blue&style=for-the-badge)](https://qiushisun.github.io/ScienceBoard-Home/) ![Benchmark Dataset](https://img.shields.io/badge/Benchmark___Dataset-808080?style=for-the-badge)
Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,23 @@
1-
- title: "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
2-
authors: "Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu"
3-
venue: "arXiv 2025/06"
1+
- title: 'MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics'
2+
authors: Žiga Kovačič, Kevin Ellis
3+
venue: arXiv 2026/06
4+
summary: This paper introduces MPMWorlds, a dataset of physical simulations used to evaluate the ability of models to infer
5+
and extrapolate dynamics from video. It compares code generation and video diffusion approaches, finding that code synthesis
6+
provides superior physical stability and temporal consistency for long-term extrapolation.
7+
tags:
8+
- benchmark
9+
- empirical
410
links:
5-
paper: "https://arxiv.org/abs/2505.19897"
6-
github: "https://github.com/OS-Copilot/ScienceBoard"
7-
website: "https://qiushisun.github.io/ScienceBoard-Home/"
8-
tags: "Benchmark & Dataset"
11+
paper: https://arxiv.org/abs/2606.01538
12+
github: ''
13+
website: ''
14+
- title: 'ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows'
15+
authors: Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng,
16+
Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing
17+
Qi, Lingpeng Kong, Zhiyong Wu
18+
venue: arXiv 2025/06
19+
links:
20+
paper: https://arxiv.org/abs/2505.19897
21+
github: https://github.com/OS-Copilot/ScienceBoard
22+
website: https://qiushisun.github.io/ScienceBoard-Home/
23+
tags: Benchmark & Dataset

data/papers_sql_engineering.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
- title: 'BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning'
2+
authors: Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller
3+
venue: arXiv 2026/06
4+
summary: BADGER is a unified evaluation framework designed to assess enterprise-grade text-to-SQL systems and multi-step
5+
agentic reasoning pipelines. It introduces a hybrid execution accuracy metric that combines LLM-based structural alignment
6+
with deterministic scoring to handle complex SQL dialects and numeric tolerances.
7+
tags:
8+
- benchmark
9+
links:
10+
paper: https://arxiv.org/abs/2606.02109
11+
github: ''
12+
website: ''
13+
- title: 'Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents'
14+
authors: Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He
15+
venue: arXiv 2026/05
16+
summary: MERIT is a dynamic multi-horizon memory retrieval framework for text-to-SQL agents that utilizes both episode-level
17+
and turn-level memory. It employs reinforcement learning and a Process Reward Model to optimize retrieval policies for
18+
global strategic guidance and local decision support during multi-turn database interactions.
19+
links:
20+
paper: https://arxiv.org/abs/2606.00547
21+
github: ''
22+
website: ''
123
- title: 'Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks'
224
authors: Jiajing Guo, Kenil Patel, Jorge Piazentin Ono, Wenbin He, Liu Ren
325
venue: arXiv 2025/10

data/papers_website_generation.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,15 @@
1+
- title: 'I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications'
2+
authors: Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li, Wenhao Wang
3+
venue: arXiv 2026/05
4+
summary: The paper introduces I-WebGenBench, a benchmark for evaluating the generation of interactive web systems from scientific
5+
papers. It also proposes PaperVoyager, a structured generation framework that models interaction logic to synthesize executable
6+
web applications that allow users to manipulate inputs and observe dynamic behaviors.
7+
tags:
8+
- benchmark
9+
links:
10+
paper: https://arxiv.org/abs/2606.00750
11+
github: ''
12+
website: ''
113
- title: 'ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding'
214
authors: Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
315
venue: arXiv 2025/10

docs/static/badges/papers.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{"schemaVersion": 1, "label": "Papers", "message": "494", "color": "brightgreen"}
1+
{"schemaVersion": 1, "label": "Papers", "message": "498", "color": "brightgreen"}

docs/static/badges/papers.svg

Lines changed: 3 additions & 3 deletions
Loading

0 commit comments

Comments
 (0)