chore(bot): add 4 paper(s) from review pipeline

Zhaoyang-Chu · Zhaoyang-Chu · commit 0bc82406354e · 2026-06-02T19:00:30.000Z
- MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Phy
- Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents
- BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise
- I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applica
diff --git a/README.md b/README.md
@@ -91,7 +91,7 @@
 <!-- START PAPERS SUMMARY -->
 🔥 **We are actively tracking the frontier research of code agents.**<br>
 🧹 *We periodically curate our collection, retaining only published papers and interesting arXiv preprints from the last six months.*<br>
-📚 *Currently collected:* **`494` papers** — *(Last update: 2026-06-01)*
+📚 *Currently collected:* **`498` papers** — *(Last update: 2026-06-02)*
 <!-- END PAPERS SUMMARY -->
 
 <!-- - [🚀 Products & Tools](#-products--tools)
@@ -1464,6 +1464,14 @@ This includes OS kernel code, runtime systems, device drivers, and system-level
 >  Autonomous agents for solving SQL challenges in real-world database systems (_e.g_., query generation and optimization, issue resolution).
 
 <!-- START PAPERS:sql_engineering -->
+- **BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning.**  
+  _Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller._ arXiv 2026/06.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.02109) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge)
+
+- **Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents.**  
+  _Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.00547)
+
 - **Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks.**  
   _Jiajing Guo, Kenil Patel, Jorge Piazentin Ono, Wenbin He, Liu Ren._ arXiv 2025/10.  
   [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.10885) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
@@ -1517,6 +1525,10 @@ This category covers Verilog/VHDL/RTL, FPGA kernels, and hardware–software co-
 > Code agents for the automated creation and maintenance of web interfaces and front-end components.
 
 <!-- START PAPERS:website_generation -->
+- **I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications.**  
+  _Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li, Wenhao Wang._ arXiv 2026/05.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.00750) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge)
+
 - **ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding.**  
   _Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou._ arXiv 2025/10.  
   [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2510.11498) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
@@ -1803,6 +1815,10 @@ This category covers Verilog/VHDL/RTL, FPGA kernels, and hardware–software co-
 > Agents designed to autonomously utilize specialized scientific software—such as simulation engines, data analysis suites, and visualization platforms—to automate and enhance domain-specific scientific workflows.
 
 <!-- START PAPERS:scientific_workflows -->
+- **MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics.**  
+  _Žiga Kovačič, Kevin Ellis._ arXiv 2026/06.  
+  [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2606.01538) ![Benchmark & Dataset](https://img.shields.io/badge/Benchmark_%26_Dataset-F4A261?style=for-the-badge) ![Empirical Study](https://img.shields.io/badge/Empirical_Study-4A90D9?style=for-the-badge)
+
 - **ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows.**  
   _Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, et al._ arXiv 2025/06.  
   [![Paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.19897) [![GitHub Stars](https://img.shields.io/github/stars/OS-Copilot/ScienceBoard?style=for-the-badge&logo=github&label=GitHub&color=black)](https://github.com/OS-Copilot/ScienceBoard) [![Website](https://img.shields.io/website?url=https://qiushisun.github.io/ScienceBoard-Home/&up_message=SCIENCEBOARD-HOME&up_color=blue&down_message=SCIENCEBOARD-HOME&down_color=blue&style=for-the-badge)](https://qiushisun.github.io/ScienceBoard-Home/) ![Benchmark   Dataset](https://img.shields.io/badge/Benchmark___Dataset-808080?style=for-the-badge)
diff --git a/data/papers_scientific_workflows.yaml b/data/papers_scientific_workflows.yaml
@@ -1,8 +1,23 @@
-- title: "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
-  authors: "Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu"
-  venue: "arXiv 2025/06"
+- title: 'MPMWorlds: Material-Point-Method Simulations for Inferring and Extrapolating Physical Dynamics'
+  authors: Žiga Kovačič, Kevin Ellis
+  venue: arXiv 2026/06
+  summary: This paper introduces MPMWorlds, a dataset of physical simulations used to evaluate the ability of models to infer
+    and extrapolate dynamics from video. It compares code generation and video diffusion approaches, finding that code synthesis
+    provides superior physical stability and temporal consistency for long-term extrapolation.
+  tags:
+  - benchmark
+  - empirical
   links:
-    paper: "https://arxiv.org/abs/2505.19897"
-    github: "https://github.com/OS-Copilot/ScienceBoard"
-    website: "https://qiushisun.github.io/ScienceBoard-Home/"
-  tags: "Benchmark & Dataset"
+    paper: https://arxiv.org/abs/2606.01538
+    github: ''
+    website: ''
+- title: 'ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows'
+  authors: Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng,
+    Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing
+    Qi, Lingpeng Kong, Zhiyong Wu
+  venue: arXiv 2025/06
+  links:
+    paper: https://arxiv.org/abs/2505.19897
+    github: https://github.com/OS-Copilot/ScienceBoard
+    website: https://qiushisun.github.io/ScienceBoard-Home/
+  tags: Benchmark & Dataset
diff --git a/data/papers_sql_engineering.yaml b/data/papers_sql_engineering.yaml
@@ -1,3 +1,25 @@
+- title: 'BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning'
+  authors: Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller
+  venue: arXiv 2026/06
+  summary: BADGER is a unified evaluation framework designed to assess enterprise-grade text-to-SQL systems and multi-step
+    agentic reasoning pipelines. It introduces a hybrid execution accuracy metric that combines LLM-based structural alignment
+    with deterministic scoring to handle complex SQL dialects and numeric tolerances.
+  tags:
+  - benchmark
+  links:
+    paper: https://arxiv.org/abs/2606.02109
+    github: ''
+    website: ''
+- title: 'Learning to Retrieve: Dual-Level Long-Term Memory for Text-to-SQL Agents'
+  authors: Yibo Wang, Nikki Lijing Kuang, Philip S. Yu, Zhewei Yao, Yuxiong He
+  venue: arXiv 2026/05
+  summary: MERIT is a dynamic multi-horizon memory retrieval framework for text-to-SQL agents that utilizes both episode-level
+    and turn-level memory. It employs reinforcement learning and a Process Reward Model to optimize retrieval policies for
+    global strategic guidance and local decision support during multi-turn database interactions.
+  links:
+    paper: https://arxiv.org/abs/2606.00547
+    github: ''
+    website: ''
 - title: 'Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks'
   authors: Jiajing Guo, Kenil Patel, Jorge Piazentin Ono, Wenbin He, Liu Ren
   venue: arXiv 2025/10
diff --git a/data/papers_website_generation.yaml b/data/papers_website_generation.yaml
@@ -1,3 +1,15 @@
+- title: 'I-WebGenBench : Evaluating Interactivity in LLM-Generated Scientific Web Applications'
+  authors: Dasen Dai, Biao Wu, Meng Fang, Shuoqi Li, Wenhao Wang
+  venue: arXiv 2026/05
+  summary: The paper introduces I-WebGenBench, a benchmark for evaluating the generation of interactive web systems from scientific
+    papers. It also proposes PaperVoyager, a structured generation framework that models interaction logic to synthesize executable
+    web applications that allow users to manipulate inputs and observe dynamic behaviors.
+  tags:
+  - benchmark
+  links:
+    paper: https://arxiv.org/abs/2606.00750
+    github: ''
+    website: ''
 - title: 'ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding'
   authors: Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
   venue: arXiv 2025/10
diff --git a/docs/static/badges/papers.json b/docs/static/badges/papers.json
@@ -1 +1 @@
-{"schemaVersion": 1, "label": "Papers", "message": "494", "color": "brightgreen"}
+{"schemaVersion": 1, "label": "Papers", "message": "498", "color": "brightgreen"}
diff --git a/docs/static/badges/papers.svg b/docs/static/badges/papers.svg
@@ -1,5 +1,5 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="77" height="20" role="img" aria-label="papers: 494">
-  <title>papers: 494</title>
+<svg xmlns="http://www.w3.org/2000/svg" width="77" height="20" role="img" aria-label="papers: 498">
+  <title>papers: 498</title>
   <linearGradient id="s" x2="0" y2="100%">
     <stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
     <stop offset="1" stop-opacity=".1"/>
@@ -12,6 +12,6 @@
   </g>
   <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,Arial,sans-serif" font-size="11">
     <text x="23.0" y="14">papers</text>
-    <text x="61.5" y="14">494</text>
+    <text x="61.5" y="14">498</text>
   </g>
 </svg>

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-{"schemaVersion": 1, "label": "Papers", "message": "494", "color": "brightgreen"}`
	`1`	`+{"schemaVersion": 1, "label": "Papers", "message": "498", "color": "brightgreen"}`