Skip to content

Commit 1f8785a

Browse files
committed
feat: Initialize MkDocs documentation website
Set up comprehensive documentation website for Awesome Issue Resolution survey: - MkDocs with Material theme and custom styling - 5 documentation pages (Home, Paper, Tables, About, Cite) - Automated deployment via GitHub Actions - Survey of 135 papers on LLM-based issue resolution - Interactive taxonomy covering Data, Methods, and Analysis - Statistical tables for datasets and models Deployment: https://DeepSoftwareAnalytics.github.io/Awesome-Issue-Resolution/
0 parents  commit 1f8785a

File tree

94 files changed

+19163
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+19163
-0
lines changed

.github/workflows/deploy.yml

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
name: Deploy MkDocs to GitHub Pages
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- master
8+
pull_request:
9+
branches:
10+
- main
11+
- master
12+
workflow_dispatch:
13+
14+
permissions:
15+
contents: read
16+
pages: write
17+
id-token: write
18+
19+
concurrency:
20+
group: "pages"
21+
cancel-in-progress: false
22+
23+
jobs:
24+
build:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- name: Checkout repository
28+
uses: actions/checkout@v4
29+
with:
30+
fetch-depth: 0
31+
32+
- name: Setup Python
33+
uses: actions/setup-python@v5
34+
with:
35+
python-version: '3.11'
36+
37+
- name: Cache pip dependencies
38+
uses: actions/cache@v4
39+
with:
40+
path: ~/.cache/pip
41+
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
42+
restore-keys: |
43+
${{ runner.os }}-pip-
44+
45+
- name: Install dependencies
46+
run: |
47+
python -m pip install --upgrade pip
48+
pip install -r requirements.txt
49+
50+
- name: Build MkDocs site
51+
run: mkdocs build --strict
52+
53+
- name: Setup Pages
54+
uses: actions/configure-pages@v4
55+
56+
- name: Upload artifact
57+
uses: actions/upload-pages-artifact@v3
58+
with:
59+
path: ./site
60+
61+
deploy:
62+
environment:
63+
name: github-pages
64+
url: ${{ steps.deployment.outputs.page_url }}
65+
runs-on: ubuntu-latest
66+
needs: build
67+
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/master'
68+
steps:
69+
- name: Deploy to GitHub Pages
70+
id: deployment
71+
uses: actions/deploy-pages@v4
72+

README.md

Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# ✨ Awesome Issue Resolution
2+
3+
<div align="center">
4+
5+
**Advances, Frontiers, and Future of Issue Resolution in Software Engineering: A Comprehensive Survey**
6+
7+
[![GitHub Stars](https://img.shields.io/github/stars/DeepSoftwareAnalytics/Awesome-Issue-Resolution?style=for-the-badge&logo=github&color=4c1)](https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution)
8+
[![Forks](https://img.shields.io/github/forks/DeepSoftwareAnalytics/Awesome-Issue-Resolution?style=for-the-badge&logo=github&color=blue)](https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution/fork)
9+
[![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
10+
[![Paper](https://img.shields.io/badge/PAPER-PDF-4285F4?style=for-the-badge&logo=googledocs&logoColor=white)](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/paper/)
11+
[![arXiv](https://img.shields.io/badge/arXiv-2501.XXXXX-B31B1B?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/XXXX.XXXXX)
12+
[![Tables](https://img.shields.io/badge/TABLES-Statistics-blue?style=for-the-badge&logo=databricks)](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/tables/)
13+
[![Contributors](https://img.shields.io/github/contributors/DeepSoftwareAnalytics/Awesome-Issue-Resolution?style=for-the-badge&color=green&logo=github)](https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution/graphs/contributors)
14+
![Papers Count](https://img.shields.io/badge/papers-135-green?style=for-the-badge&logo=googlescholar&logoColor=white)
15+
16+
[**📖 Documentation Website**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/) | [**📄 Full Paper**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/paper/) | [**📋 Tables & Resources**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/tables/)
17+
18+
<img src="docs/images/awesome-issue-resolution.png" alt="Awesome Issue Resolution" width="60%">
19+
20+
</div>
21+
22+
---
23+
24+
## 📖 Abstract
25+
26+
Based on a systematic review of **135 publications**, this survey establishes a holistic theoretical framework for Issue Resolution in software engineering. We examine how **Large Language Models (LLMs)** are transforming the automation of GitHub issue resolution. Beyond the theoretical analysis, we have curated a comprehensive collection of datasets and model training resources, which are continuously synchronized with our GitHub repository and project documentation website.
27+
28+
**🔍 Explore This Survey:**
29+
30+
- 📊 **[Data](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#data)**: Evaluation and training datasets, data collection and synthesis methods
31+
- 🛠️ **[Methods](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#methods)**: Training-free (agent/workflow) and training-based (SFT/RL) approaches
32+
- 🔍 **[Analysis](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#analysis)**: Insights into both data characteristics and method performance
33+
- 📋 **[Tables & Resources](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/tables/)**: Comprehensive statistical tables and resources
34+
- 📄 **[Full Paper](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/paper/)**: Read the complete survey paper
35+
36+
---
37+
38+
## 📊 Data
39+
40+
### Evaluation Datasets
41+
42+
We comprehensively survey evaluation benchmarks for issue resolution, categorizing them by programming language, multimodal support, and reproducible execution environments.
43+
44+
**Key Datasets:**
45+
- **SWE-bench**: Python-based benchmark with 2,294 real-world issues from 12 repositories
46+
- **SWE-bench Lite**: Curated subset of 300 high-quality instances
47+
- **Multi-SWE-bench**: Multilingual extension covering 7+ programming languages
48+
- **SWE-bench Multimodal**: Incorporates visual elements (JS, TS, HTML, CSS)
49+
- **Visual SWE-bench**: Focus on vision-intensive issue resolution
50+
51+
[**→ Explore all evaluation datasets**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#evaluation-datasets)
52+
53+
### Training Datasets
54+
55+
We analyze trajectory datasets used for agent training, including both human-annotated and synthetically generated examples.
56+
57+
**Notable Resources:**
58+
- **R2E-Gym**: 3,321 trajectories for reinforcement learning
59+
- **SWE-Gym**: 491 expert trajectories for supervised fine-tuning
60+
- **SWE-Fixer**: Large-scale dataset with 69,752 editing chains of thought
61+
62+
[**→ Explore training datasets**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#training-datasets)
63+
64+
---
65+
66+
## 🛠️ Methods
67+
68+
### Training-Free Approaches
69+
70+
#### Agent-Based Methods
71+
Autonomous agents that leverage tool use, memory, and planning to resolve issues without task-specific training.
72+
73+
**Representative Works:**
74+
- **OpenHands**: Multi-agent collaboration framework
75+
- **Agentless**: Localization + repair pipeline without agent loops
76+
- **AutoCodeRover**: Hierarchical search-based code navigation
77+
78+
#### Workflow-Based Methods
79+
Structured pipelines optimizing specific stages of issue resolution.
80+
81+
**Key Innovations:**
82+
- **Meta-RAG**: Code summarization for enhanced retrieval
83+
- **TestAider**: Test-driven development integration
84+
- **PatchPilot**: Automated patch validation and refinement
85+
86+
[**→ Explore training-free methods**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#training-free-approaches)
87+
88+
### Training-Based Approaches
89+
90+
#### Supervised Fine-Tuning (SFT)
91+
Models trained on expert trajectories to internalize issue resolution patterns.
92+
93+
**Notable Models:**
94+
- **Devstral (22B)**: 46.8% on SWE-bench Verified
95+
- **Co-PatcheR (14B)**: Multi-stage training with code editing focus
96+
- **SWE-Swiss (32B)**: Synthetic data augmentation for improved generalization
97+
98+
#### Reinforcement Learning (RL)
99+
Models optimized through environmental feedback and reward signals.
100+
101+
**State-of-the-Art:**
102+
- **OpenHands Critic (32B)**: 66.4% on SWE-bench Verified
103+
- **Kimi-Dev (72B)**: 60.4% with outcome-based rewards
104+
- **DeepSWE (32B)**: Trained from scratch using RL on code repositories
105+
106+
[**→ Explore training-based methods**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#training-based-approaches)
107+
108+
---
109+
110+
## 🔍 Analysis
111+
112+
### Data Analysis
113+
- **Quality vs. Quantity**: Analysis of dataset characteristics and their impact on model performance
114+
- **Contamination Detection**: Protocols for ensuring benchmark integrity
115+
- **Difficulty Spectrum**: Stratification of issues by complexity
116+
117+
### Methods Analysis
118+
- **Performance Trends**: Comparative evaluation across model families and sizes
119+
- **Scaling Laws**: Analysis of parameter count vs. performance gains
120+
- **Efficiency Metrics**: Cost-benefit analysis of different approaches
121+
122+
[**→ Explore detailed analysis**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/#analysis)
123+
124+
---
125+
126+
## 🚀 Challenges and Opportunities
127+
128+
### 🔧 High computational overhead
129+
The scalability of SWE agents is bottlenecked by the high costs of sandboxed environments and long-context inference. Optimization strategies are required to streamline these resource-intensive loops without sacrificing performance.
130+
131+
### 📉 Opacity in resource consumption
132+
Benchmarks often overlook efficiency, masking the high costs of techniques like inference-time scaling. Standardized reporting of latency and token usage is crucial for guiding the development of cost-effective agents.
133+
134+
### 🖼️ Limited visually-grounded reasoning
135+
Reliance on text proxies for UI interpretation limits effectiveness. Future research can adopt intrinsic multi-modal solutions, such as code-centric MLLMs, to better bridge the gap between visual rendering and underlying code logic.
136+
137+
### 🛡️ Safety risks in autonomous resolution
138+
High autonomy carries risks of destructive actions, such as accidental code deletion. Future systems should integrate safeguards, such as Git-based version control, to ensure autonomous modifications remain secure and reversible.
139+
140+
### 🎯 Lack of fine-grained reward signals
141+
Reinforcement learning is hindered by sparse, binary feedback. Integrating fine-grained signals from compiler diagnostics and execution traces is necessary to guide models through complex reasoning steps.
142+
143+
### 🔍 Data leakage and contamination
144+
As benchmarks approach saturation, evaluation validity is compromised by data leakage. Future frameworks must strictly enforce decontamination protocols to ensure fairness and reliability.
145+
146+
### 🌐 Lack of universality across SE domains
147+
While current issue resolution tasks mirror development workflows, they represent only a fraction of the full Software Development Life Cycle (SDLC). Future research should broaden the scope of issue resolution tasks to develop more versatile automated software generation methods.
148+
149+
---
150+
151+
## 📋 Tables & Resources
152+
153+
Visit our [**Tables & Resources**](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/tables/) page for comprehensive statistical tables including:
154+
155+
- 📊 **Evaluation Datasets Overview**: Detailed comparison of 30+ benchmarks
156+
- 🎯 **Training Trajectory Datasets**: Analysis of 5 major trajectory datasets
157+
- 🔧 **Supervised Fine-Tuning Models**: Performance metrics for 10+ SFT models
158+
- 🤖 **Reinforcement Learning Models**: Comprehensive analysis of 30+ RL-trained models
159+
- 🌟 **General Foundation Models**: Evaluation of 15+ general-purpose LLMs
160+
161+
---
162+
163+
## 🤝 Contributing
164+
165+
We welcome contributions to this survey! If you'd like to add new papers or fix errors:
166+
167+
1. Fork this repository
168+
2. Add paper entries in the corresponding YAML file under `data/` directory (e.g., `papers_evaluation_datasets.yaml`, `papers_single_agent.yaml`, etc.)
169+
3. Follow the existing format with fields: `short_name`, `title`, `authors`, `venue`, `year`, and `links` (arxiv, github, huggingface)
170+
4. Run `python scripts/render_papers.py` to update the documentation
171+
5. Submit a PR with your changes
172+
173+
---
174+
175+
## 📄 Citation
176+
177+
If you use this project or related survey in your research or system, please cite the following BibTeX:
178+
179+
```bibtex
180+
@misc{li2025awesome_issue_resolution,
181+
title = {Advances, Frontiers, and Future of Issue Resolution in Software Engineering: A Comprehensive Survey},
182+
author = {Caihua Li and Lianghong Guo and Yanlin Wang and Wei Tao and Zhenyu Shan and Mingwei Liu and Jiachi Chen and Haoyu Song and Duyu Tang and Hongyu Zhang and Zibin Zheng},
183+
year = {2025},
184+
howpublished = {\url{https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution}}
185+
}
186+
```
187+
188+
Once published on arXiv or at a conference, please replace the entry with the official citation information (authors, DOI/arXiv ID, conference name, etc.).
189+
190+
---
191+
192+
## 📬 Contact
193+
194+
If you have any questions or suggestions, please contact us through:
195+
196+
- 📧 **Email**: [noranotdor4@gmail.com](mailto:noranotdor4@gmail.com)
197+
- 💬 **GitHub Issues**: [Open an issue](https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution/issues)
198+
199+
---
200+
201+
## 📜 License
202+
203+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
204+
205+
---
206+
207+
<div align="center">
208+
209+
**⭐ Star this repository if you find it helpful!**
210+
211+
Made with ❤️ by the [DeepSoftwareAnalytics](https://github.com/DeepSoftwareAnalytics) team
212+
213+
[Documentation](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/) | [Paper](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/paper/) | [Tables](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/tables/) | [About](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/about/) | [Cite](https://deepsoftwareanalytics.github.io/Awesome-Issue-Resolution/cite/)
214+
215+
</div>
216+

data/papers_data_analysis.yaml

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Data Analysis
2+
# Auto-generated from papers_raw/taxonomy.tex and papers_raw/main.bib
3+
4+
- short_name: "SWE-bench Verified"
5+
title: "Introducing SWE-bench Verified | OpenAI"
6+
authors: "OpenAI"
7+
venue: "arXiv 2024"
8+
year: "2024"
9+
links:
10+
arxiv: "https://openai.com/index/introducing-swe-bench-verified/"
11+
12+
- short_name: "SWE-Bench+"
13+
title: "SWE-Bench+: Enhanced Coding Benchmark for LLMs"
14+
authors: "Reem Aleithan, Haoran Xue, Mohammad Mahdi Mohajer, Elijah Nnorom, Gias Uddin, Song Wang"
15+
venue: "arXiv 2024"
16+
year: "2024"
17+
links:
18+
arxiv: "https://arxiv.org/abs/2410.06992"
19+
20+
- short_name: "Patch Correctness"
21+
title: "Are \"Solved Issues\" in SWE-bench Really Solved Correctly? An Empirical Study"
22+
authors: "You Wang, Michael Pradel, Zhongxin Liu"
23+
venue: "arXiv 2025"
24+
year: "2025"
25+
links:
26+
arxiv: "http://arxiv.org/abs/2503.15223"
27+
28+
- short_name: "UTBoost"
29+
title: "UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench"
30+
authors: "Boxi Yu, Yuxuan Zhu, Pinjia He, Daniel Kang"
31+
venue: "arXiv 2025"
32+
year: "2025"
33+
links:
34+
arxiv: "https://arxiv.org/abs/2506.09289"
35+
36+
- short_name: "Trustworthiness"
37+
title: "Is Your Automated Software Engineer Trustworthy?"
38+
authors: "Noble Saji Mathews, Meiyappan Nagappan"
39+
venue: "arXiv 2025"
40+
year: "2025"
41+
links:
42+
arxiv: "https://arxiv.org/abs/2506.17812"
43+
44+
- short_name: "Rigorous agentic benchmarks"
45+
title: "Establishing Best Practices for Building Rigorous Agentic Benchmarks"
46+
authors: "Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha Cui, Sayash Kapoor, Shayne Longpre, Kevin Meng, Rebecca Weiss, Fazl Barez, Rahul Gupta, Jwala Dhamala, Jacob Merizian, Mario Giulianelli, Harry Coppock, Cozmin Ududec, Jasjeet Sekhon, Jacob Steinhardt, Antony Kellermann, Sarah Schwettmann, Matei Zaharia, Ion Stoica, Percy Liang, Daniel Kang"
47+
venue: "arXiv 2025"
48+
year: "2025"
49+
links:
50+
arxiv: "https://arxiv.org/abs/2507.02825"
51+
52+
- short_name: "The SWE-Bench Illusion"
53+
title: "The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason"
54+
authors: "Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam"
55+
venue: "arXiv 2025"
56+
year: "2025"
57+
links:
58+
arxiv: "https://arxiv.org/abs/2506.12286"
59+
60+
- short_name: "Revisiting SWE-Bench"
61+
title: "Revisiting SWE-Bench: On the Importance of Data Quality for LLM-Based Code Models"
62+
authors: "Reem Aleithan"
63+
venue: "2025 IEEE/ACM 47th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) 2025"
64+
year: "2025"
65+
links:
66+
67+
- short_name: "SPICE"
68+
title: "SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation"
69+
authors: "Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang, Yihao Chen, Zhilong Chen, Arthur Leung, Dayi Lin, Boyuan Chen, Ahmed E. Hassan"
70+
venue: "arXiv 2025"
71+
year: "2025"
72+
links:
73+
arxiv: "https://arxiv.org/abs/2507.09108"
74+
75+
- short_name: "Data contamination"
76+
title: "Does SWE-Bench-Verified Test Agent Ability or Model Memory?"
77+
authors: "Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan"
78+
venue: "arXiv 2025"
79+
year: "2025"
80+
links:
81+
arxiv: "https://arxiv.org/abs/2512.10218"
82+

0 commit comments

Comments
 (0)