Skip to content

Commit 60382f4

Browse files
bdqnghiclaude
andcommitted
Refine README with modern design and improved structure
- Add centered header with styled logo - Improve badge layout with flat-square design - Streamline Quick Start section into 3 clear steps - Better organize CLI commands with subheadings - Add How It Works section with architecture diagram - Enhance language support display with emojis - Improve overall visual hierarchy and readability - Maintain all technical content and experimental results 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent d28cf9b commit 60382f4

File tree

1 file changed

+182
-120
lines changed

1 file changed

+182
-120
lines changed

README.md

Lines changed: 182 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -1,174 +1,157 @@
1-
# CodeWiki: Evaluating AI’s Ability to Generate Holistic Documentation for Large-Scale Codebases
1+
<p align="center">
2+
<img src="./img/framework-overview.png" alt="CodeWiki Framework" width="500" style="border: 2px solid #e1e4e8; border-radius: 12px; padding: 20px;"/>
3+
</p>
4+
5+
<h1 align="center">CodeWiki</h1>
6+
7+
<p align="center">
8+
<strong>AI-Powered Repository Documentation Generation</strong> • <strong>Multi-Language Support</strong> • <strong>Architecture-Aware Analysis</strong>
9+
</p>
10+
11+
<p align="center">
12+
Generate holistic, structured documentation for large-scale codebases • Cross-module interactions • Visual artifacts and diagrams
13+
</p>
14+
15+
<p align="center">
16+
<a href="https://github.com/FSoft-AI4Code/CodeWiki/actions"><img alt="CI" src="https://github.com/FSoft-AI4Code/CodeWiki/actions/workflows/ci.yml/badge.svg" /></a>
17+
<a href="https://pypi.org/project/codewiki/"><img alt="PyPI version" src="https://img.shields.io/pypi/v/codewiki?style=flat-square" /></a>
18+
<a href="https://python.org/"><img alt="Python version" src="https://img.shields.io/badge/python-3.12+-blue?style=flat-square" /></a>
19+
<a href="./LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-green.svg?style=flat-square" /></a>
20+
<a href="https://github.com/FSoft-AI4Code/CodeWiki/stargazers"><img alt="GitHub stars" src="https://img.shields.io/github/stars/FSoft-AI4Code/CodeWiki?style=flat-square" /></a>
21+
</p>
22+
23+
<p align="center">
24+
<a href="#quick-start"><strong>Quick Start</strong></a> •
25+
<a href="#cli-commands"><strong>CLI Commands</strong></a> •
26+
<a href="#documentation-output"><strong>Output Structure</strong></a> •
27+
<a href="https://arxiv.org/abs/2510.24428"><strong>Paper</strong></a>
28+
</p>
229

3-
<div align="center">
30+
---
431

5-
![CodeWiki Architecture](img/framework-overview.png)
32+
## Quick Start
633

7-
[![Python](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
8-
[![GitHub stars](https://img.shields.io/github/stars/FSoft-AI4Code/CodeWiki?style=social)](https://github.com/FSoft-AI4Code/CodeWiki/stargazers)
34+
### 1. Install CodeWiki
935

10-
**Open-source framework for holistic, structured repository-level documentation across multilingual codebases**
36+
```bash
37+
# Install from PyPI
38+
pip install codewiki
1139

12-
[Demo](https://fsoft-ai4code.github.io/codewiki-demo/)[Paper](https://arxiv.org/abs/2510.24428)[CodeWikiBench](https://github.com/FSoft-AI4Code/CodeWikiBench)[Docker](docker/DOCKER_README.md)[Development](DEVELOPMENT.md)[Citation](#citation)
40+
# Or install from source
41+
pip install git+https://github.com/FSoft-AI4Code/CodeWiki.git
1342

14-
</div>
43+
# Verify installation
44+
codewiki --version
45+
```
1546

16-
---
47+
### 2. Configure Your Environment
1748

18-
## Abstract
49+
```bash
50+
codewiki config set \
51+
--api-key YOUR_API_KEY \
52+
--base-url https://api.anthropic.com \
53+
--main-model claude-sonnet-4 \
54+
--cluster-model claude-sonnet-4
55+
```
1956

20-
Given a large and evolving codebase, the ability to automatically generate holistic, architecture-aware documentation that captures not only individual functions but also their cross-file, cross-module, and system-level interactions remains an open challenge. We present **CodeWiki**, a unified framework for automated repository-level documentation across seven programming languages. CodeWiki introduces three key innovations: (i) hierarchical decomposition that preserves architectural context across multiple levels of granularity, (ii) recursive multi-agent processing with dynamic task delegation for scalable generation, and (iii) multi-modal synthesis that integrates textual descriptions with visual artifacts such as architecture diagrams and data-flow representations.
57+
### 3. Generate Documentation
2158

22-
---
59+
```bash
60+
# Navigate to your project
61+
cd /path/to/your/project
2362

24-
## Usage Example
63+
# Generate documentation
64+
codewiki generate
65+
66+
# Generate with HTML viewer for GitHub Pages
67+
codewiki generate --github-pages --create-branch
68+
```
2569

26-
![CLI Usage Example](https://github.com/FSoft-AI4Code/CodeWiki/releases/download/assets/cli-usage-example.gif)
70+
**That's it!** Your documentation will be generated in `./docs/` with comprehensive repository-level analysis.
2771

2872
---
2973

30-
## Overview
74+
## What is CodeWiki?
3175

32-
CodeWiki addresses the challenge of comprehensive documentation for large-scale repositories through three core innovations:
76+
CodeWiki is an open-source framework for **automated repository-level documentation** across seven programming languages. It generates holistic, architecture-aware documentation that captures not only individual functions but also their cross-file, cross-module, and system-level interactions.
3377

3478
### Key Innovations
3579

3680
| Innovation | Description | Impact |
3781
|------------|-------------|--------|
38-
| **Hierarchical Decomposition** | Dynamic programming-inspired strategy that partitions repositories into coherent modules while preserving architectural context | Handles codebases of arbitrary size (86K-1.4M LOC tested) |
39-
| **Recursive Agentic System** | Adaptive multi-agent processing with dynamic delegation capabilities for complex modules | Maintains quality while scaling to repository-level scope |
82+
| **Hierarchical Decomposition** | Dynamic programming-inspired strategy that preserves architectural context | Handles codebases of arbitrary size (86K-1.4M LOC tested) |
83+
| **Recursive Agentic System** | Adaptive multi-agent processing with dynamic delegation capabilities | Maintains quality while scaling to repository-level scope |
4084
| **Multi-Modal Synthesis** | Generates textual documentation, architecture diagrams, data flows, and sequence diagrams | Comprehensive understanding from multiple perspectives |
4185

42-
### Multilingual Support
86+
### Supported Languages
4387

44-
Supports **7 programming languages**: Python, Java, JavaScript, TypeScript, C, C++, C#
88+
**🐍 Python****Java****🟨 JavaScript****🔷 TypeScript****⚙️ C****🔧 C++****🪟 C#**
4589

4690
---
4791

48-
## Experimental Results
92+
## CLI Commands
4993

50-
CodeWiki has been evaluated on **CodeWikiBench**, the first benchmark specifically designed for repository-level documentation quality assessment.
51-
52-
### Performance by Language Category
53-
54-
| Language Category | CodeWiki (Sonnet-4) | DeepWiki | Improvement |
55-
|-------------------|---------------------|----------|-------------|
56-
| High-Level (Python, JS, TS) | **79.14%** | 68.67% | **+10.47%** |
57-
| Managed (C#, Java) | **68.84%** | 64.80% | **+4.04%** |
58-
| Systems (C, C++) | 53.24% | 56.39% | -3.15% |
59-
| **Overall Average** | **68.79%** | **64.06%** | **+4.73%** |
60-
61-
### Results on Representative Repositories
62-
63-
| Repository | Language | LOC | CodeWiki-Sonnet-4 | DeepWiki | Improvement |
64-
|------------|----------|-----|-------------------|----------|-------------|
65-
| All-Hands-AI--OpenHands | Python | 229K | **82.45%** | 73.04% | **+9.41%** |
66-
| puppeteer--puppeteer | TypeScript | 136K | **83.00%** | 64.46% | **+18.54%** |
67-
| sveltejs--svelte | JavaScript | 125K | **71.96%** | 68.51% | **+3.45%** |
68-
| Unity-Technologies--ml-agents | C# | 86K | **79.78%** | 74.80% | **+4.98%** |
69-
| elastic--logstash | Java | 117K | **57.90%** | 54.80% | **+3.10%** |
70-
71-
**View comprehensive results:** See [paper](https://arxiv.org/abs/2510.24428) for complete evaluation on 21 repositories spanning all supported languages.
72-
73-
---
74-
75-
## CLI Installation & Usage
76-
77-
### Prerequisites
78-
79-
- Python 3.12+
80-
- Node.js (for mermaid diagram validation)
81-
- LLM API access (Anthropic Claude, OpenAI, etc.)
82-
83-
### Installation
84-
85-
```bash
86-
# Install from source
87-
pip install git+https://github.com/FSoft-AI4Code/CodeWiki.git
88-
89-
# Verify installation
90-
codewiki --version
91-
```
92-
93-
### Quick Start
94-
95-
#### 1. Configure CodeWiki
94+
### Configuration Management
9695

9796
```bash
97+
# Set up your API configuration
9898
codewiki config set \
99-
--api-key YOUR_API_KEY \
100-
--base-url https://api.anthropic.com \
101-
--main-model claude-sonnet-4 \
102-
--cluster-model claude-sonnet-4
103-
```
104-
105-
Verify configuration:
99+
--api-key <your-api-key> \
100+
--base-url <provider-url> \
101+
--main-model <model-name> \
102+
--cluster-model <model-name>
106103

107-
```bash
104+
# Show current configuration
108105
codewiki config show
106+
107+
# Validate your configuration
109108
codewiki config validate
110109
```
111110

112-
#### 2. Generate Documentation
111+
### Documentation Generation
113112

114113
```bash
115-
# Navigate to your project
116-
cd /path/to/your/project
117-
118-
# Generate documentation (saved to ./docs/)
114+
# Basic generation
119115
codewiki generate
120116

121-
# Generate with GitHub Pages HTML viewer
122-
codewiki generate --github-pages
117+
# Custom output directory
118+
codewiki generate --output ./documentation
123119

124-
# Full-featured generation
125-
codewiki generate --create-branch --github-pages --verbose
126-
```
120+
# Create git branch for documentation
121+
codewiki generate --create-branch
127122

128-
### CLI Commands
123+
# Generate HTML viewer for GitHub Pages
124+
codewiki generate --github-pages
129125

130-
```bash
131-
# Configuration Management
132-
codewiki config set --api-key <key> --base-url <url> \
133-
--main-model <model> --cluster-model <model>
134-
codewiki config show
135-
codewiki config validate
126+
# Enable verbose logging
127+
codewiki generate --verbose
136128

137-
# Documentation Generation
138-
codewiki generate # Basic generation
139-
codewiki generate --output ./documentation # Custom output directory
140-
codewiki generate --create-branch # Create git branch
141-
codewiki generate --github-pages # Generate HTML viewer
142-
codewiki generate --verbose # Detailed logging
129+
# Full-featured generation
130+
codewiki generate --create-branch --github-pages --verbose
143131
```
144132

145133
### Configuration Storage
146134

147-
- **API keys**: System keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
135+
- **API keys**: Securely stored in system keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service)
148136
- **Settings**: `~/.codewiki/config.json`
149137

150138
---
151139

152-
## Additional Documentation
153-
154-
- **[Docker Deployment](docker/DOCKER_README.md)** - Containerized deployment instructions
155-
- **[Development Guide](DEVELOPMENT.md)** - Project structure, architecture, and contributing guidelines
156-
157-
---
158-
159140
## Documentation Output
160141

161-
Generated documentation includes:
142+
Generated documentation includes both **textual descriptions** and **visual artifacts** for comprehensive understanding.
162143

163144
### Textual Documentation
164145
- Repository overview with architecture guide
165146
- Module-level documentation with API references
166147
- Usage examples and implementation patterns
148+
- Cross-module interaction analysis
167149

168150
### Visual Artifacts
169151
- System architecture diagrams (Mermaid)
170152
- Data flow visualizations
171-
- Dependency graphs
153+
- Dependency graphs and module relationships
154+
- Sequence diagrams for complex interactions
172155

173156
### Output Structure
174157

@@ -185,38 +168,117 @@ Generated documentation includes:
185168

186169
---
187170

171+
## Experimental Results
172+
173+
CodeWiki has been evaluated on **CodeWikiBench**, the first benchmark specifically designed for repository-level documentation quality assessment.
174+
175+
### Performance by Language Category
176+
177+
| Language Category | CodeWiki (Sonnet-4) | DeepWiki | Improvement |
178+
|-------------------|---------------------|----------|-------------|
179+
| High-Level (Python, JS, TS) | **79.14%** | 68.67% | **+10.47%** |
180+
| Managed (C#, Java) | **68.84%** | 64.80% | **+4.04%** |
181+
| Systems (C, C++) | 53.24% | 56.39% | -3.15% |
182+
| **Overall Average** | **68.79%** | **64.06%** | **+4.73%** |
183+
184+
### Results on Representative Repositories
185+
186+
| Repository | Language | LOC | CodeWiki-Sonnet-4 | DeepWiki | Improvement |
187+
|------------|----------|-----|-------------------|----------|-------------|
188+
| All-Hands-AI--OpenHands | Python | 229K | **82.45%** | 73.04% | **+9.41%** |
189+
| puppeteer--puppeteer | TypeScript | 136K | **83.00%** | 64.46% | **+18.54%** |
190+
| sveltejs--svelte | JavaScript | 125K | **71.96%** | 68.51% | **+3.45%** |
191+
| Unity-Technologies--ml-agents | C# | 86K | **79.78%** | 74.80% | **+4.98%** |
192+
| elastic--logstash | Java | 117K | **57.90%** | 54.80% | **+3.10%** |
193+
194+
**View comprehensive results:** See [paper](https://arxiv.org/abs/2510.24428) for complete evaluation on 21 repositories spanning all supported languages.
195+
196+
---
197+
198+
## How It Works
199+
200+
### Architecture Overview
201+
202+
CodeWiki employs a three-stage process for comprehensive documentation generation:
203+
204+
1. **Hierarchical Decomposition**: Uses dynamic programming-inspired algorithms to partition repositories into coherent modules while preserving architectural context across multiple granularity levels.
205+
206+
2. **Recursive Multi-Agent Processing**: Implements adaptive multi-agent processing with dynamic task delegation, allowing the system to handle complex modules at scale while maintaining quality.
207+
208+
3. **Multi-Modal Synthesis**: Integrates textual descriptions with visual artifacts including architecture diagrams, data-flow representations, and sequence diagrams for comprehensive understanding.
209+
210+
### Data Flow
211+
212+
```
213+
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
214+
│ Codebase │───▶│ Hierarchical │───▶│ Multi-Agent │
215+
│ Analysis │ │ Decomposition │ │ Processing │
216+
└─────────────────┘ └──────────────────┘ └─────────────────┘
217+
│ │
218+
▼ ▼
219+
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
220+
│ Visual │◀───│ Multi-Modal │◀───│ Structured │
221+
│ Artifacts │ │ Synthesis │ │ Content │
222+
└─────────────────┘ └──────────────────┘ └─────────────────┘
223+
```
224+
225+
---
226+
227+
## Requirements
228+
229+
- **Python 3.12+**
230+
- **Node.js** (for Mermaid diagram validation)
231+
- **LLM API access** (Anthropic Claude, OpenAI, etc.)
232+
- **Git** (for branch creation features)
233+
234+
---
235+
236+
## Additional Resources
237+
238+
### Documentation & Guides
239+
- **[Docker Deployment](docker/DOCKER_README.md)** - Containerized deployment instructions
240+
- **[Development Guide](DEVELOPMENT.md)** - Project structure, architecture, and contributing guidelines
241+
- **[CodeWikiBench](https://github.com/FSoft-AI4Code/CodeWikiBench)** - Repository-level documentation benchmark
242+
- **[Live Demo](https://fsoft-ai4code.github.io/codewiki-demo/)** - Interactive demo and examples
243+
244+
### Academic Resources
245+
- **[Paper](https://arxiv.org/abs/2510.24428)** - Full research paper with detailed methodology and results
246+
- **[Citation](#citation)** - How to cite CodeWiki in your research
247+
248+
---
249+
188250
## Citation
189251

190252
If you use CodeWiki in your research, please cite:
191253

192254
```bibtex
193255
@misc{hoang2025codewikievaluatingaisability,
194-
title={CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases},
256+
title={CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases},
195257
author={Anh Nguyen Hoang and Minh Le-Anh and Bach Le and Nghi D. Q. Bui},
196258
year={2025},
197259
eprint={2510.24428},
198260
archivePrefix={arXiv},
199261
primaryClass={cs.SE},
200-
url={https://arxiv.org/abs/2510.24428},
262+
url={https://arxiv.org/abs/2510.24428},
201263
}
202264
```
203265

204266
---
205267

206268
## Star History
207269

208-
<a href="https://star-history.com/#FSoft-AI4Code/CodeWiki&Date">
209-
<picture>
210-
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date&theme=dark" />
211-
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date" />
212-
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date" />
213-
</picture>
214-
</a>
270+
<p align="center">
271+
<a href="https://star-history.com/#FSoft-AI4Code/CodeWiki&Date">
272+
<picture>
273+
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date&theme=dark" />
274+
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date" />
275+
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=FSoft-AI4Code/CodeWiki&type=Date" />
276+
</picture>
277+
</a>
278+
</p>
215279

216280
---
217281

218282
## License
219283

220-
MIT License
221-
222-
</div>
284+
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.

0 commit comments

Comments
 (0)