Skip to content

Commit 18df13d

Browse files
committed
docs: modernize README with comprehensive documentation
- Convert from RST to Markdown format with modern styling - Add badges for Python version, license, and build status - Include architecture diagram and visual project overview - Add detailed quick start guide and usage examples - Provide comprehensive troubleshooting section - Include contribution guidelines and development setup
1 parent 80e0162 commit 18df13d

1 file changed

Lines changed: 201 additions & 0 deletions

File tree

README.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# 🤖 CodeRAG: AI-Powered Code Retrieval & Assistance
2+
3+
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
4+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
5+
[![Code Quality](https://github.com/your-username/CodeRAG/workflows/Code%20Quality/badge.svg)](https://github.com/your-username/CodeRAG/actions)
6+
7+
> **Note**: This POC was innovative for its time, but modern tools like Cursor and Windsurf now apply this principle directly in IDEs. This remains an excellent educational project for understanding RAG implementation.
8+
9+
## ✨ What is CodeRAG?
10+
11+
CodeRAG combines **Retrieval-Augmented Generation (RAG)** with AI to provide intelligent coding assistance. Instead of limited context windows, it indexes your entire codebase and provides contextual suggestions based on your complete project.
12+
13+
### 🎯 Core Idea
14+
15+
Most coding assistants work with limited scope, but CodeRAG provides the full context of your project by:
16+
- **Real-time indexing** of your entire codebase using FAISS vector search
17+
- **Semantic code search** powered by OpenAI embeddings
18+
- **Contextual AI responses** that understand your project structure
19+
20+
## 🚀 Quick Start
21+
22+
### Prerequisites
23+
- Python 3.8+
24+
- OpenAI API Key ([Get one here](https://platform.openai.com/api-keys))
25+
26+
### Installation
27+
28+
```bash
29+
# Clone the repository
30+
git clone https://github.com/your-username/CodeRAG.git
31+
cd CodeRAG
32+
33+
# Create virtual environment
34+
python -m venv venv
35+
source venv/bin/activate # On Windows: venv\\Scripts\\activate
36+
37+
# Install dependencies
38+
pip install -r requirements.txt
39+
40+
# Configure environment
41+
cp example.env .env
42+
# Edit .env with your OpenAI API key and settings
43+
```
44+
45+
### Configuration
46+
47+
Create a `.env` file with your settings:
48+
49+
```env
50+
OPENAI_API_KEY=your_openai_api_key_here
51+
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
52+
OPENAI_CHAT_MODEL=gpt-4
53+
WATCHED_DIR=/path/to/your/code/directory
54+
FAISS_INDEX_FILE=./coderag_index.faiss
55+
EMBEDDING_DIM=1536
56+
```
57+
58+
### Running CodeRAG
59+
60+
```bash
61+
# Start the backend (indexing and monitoring)
62+
python main.py
63+
64+
# In a separate terminal, start the web interface
65+
streamlit run app.py
66+
```
67+
68+
## 📖 How It Works
69+
70+
```mermaid
71+
graph LR
72+
A[Code Files] --> B[File Monitor]
73+
B --> C[OpenAI Embeddings]
74+
C --> D[FAISS Vector DB]
75+
E[User Query] --> F[Semantic Search]
76+
D --> F
77+
F --> G[Retrieved Context]
78+
G --> H[OpenAI GPT]
79+
H --> I[AI Response]
80+
```
81+
82+
1. **Indexing**: CodeRAG monitors your code directory and generates embeddings for Python files
83+
2. **Storage**: Embeddings are stored in a FAISS vector database with metadata
84+
3. **Search**: User queries are embedded and matched against the code database
85+
4. **Generation**: Retrieved code context is sent to GPT models for intelligent responses
86+
87+
## 🛠️ Architecture
88+
89+
```
90+
CodeRAG/
91+
├── 🧠 coderag/ # Core RAG functionality
92+
│ ├── config.py # Environment configuration
93+
│ ├── embeddings.py # OpenAI embedding generation
94+
│ ├── index.py # FAISS vector operations
95+
│ ├── search.py # Semantic code search
96+
│ └── monitor.py # File system monitoring
97+
├── 🌐 app.py # Streamlit web interface
98+
├── 🔧 main.py # Backend indexing service
99+
├── 🔗 prompt_flow.py # RAG pipeline orchestration
100+
└── 📋 requirements.txt # Dependencies
101+
```
102+
103+
### Key Components
104+
105+
- **🔍 Vector Search**: FAISS-powered similarity search for code retrieval
106+
- **🎯 Smart Embeddings**: OpenAI embeddings capture semantic code meaning
107+
- **📡 Real-time Updates**: Watchdog monitors file changes for live indexing
108+
- **💬 Conversational UI**: Streamlit interface with chat-like experience
109+
110+
## 🎪 Usage Examples
111+
112+
### Ask About Your Code
113+
```
114+
"How does the FAISS indexing work in this codebase?"
115+
"Where is error handling implemented?"
116+
"Show me examples of the embedding generation process"
117+
```
118+
119+
### Get Improvements
120+
```
121+
"How can I optimize the search performance?"
122+
"What are potential security issues in this code?"
123+
"Suggest better error handling for the monitor module"
124+
```
125+
126+
### Debug Issues
127+
```
128+
"Why might the search return no results?"
129+
"How do I troubleshoot OpenAI connection issues?"
130+
"What could cause indexing to fail?"
131+
```
132+
133+
## ⚙️ Development
134+
135+
### Code Quality Tools
136+
137+
```bash
138+
# Install pre-commit hooks
139+
pip install pre-commit
140+
pre-commit install
141+
142+
# Run formatting and linting
143+
black .
144+
flake8 .
145+
mypy .
146+
```
147+
148+
### Testing
149+
150+
```bash
151+
# Test FAISS index functionality
152+
python tests/test_faiss.py
153+
154+
# Test individual components
155+
python scripts/initialize_index.py
156+
python scripts/run_monitor.py
157+
```
158+
159+
## 🐛 Troubleshooting
160+
161+
### Common Issues
162+
163+
**Search returns no results**
164+
- Check if indexing completed: look for `coderag_index.faiss` file
165+
- Verify OpenAI API key is working
166+
- Ensure your query relates to indexed Python files
167+
168+
**OpenAI API errors**
169+
- Verify API key in `.env` file
170+
- Check API usage limits and billing
171+
- Ensure model names are correct (gpt-4, text-embedding-ada-002)
172+
173+
**File monitoring not working**
174+
- Check `WATCHED_DIR` path in `.env`
175+
- Ensure directory contains `.py` files
176+
- Look for error logs in console output
177+
178+
## 🤝 Contributing
179+
180+
1. Fork the repository
181+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
182+
3. Make your changes with proper error handling and type hints
183+
4. Run code quality checks (`pre-commit run --all-files`)
184+
5. Commit your changes (`git commit -m 'Add amazing feature'`)
185+
6. Push to the branch (`git push origin feature/amazing-feature`)
186+
7. Open a Pull Request
187+
188+
## 📄 License
189+
190+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE-2.0.txt) file for details.
191+
192+
## 🙏 Acknowledgments
193+
194+
- [OpenAI](https://openai.com/) for embedding and chat models
195+
- [Facebook AI Similarity Search (FAISS)](https://github.com/facebookresearch/faiss) for vector search
196+
- [Streamlit](https://streamlit.io/) for the web interface
197+
- [Watchdog](https://github.com/gorakhargosh/watchdog) for file monitoring
198+
199+
---
200+
201+
**⭐ If this project helps you, please give it a star!**

0 commit comments

Comments
 (0)