|
| 1 | +# 🤖 CodeRAG: AI-Powered Code Retrieval & Assistance |
| 2 | + |
| 3 | +[](https://www.python.org/downloads/) |
| 4 | +[](https://opensource.org/licenses/Apache-2.0) |
| 5 | +[](https://github.com/your-username/CodeRAG/actions) |
| 6 | + |
| 7 | +> **Note**: This POC was innovative for its time, but modern tools like Cursor and Windsurf now apply this principle directly in IDEs. This remains an excellent educational project for understanding RAG implementation. |
| 8 | +
|
| 9 | +## ✨ What is CodeRAG? |
| 10 | + |
| 11 | +CodeRAG combines **Retrieval-Augmented Generation (RAG)** with AI to provide intelligent coding assistance. Instead of limited context windows, it indexes your entire codebase and provides contextual suggestions based on your complete project. |
| 12 | + |
| 13 | +### 🎯 Core Idea |
| 14 | + |
| 15 | +Most coding assistants work with limited scope, but CodeRAG provides the full context of your project by: |
| 16 | +- **Real-time indexing** of your entire codebase using FAISS vector search |
| 17 | +- **Semantic code search** powered by OpenAI embeddings |
| 18 | +- **Contextual AI responses** that understand your project structure |
| 19 | + |
| 20 | +## 🚀 Quick Start |
| 21 | + |
| 22 | +### Prerequisites |
| 23 | +- Python 3.8+ |
| 24 | +- OpenAI API Key ([Get one here](https://platform.openai.com/api-keys)) |
| 25 | + |
| 26 | +### Installation |
| 27 | + |
| 28 | +```bash |
| 29 | +# Clone the repository |
| 30 | +git clone https://github.com/your-username/CodeRAG.git |
| 31 | +cd CodeRAG |
| 32 | + |
| 33 | +# Create virtual environment |
| 34 | +python -m venv venv |
| 35 | +source venv/bin/activate # On Windows: venv\\Scripts\\activate |
| 36 | + |
| 37 | +# Install dependencies |
| 38 | +pip install -r requirements.txt |
| 39 | + |
| 40 | +# Configure environment |
| 41 | +cp example.env .env |
| 42 | +# Edit .env with your OpenAI API key and settings |
| 43 | +``` |
| 44 | + |
| 45 | +### Configuration |
| 46 | + |
| 47 | +Create a `.env` file with your settings: |
| 48 | + |
| 49 | +```env |
| 50 | +OPENAI_API_KEY=your_openai_api_key_here |
| 51 | +OPENAI_EMBEDDING_MODEL=text-embedding-ada-002 |
| 52 | +OPENAI_CHAT_MODEL=gpt-4 |
| 53 | +WATCHED_DIR=/path/to/your/code/directory |
| 54 | +FAISS_INDEX_FILE=./coderag_index.faiss |
| 55 | +EMBEDDING_DIM=1536 |
| 56 | +``` |
| 57 | + |
| 58 | +### Running CodeRAG |
| 59 | + |
| 60 | +```bash |
| 61 | +# Start the backend (indexing and monitoring) |
| 62 | +python main.py |
| 63 | + |
| 64 | +# In a separate terminal, start the web interface |
| 65 | +streamlit run app.py |
| 66 | +``` |
| 67 | + |
| 68 | +## 📖 How It Works |
| 69 | + |
| 70 | +```mermaid |
| 71 | +graph LR |
| 72 | + A[Code Files] --> B[File Monitor] |
| 73 | + B --> C[OpenAI Embeddings] |
| 74 | + C --> D[FAISS Vector DB] |
| 75 | + E[User Query] --> F[Semantic Search] |
| 76 | + D --> F |
| 77 | + F --> G[Retrieved Context] |
| 78 | + G --> H[OpenAI GPT] |
| 79 | + H --> I[AI Response] |
| 80 | +``` |
| 81 | + |
| 82 | +1. **Indexing**: CodeRAG monitors your code directory and generates embeddings for Python files |
| 83 | +2. **Storage**: Embeddings are stored in a FAISS vector database with metadata |
| 84 | +3. **Search**: User queries are embedded and matched against the code database |
| 85 | +4. **Generation**: Retrieved code context is sent to GPT models for intelligent responses |
| 86 | + |
| 87 | +## 🛠️ Architecture |
| 88 | + |
| 89 | +``` |
| 90 | +CodeRAG/ |
| 91 | +├── 🧠 coderag/ # Core RAG functionality |
| 92 | +│ ├── config.py # Environment configuration |
| 93 | +│ ├── embeddings.py # OpenAI embedding generation |
| 94 | +│ ├── index.py # FAISS vector operations |
| 95 | +│ ├── search.py # Semantic code search |
| 96 | +│ └── monitor.py # File system monitoring |
| 97 | +├── 🌐 app.py # Streamlit web interface |
| 98 | +├── 🔧 main.py # Backend indexing service |
| 99 | +├── 🔗 prompt_flow.py # RAG pipeline orchestration |
| 100 | +└── 📋 requirements.txt # Dependencies |
| 101 | +``` |
| 102 | + |
| 103 | +### Key Components |
| 104 | + |
| 105 | +- **🔍 Vector Search**: FAISS-powered similarity search for code retrieval |
| 106 | +- **🎯 Smart Embeddings**: OpenAI embeddings capture semantic code meaning |
| 107 | +- **📡 Real-time Updates**: Watchdog monitors file changes for live indexing |
| 108 | +- **💬 Conversational UI**: Streamlit interface with chat-like experience |
| 109 | + |
| 110 | +## 🎪 Usage Examples |
| 111 | + |
| 112 | +### Ask About Your Code |
| 113 | +``` |
| 114 | +"How does the FAISS indexing work in this codebase?" |
| 115 | +"Where is error handling implemented?" |
| 116 | +"Show me examples of the embedding generation process" |
| 117 | +``` |
| 118 | + |
| 119 | +### Get Improvements |
| 120 | +``` |
| 121 | +"How can I optimize the search performance?" |
| 122 | +"What are potential security issues in this code?" |
| 123 | +"Suggest better error handling for the monitor module" |
| 124 | +``` |
| 125 | + |
| 126 | +### Debug Issues |
| 127 | +``` |
| 128 | +"Why might the search return no results?" |
| 129 | +"How do I troubleshoot OpenAI connection issues?" |
| 130 | +"What could cause indexing to fail?" |
| 131 | +``` |
| 132 | + |
| 133 | +## ⚙️ Development |
| 134 | + |
| 135 | +### Code Quality Tools |
| 136 | + |
| 137 | +```bash |
| 138 | +# Install pre-commit hooks |
| 139 | +pip install pre-commit |
| 140 | +pre-commit install |
| 141 | + |
| 142 | +# Run formatting and linting |
| 143 | +black . |
| 144 | +flake8 . |
| 145 | +mypy . |
| 146 | +``` |
| 147 | + |
| 148 | +### Testing |
| 149 | + |
| 150 | +```bash |
| 151 | +# Test FAISS index functionality |
| 152 | +python tests/test_faiss.py |
| 153 | + |
| 154 | +# Test individual components |
| 155 | +python scripts/initialize_index.py |
| 156 | +python scripts/run_monitor.py |
| 157 | +``` |
| 158 | + |
| 159 | +## 🐛 Troubleshooting |
| 160 | + |
| 161 | +### Common Issues |
| 162 | + |
| 163 | +**Search returns no results** |
| 164 | +- Check if indexing completed: look for `coderag_index.faiss` file |
| 165 | +- Verify OpenAI API key is working |
| 166 | +- Ensure your query relates to indexed Python files |
| 167 | + |
| 168 | +**OpenAI API errors** |
| 169 | +- Verify API key in `.env` file |
| 170 | +- Check API usage limits and billing |
| 171 | +- Ensure model names are correct (gpt-4, text-embedding-ada-002) |
| 172 | + |
| 173 | +**File monitoring not working** |
| 174 | +- Check `WATCHED_DIR` path in `.env` |
| 175 | +- Ensure directory contains `.py` files |
| 176 | +- Look for error logs in console output |
| 177 | + |
| 178 | +## 🤝 Contributing |
| 179 | + |
| 180 | +1. Fork the repository |
| 181 | +2. Create a feature branch (`git checkout -b feature/amazing-feature`) |
| 182 | +3. Make your changes with proper error handling and type hints |
| 183 | +4. Run code quality checks (`pre-commit run --all-files`) |
| 184 | +5. Commit your changes (`git commit -m 'Add amazing feature'`) |
| 185 | +6. Push to the branch (`git push origin feature/amazing-feature`) |
| 186 | +7. Open a Pull Request |
| 187 | + |
| 188 | +## 📄 License |
| 189 | + |
| 190 | +This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE-2.0.txt) file for details. |
| 191 | + |
| 192 | +## 🙏 Acknowledgments |
| 193 | + |
| 194 | +- [OpenAI](https://openai.com/) for embedding and chat models |
| 195 | +- [Facebook AI Similarity Search (FAISS)](https://github.com/facebookresearch/faiss) for vector search |
| 196 | +- [Streamlit](https://streamlit.io/) for the web interface |
| 197 | +- [Watchdog](https://github.com/gorakhargosh/watchdog) for file monitoring |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +**⭐ If this project helps you, please give it a star!** |
0 commit comments