A CLI tool that lets you chat with any GitHub repository using Retrieval-Augmented Generation (RAG). Load a codebase, ask questions in natural language, and get context-aware answers.
GitHub Repo → Load Files → Chunk Code → Generate Embeddings → Store in ChromaDB
- Load — Fetches all files from a GitHub repo using LangChain's
GithubFileLoader, filtering out binaries, images, and folders likenode_modules/,.git/,venv/ - Chunk — Splits code using LangChain's
RecursiveCharacterTextSplitterwith language-aware splitting (supports 25+ languages) - Embed — Generates vector embeddings using Sentence Transformers'
all-MiniLM-L6-v2model - Store — Persists embeddings in a local ChromaDB vector database with cosine similarity indexing
User Query → Generate Query Embedding → Cosine Similarity Search → Retrieve Top-K Chunks → LLM Generates Answer
- Embed Query — Converts your question into vector using the same Sentence Transformers model
- Retrieve — ChromaDB performs cosine similarity search and returns the top-K most relevant code chunks with file path metadata
- Answer — Sends retrieved code context (with file paths) + your question to llm
llama-3.3-70b-versatile
- Python: 3.12
- Environment manager: Conda (recommended)
- Virtual environment: venv
You'll need two API keys:
| Key | Where to get it |
|---|---|
| GitHub Personal Access Token | github.com/settings/tokens — create a fine-grained token with content:read permission |
| Groq API Key | console.groq.com/keys — sign up and generate an API key |
git clone https://github.com/AnmolTutejaGitHub/RepoRAGX.git
cd RepoRAGXUsing Conda (recommended):
conda create -p venv python==3.12
conda activate venv/pip install -r requirements.txtpython -m src.mainYou'll be prompted for:
GitHub Personal Access Token: ********
Groq API Key: ********
Repo (owner/repo): AnmolTutejaGitHub/RepoRAGX
Branch (default: main): main
Once the repo is loaded and indexed, you can start chatting:
Ask anything ('exit' to quit): Where is authentication implemented?
> Authentication is implemented in server/controllers/authController.js ...
Ask anything ('exit' to quit): exit
RepoRAGX/
├── src/
│ ├── main.py # CLI entry point
│ └── rag/
│ ├── github_codebase_loader.py # Fetches repo files from GitHub
│ ├── text_splitter.py # Language-aware document chunking
│ ├── embedding_manager.py # Sentence Transformer embeddings
│ ├── vector_store.py # ChromaDB vector storage
│ ├── rag_retriever.py # Similarity search & retrieval
│ └── groq_llm.py # LLM integration
├── requirements.txt
├── .env.example
└── README.md
