A highly optimized Retrieval-Augmented Generation (RAG) system designed to ingest technical documentation (PDFs, Markdown) and provide exact, cited answers to user queries.
This project demonstrates a production-ready enterprise AI pattern by implementing a Hybrid Search approach combined with Cross-Encoder Re-ranking, ensuring high-precision context retrieval.
- Hybrid Search: Combines Dense Vector Search (semantic similarity via ChromaDB +
sentence-transformers) with Sparse Keyword Search (exact match viarank_bm25) using Reciprocal Rank Fusion (RRF). - Cross-Encoder Re-ranking: Passes the fused results through a Cross-Encoder (
ms-marco-MiniLM-L-6-v2) to achieve maximum relevance before LLM generation. - Intelligent Chunking: Uses LangChain's token-aware chunking for optimal context preservation.
- FastAPI Backend: Asynchronous, high-performance API for ingestion and querying.
- Premium Vite + React Frontend: A modern, glassmorphic UI for uploading files and viewing retrieved contexts.
- Ingestion: Documents are parsed, chunked, and simultaneously embedded into ChromaDB and indexed into a BM25 store.
- Retrieval: A query is executed against both ChromaDB and BM25.
- Fusion & Re-ranking: Results are fused and re-scored.
- Generation: The top contexts are passed to an LLM interface (mocked in this implementation, ready to be connected to OpenAI, Gemini, etc.).
Open a terminal and create the conda environment:
# Create and activate the conda environment
conda create -y -n rag-engine python=3.11
conda activate rag-engine
# Install Python dependencies
pip install -r requirements.txtOpen a new terminal window:
# Navigate to the frontend directory
cd frontend
# Install Node dependencies
npm installEnsure your conda environment is activated:
conda activate rag-engine
uvicorn app.main:app --reloadThe FastAPI server will start on http://localhost:8000. You can view the API documentation at http://localhost:8000/docs.
In your frontend terminal:
cd frontend
npm run devThe Vite React app will start on http://localhost:5173. Open this URL in your browser to interact with the RAG Engine!
- Open the web interface.
- Upload a PDF or Markdown file containing technical documentation.
- Once processed, use the search bar to ask a question.
- View the synthesized answer and the exact document chunks (with their cross-encoder relevance scores) that were used to generate it.