Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 2.78 KB

File metadata and controls

73 lines (55 loc) · 2.78 KB

RAG Engine

A highly optimized Retrieval-Augmented Generation (RAG) system designed to ingest technical documentation (PDFs, Markdown) and provide exact, cited answers to user queries.

This project demonstrates a production-ready enterprise AI pattern by implementing a Hybrid Search approach combined with Cross-Encoder Re-ranking, ensuring high-precision context retrieval.

Key Features

  • Hybrid Search: Combines Dense Vector Search (semantic similarity via ChromaDB + sentence-transformers) with Sparse Keyword Search (exact match via rank_bm25) using Reciprocal Rank Fusion (RRF).
  • Cross-Encoder Re-ranking: Passes the fused results through a Cross-Encoder (ms-marco-MiniLM-L-6-v2) to achieve maximum relevance before LLM generation.
  • Intelligent Chunking: Uses LangChain's token-aware chunking for optimal context preservation.
  • FastAPI Backend: Asynchronous, high-performance API for ingestion and querying.
  • Premium Vite + React Frontend: A modern, glassmorphic UI for uploading files and viewing retrieved contexts.

Architecture

  1. Ingestion: Documents are parsed, chunked, and simultaneously embedded into ChromaDB and indexed into a BM25 store.
  2. Retrieval: A query is executed against both ChromaDB and BM25.
  3. Fusion & Re-ranking: Results are fused and re-scored.
  4. Generation: The top contexts are passed to an LLM interface (mocked in this implementation, ready to be connected to OpenAI, Gemini, etc.).

Setup & Installation

Prerequisites

1. Backend Setup

Open a terminal and create the conda environment:

# Create and activate the conda environment
conda create -y -n rag-engine python=3.11
conda activate rag-engine

# Install Python dependencies
pip install -r requirements.txt

2. Frontend Setup

Open a new terminal window:

# Navigate to the frontend directory
cd frontend

# Install Node dependencies
npm install

Running the Application

Start the Backend

Ensure your conda environment is activated:

conda activate rag-engine
uvicorn app.main:app --reload

The FastAPI server will start on http://localhost:8000. You can view the API documentation at http://localhost:8000/docs.

Start the Frontend

In your frontend terminal:

cd frontend
npm run dev

The Vite React app will start on http://localhost:5173. Open this URL in your browser to interact with the RAG Engine!

Usage

  1. Open the web interface.
  2. Upload a PDF or Markdown file containing technical documentation.
  3. Once processed, use the search bar to ask a question.
  4. View the synthesized answer and the exact document chunks (with their cross-encoder relevance scores) that were used to generate it.