A production-grade Retrieval-Augmented Generation (RAG) system built with LangGraph, featuring:
- DeepSeek OCR via vLLM for document text extraction
- LangGraph for agentic workflow orchestration with validation loops
- ChromaDB for persistent vector storage
- Qwen3-Embedding-0.6B for embeddings and generation
- Contextual Chunking based on Anthropic's research (49% retrieval improvement) (support included not tried so might have errors)
- Semantic Chunking 768 tokens and 50 as sliding window
- Streamlit web interface for document upload and chat
┌─────────────────────────────────────────────────────────────────┐
│ Streamlit UI │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Upload Page │ │ Chat Page │ │ Manage Page │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
└───────────┼─────────────────────┼─────────────────────┼──────────┘
│ │ │
▼ ▼ ▼
┌───────────────────────┐ ┌──────────────────────────────────────────────┐
│ Ingestion Pipeline │ │ LangGraph Workflow │
│ ┌─────────────────┐ │ │ ┌────────┐ ┌─────────┐ ┌─────────┐ ┌──────┐ │
│ │ PDF Processor │ │ │ │Query │→│Retriever│→│Generator│→│Valid.│ │
│ └────────┬────────┘ │ │ │Rewriter│ └─────────┘ └────┬────┘ └──┬───┘ │
│ ▼ │ │ └────────┘ │ │ │
│ ┌─────────────────┐ │ │ ◄─────────┘ │
│ │ DeepSeek OCR │ │ │ (retry if invalid) │
│ └────────┬────────┘ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌─────────────────┐ │ │ ┌─────────┐ │
│ │ Text Cleaner │ │ │ │Response │ │
│ └────────┬────────┘ │ │ └─────────┘ │
│ ▼ │ └──────────────────────────────────────────────┘
│ ┌─────────────────┐ │ │
│ │Contextual Chunk │ │ │
│ └────────┬────────┘ │ │
└───────────┼───────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────────────┐
│ ChromaDB Vector Store │
└─────────────────────────────────────────────┘
assignment-langraph/
├── pyproject.toml # Single config with all dependencies
├── .env.example # Environment template
├── data/ # Sample PDFs
│
├── common/ # Shared utilities
│ ├── config.py # Pydantic settings
│ └── logging.py # structlog configuration
│
├── ingestion/ # Document ingestion pipeline
│ ├── ocr/ # DeepSeek OCR client
│ ├── processor/ # PDF, cleaner, chunker
│ ├── vectorstore/ # ChromaDB
│ ├── pipeline.py # Main orchestrator
│ └── cli.py # CLI interface
│
├── agents/ # LangGraph RAG workflow
│ ├── nodes/ # Query Rewriter, Retriever, Generator, Validator, Response
│ ├── state.py # Shared state schema (includes chat_history)
│ ├── graph.py # LangGraph workflow
│ └── chat.py # Chat session with conversation history
│
├── app/ # Streamlit UI
│ ├── main.py # Entry point
│ └── pages/ # Upload, Chat pages
│
└── tests/
-
Python 3.11+
-
UV package manager: https://docs.astral.sh/uv/
-
DeepSeek OCR vLLM server (for OCR):
The project includes a Docker-based infrastructure setup for running DeepSeek OCR locally.
Quick start:
cd infrastructure # Download model weights (~35GB) mkdir -p models huggingface-cli download deepseek-ai/DeepSeek-OCR --local-dir models/deepseek-ai/DeepSeek-OCR # Build and run docker-compose up -d
See infrastructure/README.md for full setup instructions, prerequisites, and troubleshooting.
-
Clone and navigate:
git clone <repository-url> cd assignment-langraph
-
Create environment file:
cp .env.example .env # Edit .env with your API keys -
Install dependencies:
uv sync
Ingest a document:
uv run ingestion ingest data/document.pdf
# With page range:
uv run ingestion ingest data/document.pdf --start-page 1 --end-page 10
List ingested documents:
uv run ingestion listSearch documents:
uv run ingestion search "What is the main topic?"Delete a document:
uv run ingestion delete <document_id>uv run streamlit run app/main.pyThen open http://localhost:8501 in your browser.
from ingestion import IngestionPipeline
from agents import ChatSession
# Ingest a document
pipeline = IngestionPipeline()
result = pipeline.ingest("document.pdf", start_page=1, end_page=10)
print(f"Ingested {result['chunks_created']} chunks")
# Chat with documents
session = ChatSession()
response = session.chat("What is the main topic of the document?")
print(response.content)The RAG workflow follows this pattern with query contextualization and a retry loop for validation:
START → Query Rewriter → Retriever → Generator → Validator
│
[is_valid?]
/ \
Yes / \ No (retry < 3)
↓ ↓
Response ←── Generator (with feedback)
↓
END
Agents:
- Query Rewriter: Contextualizes queries using conversation history (e.g., "tell me more about it" → "tell me more about machine learning")
- Retriever: Fetches top-k relevant chunks from ChromaDB using the contextualized query
- Generator: Generates answer using GPT-4o-mini with retrieved context
- Validator: Checks for hallucinations using structured output
- Response: Formats final response with sources
Conversation Context:
- The system maintains chat history across messages in a session
- The Query Rewriter resolves references like "it", "that", "the previous topic" using conversation context
- Original query is preserved separately from the contextualized query for logging/debugging
Self-host or use cloud for tracing:
# .env
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
LANGFUSE_HOST=https://cloud.langfuse.comJSON logging for production:
# .env
LOG_FORMAT=json
LOG_LEVEL=INFOAll settings via environment variables or .env:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | OpenAI API key |
DEEPSEEK_OCR_URL |
http://localhost:8000/ |
vLLM server URL |
CHROMA_PERSIST_DIR |
./chroma_db |
Vector store path |
CHUNK_SIZE |
512 |
Tokens per chunk |
CHUNK_OVERLAP |
50 |
Overlap tokens |
CHUNKING_STRATEGY |
contextual |
semantic or contextual |
RETRIEVAL_TOP_K |
5 |
Chunks to retrieve |
MAX_RETRY_COUNT |
3 |
Max validation retries |
The data/ directory contains sample PDFs for testing:
- handwritten.pdf - Handwritten notes demonstrating OCR capabilities
- normal.pdf - Historical text about ancient India