Chat with your PDF documents using AI — answers grounded in your documents, with exact source citations.
RAG (Retrieval-Augmented Generation) is a technique that gives a language model access to your own documents at query time. When you ask a question, the system first searches your document library for the most relevant passages, then passes those passages to the LLM as context. This means the AI answers from your content, not from its training data — eliminating hallucinations and providing citable sources.
- Multi-PDF upload — upload one or several PDFs at once
- Local embeddings —
all-MiniLM-L6-v2runs entirely on CPU, no API cost - FAISS vector search — blazing-fast similarity search, fully in-memory
- Groq LLM — free-tier
llama3-8b-8192inference, sub-second latency - Source citations — every answer shows the exact file and page it came from
- Conversation memory — multi-turn chat; follow-up questions work naturally
- Strict grounding — the model is instructed to say "I cannot find this" rather than guess
- Go to console.groq.com
- Sign up (no credit card required)
- Navigate to API Keys → Create API Key
- Copy the key — it starts with
gsk_
pip install -r requirements.txtFirst run downloads the
all-MiniLM-L6-v2model (~90 MB). Subsequent runs use the cache.
python generate_samples.pystreamlit run app.pyOpen http://localhost:8501 in your browser.
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Upload PDF │───▶│ Chunk & Embed │───▶│ FAISS Vector │───▶│ Groq LLM │
│ Documents │ │ (MiniLM-L6-v2) │ │ Store (local) │ │ (llama3-8b) │
└─────────────┘ └──────────────────┘ └────────┬────────┘ └──────┬───────┘
│ │
┌──────▼──────┐ │
│ Top-4 most │ │
│ relevant │───────────▶│
│ chunks │ context │
└─────────────┘ │
┌──────▼───────┐
│ Answer + │
│ Citations │
└──────────────┘
Upload the sample PDFs in
sample_docs/and try asking:
- "What are the three types of AI?"
- "How is a decision tree different from a neural network?"
- "When was the term Artificial Intelligence coined?"
rag-doc-chatbot/
├── app.py # Streamlit UI — chat interface, sidebar, session state
├── rag_chain.py # LangChain RAG pipeline — chain + source formatting
├── vectorstore.py # PDF loading, chunking, FAISS embedding & retrieval
├── generate_samples.py # One-time script to create demo PDFs
├── requirements.txt # Pinned dependencies
├── .env.example # Environment variable template
└── sample_docs/
├── ai_overview.pdf
└── ml_concepts.pdf
Groq's LPU (Language Processing Unit) hardware delivers 10× faster inference than GPU-based providers. The free tier gives you ~14,400 requests/day with llama3-8b-8192 — more than enough for development and demos — with no credit card required.
MIT © 2024