- Docker installed (Download Docker)
- Groq API Key (Free – Get yours here)
-
Clone the repository
git clone <repository-url> cd multimodalrag
-
Configure API Key
cp .env.example .env # Edit the .env file and replace "your_groq_api_key_here" with your actual key -
Launch the stack
docker-compose up -d
-
Open the application
- Navigate to: http://localhost:8501
- The app will launch automatically
- Use the left sidebar to upload a PDF
- The system will automatically process text, images, and tables
Try the following prompts:
General Queries:
- "What is this document about?"
- "Summarize the key points"
Image-Based Queries:
- "Show the images present in the document"
- "What do the figures show?"
Table Queries:
- "What data is in the tables?"
- "Show me numerical results"
Multimodal Queries:
- "Combine insights from text and images"
- PDF Upload – Automatic processing pipeline
- Multimodal Extraction – Text, images, tables
- Semantic Retrieval – Context-aware answers
- Source References – Transparent outputs
- Modern UI – Clean and intuitive UX
- AI Vision – Image captioning with BLIP
- OCR Support – Text extraction from images
- Object Detection – YOLO-based visual tagging
- Vector Database – Semantic similarity via Qdrant
- LLM Integration – Answer generation with Groq API
# Check if Qdrant is running
docker ps | grep qdrant
# If not running, restart services
docker-compose down
docker-compose up -d# Check if the API key is correct in your .env file
cat .env | grep GROQ_API_KEY
# The key must start with "gsk_"# Change port in docker-compose.yml
# Find the line "8501:8501" and update to "8502:8501"
# Then access the app via http://localhost:8502If you experience issues:
-
Check logs:
docker-compose logs -f multimodal-rag
-
Full reset:
docker-compose down -v docker-compose up -d
-
Alternative local setup:
pip install -r requirements.txt docker run -d -p 6333:6333 qdrant/qdrant streamlit run streamlit_app/Home.py