Say that this is a full-stack multi-modal Graph RAG system that supports:
- text
- PDFs
- images
and combines:
- vector retrieval using Pinecone
- graph reasoning using NetworkX
- local Ollama generation for grounded answers
Show the Mermaid diagram from the README and explain the pipeline in one line each:
- ingestion API receives files
- modality-specific extraction converts them to retrieval-ready text
- chunks are stored in Pinecone and images are embedded directly with CLIP
- entities and relationships are stored in the knowledge graph
- user query triggers retrieval + graph expansion + answer generation
Upload:
- one
.txtfile - one
.pdf - one
.jpegor.png
While doing that, say:
- text and PDF are parsed directly
- images are directly embedded using a CLIP visual model
- text descriptions are embedded using the same CLIP model, putting everything in the same search space
Then highlight in the UI:
- inventory cards
- file counts
- chunk counts
- entity counts
- graph node and edge totals
Ask a cross-modal question such as:
Find me running shoes that look like this uploaded image, and summarize their reviews.
Then point out:
- the answer panel
- citations
- graph insights
- retrieved evidence cards
Mention that the answer is grounded in retrieved context instead of direct free-form generation.
Remove one uploaded file from the inventory using the Remove File button.
Then show:
- the card disappears
- the graph metrics update
- the removed file no longer affects later answers
This helps demonstrate lifecycle management, not just ingestion.
Briefly explain:
Pineconewas chosen for serverless cloud vector storage to save local RAMNetworkXwas chosen for lightweight graph reasoning without extra operational overheadqwen2:0.5bwas chosen for lightweight local answer generationsentence-transformers/clip-ViT-B-32was chosen to demonstrate true cross-modal vector embedding- local Ollama was used to avoid cloud quota issues during live demo
Finish by connecting the design to the literature survey:
- modern agentic systems rely on memory, planning, and tool orchestration
- this project applies that idea through vector retrieval, graph memory, and orchestrated grounded generation