Merge pull request #16 from MIT-Emerging-Talent/rag+slm

AseelOmer · web-flow · commit 0da326762379 · 2025-12-06T21:11:13.000+02:00
Milestone 2: adding rag+slm study
diff --git a/0_domain_study/rag_slm_study.md b/0_domain_study/rag_slm_study.md
@@ -0,0 +1,54 @@
+# RAG + SLM: Efficient Knowledge-Augmented Reasoning
+
+RAG + SLM = A lightweight model that “thinks” efficiently while “reading”
+external, up-to-date knowledge bases.
+
+---
+
+## Overview
+
+A simple chart was created to visualize and clarify how
+**Retrieval-Augmented Generation (RAG)** operates when integrated with
+**Small Language Models (SLMs)**.
+
+![RAG + SLM Concept Chart](https://i.postimg.cc/136czVFw/img77.jpg)
+
+Through this combination, the model is enabled to remain lightweight while
+efficiently accessing external knowledge this is an ideal configuration for setups
+requiring local execution, low latency, and minimal computational cost 'our case'.
+
+---
+
+## Reference
+
+A comprehensive guide provided by Hugging Face was used to understand the
+concept and its implementation:  
+[Make Your Own RAG](https://huggingface.co/blog/ngxson/make-your-own-rag)
+
+---
+
+## Implementation
+
+The Hugging Face example was explored, and the code was adapted for testing on
+Google Colab.  
+The notebook can be accessed here:  
+[Colab Notebook](https://colab.research.google.com/drive/1b3U2QI1NiYe67dCcxuur9vN2Q_HiHAn0#scrollTo=f9wbzle2ENt1)
+
+---
+
+## Key Takeaways
+
+- **RAG (Retrieval-Augmented Generation)** integrates:
+  - a retriever → used for fetching relevant context or knowledge chunks  
+  - a language model → employed for generating grounded and
+    context-aware answers
+
+- **SLMs (Small Language Models)** were shown to perform RAG effectively when:
+  - coupled with high-quality embeddings
+  - guided by well-engineered prompts
+
+- The **embedding model** was found to be crucial for retrieval quality.  
+- **Prompt engineering** was identified as a key factor for improved grounding
+  and coherence.  
+- The use of **GPU acceleration** in Colab was recommended for faster
+  performance.