Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions 0_domain_study/rag_slm_study.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# RAG + SLM: Efficient Knowledge-Augmented Reasoning

RAG + SLM = A lightweight model that “thinks” efficiently while “reading”
external, up-to-date knowledge bases.

---

## Overview

A simple chart was created to visualize and clarify how
**Retrieval-Augmented Generation (RAG)** operates when integrated with
**Small Language Models (SLMs)**.

![RAG + SLM Concept Chart](https://i.postimg.cc/136czVFw/img77.jpg)

Through this combination, the model is enabled to remain lightweight while
efficiently accessing external knowledge this is an ideal configuration for setups
requiring local execution, low latency, and minimal computational cost 'our case'.

---

## Reference

A comprehensive guide provided by Hugging Face was used to understand the
concept and its implementation:
[Make Your Own RAG](https://huggingface.co/blog/ngxson/make-your-own-rag)

---

## Implementation

The Hugging Face example was explored, and the code was adapted for testing on
Google Colab.
The notebook can be accessed here:
[Colab Notebook](https://colab.research.google.com/drive/1b3U2QI1NiYe67dCcxuur9vN2Q_HiHAn0#scrollTo=f9wbzle2ENt1)

---

## Key Takeaways

- **RAG (Retrieval-Augmented Generation)** integrates:
- a retriever → used for fetching relevant context or knowledge chunks
- a language model → employed for generating grounded and
context-aware answers

- **SLMs (Small Language Models)** were shown to perform RAG effectively when:
- coupled with high-quality embeddings
- guided by well-engineered prompts

- The **embedding model** was found to be crucial for retrieval quality.
- **Prompt engineering** was identified as a key factor for improved grounding
and coherence.
- The use of **GPU acceleration** in Colab was recommended for faster
performance.
Loading