Skip to content

Commit 43a7d72

Browse files
committed
ReadMe
1 parent 20c9463 commit 43a7d72

1 file changed

Lines changed: 122 additions & 0 deletions

File tree

mistral7b/ReadMe.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Local RAG Implementation and Evaluation with Mistral 7B
2+
3+
## 1. 🎯 Project Goal
4+
5+
This project implements and evaluates a **Retrieval-Augmented Generation (RAG)**
6+
pipeline using **Mistral 7B**, a powerful, open-source Large Language Model (LLM).
7+
8+
The primary goal was to test the feasibility, accuracy, and environmental cost
9+
of running an open-source model locally on consumer hardware (a laptop) as
10+
a viable alternative to paid, commercial API-based models.
11+
12+
---
13+
14+
## 2. 🤖 About the Model: Mistral 7B
15+
16+
### Why Mistral 7B Was Chosen
17+
18+
Mistral 7B was selected as the core model for this project for several key reasons:
19+
20+
* **Performance vs. Size:** It is famous for outperforming much larger models
21+
(like Llama 2 13B):
22+
* on a wide range of benchmarks, offering state-of-the-art
23+
performance in a small package.
24+
* **Local Feasibility:** Its 7-billion-parameter size, especially when **quantized**
25+
* (shrunk) into a format like `.gguf`, is small enough to run effectively on
26+
consumer-grade hardware.
27+
This was essential for the project's goal of local, laptop-based execution.
28+
* **Open-Source & Privacy:** As a fully open-source model, it can be run 100%
29+
offline. This ensures complete data privacy and eliminates API fees.
30+
* **Excellent Community Support:** Mistral 7B is well-supported by key RAG frameworks
31+
, including LlamaIndex and LangChain, and can be run efficiently using `llama-cpp-python`.
32+
33+
### Development & Key Facts
34+
35+
Mistral 7B was developed by **Mistral AI**, a Paris-based AI startup, and released
36+
in September 2023.
37+
38+
* **Core Goal:** Mistral AI's goal was to prove that **efficiency** was a more
39+
important frontier than just model size. They demonstrated that a smaller,
40+
exceptionally well-trained model could outperform larger competitors.
41+
* **Key Innovation (GQA):** It uses **Grouped-Query Attention (GQA)** for faster
42+
inference. This allows the model to process information and generate responses
43+
much more quickly and with a smaller memory footprint (KV cache) than models
44+
using standard Multi-Head Attention.
45+
* **Key Innovation (SWA):** It also employs **Sliding Window Attention (SWA)**.
46+
* This mechanism allows the model to handle very long sequences
47+
(a 32k token context window) without the typical massive computation and memory
48+
costs.
49+
* **License:** It was released under the **Apache 2.0 license**, a very
50+
permissive open-source license that allows for almost unrestricted use and
51+
modification by the community.
52+
53+
---
54+
55+
## 3. 🛠️ Methodology: Retrieval-Augmented Generation (RAG)
56+
57+
To answer questions about specific documents (like PDFs or text files) that the
58+
model wasn't trained on, we must augment its knowledge. We implemented
59+
a RAG pipeline, which works in two main stages:
60+
61+
1. **Retrieval:** When a user asks a question, the pipeline first converts the
62+
question into a numerical representation (an embedding). It uses this to search
63+
a pre-built index of the source documents and "retrieves" the most relevant
64+
chunks of text.
65+
2. **Generation:** The retrieved text chunks (the "context") are then combined
66+
with the original question into a single, comprehensive prompt. This prompt
67+
is fed to Mistral 7B, which generates a final answer based *only* on the
68+
provided context.
69+
70+
### Implementation Details
71+
72+
* **Framework:** `llama-index`
73+
* **Model Loader:** `llama-cpp-python` (To run the GGUF model)
74+
* **Embedding Model:** `BAAI/bge-small-en-v1.5` (To create the vector index of documents)
75+
* **LLM:** `mistral-7b-instruct-v0.2.Q4_K_M.gguf` (A 4-bit quantized version of
76+
* the model, ideal for balancing performance and resource use)
77+
78+
---
79+
80+
## 4. 📊 Project Results: Prompts & Responses
81+
82+
Below is a sample of the prompts given to the RAG pipeline and the corresponding
83+
answers generated by the local Mistral 7B model.
84+
85+
### Example 1: [Your Question Here]
86+
87+
* **Prompt:**
88+
> [**PASTE YOUR QUESTION HERE**]
89+
90+
* **Generated Response:**
91+
> [**PASTE THE MODEL'S RESPONSE HERE**]
92+
93+
* **My Analysis:**
94+
> [Add your notes]
95+
96+
---
97+
98+
### Example 2: [Your Question Here]
99+
100+
* **Prompt:**
101+
> [**PASTE YOUR QUESTION HERE**]
102+
103+
* **Generated Response:**
104+
> [**PASTE THE MODEL'S RESPONSE HERE**]
105+
106+
* **My Analysis:**
107+
> [Add your notes here...]
108+
109+
---
110+
111+
### Example 3: [Your Question Here]
112+
113+
* **Prompt:**
114+
> [**PASTE YOUR QUESTION HERE**]
115+
116+
* **Generated Response:**
117+
> [**PASTE THE MODEL'S RESPONSE HERE**]
118+
119+
* **My Analysis:**
120+
> [Add your notes here...]
121+
122+
---

0 commit comments

Comments
 (0)