|
| 1 | +# Local RAG Implementation and Evaluation with Mistral 7B |
| 2 | + |
| 3 | +## 1. 🎯 Project Goal |
| 4 | + |
| 5 | +This project implements and evaluates a **Retrieval-Augmented Generation (RAG)** |
| 6 | + pipeline using **Mistral 7B**, a powerful, open-source Large Language Model (LLM). |
| 7 | + |
| 8 | +The primary goal was to test the feasibility, accuracy, and environmental cost |
| 9 | +of running an open-source model locally on consumer hardware (a laptop) as |
| 10 | + a viable alternative to paid, commercial API-based models. |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## 2. 🤖 About the Model: Mistral 7B |
| 15 | + |
| 16 | +### Why Mistral 7B Was Chosen |
| 17 | + |
| 18 | +Mistral 7B was selected as the core model for this project for several key reasons: |
| 19 | + |
| 20 | +* **Performance vs. Size:** It is famous for outperforming much larger models |
| 21 | +(like Llama 2 13B) |
| 22 | +* on a wide range of benchmarks, offering state-of-the-art |
| 23 | + performance in a small package. |
| 24 | +* **Local Feasibility:** Its 7-billion-parameter size, especially when **quantized** |
| 25 | +* (shrunk) into a format like `.gguf`, is small enough to run effectively on |
| 26 | + consumer-grade hardware. |
| 27 | + This was essential for the project's goal of local, laptop-based execution. |
| 28 | +* **Open-Source & Privacy:** As a fully open-source model, it can be run 100% |
| 29 | + offline. This ensures complete data privacy and eliminates API fees. |
| 30 | +* **Excellent Community Support:** Mistral 7B is well-supported by key RAG frameworks |
| 31 | + , including LlamaIndex and LangChain, and can be run efficiently using `llama-cpp-python`. |
| 32 | + |
| 33 | +### Development & Key Facts |
| 34 | + |
| 35 | +Mistral 7B was developed by **Mistral AI**, a Paris-based AI startup, and released |
| 36 | + in September 2023. |
| 37 | + |
| 38 | +* **Core Goal:** Mistral AI's goal was to prove that **efficiency** was a more |
| 39 | + important frontier than just model size. They demonstrated that a smaller, |
| 40 | + exceptionally well-trained model could outperform larger competitors. |
| 41 | +* **Key Innovation (GQA):** It uses **Grouped-Query Attention (GQA)** for faster |
| 42 | + inference. This allows the model to process information and generate responses |
| 43 | + much more quickly and with a smaller memory footprint (KV cache) than models |
| 44 | + using standard Multi-Head Attention. |
| 45 | +* **Key Innovation (SWA):** It also employs **Sliding Window Attention (SWA)**. |
| 46 | +* This mechanism allows the model to handle very long sequences |
| 47 | + (a 32k token context window) without the typical massive computation and memory |
| 48 | + costs. |
| 49 | +* **License:** It was released under the **Apache 2.0 license**, a very |
| 50 | + permissive open-source license that allows for almost unrestricted use and |
| 51 | + modification by the community. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +## 3. 🛠️ Methodology: Retrieval-Augmented Generation (RAG) |
| 56 | + |
| 57 | +To answer questions about specific documents (like PDFs or text files) that the |
| 58 | + model wasn't trained on, we must augment its knowledge. We implemented |
| 59 | + a RAG pipeline, which works in two main stages: |
| 60 | + |
| 61 | +1. **Retrieval:** When a user asks a question, the pipeline first converts the |
| 62 | +question into a numerical representation (an embedding). It uses this to search |
| 63 | +a pre-built index of the source documents and "retrieves" the most relevant |
| 64 | +chunks of text. |
| 65 | +2. **Generation:** The retrieved text chunks (the "context") are then combined |
| 66 | +with the original question into a single, comprehensive prompt. This prompt |
| 67 | +is fed to Mistral 7B, which generates a final answer based *only* on the |
| 68 | +provided context. |
| 69 | + |
| 70 | +### Implementation Details |
| 71 | + |
| 72 | +* **Framework:** `llama-index` |
| 73 | +* **Model Loader:** `llama-cpp-python` (To run the GGUF model) |
| 74 | +* **Embedding Model:** `BAAI/bge-small-en-v1.5` (To create the vector index of documents) |
| 75 | +* **LLM:** `mistral-7b-instruct-v0.2.Q4_K_M.gguf` (A 4-bit quantized version of |
| 76 | +* the model, ideal for balancing performance and resource use) |
| 77 | + |
| 78 | +--- |
| 79 | + |
| 80 | +## 4. 📊 Project Results: Prompts & Responses |
| 81 | + |
| 82 | +Below is a sample of the prompts given to the RAG pipeline and the corresponding |
| 83 | + answers generated by the local Mistral 7B model. |
| 84 | + |
| 85 | +### Example 1: [Your Question Here] |
| 86 | + |
| 87 | +* **Prompt:** |
| 88 | + > [**PASTE YOUR QUESTION HERE**] |
| 89 | +
|
| 90 | +* **Generated Response:** |
| 91 | + > [**PASTE THE MODEL'S RESPONSE HERE**] |
| 92 | +
|
| 93 | +* **My Analysis:** |
| 94 | + > [Add your notes] |
| 95 | +
|
| 96 | +--- |
| 97 | + |
| 98 | +### Example 2: [Your Question Here] |
| 99 | + |
| 100 | +* **Prompt:** |
| 101 | + > [**PASTE YOUR QUESTION HERE**] |
| 102 | +
|
| 103 | +* **Generated Response:** |
| 104 | + > [**PASTE THE MODEL'S RESPONSE HERE**] |
| 105 | +
|
| 106 | +* **My Analysis:** |
| 107 | + > [Add your notes here...] |
| 108 | +
|
| 109 | +--- |
| 110 | + |
| 111 | +### Example 3: [Your Question Here] |
| 112 | + |
| 113 | +* **Prompt:** |
| 114 | + > [**PASTE YOUR QUESTION HERE**] |
| 115 | +
|
| 116 | +* **Generated Response:** |
| 117 | + > [**PASTE THE MODEL'S RESPONSE HERE**] |
| 118 | +
|
| 119 | +* **My Analysis:** |
| 120 | + > [Add your notes here...] |
| 121 | +
|
| 122 | +--- |
0 commit comments