|
| 1 | +# Hybrid RAG with Critic–Refiner Workflow (Qwen2.5 + LAmini) |
| 2 | + |
| 3 | +## 1. 🎯Goal |
| 4 | + |
| 5 | +This project implements a **Retrieval-Augmented Generation (RAG)** pipeline enhanced |
| 6 | + with a **dual-stage Critic–Refiner architecture**. |
| 7 | + |
| 8 | +The main objective was to create a **highly accurate, context-grounded, and reliable |
| 9 | + question-answering system**, combining: |
| 10 | + |
| 11 | +- **Qwen2.5-7B-Instruct** (cloud-based Critic) |
| 12 | +- **LAmini (local GGUF model)** (Refiner) |
| 13 | +- **LlamaIndex** (retrieval engine) |
| 14 | + |
| 15 | +The system rigorously evaluates draft answers using a critic model, detects |
| 16 | +factual errors or missing context, and then rewrites them using a local refiner |
| 17 | + model. |
| 18 | +This produces answers that are **trustworthy**, **grounded**, and **fully derived |
| 19 | + from source documents**. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## 2. 🤖 About the Models Used |
| 24 | + |
| 25 | +### 2.1 Qwen2.5-7B-Instruct (Critic Model) |
| 26 | + |
| 27 | +Qwen2.5-7B is a powerful instruction-tuned LLM developed by Alibaba Cloud. |
| 28 | +It was chosen as the **Critic** for these reasons: |
| 29 | + |
| 30 | +- **High factual reliability:** Qwen models consistently score high in truthfulness |
| 31 | +- and instruction-following benchmarks. |
| 32 | +- **Ideal for evaluation:** As a cloud-based model on Hugging Face Inference API, |
| 33 | +- it is fast, stable, and accurate. |
| 34 | +- **Excellent reasoning capabilities:** Perfect for evaluating alignment between |
| 35 | +- retrieved context and generated draft answers. |
| 36 | + |
| 37 | +### 2.2 LAmini (Local Refiner Model) |
| 38 | + |
| 39 | +LAmini is a compact, efficient, open-source model designed for rewriting and |
| 40 | +stylistic refinement. |
| 41 | +It was selected as the **Refiner** because: |
| 42 | + |
| 43 | +- **Small and fast:** Runs comfortably on consumer hardware in `.gguf` format. |
| 44 | +- **Excellent at rewriting:** Ideal for polishing or correcting drafts based on |
| 45 | +- reviewer feedback. |
| 46 | +- **Local privacy:** No online requests; all refinement happens locally. |
| 47 | +- **Lightweight:** Fits the project's goal of low-cost, local execution. |
| 48 | + |
| 49 | +### 2.3 Why a Critic–Refiner System? |
| 50 | + |
| 51 | +This architecture ensures: |
| 52 | + |
| 53 | +- The **Critic** checks for correctness, consistency, and missing facts. |
| 54 | +- The **Refiner** rewrites only the necessary corrections. |
| 55 | +- The workflow minimizes hallucinations and guarantees source-grounded answers. |
| 56 | + |
| 57 | +This structure is heavily inspired by **self-correcting LLM systems** and |
| 58 | + **Human-in-the-Loop editorial workflows**, but automated. |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## 3. 🛠️ Methodology: Retrieval-Augmented Generation (RAG) |
| 63 | + |
| 64 | +To answer questions based on documents not included in the LLM’s training data, |
| 65 | + RAG augments the model’s knowledge using retrieval. |
| 66 | + |
| 67 | +The pipeline works as follows: |
| 68 | + |
| 69 | +1. **Retrieval:** |
| 70 | + User question → Convert to embedding → Search vector index → Retrieve relevant |
| 71 | + text chunks. |
| 72 | + |
| 73 | +2. **Draft Generation:** |
| 74 | + The retrieved context + question are used to generate a **draft answer**. |
| 75 | + |
| 76 | +3. **Critic Evaluation (Qwen2.5):** |
| 77 | + The critic compares the draft answer against the retrieved context and returns: |
| 78 | + - `[OK]` — Draft is accurate |
| 79 | + - `[REVISE]` — Draft contains errors/missing info |
| 80 | + - plus a bulleted list of required corrections. |
| 81 | + |
| 82 | +4. **Refinement (LAmini):** |
| 83 | + LAmini rewrites the draft based **only on the critic’s feedback**, producing |
| 84 | + the final polished answer. |
| 85 | + |
| 86 | +This ensures accuracy and consistency with the source documents. |
| 87 | + |
| 88 | +### Implementation Details |
| 89 | + |
| 90 | +- **Framework:** `LlamaIndex` |
| 91 | +- **Local Model Loader:** `llama-cpp-python` |
| 92 | +- **Embedding Model:** `HuggingFaceEmbedding` (e.g., BAAI/bge-small) |
| 93 | +- **Critic Model:** `Qwen/Qwen2.5-7B-Instruct` via HuggingFace Inference API |
| 94 | +- **Refiner Model:** `LAmini-Chat` in `.gguf` format |
| 95 | +- **Energy Tracking:** CodeCarbon (`OfflineEmissionsTracker`) |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## 4. 📑 Prompt Engineering: The Editorial Workflow |
| 100 | + |
| 101 | +### 4.1 Critic Prompt |
| 102 | + |
| 103 | +The Critic acts like a strict editor. |
| 104 | + |
| 105 | +It must: |
| 106 | + |
| 107 | +- Judge the draft answer |
| 108 | +- Compare it with the source context |
| 109 | +- Output `[OK]` or `[REVISE]` |
| 110 | +- Provide bullet-point feedback only when necessary |
| 111 | + |
| 112 | +Example behavior: |
| 113 | +[REVISE] |
| 114 | + |
| 115 | +The draft added information not found in the source context. |
| 116 | + |
| 117 | +Missing key fact about X. |
| 118 | + |
| 119 | +### 4.2 Refiner Prompt (LAmini) |
| 120 | + |
| 121 | +The Refiner receives: |
| 122 | + |
| 123 | +- Draft answer |
| 124 | +- Editor (Critic) feedback |
| 125 | + |
| 126 | +It rewrites the answer accordingly, following strict rules: |
| 127 | + |
| 128 | +- Only fix issues the Critic highlighted |
| 129 | +- No new information allowed |
| 130 | +- Must produce a complete final answer |
| 131 | + |
| 132 | +This avoids adding hallucinations and ensures correctness. |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +## 5. 📊 Sample Workflow (Prompts & Responses) |
| 137 | + |
| 138 | +You can include your own examples below. |
| 139 | + |
| 140 | +### Example 1: [Your Question Here] |
| 141 | + |
| 142 | +- **Prompt:** |
| 143 | + > **Paste your question here** |
| 144 | +
|
| 145 | +- **Draft Answer:** |
| 146 | + > **Paste model output here** |
| 147 | +
|
| 148 | +- **Critic Response:** |
| 149 | + > **Paste critic evaluation here** |
| 150 | +
|
| 151 | +- **Refined Answer (Final):** |
| 152 | + > **Paste LAmini rewrite here** |
| 153 | +
|
| 154 | +- **My Analysis:** |
| 155 | + > [Your notes] |
| 156 | +
|
| 157 | +--- |
| 158 | + |
| 159 | +### Example 2: [Your Question Here] |
| 160 | + |
| 161 | +(Same structure) |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +### Example 3: [Your Question Here] |
| 166 | + |
| 167 | +(Same structure) |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## 6. 🌱 Environmental Tracking |
| 172 | + |
| 173 | +We used **CodeCarbon** to measure local compute emissions and energy usage. |
| 174 | + |
| 175 | +This enables: |
| 176 | + |
| 177 | +- Transparency regarding energy cost |
| 178 | +- Comparison with API-based approaches |
| 179 | +- Understanding environmental impact on local hardware |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## 7. 📚 References (Reputable Sources) |
| 184 | + |
| 185 | +All documentation used: |
| 186 | + |
| 187 | +- Hugging Face Inference API |
| 188 | + <https://huggingface.co/docs/api-inference> |
| 189 | + |
| 190 | +- LlamaIndex Documentation |
| 191 | + <https://docs.llamaindex.ai> |
| 192 | + |
| 193 | +- LAmini Models |
| 194 | + <https://huggingface.co/LinkSoul/LAmini-Chat> |
| 195 | + |
| 196 | +- Qwen2.5 Models |
| 197 | + <https://huggingface.co/Qwen> |
| 198 | + |
| 199 | +- LlamaCPP / GGUF Models |
| 200 | + <https://github.com/ggerganov/llama.cpp> |
| 201 | + |
| 202 | +- CodeCarbon |
| 203 | + <https://mlco2.github.io/codecarbon/> |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## 8. ✅ Summary |
| 208 | + |
| 209 | +This project demonstrates a powerful hybrid RAG architecture that blends cloud |
| 210 | + reasoning and local refinement. |
| 211 | +Using a Critic–Refiner pipeline dramatically increases accuracy, reduces |
| 212 | + hallucinations, and ensures answers remain faithful to the source documents. |
| 213 | + |
| 214 | +LAmini provides fast, private, offline rewriting, while Qwen2.5 guarantees |
| 215 | + high-quality factual evaluation. |
| 216 | + |
| 217 | +Together, they form a reliable, cost-efficient, and production-ready RAG system. |
0 commit comments