|
| 1 | +# Phase 1 RAG Upgrade Plan: Skeleton → Production-Ready |
| 2 | + |
| 3 | +## Current State (Skeleton) |
| 4 | +- ✅ Document chunking: Converts experiences.json to RAG documents |
| 5 | +- ✅ JSON vector store: Stores documents with metadata |
| 6 | +- ✅ Retrieval pipeline: Returns top-K results |
| 7 | +- ❌ **Synthetic embeddings**: Hash-based, not semantic |
| 8 | +- ❌ **No LLM integration**: No AI-powered rewriting |
| 9 | +- ❌ **No vector DB**: JSON-based, not scalable |
| 10 | +- ❌ **No reranking**: No quality filtering |
| 11 | + |
| 12 | +## Feedback from Code Review |
| 13 | + |
| 14 | +> "It's a good RAG skeleton (index → retrieve → use context), but not a full, production RAG." |
| 15 | +
|
| 16 | +### What's Missing for "Real RAG" |
| 17 | + |
| 18 | +1. **Real embeddings** - Swap fake hash/sine vectors for actual ML embeddings |
| 19 | +2. **A vector store** - Use FAISS/Chroma/pgvector instead of JSON |
| 20 | +3. **LLM generation & planning** - Use LLM to rewrite bullets with evidence constraints |
| 21 | +4. **Reranking** (optional) - Cross-encoder to improve top-K quality |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## Upgrade Tasks |
| 26 | + |
| 27 | +### A) Real Embeddings (sentence-transformers) |
| 28 | + |
| 29 | +**Current**: Hash-based embeddings (deterministic but not semantic) |
| 30 | +```python |
| 31 | +def _generate_embedding(self, text: str) -> List[float]: |
| 32 | + hash_val = sum(ord(c) for c in text) |
| 33 | + embedding = [] |
| 34 | + for i in range(384): |
| 35 | + embedding.append(math.sin((hash_val + i) * 0.1) * 0.5 + 0.5) |
| 36 | + return embedding |
| 37 | +``` |
| 38 | + |
| 39 | +**Target**: Real semantic embeddings |
| 40 | +```python |
| 41 | +from sentence_transformers import SentenceTransformer |
| 42 | + |
| 43 | +class RAGIndexer: |
| 44 | + def __init__(self): |
| 45 | + self.embedder = SentenceTransformer("all-MiniLM-L6-v2") |
| 46 | + |
| 47 | + def _embed(self, texts: List[str]) -> List[List[float]]: |
| 48 | + return self.embedder.encode(texts, normalize_embeddings=True).tolist() |
| 49 | +``` |
| 50 | + |
| 51 | +**Changes**: |
| 52 | +- [ ] Add `sentence-transformers` dependency |
| 53 | +- [ ] Update `RAGIndexer._embed()` to use SentenceTransformer |
| 54 | +- [ ] Update `Retriever._query_embedding()` to use SentenceTransformer |
| 55 | +- [ ] Re-index all documents with real embeddings |
| 56 | +- [ ] Update tests to validate semantic similarity |
| 57 | + |
| 58 | +**Impact**: Semantic search instead of keyword matching |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +### B) Vector Database (FAISS) |
| 63 | + |
| 64 | +**Current**: JSON file with linear search |
| 65 | +```python |
| 66 | +# O(n) search through all documents |
| 67 | +for doc in self.documents: |
| 68 | + score = self._cosine_similarity(query_embedding, doc.embedding) |
| 69 | +``` |
| 70 | + |
| 71 | +**Target**: FAISS for efficient similarity search |
| 72 | +```python |
| 73 | +import faiss |
| 74 | +import numpy as np |
| 75 | + |
| 76 | +class Retriever: |
| 77 | + def __init__(self, vector_store_path): |
| 78 | + self.embedder = SentenceTransformer("all-MiniLM-L6-v2") |
| 79 | + self.index = faiss.IndexFlatIP(384) # Inner product for cosine |
| 80 | + |
| 81 | + # Load embeddings and build index |
| 82 | + embeddings = np.array([d["embedding"] for d in docs], dtype="float32") |
| 83 | + self.index.add(embeddings) |
| 84 | + |
| 85 | + def retrieve(self, query: str, top_k: int = 10): |
| 86 | + q_emb = np.array([self.embedder.encode(query)], dtype="float32") |
| 87 | + scores, indices = self.index.search(q_emb, top_k) |
| 88 | + return [(self.documents[i], float(scores[0][j])) |
| 89 | + for j, i in enumerate(indices[0])] |
| 90 | +``` |
| 91 | + |
| 92 | +**Changes**: |
| 93 | +- [ ] Add `faiss-cpu` dependency (or `faiss-gpu` for production) |
| 94 | +- [ ] Update `RAGIndexer` to build FAISS index |
| 95 | +- [ ] Update `Retriever` to use FAISS for search |
| 96 | +- [ ] Save/load FAISS index alongside JSON metadata |
| 97 | +- [ ] Update tests for FAISS integration |
| 98 | + |
| 99 | +**Impact**: O(log n) search, scales to millions of documents |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +### C) LLM-Powered Rewriting |
| 104 | + |
| 105 | +**Current**: No LLM integration, just retrieval |
| 106 | +```python |
| 107 | +# tailor.py just injects retrieved context into prompt |
| 108 | +rag_context = retriever.retrieve(requirement) |
| 109 | +# Context passed to LLM but no special handling |
| 110 | +``` |
| 111 | + |
| 112 | +**Target**: Evidence-constrained LLM rewriting |
| 113 | +```python |
| 114 | +from openai import OpenAI |
| 115 | + |
| 116 | +def rewrite_with_evidence(bullet: str, evidence: str, requirement: str) -> str: |
| 117 | + """Rewrite bullet using retrieved evidence as constraint.""" |
| 118 | + client = OpenAI() |
| 119 | + |
| 120 | + prompt = f"""Rewrite this resume bullet to match the job requirement. |
| 121 | +Use ONLY facts from the EVIDENCE. Do not invent metrics or skills. |
| 122 | +
|
| 123 | +REQUIREMENT: {requirement} |
| 124 | +ORIGINAL BULLET: {bullet} |
| 125 | +EVIDENCE: {evidence} |
| 126 | +
|
| 127 | +Rewrite the bullet to: |
| 128 | +1. Use active voice and strong verbs |
| 129 | +2. Include quantified impact (%, $, X%, improvement) |
| 130 | +3. Highlight relevant skills from the requirement |
| 131 | +4. Keep it under 150 characters |
| 132 | +
|
| 133 | +Return ONLY the rewritten bullet, no explanation.""" |
| 134 | + |
| 135 | + response = client.chat.completions.create( |
| 136 | + model="gpt-4o-mini", |
| 137 | + messages=[{"role": "user", "content": prompt}], |
| 138 | + temperature=0.2, # Low temperature for consistency |
| 139 | + max_tokens=100 |
| 140 | + ) |
| 141 | + |
| 142 | + return response.choices[0].message.content.strip() |
| 143 | +``` |
| 144 | + |
| 145 | +**Integration in tailor.py**: |
| 146 | +```python |
| 147 | +def select_and_rewrite_with_rag(experience, keywords, rag_context=None): |
| 148 | + tailored = [] |
| 149 | + for job in experience: |
| 150 | + top_bullets = score_bullets(job["bullets"], keywords)[:3] |
| 151 | + |
| 152 | + rewritten = [] |
| 153 | + for bullet in top_bullets: |
| 154 | + # Get evidence for this bullet |
| 155 | + evidence = retrieve_evidence_for_bullet(bullet, rag_context) |
| 156 | + |
| 157 | + # Rewrite with LLM using evidence |
| 158 | + improved = rewrite_with_evidence( |
| 159 | + bullet["text"], |
| 160 | + evidence, |
| 161 | + keywords[0] # Primary requirement |
| 162 | + ) |
| 163 | + rewritten.append(improved) |
| 164 | + |
| 165 | + job_data = {**job, "selected_bullets": rewritten} |
| 166 | + tailored.append(job_data) |
| 167 | + |
| 168 | + return tailored |
| 169 | +``` |
| 170 | + |
| 171 | +**Changes**: |
| 172 | +- [ ] Create `src/rag/llm_rewriter.py` with `rewrite_with_evidence()` |
| 173 | +- [ ] Update `tailor.py` to use LLM rewriter |
| 174 | +- [ ] Add OpenAI API key configuration |
| 175 | +- [ ] Implement evidence extraction from retrieved context |
| 176 | +- [ ] Add error handling for LLM failures |
| 177 | +- [ ] Update tests with mock LLM responses |
| 178 | + |
| 179 | +**Impact**: AI-powered bullet rewriting with evidence constraints |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +### D) Reranking (Optional but Recommended) |
| 184 | + |
| 185 | +**Current**: Top-K by similarity score only |
| 186 | +```python |
| 187 | +# Just return top-K by cosine similarity |
| 188 | +top_results = sorted_docs[:top_k] |
| 189 | +``` |
| 190 | + |
| 191 | +**Target**: Rerank with cross-encoder for better quality |
| 192 | +```python |
| 193 | +from sentence_transformers import CrossEncoder |
| 194 | + |
| 195 | +class Retriever: |
| 196 | + def __init__(self, vector_store_path): |
| 197 | + self.embedder = SentenceTransformer("all-MiniLM-L6-v2") |
| 198 | + self.reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2") |
| 199 | + |
| 200 | + def retrieve(self, query: str, top_k: int = 10): |
| 201 | + # Step 1: Get top-2K with FAISS |
| 202 | + candidates = self._faiss_search(query, top_k=20) |
| 203 | + |
| 204 | + # Step 2: Rerank with cross-encoder |
| 205 | + pairs = [[query, doc.content] for doc, _ in candidates] |
| 206 | + scores = self.reranker.predict(pairs) |
| 207 | + |
| 208 | + # Step 3: Return top-K reranked results |
| 209 | + reranked = sorted( |
| 210 | + zip(candidates, scores), |
| 211 | + key=lambda x: x[1], |
| 212 | + reverse=True |
| 213 | + )[:top_k] |
| 214 | + |
| 215 | + return [(doc, float(score)) for (doc, _), score in reranked] |
| 216 | +``` |
| 217 | + |
| 218 | +**Changes**: |
| 219 | +- [ ] Add `sentence-transformers` cross-encoder model |
| 220 | +- [ ] Update `Retriever.retrieve()` to use reranking |
| 221 | +- [ ] Add reranking configuration (top_k_candidates) |
| 222 | +- [ ] Update tests for reranking |
| 223 | + |
| 224 | +**Impact**: Better quality top-K results, more relevant to query |
| 225 | + |
| 226 | +--- |
| 227 | + |
| 228 | +## Implementation Order |
| 229 | + |
| 230 | +### Phase 1A: Real Embeddings (2-3 hours) |
| 231 | +1. Add `sentence-transformers` dependency |
| 232 | +2. Update `RAGIndexer` to use SentenceTransformer |
| 233 | +3. Update `Retriever` to use SentenceTransformer |
| 234 | +4. Re-index documents |
| 235 | +5. Update tests |
| 236 | + |
| 237 | +### Phase 1B: FAISS Integration (2-3 hours) |
| 238 | +1. Add `faiss-cpu` dependency |
| 239 | +2. Update `RAGIndexer` to build FAISS index |
| 240 | +3. Update `Retriever` to use FAISS search |
| 241 | +4. Update tests |
| 242 | + |
| 243 | +### Phase 1C: LLM Rewriting (3-4 hours) |
| 244 | +1. Create `src/rag/llm_rewriter.py` |
| 245 | +2. Update `tailor.py` to use LLM rewriter |
| 246 | +3. Add OpenAI configuration |
| 247 | +4. Update tests with mock responses |
| 248 | + |
| 249 | +### Phase 1D: Reranking (1-2 hours) |
| 250 | +1. Add cross-encoder model |
| 251 | +2. Update `Retriever.retrieve()` for reranking |
| 252 | +3. Update tests |
| 253 | + |
| 254 | +### Phase 1E: Testing & Documentation (2-3 hours) |
| 255 | +1. Update all tests |
| 256 | +2. Update documentation |
| 257 | +3. Run full test suite |
| 258 | +4. Create upgrade guide |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +## Dependencies to Add |
| 263 | + |
| 264 | +```bash |
| 265 | +pip install sentence-transformers faiss-cpu |
| 266 | +``` |
| 267 | + |
| 268 | +Or in `requirements.txt`: |
| 269 | +``` |
| 270 | +sentence-transformers>=2.2.0 |
| 271 | +faiss-cpu>=1.7.4 # or faiss-gpu for production |
| 272 | +``` |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Configuration Changes |
| 277 | + |
| 278 | +### RAG Config (src/rag/config.py) |
| 279 | +```python |
| 280 | +RAG_CONFIG = { |
| 281 | + # Embedding |
| 282 | + 'embedding_model': 'all-MiniLM-L6-v2', |
| 283 | + 'embedding_dim': 384, |
| 284 | + |
| 285 | + # Vector Store |
| 286 | + 'vector_store_type': 'faiss', # Changed from 'local' |
| 287 | + 'vector_store_path': 'data/rag/vector_store.faiss', |
| 288 | + 'metadata_path': 'data/rag/metadata.json', |
| 289 | + |
| 290 | + # Retrieval |
| 291 | + 'retrieval_top_k': 10, |
| 292 | + 'retrieval_top_k_candidates': 20, # For reranking |
| 293 | + 'similarity_threshold': 0.35, |
| 294 | + |
| 295 | + # Reranking |
| 296 | + 'use_reranking': True, |
| 297 | + 'reranker_model': 'cross-encoder/ms-marco-MiniLM-L-6-v2', |
| 298 | + |
| 299 | + # LLM Rewriting |
| 300 | + 'use_llm_rewriting': True, |
| 301 | + 'llm_model': 'gpt-4o-mini', |
| 302 | + 'llm_temperature': 0.2, |
| 303 | +} |
| 304 | +``` |
| 305 | + |
| 306 | +--- |
| 307 | + |
| 308 | +## Testing Strategy |
| 309 | + |
| 310 | +### Unit Tests |
| 311 | +- [ ] Test real embeddings produce semantic similarity |
| 312 | +- [ ] Test FAISS index creation and search |
| 313 | +- [ ] Test LLM rewriter with mock responses |
| 314 | +- [ ] Test reranking improves quality |
| 315 | + |
| 316 | +### Integration Tests |
| 317 | +- [ ] End-to-end: Parse JD → Retrieve → Rerank → Rewrite |
| 318 | +- [ ] Compare skeleton vs production RAG quality |
| 319 | +- [ ] Benchmark performance (speed, accuracy) |
| 320 | + |
| 321 | +### Validation |
| 322 | +- [ ] Semantic similarity > 0.7 for relevant documents |
| 323 | +- [ ] Reranking improves top-1 accuracy by > 10% |
| 324 | +- [ ] LLM rewriting produces valid bullets |
| 325 | +- [ ] All 421 existing tests still pass |
| 326 | + |
| 327 | +--- |
| 328 | + |
| 329 | +## Success Criteria |
| 330 | + |
| 331 | +- [ ] Real embeddings: Semantic similarity works correctly |
| 332 | +- [ ] FAISS: Search is O(log n) and accurate |
| 333 | +- [ ] LLM Rewriting: Produces evidence-constrained bullets |
| 334 | +- [ ] Reranking: Improves top-K quality by > 10% |
| 335 | +- [ ] All tests pass (421 + new tests) |
| 336 | +- [ ] Performance: Retrieval < 100ms for 1000 documents |
| 337 | +- [ ] Documentation: Updated with production RAG details |
| 338 | + |
| 339 | +--- |
| 340 | + |
| 341 | +## Rollback Plan |
| 342 | + |
| 343 | +If any component fails: |
| 344 | +1. Keep JSON vector store as fallback |
| 345 | +2. Keep hash-based embeddings as fallback |
| 346 | +3. Keep regex rewriter as fallback |
| 347 | +4. Feature flags to enable/disable each component |
| 348 | + |
| 349 | +--- |
| 350 | + |
| 351 | +## Next Steps |
| 352 | + |
| 353 | +1. **Decide**: Do you want to upgrade Phase 1 to production RAG? |
| 354 | +2. **Prioritize**: Which components are most important? |
| 355 | + - A) Real embeddings (critical for semantic search) |
| 356 | + - B) FAISS (critical for scalability) |
| 357 | + - C) LLM rewriting (critical for quality) |
| 358 | + - D) Reranking (nice-to-have for quality) |
| 359 | +3. **Timeline**: How much time do you want to spend? |
| 360 | +4. **Resources**: Do you have OpenAI API access for LLM rewriting? |
| 361 | + |
| 362 | +--- |
| 363 | + |
| 364 | +## Recommendation |
| 365 | + |
| 366 | +**Implement in this order**: |
| 367 | +1. **A + B** (Real embeddings + FAISS) - 4-6 hours |
| 368 | + - Enables semantic search and scalability |
| 369 | + - Foundation for everything else |
| 370 | +2. **C** (LLM rewriting) - 3-4 hours |
| 371 | + - Enables AI-powered bullet improvement |
| 372 | + - Requires OpenAI API |
| 373 | +3. **D** (Reranking) - 1-2 hours |
| 374 | + - Optional but recommended for quality |
| 375 | + - Low effort, high impact |
| 376 | + |
| 377 | +**Total**: 8-12 hours to production-ready RAG |
| 378 | + |
| 379 | +This would make Phase 1 a **complete, production-ready RAG system** before moving to Phase 2 (LoRA fine-tuning). |
| 380 | + |
0 commit comments