Phase 1 RAG Upgrade Plan: Skeleton → Production-Ready
Current State (Skeleton)
- ✅ Document chunking: Converts experiences.json to RAG documents
- ✅ JSON vector store: Stores documents with metadata
- ✅ Retrieval pipeline: Returns top-K results
- ❌ Synthetic embeddings: Hash-based, not semantic
- ❌ No LLM integration: No AI-powered rewriting
- ❌ No vector DB: JSON-based, not scalable
- ❌ No reranking: No quality filtering
Feedback from Code Review
"It's a good RAG skeleton (index → retrieve → use context), but not a full, production RAG."
What's Missing for "Real RAG"
- Real embeddings - Swap fake hash/sine vectors for actual ML embeddings
- A vector store - Use FAISS/Chroma/pgvector instead of JSON
- LLM generation & planning - Use LLM to rewrite bullets with evidence constraints
- Reranking (optional) - Cross-encoder to improve top-K quality
Upgrade Tasks
A) Real Embeddings (sentence-transformers)
Current: Hash-based embeddings (deterministic but not semantic)
def _generate_embedding(self, text: str) -> List[float]:
hash_val = sum(ord(c) for c in text)
embedding = []
for i in range(384):
embedding.append(math.sin((hash_val + i) * 0.1) * 0.5 + 0.5)
return embedding
Target: Real semantic embeddings
from sentence_transformers import SentenceTransformer
class RAGIndexer:
def __init__(self):
self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
def _embed(self, texts: List[str]) -> List[List[float]]:
return self.embedder.encode(texts, normalize_embeddings=True).tolist()
Changes:
Impact: Semantic search instead of keyword matching
B) Vector Database (FAISS)
Current: JSON file with linear search
# O(n) search through all documents
for doc in self.documents:
score = self._cosine_similarity(query_embedding, doc.embedding)
Target: FAISS for efficient similarity search
import faiss
import numpy as np
class Retriever:
def __init__(self, vector_store_path):
self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
self.index = faiss.IndexFlatIP(384) # Inner product for cosine
# Load embeddings and build index
embeddings = np.array([d["embedding"] for d in docs], dtype="float32")
self.index.add(embeddings)
def retrieve(self, query: str, top_k: int = 10):
q_emb = np.array([self.embedder.encode(query)], dtype="float32")
scores, indices = self.index.search(q_emb, top_k)
return [(self.documents[i], float(scores[0][j]))
for j, i in enumerate(indices[0])]
Changes:
Impact: O(log n) search, scales to millions of documents
C) LLM-Powered Rewriting
Current: No LLM integration, just retrieval
# tailor.py just injects retrieved context into prompt
rag_context = retriever.retrieve(requirement)
# Context passed to LLM but no special handling
Target: Evidence-constrained LLM rewriting
from openai import OpenAI
def rewrite_with_evidence(bullet: str, evidence: str, requirement: str) -> str:
"""Rewrite bullet using retrieved evidence as constraint."""
client = OpenAI()
prompt = f"""Rewrite this resume bullet to match the job requirement.
Use ONLY facts from the EVIDENCE. Do not invent metrics or skills.
REQUIREMENT: {requirement}
ORIGINAL BULLET: {bullet}
EVIDENCE: {evidence}
Rewrite the bullet to:
1. Use active voice and strong verbs
2. Include quantified impact (%, $, X%, improvement)
3. Highlight relevant skills from the requirement
4. Keep it under 150 characters
Return ONLY the rewritten bullet, no explanation."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0.2, # Low temperature for consistency
max_tokens=100
)
return response.choices[0].message.content.strip()
Integration in tailor.py:
def select_and_rewrite_with_rag(experience, keywords, rag_context=None):
tailored = []
for job in experience:
top_bullets = score_bullets(job["bullets"], keywords)[:3]
rewritten = []
for bullet in top_bullets:
# Get evidence for this bullet
evidence = retrieve_evidence_for_bullet(bullet, rag_context)
# Rewrite with LLM using evidence
improved = rewrite_with_evidence(
bullet["text"],
evidence,
keywords[0] # Primary requirement
)
rewritten.append(improved)
job_data = {**job, "selected_bullets": rewritten}
tailored.append(job_data)
return tailored
Changes:
Impact: AI-powered bullet rewriting with evidence constraints
D) Reranking (Optional but Recommended)
Current: Top-K by similarity score only
# Just return top-K by cosine similarity
top_results = sorted_docs[:top_k]
Target: Rerank with cross-encoder for better quality
from sentence_transformers import CrossEncoder
class Retriever:
def __init__(self, vector_store_path):
self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
self.reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def retrieve(self, query: str, top_k: int = 10):
# Step 1: Get top-2K with FAISS
candidates = self._faiss_search(query, top_k=20)
# Step 2: Rerank with cross-encoder
pairs = [[query, doc.content] for doc, _ in candidates]
scores = self.reranker.predict(pairs)
# Step 3: Return top-K reranked results
reranked = sorted(
zip(candidates, scores),
key=lambda x: x[1],
reverse=True
)[:top_k]
return [(doc, float(score)) for (doc, _), score in reranked]
Changes:
Impact: Better quality top-K results, more relevant to query
Implementation Order
Phase 1A: Real Embeddings (2-3 hours)
- Add
sentence-transformers dependency
- Update
RAGIndexer to use SentenceTransformer
- Update
Retriever to use SentenceTransformer
- Re-index documents
- Update tests
Phase 1B: FAISS Integration (2-3 hours)
- Add
faiss-cpu dependency
- Update
RAGIndexer to build FAISS index
- Update
Retriever to use FAISS search
- Update tests
Phase 1C: LLM Rewriting (3-4 hours)
- Create
src/rag/llm_rewriter.py
- Update
tailor.py to use LLM rewriter
- Add OpenAI configuration
- Update tests with mock responses
Phase 1D: Reranking (1-2 hours)
- Add cross-encoder model
- Update
Retriever.retrieve() for reranking
- Update tests
Phase 1E: Testing & Documentation (2-3 hours)
- Update all tests
- Update documentation
- Run full test suite
- Create upgrade guide
Dependencies to Add
pip install sentence-transformers faiss-cpu
Or in requirements.txt:
sentence-transformers>=2.2.0
faiss-cpu>=1.7.4 # or faiss-gpu for production
Configuration Changes
RAG Config (src/rag/config.py)
RAG_CONFIG = {
# Embedding
'embedding_model': 'all-MiniLM-L6-v2',
'embedding_dim': 384,
# Vector Store
'vector_store_type': 'faiss', # Changed from 'local'
'vector_store_path': 'data/rag/vector_store.faiss',
'metadata_path': 'data/rag/metadata.json',
# Retrieval
'retrieval_top_k': 10,
'retrieval_top_k_candidates': 20, # For reranking
'similarity_threshold': 0.35,
# Reranking
'use_reranking': True,
'reranker_model': 'cross-encoder/ms-marco-MiniLM-L-6-v2',
# LLM Rewriting
'use_llm_rewriting': True,
'llm_model': 'gpt-4o-mini',
'llm_temperature': 0.2,
}
Testing Strategy
Unit Tests
Integration Tests
Validation
Success Criteria
Rollback Plan
If any component fails:
- Keep JSON vector store as fallback
- Keep hash-based embeddings as fallback
- Keep regex rewriter as fallback
- Feature flags to enable/disable each component
Next Steps
- Decide: Do you want to upgrade Phase 1 to production RAG?
- Prioritize: Which components are most important?
- A) Real embeddings (critical for semantic search)
- B) FAISS (critical for scalability)
- C) LLM rewriting (critical for quality)
- D) Reranking (nice-to-have for quality)
- Timeline: How much time do you want to spend?
- Resources: Do you have OpenAI API access for LLM rewriting?
Recommendation
Implement in this order:
- A + B (Real embeddings + FAISS) - 4-6 hours
- Enables semantic search and scalability
- Foundation for everything else
- C (LLM rewriting) - 3-4 hours
- Enables AI-powered bullet improvement
- Requires OpenAI API
- D (Reranking) - 1-2 hours
- Optional but recommended for quality
- Low effort, high impact
Total: 8-12 hours to production-ready RAG
This would make Phase 1 a complete, production-ready RAG system before moving to Phase 2 (LoRA fine-tuning).
Phase 1 RAG Upgrade Plan: Skeleton → Production-Ready
Current State (Skeleton)
Feedback from Code Review
What's Missing for "Real RAG"
Upgrade Tasks
A) Real Embeddings (sentence-transformers)
Current: Hash-based embeddings (deterministic but not semantic)
Target: Real semantic embeddings
Changes:
sentence-transformersdependencyRAGIndexer._embed()to use SentenceTransformerRetriever._query_embedding()to use SentenceTransformerImpact: Semantic search instead of keyword matching
B) Vector Database (FAISS)
Current: JSON file with linear search
Target: FAISS for efficient similarity search
Changes:
faiss-cpudependency (orfaiss-gpufor production)RAGIndexerto build FAISS indexRetrieverto use FAISS for searchImpact: O(log n) search, scales to millions of documents
C) LLM-Powered Rewriting
Current: No LLM integration, just retrieval
Target: Evidence-constrained LLM rewriting
Integration in tailor.py:
Changes:
src/rag/llm_rewriter.pywithrewrite_with_evidence()tailor.pyto use LLM rewriterImpact: AI-powered bullet rewriting with evidence constraints
D) Reranking (Optional but Recommended)
Current: Top-K by similarity score only
Target: Rerank with cross-encoder for better quality
Changes:
sentence-transformerscross-encoder modelRetriever.retrieve()to use rerankingImpact: Better quality top-K results, more relevant to query
Implementation Order
Phase 1A: Real Embeddings (2-3 hours)
sentence-transformersdependencyRAGIndexerto use SentenceTransformerRetrieverto use SentenceTransformerPhase 1B: FAISS Integration (2-3 hours)
faiss-cpudependencyRAGIndexerto build FAISS indexRetrieverto use FAISS searchPhase 1C: LLM Rewriting (3-4 hours)
src/rag/llm_rewriter.pytailor.pyto use LLM rewriterPhase 1D: Reranking (1-2 hours)
Retriever.retrieve()for rerankingPhase 1E: Testing & Documentation (2-3 hours)
Dependencies to Add
Or in
requirements.txt:Configuration Changes
RAG Config (src/rag/config.py)
Testing Strategy
Unit Tests
Integration Tests
Validation
Success Criteria
Rollback Plan
If any component fails:
Next Steps
Recommendation
Implement in this order:
Total: 8-12 hours to production-ready RAG
This would make Phase 1 a complete, production-ready RAG system before moving to Phase 2 (LoRA fine-tuning).