Skip to content

Add RAG example using mlx-lm hidden state embeddings#1130

Open
ManjushaMotamarry wants to merge 1 commit intoml-explore:mainfrom
ManjushaMotamarry:add-rag-example
Open

Add RAG example using mlx-lm hidden state embeddings#1130
ManjushaMotamarry wants to merge 1 commit intoml-explore:mainfrom
ManjushaMotamarry:add-rag-example

Conversation

@ManjushaMotamarry
Copy link
Copy Markdown

Summary

Adds a minimal Retrieval-Augmented Generation (RAG) example to mlx_lm/examples/.

What this example demonstrates

  • How to generate sentence embeddings using mlx-lm hidden states (mean pooling across the last transformer layer)
  • How to compute cosine similarity between a question and a set of documents
  • How to retrieve the most relevant document and inject it into the prompt as context
  • How to generate a grounded answer using mlx-lm

Why no external vector DB

Intentionally dependency-free — uses numpy for cosine similarity and mlx-lm itself for embeddings, keeping everything within the MLX ecosystem.

Files changed

  • mlx_lm/examples/rag.py — the RAG example
  • tests/test_rag.py — unit tests for retrieve() and cosine_similarity()
  • README.md — added link to the new example

Tests

All 5 tests pass:

  • test_identical_vectors
  • test_orthogonal_vectors
  • test_opposite_vectors
  • test_retrieves_most_similar_document
  • test_single_document_always_returned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant