Add RAG example using mlx-lm hidden state embeddings by ManjushaMotamarry · Pull Request #1130 · ml-explore/mlx-lm

ManjushaMotamarry · 2026-04-08T17:11:28Z

Summary

Adds a minimal Retrieval-Augmented Generation (RAG) example to mlx_lm/examples/.

How to generate sentence embeddings using mlx-lm hidden states (mean pooling across the last transformer layer)
How to compute cosine similarity between a question and a set of documents
How to retrieve the most relevant document and inject it into the prompt as context
How to generate a grounded answer using mlx-lm

Intentionally dependency-free — uses numpy for cosine similarity and mlx-lm itself for embeddings, keeping everything within the MLX ecosystem.

All 5 tests pass:

…arity

Add RAG example using mlx-lm hidden state embeddings and cosine simil…

9ca7e48

…arity