Encrypted vector search for LangChain using Envector, powered by homomorphic encryption (CKKS). This repo ships a LangChain-compatible VectorStore and retriever utilities built on the high-level pyenvector Python SDK.
- LangChain
VectorStoreinterface withsimilarity_search,from_texts, etc. - Optional
VectorStoreRetrieverhelper for quick RAG integrations. - Client-side encryption handled transparently by the SDK, including score thresholds and filtering.
- Python 3.9–3.13 (recommend 3.11)
- Create and activate a virtualenv:
python3.11 -m venv .venv && source .venv/bin/activate
- Install runtime dependencies:
pip install -U pip setuptools wheelpip install pyenvector langchain sentence-transformers
- Configure Envector using
EnvectorConfig, pointing to your EnVector endpoint and keys. - Initialize embeddings (or provide pre-computed vectors).
- Instantiate
Envector(config=cfg, embeddings=emb)and calladd_texts,add_documents, or useas_retriever. - Run
similarity_searchor plug the retriever into your LangChain pipeline.
See
notebooks/for end-to-end walkthroughs and thelibs/envectorpackage for implementation details.
Key dataclasses live in libs/envector/config.py:
ConnectionConfig: address or host/port for EnVector.KeyConfig: key path, key ID, optional preset/eval mode.IndexSettings: index name, dimension (32–4096), query encryption mode, optional output fields and fetch parameters.EnvectorConfig: wraps the above and enables auto-creation viacreate_if_missing.
- Each vector stores a single
metadatastring in EnVector. - To align with LangChain’s
Document, inserts wrap data as JSON:{"text": ..., "metadata": ...}. - Retrieval unwraps JSON, returning
Document(page_content=text, metadata={...}). - Client-side filtering requires the JSON envelope to include an object under
metadata.
- Item-level delete/update is unsupported (drop the index to reset).
- Manual item IDs are not accepted; returned IDs from
add_textsare ephemeral. - Filtering happens client-side; ensure metadata is JSON for structured filters.
from langchain_envector.config import ConnectionConfig, EnvectorConfig, IndexSettings, KeyConfig
cfg = EnvectorConfig(
connection=ConnectionConfig(
address=ENVECTOR_ADDRESS,
access_token=ENVECTOR_ACCESS_TOKEN
),
key=KeyConfig(
key_path=ENVECTOR_KEY_PATH,
key_id=ENVECTOR_KEY_ID,
preset="ip",
eval_mode="rmp"
),
index=IndexSettings(
index_name=INDEX_NAME,
dim=vector_dim,
query_encryption="cipher"
),
create_if_missing=True,
)from langchain_core.documents import Document
from langchain_envector.vectorstore import Envector
docs = [
Document(
page_content="chunk-1",
metadata={"source": "paper.pdf", "page": 1, "chunk": 0}
),
Document(
page_content="chunk-2",
metadata={"source": "paper.pdf", "page": 1, "chunk": 1}
),
]
store = Envector(config=cfg, embeddings=emb)
store.add_documents(docs)Or you can use add_texts to store vectors and their texts.
store.add_texts(
texts=["chunk 3"],
metadatas=[{"source": "paper.pdf", "page": 1, "chunk": 2}]
)results = store.similarity_search(query, k=1)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")results = store.similarity_search_with_score(query, k=1)
for doc, score in results:
print(f"* [SIM={score:.3f}] {doc.page_content} [{doc.metadata}]")query_embedding = embeddings.embed_query(query)
print(f"Query: {query_embedding[:3]}")
results = store.similarity_search_by_vector(query_embedding, k=3)
for doc in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")- Connection issues: verify EnVector address and registered keys.
- Embeddings mismatch: ensure embedding dimension equals
index.dimwhen supplying vectors. - Unexpected raw strings: confirm inserts used the JSON envelope.
- Key Issues: check key's metadata to sync with the registered key if facing any key issue.
Before running tests, install dependencies for pytest:
pip install -r tests/requirements.txtRun unit tests offline (no EnVector or SDK required)
python -m pytest -q -m "not integration"
# or
python scripts/run_unit_tests.pyRun integration tests (requires enVector server)
-
Prepare the running enVector server
-
Export the environment variables:
ENVECTOR_ADDRESSENVECTOR_KEY_PATHENVECTOR_KEY_IDENVECTOR_INDEX_NAME- (Optional)
ENVECTOR_USE_EMBEDDINGS=1 - (Optional)
ENVECTOR_EMB_MODEL - (Optional)
ENVECTOR_USE_HF_DATASET=1
- Run the following command:
python -m pytest -q -m integration -sSee CONTRIBUTE.md for development, testing, and PR guidelines.