Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 61 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,23 +93,27 @@ pip install crossvector
### With Specific Backends

```bash
# AstraDB + OpenAI
pip install crossvector[astradb,openai]
# Recommended: PgVector + Gemini (free tier)
pip install crossvector[pgvector,gemini]

# Alternative: ChromaDB + Gemini (cloud or local)
pip install crossvector[chromadb,gemini]

# ChromaDB + OpenAI
# With OpenAI (requires paid API key)
pip install crossvector[pgvector,openai]
pip install crossvector[chromadb,openai]

# Milvus + Gemini
pip install crossvector[milvus,gemini]

# PgVector + OpenAI
pip install crossvector[pgvector,openai]
# AstraDB + OpenAI
pip install crossvector[astradb,openai]
```

### All Backends and Providers
### All Backends

```bash
# Everything
# Install everything
pip install crossvector[all]

# All databases only
Expand All @@ -123,21 +127,27 @@ pip install crossvector[astradb,all-embeddings]

## Quick Start

> 💡 **Recommended**: Use `GeminiEmbeddingAdapter` for most use cases - free tier, faster search (1.5x), smaller vectors (768 vs 1536 dims). See [benchmarks](benchmark.md) for details.

### Basic Usage

```python
from crossvector import VectorEngine
from crossvector.embeddings.openai import OpenAIEmbeddingAdapter
from crossvector.embeddings.gemini import GeminiEmbeddingAdapter
from crossvector.dbs.pgvector import PgVectorAdapter

# Initialize engine (uses default models if not specified)
# Initialize engine with Gemini (recommended: free tier, fast performance)
engine = VectorEngine(
embedding=OpenAIEmbeddingAdapter(), # Uses text-embedding-3-small by default
embedding=GeminiEmbeddingAdapter(), # Free tier, 1536-dim vectors
db=PgVectorAdapter(),
collection_name="my_documents",
store_text=True
)

# Alternative: OpenAI (requires paid API key, 1536-dim vectors)
# from crossvector.embeddings.openai import OpenAIEmbeddingAdapter
# embedding = OpenAIEmbeddingAdapter()

# Create documents (flexible input formats)
doc1 = engine.create(text="Python is a programming language")
doc2 = engine.create({"text": "Artificial intelligence", "metadata": {"category": "tech"}})
Expand Down Expand Up @@ -452,8 +462,8 @@ Different backends have varying feature support:
| Feature | AstraDB | ChromaDB | Milvus | PgVector |
|---------|---------|----------|--------|----------|
| Vector Search | ✅ | ✅ | ✅ | ✅ |
| Metadata-Only Search | ✅ | ✅ | | ✅ |
| Nested Metadata | ✅ | ✅* | | ✅ |
| Metadata-Only Search | ✅ | ✅ | | ✅ |
| Nested Metadata | ✅ | ✅ | | ✅ |
| Numeric Comparisons | ✅ | ✅ | ✅ | ✅ |
| Text Storage | ✅ | ✅ | ✅ | ✅ |

Expand Down Expand Up @@ -537,50 +547,63 @@ engine = VectorEngine(embedding=embedding, db=db)

## Embedding Providers

### OpenAI
> 💡 **Recommended**: Start with **Gemini** for free tier and faster performance. See [benchmark comparison](benchmark.md).

### Gemini (Recommended)

```python
from crossvector.embeddings.openai import OpenAIEmbeddingAdapter
from crossvector.embeddings.gemini import GeminiEmbeddingAdapter

# Default model (text-embedding-3-small, 1536 dims)
embedding = OpenAIEmbeddingAdapter()
# Default model (gemini-embedding-001, 1536 dims)
embedding = GeminiEmbeddingAdapter()

# Or use VECTOR_EMBEDDING_MODEL from .env
# VECTOR_EMBEDDING_MODEL=text-embedding-3-large
embedding = OpenAIEmbeddingAdapter() # Uses env var
# Explicit model specification
embedding = GeminiEmbeddingAdapter(model_name="models/text-embedding-004", dim=768)
```

# Explicit model override
embedding = OpenAIEmbeddingAdapter(model_name="text-embedding-3-large")
**Why Choose Gemini:**
- ✅ **Free tier**: 1,500 requests/min (vs OpenAI paid only)
- ✅ **Faster search**: 234ms avg (1.5x faster than OpenAI)
- ✅ **Efficient**: 768 dims = 50% less storage than OpenAI
- ✅ **Quality**: Comparable accuracy to OpenAI

**Configuration:**
```bash
GEMINI_API_KEY=AI... # Get free key at https://makersuite.google.com/app/apikey
```

**Supported Models:**
- `text-embedding-3-small` (1536 dims, default)
- `text-embedding-3-large` (3072 dims)
- `text-embedding-ada-002` (1536 dims, legacy)
- `gemini-embedding-001` (1536 dims, **recommended**)
- `models/text-embedding-004` (768 dims)

### Gemini
### OpenAI (Alternative)

```python
from crossvector.embeddings.gemini import GeminiEmbeddingAdapter
from crossvector.embeddings.openai import OpenAIEmbeddingAdapter

# Default model (gemini-embedding-001, 1536 dims)
embedding = GeminiEmbeddingAdapter()
# Default model (text-embedding-3-small, 1536 dims)
embedding = OpenAIEmbeddingAdapter()

# Or use VECTOR_EMBEDDING_MODEL from .env
# VECTOR_EMBEDDING_MODEL=gemini-embedding-001
embedding = GeminiEmbeddingAdapter() # Uses env var
# Explicit model specification
embedding = OpenAIEmbeddingAdapter(model_name="text-embedding-3-large")
```

# With custom dimensions (768, 1536, 3072)
embedding = GeminiEmbeddingAdapter(dim=768)
**When to Use OpenAI:**
- ✅ Need 1536 or 3072 dimensions
- ✅ Already have OpenAI API budget
- ✅ Prefer OpenAI ecosystem integration

# With task type
embedding = GeminiEmbeddingAdapter(
task_type="retrieval_document" # or "retrieval_query", "semantic_similarity"
)
**Configuration:**
```bash
OPENAI_API_KEY=sk-... # Paid API key from https://platform.openai.com
```

**Supported Models:**
- `gemini-embedding-001` (768-3072 dims, default, recommended)
- `text-embedding-3-small` (1536 dims, default)
- `text-embedding-3-large` (3072 dims)
- `text-embedding-ada-002` (1536 dims, legacy)

- `gemini-embedding-001` (1536 dims, default)
- `text-embedding-005` (768 dims)
- `text-embedding-004` (768 dims, legacy)

Expand Down
6 changes: 3 additions & 3 deletions docs/adapters/databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ pip install crossvector[astradb]
```bash
ASTRA_DB_APPLICATION_TOKEN="AstraCS:xxx"
ASTRA_DB_API_ENDPOINT="https://xxx.apps.astra.datastax.com"
ASTRA_DB_KEYSPACE="default_keyspace" # Optional
# Note: Collection name uses VECTOR_COLLECTION_NAME (shared setting)
```

**Programmatic:**
Expand Down Expand Up @@ -727,11 +727,11 @@ Same code works across all backends:

```python
from crossvector import VectorEngine
from crossvector.embeddings.openai import OpenAIEmbeddingAdapter
from crossvector.embeddings.gemini import GeminiEmbeddingAdapter
from crossvector.querydsl.q import Q

# Create embedding adapter (same for all)
embedding = OpenAIEmbeddingAdapter()
embedding = GeminiEmbeddingAdapter()

# Choose backend (interchangeable)
if backend == "astradb":
Expand Down
Loading