| sidebar_label | Hybrid Search |
|---|
This notebook demonstrates the hybrid search capabilities of the OceanBase vector store, including vector search, sparse vector search, and full-text search with intelligent result fusion.
To use hybrid search features, you'll need to deploy a standalone OceanBase server and install the required packages:
docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latestAnd install the langchain-oceanbase integration package:
pip install -qU "langchain-oceanbase"Check the connection to OceanBase and set the memory usage ratio for vector data:
from pyobvector import ObVecClient
tmp_client = ObVecClient()
tmp_client.perform_raw_text_sql("ALTER SYSTEM ob_vector_memory_limit_percentage = 30")Configure the embedding model and initialize the OceanBase vector store with hybrid search capabilities. Here we use DefaultEmbeddingFunctionAdapter for demonstration purposes:
import os
from langchain_oceanbase.embedding_utils import DefaultEmbeddingFunctionAdapter
from langchain_core.documents import Document
from langchain_oceanbase.vectorstores import OceanbaseVectorStore
# Connection configuration
connection_args = {
"host": "127.0.0.1",
"port": "2881",
"user": "root@test",
"password": "",
"db_name": "test",
}
# Initialize embeddings (using DefaultEmbeddingFunctionAdapter for demo)
embeddings = DefaultEmbeddingFunctionAdapter()
# Create vector store with hybrid search capabilities
vector_store = OceanbaseVectorStore(
embedding_function=embeddings,
table_name="hybrid_search_demo",
connection_args=connection_args,
vidx_metric_type="l2",
include_sparse=True, # Enable sparse vector search
include_fulltext=True, # Enable full-text search
drop_old=True,
embedding_dim=384,
)
print("Hybrid search vector store initialized successfully!")Vector search provides semantic similarity matching using embeddings:
# Add some sample documents
documents = [
Document(
page_content="Machine learning is a subset of artificial intelligence",
metadata={"topic": "AI", "author": "John Doe", "year": 2024}
),
Document(
page_content="Deep learning uses neural networks for pattern recognition",
metadata={"topic": "AI", "author": "Jane Smith", "year": 2023}
),
Document(
page_content="Natural language processing enables computers to understand human language",
metadata={"topic": "NLP", "author": "Bob Wilson", "year": 2024}
),
Document(
page_content="Computer vision allows machines to interpret visual information",
metadata={"topic": "CV", "author": "Alice Brown", "year": 2023}
),
]
# Add documents to vector store
ids = vector_store.add_documents(documents)
print(f"Added {len(ids)} documents to the vector store")
# Perform basic vector search
results = vector_store.similarity_search("artificial intelligence", k=3)
print(f"\nVector search results for 'artificial intelligence':")
for i, doc in enumerate(results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()Sparse vector search is useful for keyword-based exact matching:
# Create sparse embeddings for the documents
# Each sparse vector represents keyword weights
sparse_embeddings = [
{1: 0.8, 5: 0.6, 10: 0.4, 15: 0.2}, # machine learning AI
{2: 0.7, 6: 0.5, 11: 0.3, 16: 0.1}, # deep learning neural
{3: 0.9, 7: 0.4, 12: 0.6, 17: 0.3}, # natural language processing
{4: 0.6, 8: 0.8, 13: 0.2, 18: 0.5}, # computer vision
]
# Add documents with sparse embeddings
sparse_ids = vector_store.add_sparse_documents(
documents=documents,
sparse_embeddings=sparse_embeddings,
)
print(f"Added sparse embeddings for {len(sparse_ids)} documents")
# Perform sparse vector search
sparse_query = {1: 0.8, 5: 0.6, 10: 0.4} # Query for "machine learning AI"
sparse_results = vector_store.similarity_search_with_sparse_vector(
sparse_query=sparse_query,
k=3
)
print(f"\nSparse vector search results:")
for i, doc in enumerate(sparse_results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()Full-text search enables searching within document content:
# Create full-text content for the documents
fulltext_content = [
"Machine learning algorithms enable computers to learn from data without explicit programming",
"Deep learning neural networks can process complex patterns in images and text",
"Natural language processing combines computational linguistics with machine learning",
"Computer vision systems can identify objects and scenes in digital images",
]
# Add documents with full-text content
fulltext_ids = vector_store.add_documents_with_fulltext(
documents=documents,
fulltext_content=fulltext_content,
)
print(f"Added full-text content for {len(fulltext_ids)} documents")
# Perform full-text search
fulltext_results = vector_store.similarity_search_with_fulltext(
query="artificial intelligence",
fulltext_query="neural networks",
k=3
)
print(f"\nFull-text search results:")
for i, doc in enumerate(fulltext_results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()The most powerful feature is combining all three search modalities:
# Advanced hybrid search combining all three modalities
hybrid_results = vector_store.advanced_hybrid_search(
vector_query="AI technology", # Semantic search
sparse_query={1: 0.8, 5: 0.6}, # Keyword search
fulltext_query="machine learning", # Text search
k=4
)
print(f"Advanced hybrid search results (default weights):")
for i, doc in enumerate(hybrid_results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()
# Advanced hybrid search with custom weights
custom_weights = {
'vector': 0.7, # Emphasize semantic similarity
'sparse': 0.2, # Reduce keyword matching
'fulltext': 0.1 # Reduce text search
}
weighted_results = vector_store.advanced_hybrid_search(
vector_query="AI technology",
sparse_query={1: 0.8, 5: 0.6},
fulltext_query="machine learning",
k=4,
modality_weights=custom_weights
)
print(f"Advanced hybrid search results (custom weights):")
for i, doc in enumerate(weighted_results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()You can also combine hybrid search with advanced filtering:
# Advanced filtering with hybrid search
filters = {
'fulltext': 'machine learning'
}
filtered_results = vector_store.similarity_search_with_advanced_filters(
query="artificial intelligence",
filters=filters,
k=3
)
print(f"Advanced filtering results:")
for i, doc in enumerate(filtered_results, 1):
print(f"{i}. {doc.page_content[:50]}...")
print(f" Metadata: {doc.metadata}")
print()- Vector Search: Semantic similarity matching using embeddings
- Sparse Vector Search: Keyword-based exact matching
- Full-text Search: Content-based text search
- Combines results from multiple search modalities
- Uses weighted scoring system for optimal ranking
- Normalizes scores across different modalities
- Enable/disable specific search modalities
- Configurable weights for different search types (NEW!)
- Support for various distance metrics
- Custom weight validation and error handling
- Leverages OceanBase's vector indexes
- Parallel execution of multiple searches
- Efficient result fusion algorithms
The OceanBase Vector Store hybrid search provides:
- True multi-modal search combining vector, sparse vector, and full-text search
- Intelligent result fusion with configurable weights (NEW!)
- High performance leveraging OceanBase's native capabilities
- Flexible configuration for different use cases
- Backward compatibility with existing vector search functionality
- Weight validation ensuring proper configuration
# Default weights (automatic)
results = vector_store.advanced_hybrid_search(
vector_query="AI technology",
sparse_query={1: 0.8, 5: 0.6},
fulltext_query="machine learning"
)
# Custom weights emphasizing semantic similarity
semantic_weights = {'vector': 0.8, 'sparse': 0.1, 'fulltext': 0.1}
results = vector_store.advanced_hybrid_search(
vector_query="AI technology",
sparse_query={1: 0.8, 5: 0.6},
fulltext_query="machine learning",
modality_weights=semantic_weights
)
# Custom weights emphasizing keyword matching
keyword_weights = {'vector': 0.2, 'sparse': 0.7, 'fulltext': 0.1}
results = vector_store.advanced_hybrid_search(
vector_query="AI technology",
sparse_query={1: 0.8, 5: 0.6},
fulltext_query="machine learning",
modality_weights=keyword_weights
)This makes it ideal for applications requiring both semantic understanding and precise keyword matching, such as academic paper search, product catalogs, and knowledge bases.