This module contains a Superlinked implementation that leverages a mixture of embedders for advanced semantic search capabilities. The pipeline supports query feature extraction, embedding generation, and configurable search approaches with multiple embedding models.
Superlinked is a vector database platform that provides:
- Multi-Embedder Support: Combines multiple embedding models for enhanced search relevance
- Flexible Schema Design: Supports multiple spaces
- Advanced Filtering: Provides sophisticated filtering capabilities (soft and hard)
- Query Parameter Integration: Uses extracted query parameters for enhanced search precision
The module supports two main applications:
- State-of-the-art implementation with advanced features
- Supports NLQ (Natural Language Query) parameter extraction
- Uses ground truth parameters for baseline comparison
- Full text search capabilities
- Baseline implementation for comparison (stringify-and-embedd)
- Simplified configuration
- Focus on full text search
Make sure you have updated config files (local ones depending on your app) and a global one with you GCP and Redis credentials and paths
Run the module using Python's module execution:
# Run the SOTA app
python -m superlinked_app.app sota_app
# Run the baseline app (stringify-and-embedd)
python -m superlinked_app.app sota_app_baselineNote: if you encounter problems with Superlinked and Redis, then run this line
pip install 'superlinked[redis]'The implementation supports multiple search approaches:
- Uses parameters extracted by LLM from natural language queries
- Applies dynamic filtering based on extracted attributes
- Combines multi-embedder search with structured filtering
- Uses predefined ground truth parameters
- Applies exact filtering based on known query parameters
- Provides baseline performance comparison
- Uses only the query text for search
- Leverages Stringify-and-embedd approach
- No additional filtering applied
app.py: Main application entry pointregistry.py: Module registry for different app configurationsutil/: Utility functions and enums
Each app (sota_app, sota_app_baseline) contains:
config.py: Configuration settings and parametersdata_prep.py: Data preprocessing functionsindex.py: Superlinked index creation and configurationquery.py: Query configuration and nlq descriptionsnlq.py: Natural Language Query configuration
- GCP bucket
- OpenAI API key for NLQ
- Redis instance for vector storage
Each app has its own configuration in apps/{app_name}/config.py:
- Superlinked: Embedder settings, Query mode (USE_FULL_QUERY_TEXT, USE_GROUND_TRUTH_QUERY_INPUTS, USE_NLQ), Reingest option, Redo nlq option
- Data Sources: Google Cloud Storage bucket paths for products and queries
- Output Files: Paths for results and query parameters
- Redis: Vector database connection settings