This is a simple demo to show how to use Jina AI to generate embeddings for text data. Then store the embeddings in TiDB Vector Storage and search for similar embeddings.
- A running TiDB Serverless cluster with vector search enabled
- Python 3.8 or later
- Jina AI API key
git clone https://github.com/pingcap/tidb-vector-python.gitcd tidb-vector-python/examples/jina-ai-embeddings-demo
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtGet the Jina AI API key from the Jina AI Embedding API page
Get the HOST, PORT, USERNAME, PASSWORD, DATABASE, and CA parameters from the TiDB Cloud console (see Prerequisites), and then replace the following placeholders to get the TIDB_DATABASE_URL.
export JINA_API_KEY="****"
export TIDB_DATABASE_URL="mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:4000/<DATABASE>?ssl_ca=<CA>&ssl_verify_cert=true&ssl_verify_identity=true"or create a .env file with the above environment variables.
$ python jina-ai-embeddings-demo.py
- Inserting Data to TiDB...
- Inserting: Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.
- Inserting: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
- List All Documents and Their Distances to the Query:
- distance: 0.3585317326132522
content: Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.
- distance: 0.10858102967720984
content: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.
- The Most Relevant Document and Its Distance to the Query:
- distance: 0.10858102967720984
content: TiDB is an open-source MySQL-compatible database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads.