title	Integrate TiDB Vector Search with SQLAlchemy
summary	Learn how to integrate TiDB Vector Search with SQLAlchemy to store embeddings and perform semantic searches.

Integrate TiDB Vector Search with SQLAlchemy

This tutorial walks you through how to use SQLAlchemy to interact with TiDB Vector Search, store embeddings, and perform vector search queries.

Warning:

The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed without prior notice. If you find a bug, you can report an issue on GitHub.

Note:

The vector search feature is in beta. It might be changed without prior notice. If you find a bug, you can report an issue on GitHub.

Note:

The vector search feature is available on TiDB Self-Managed, {{{ .starter }}}, {{{ .essential }}}, and TiDB Cloud Dedicated. For TiDB Self-Managed and TiDB Cloud Dedicated, the TiDB version must be v8.4.0 or later (v8.5.0 or later is recommended).

Prerequisites

To complete this tutorial, you need:

Python 3.8 or higher installed.
Git installed.
A TiDB cluster.

If you don't have a TiDB cluster, you can create one as follows:

Follow Deploy a local test TiDB cluster or Deploy a production TiDB cluster to create a local cluster.
Follow Creating a {{{ .starter }}} cluster to create your own TiDB Cloud cluster.

If you don't have a TiDB cluster, you can create one as follows:

(Recommended) Follow Creating a {{{ .starter }}} cluster to create your own TiDB Cloud cluster.
Follow Deploy a local test TiDB cluster or Deploy a production TiDB cluster to create a local cluster of v8.4.0 or a later version.

Run the sample app

You can quickly learn about how to integrate TiDB Vector Search with SQLAlchemy by following the steps below.

Step 1. Clone the repository

Clone the tidb-vector-python repository to your local machine:

git clone https://github.com/pingcap/tidb-vector-python.git

Step 2. Create a virtual environment

Create a virtual environment for your project:

cd tidb-vector-python/examples/orm-sqlalchemy-quickstart
python3 -m venv .venv
source .venv/bin/activate

Step 3. Install the required dependencies

Install the required dependencies for the demo project:

pip install -r requirements.txt

Alternatively, you can install the following packages for your project:

pip install pymysql python-dotenv sqlalchemy tidb-vector

Step 4. Configure the environment variables

Configure the environment variables depending on the TiDB deployment option you've selected.

For a {{{ .starter }}} cluster, take the following steps to obtain the cluster connection string and configure environment variables:

Navigate to the Clusters page, and then click the name of your target cluster to go to its overview page.
Click Connect in the upper-right corner. A connection dialog is displayed.
Ensure the configurations in the connection dialog match your environment.
- Connection Type is set to Public.
- Branch is set to main.
- Connect With is set to SQLAlchemy.
- Operating System matches your environment.
Tip:

If your program is running in Windows Subsystem for Linux (WSL), switch to the corresponding Linux distribution.
Click the PyMySQL tab and copy the connection string.

Tip:

If you have not set a password yet, click Generate Password to generate a random password.

In the root directory of your Python project, create a .env file and paste the connection string into it.

The following is an example for macOS:

TIDB_DATABASE_URL="mysql+pymysql://<prefix>.root:<password>@gateway01.<region>.prod.aws.tidbcloud.com:4000/test?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true"

For a TiDB Self-Managed cluster, create a .env file in the root directory of your Python project. Copy the following content into the .env file, and modify the environment variable values according to the connection parameters of your TiDB cluster:

TIDB_DATABASE_URL="mysql+pymysql://<USER>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>"
# For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test"

If you are running TiDB on your local machine, <HOST> is 127.0.0.1 by default. The initial <PASSWORD> is empty, so if you are starting the cluster for the first time, you can omit this field.

The following are descriptions for each parameter:

<USER>: The username to connect to the TiDB cluster.
<PASSWORD>: The password to connect to the TiDB cluster.
<HOST>: The host of the TiDB cluster.
<PORT>: The port of the TiDB cluster.
<DATABASE>: The name of the database you want to connect to.

Step 5. Run the demo

python sqlalchemy-quickstart.py

Example output:

Get 3-nearest neighbor documents:
  - distance: 0.00853986601633272
    document: fish
  - distance: 0.12712843905603044
    document: dog
  - distance: 0.7327387580875756
    document: tree
Get documents within a certain distance:
  - distance: 0.00853986601633272
    document: fish
  - distance: 0.12712843905603044
    document: dog

Sample code snippets

You can refer to the following sample code snippets to develop your application.

Create vector tables

Connect to TiDB cluster

import os
import dotenv

from sqlalchemy import Column, Integer, create_engine, Text
from sqlalchemy.orm import declarative_base, Session
from tidb_vector.sqlalchemy import VectorType

dotenv.load_dotenv()

tidb_connection_string = os.environ['TIDB_DATABASE_URL']
engine = create_engine(tidb_connection_string)

Define a vector column

Create a table with a column named embedding that stores a 3-dimensional vector.

Base = declarative_base()

class Document(Base):
    __tablename__ = 'sqlalchemy_demo_documents'
    id = Column(Integer, primary_key=True)
    content = Column(Text)
    embedding = Column(VectorType(3))

Store documents with embeddings

with Session(engine) as session:
   session.add(Document(content="dog", embedding=[1, 2, 1]))
   session.add(Document(content="fish", embedding=[1, 2, 4]))
   session.add(Document(content="tree", embedding=[1, 0, 0]))
   session.commit()

Search the nearest neighbor documents

Search for the top-3 documents that are semantically closest to the query vector [1, 2, 3] based on the cosine distance function.

with Session(engine) as session:
   distance = Document.embedding.cosine_distance([1, 2, 3]).label('distance')
   results = session.query(
      Document, distance
   ).order_by(distance).limit(3).all()

Search documents within a certain distance

Search for documents whose cosine distance from the query vector [1, 2, 3] is less than 0.2.

with Session(engine) as session:
    distance = Document.embedding.cosine_distance([1, 2, 3]).label('distance')
    results = session.query(
        Document, distance
    ).filter(distance < 0.2).order_by(distance).limit(3).all()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate TiDB Vector Search with SQLAlchemy

Prerequisites

Run the sample app

Step 1. Clone the repository

Step 2. Create a virtual environment

Step 3. Install the required dependencies

Step 4. Configure the environment variables

Step 5. Run the demo

Sample code snippets

Create vector tables

Connect to TiDB cluster

Define a vector column

Store documents with embeddings

Search the nearest neighbor documents

Search documents within a certain distance

See also

FilesExpand file tree

vector-search-integrate-with-sqlalchemy.md

Latest commit

History

vector-search-integrate-with-sqlalchemy.md

File metadata and controls

Integrate TiDB Vector Search with SQLAlchemy

Prerequisites

Run the sample app

Step 1. Clone the repository

Step 2. Create a virtual environment

Step 3. Install the required dependencies

Step 4. Configure the environment variables

Step 5. Run the demo

Sample code snippets

Create vector tables

Connect to TiDB cluster

Define a vector column

Store documents with embeddings

Search the nearest neighbor documents

Search documents within a certain distance

See also