|
| 1 | +--- |
| 2 | +layout: integration |
| 3 | +name: SQLAlchemy |
| 4 | +description: Query any SQL database from a Haystack pipeline using SQLAlchemy |
| 5 | +authors: |
| 6 | + - name: deepset |
| 7 | + socials: |
| 8 | + github: deepset-ai |
| 9 | + twitter: deepset_ai |
| 10 | + linkedin: https://www.linkedin.com/company/deepset-ai/ |
| 11 | +pypi: https://pypi.org/project/sqlalchemy-haystack |
| 12 | +repo: https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/sqlalchemy |
| 13 | +type: Data Ingestion |
| 14 | +report_issue: https://github.com/deepset-ai/haystack-core-integrations/issues |
| 15 | +logo: /logos/sqlalchemy.png |
| 16 | +version: Haystack 2.0 |
| 17 | +toc: true |
| 18 | +--- |
| 19 | + |
| 20 | +**Table of Contents** |
| 21 | + |
| 22 | +- [Overview](#overview) |
| 23 | +- [Installation](#installation) |
| 24 | +- [Usage](#usage) |
| 25 | +- [Security](#security) |
| 26 | +- [License](#license) |
| 27 | + |
| 28 | +## Overview |
| 29 | + |
| 30 | +The SQLAlchemy integration provides a `SQLAlchemyTableRetriever` component that connects to any |
| 31 | +[SQLAlchemy](https://www.sqlalchemy.org/)-supported database, executes a SQL query, and returns |
| 32 | +results as a Pandas DataFrame and an optional Markdown-formatted table string. |
| 33 | + |
| 34 | +Supported backends include PostgreSQL, MySQL, MariaDB, SQLite, MSSQL, and Oracle — anything |
| 35 | +SQLAlchemy supports works out of the box. |
| 36 | + |
| 37 | +This component is designed for Text-to-SQL pipelines where an LLM generates a SQL query and the |
| 38 | +retriever fetches the corresponding rows for downstream processing. |
| 39 | + |
| 40 | +## Installation |
| 41 | + |
| 42 | +```bash |
| 43 | +pip install sqlalchemy-haystack |
| 44 | +``` |
| 45 | + |
| 46 | +You also need to install the appropriate database driver for your backend: |
| 47 | + |
| 48 | +| Backend | Driver package | |
| 49 | +|---------|----------------| |
| 50 | +| PostgreSQL | `psycopg2-binary` or `psycopg[binary]` | |
| 51 | +| MySQL / MariaDB | `pymysql` or `mysqlclient` | |
| 52 | +| SQLite | built-in (no extra package needed) | |
| 53 | +| MSSQL | `pyodbc` | |
| 54 | +| Oracle | `cx_oracle` or `oracledb` | |
| 55 | + |
| 56 | +## Usage |
| 57 | + |
| 58 | +### SQLite (in-memory) |
| 59 | + |
| 60 | +```python |
| 61 | +from haystack_integrations.components.retrievers.sqlalchemy import SQLAlchemyTableRetriever |
| 62 | + |
| 63 | +retriever = SQLAlchemyTableRetriever( |
| 64 | + drivername="sqlite", |
| 65 | + database=":memory:", |
| 66 | + init_script=[ |
| 67 | + "CREATE TABLE products (id INTEGER, name TEXT, price REAL)", |
| 68 | + "INSERT INTO products VALUES (1, 'Widget', 9.99)", |
| 69 | + "INSERT INTO products VALUES (2, 'Gadget', 19.99)", |
| 70 | + ], |
| 71 | +) |
| 72 | +retriever.warm_up() |
| 73 | + |
| 74 | +result = retriever.run(query="SELECT * FROM products WHERE price < 15") |
| 75 | +print(result["dataframe"]) |
| 76 | +print(result["table"]) |
| 77 | +``` |
| 78 | + |
| 79 | +### PostgreSQL |
| 80 | + |
| 81 | +```python |
| 82 | +from haystack.utils import Secret |
| 83 | +from haystack_integrations.components.retrievers.sqlalchemy import SQLAlchemyTableRetriever |
| 84 | + |
| 85 | +retriever = SQLAlchemyTableRetriever( |
| 86 | + drivername="postgresql+psycopg2", |
| 87 | + host="localhost", |
| 88 | + port=5432, |
| 89 | + database="mydb", |
| 90 | + username="myuser", |
| 91 | + password=Secret.from_env_var("DB_PASSWORD"), |
| 92 | +) |
| 93 | +retriever.warm_up() |
| 94 | + |
| 95 | +result = retriever.run(query="SELECT * FROM orders LIMIT 10") |
| 96 | +print(result["dataframe"]) |
| 97 | +``` |
| 98 | + |
| 99 | + |
| 100 | +## Security |
| 101 | + |
| 102 | +This component executes raw SQL queries at runtime. Keep the following in mind: |
| 103 | + |
| 104 | +- **Never pass unsanitised user input** directly as a query — this exposes you to SQL injection. |
| 105 | +- **Use a read-only database user.** Even if a malicious query is executed, a read-only user cannot modify or delete data. |
| 106 | +- **Restrict database permissions** to the minimum required — specific tables and schemas only, no DDL privileges (`CREATE`, `DROP`, `ALTER`). |
| 107 | + |
| 108 | +## License |
| 109 | + |
| 110 | +`sqlalchemy-haystack` is distributed under the terms of the [Apache-2.0](https://spdx.org/licenses/Apache-2.0.html) license. |
0 commit comments