Summary and motivation
-
A new sqlalchemy-haystack integration to haystack-core-integrations that wraps the SQLAlchemyTableRetriever component.
-
Mirrors the structure and conventions of the existing snowflake integration and enables users to retrieve tabular data from any SQLAlchemy-supported database (PostgreSQL, MySQL, SQLite, MSSQL, etc.) within Haystack pipelines.
-
A PR to haystack-integrations is also required so the integration appears on the Haystack integrations webpage.
Detailed design
To create a new integration, run the scaffold script from the root of the repository:
https://github.com/deepset-ai/haystack-core-integrations/blob/main/CONTRIBUTING.md#create-a-new-integration
The component implementation already exists (see the attached code) and needs to be packaged according to the repo's conventions.
Structure
Create the directory integrations/sqlalchemy/ with the following structure:
integrations/sqlalchemy/
├── pyproject.toml
├── README.md
├── LICENSE
├── src/
│ └── haystack_integrations/
│ └── components/
│ └── retrievers/
│ └── sqlalchemy/
│ ├── __init__.py
│ └── sqlalchemy_table_retriever.py ← existing code
└── tests/
├── __init__.py
└── test_sqlalchemy_table_retriever.py
.github/workflows/sqlalchemy.yml - GitHub actions workflow file at
pyproject.toml - follow the pattern from the Snowflake integration
Tests
- unit tests (no live database required — use SQLite in-memory or Docker for PostgreSQL/MySQL).
- Tests should cover:
| Test |
Description |
test_init_defaults |
Component initialises correctly with only drivername provided |
test_init_all_params |
Component stores all constructor params as attributes |
test_to_dict |
Serialises correctly; password uses Secret serialisation format |
test_from_dict |
Round-trips through to_dict / from_dict |
test_run_empty_query |
Returns error="empty query" and an empty DataFrame when query is "" |
test_run_returns_dataframe |
Using SQLite in-memory: warm_up() + run("SELECT 1") returns a non-empty DataFrame and a Markdown table string |
test_run_returns_markdown |
Validates that table output is a valid Markdown string (contains |) |
test_run_sql_error |
Passes a malformed query; asserts error field is non-empty |
test_max_row_limit |
Monkeypatches MAX_SYS_ROWS to a small value and asserts truncation |
test_warm_up_with_init_script |
Verifies that init_script is executed on warm_up() (SQLite only) |
README.md
Follow the Snowflake README structure:
- Title:
sqlalchemy-haystack
- Sections: Installation, Usage (basic query + pipeline example), Supported Databases, Configuration Reference, License
- Installation:
pip install sqlalchemy-haystack
- Import path:
from haystack_integrations.components.retrievers.sqlalchemy import SQLAlchemyTableRetriever
- Include a usage example showing SQLite and a generic RDBMS (e.g. PostgreSQL via
postgresql+psycopg2)
- Include a pipeline example with
PromptBuilder + OpenAIGenerator (mirror the Snowflake example)
PR ondeepset-ai/haystack-integrations
Create integrations/sqlalchemy.md with the following front matter and content - see other pages for specific structure
---
layout: integration
name: SQLAlchemy
description: >
Retrieve tabular data from any SQLAlchemy-supported database (PostgreSQL,
MySQL, SQLite, MSSQL, and more) directly in Haystack pipelines.
toc: true
---
The body should contain:
- Installation section (
pip install sqlalchemy-haystack)
- Usage section with an initialisation example and a
run() call
- Pipeline example with
PromptBuilder + OpenAIGenerator
- License section
Note: A SQLAlchemy logo (sqlalchemy.png) needs to be added to the /logos/ directory in the same PR
Checklist
If the request is accepted, ensure the following checklist is complete before closing this issue.
Tasks
Summary and motivation
A new
sqlalchemy-haystackintegration tohaystack-core-integrationsthat wraps theSQLAlchemyTableRetrievercomponent.Mirrors the structure and conventions of the existing
snowflakeintegration and enables users to retrieve tabular data from any SQLAlchemy-supported database (PostgreSQL, MySQL, SQLite, MSSQL, etc.) within Haystack pipelines.A PR to
haystack-integrationsis also required so the integration appears on the Haystack integrations webpage.Detailed design
To create a new integration, run the scaffold script from the root of the repository:
https://github.com/deepset-ai/haystack-core-integrations/blob/main/CONTRIBUTING.md#create-a-new-integration
The component implementation already exists (see the attached code) and needs to be packaged according to the repo's conventions.
Structure
Create the directory
integrations/sqlalchemy/with the following structure:.github/workflows/sqlalchemy.yml- GitHub actions workflow file atpyproject.toml- follow the pattern from the Snowflake integrationTests
test_init_defaultsdrivernameprovidedtest_init_all_paramstest_to_dictSecretserialisation formattest_from_dictto_dict/from_dicttest_run_empty_queryerror="empty query"and an empty DataFrame when query is""test_run_returns_dataframewarm_up()+run("SELECT 1")returns a non-empty DataFrame and a Markdown table stringtest_run_returns_markdowntableoutput is a valid Markdown string (contains|)test_run_sql_errorerrorfield is non-emptytest_max_row_limitMAX_SYS_ROWSto a small value and asserts truncationtest_warm_up_with_init_scriptinit_scriptis executed onwarm_up()(SQLite only)README.mdFollow the Snowflake README structure:
sqlalchemy-haystackpip install sqlalchemy-haystackfrom haystack_integrations.components.retrievers.sqlalchemy import SQLAlchemyTableRetrieverpostgresql+psycopg2)PromptBuilder+OpenAIGenerator(mirror the Snowflake example)PR on
deepset-ai/haystack-integrationsCreate
integrations/sqlalchemy.mdwith the following front matter and content - see other pages for specific structureThe body should contain:
pip install sqlalchemy-haystack)run()callPromptBuilder+OpenAIGeneratorChecklist
If the request is accepted, ensure the following checklist is complete before closing this issue.
Tasks
mainbranchintegration:<your integration name>has been added to the list of labels for this repository