Skip to content

New Integration: SQLAlchemy Table Retriever #3077

@davidsbatista

Description

@davidsbatista

Summary and motivation

  • A new sqlalchemy-haystack integration to haystack-core-integrations that wraps the SQLAlchemyTableRetriever component.

  • Mirrors the structure and conventions of the existing snowflake integration and enables users to retrieve tabular data from any SQLAlchemy-supported database (PostgreSQL, MySQL, SQLite, MSSQL, etc.) within Haystack pipelines.

  • A PR to haystack-integrations is also required so the integration appears on the Haystack integrations webpage.

Detailed design

To create a new integration, run the scaffold script from the root of the repository:

https://github.com/deepset-ai/haystack-core-integrations/blob/main/CONTRIBUTING.md#create-a-new-integration

The component implementation already exists (see the attached code) and needs to be packaged according to the repo's conventions.

Structure

Create the directory integrations/sqlalchemy/ with the following structure:

integrations/sqlalchemy/
├── pyproject.toml
├── README.md
├── LICENSE
├── src/
│   └── haystack_integrations/
│       └── components/
│           └── retrievers/
│               └── sqlalchemy/
│                   ├── __init__.py
│                   └── sqlalchemy_table_retriever.py   ← existing code
└── tests/
    ├── __init__.py
    └── test_sqlalchemy_table_retriever.py
  • .github/workflows/sqlalchemy.yml - GitHub actions workflow file at
  • pyproject.toml - follow the pattern from the Snowflake integration

Tests

  • unit tests (no live database required — use SQLite in-memory or Docker for PostgreSQL/MySQL).
  • Tests should cover:
Test Description
test_init_defaults Component initialises correctly with only drivername provided
test_init_all_params Component stores all constructor params as attributes
test_to_dict Serialises correctly; password uses Secret serialisation format
test_from_dict Round-trips through to_dict / from_dict
test_run_empty_query Returns error="empty query" and an empty DataFrame when query is ""
test_run_returns_dataframe Using SQLite in-memory: warm_up() + run("SELECT 1") returns a non-empty DataFrame and a Markdown table string
test_run_returns_markdown Validates that table output is a valid Markdown string (contains |)
test_run_sql_error Passes a malformed query; asserts error field is non-empty
test_max_row_limit Monkeypatches MAX_SYS_ROWS to a small value and asserts truncation
test_warm_up_with_init_script Verifies that init_script is executed on warm_up() (SQLite only)

README.md

Follow the Snowflake README structure:

  • Title: sqlalchemy-haystack
  • Sections: Installation, Usage (basic query + pipeline example), Supported Databases, Configuration Reference, License
  • Installation: pip install sqlalchemy-haystack
  • Import path: from haystack_integrations.components.retrievers.sqlalchemy import SQLAlchemyTableRetriever
  • Include a usage example showing SQLite and a generic RDBMS (e.g. PostgreSQL via postgresql+psycopg2)
  • Include a pipeline example with PromptBuilder + OpenAIGenerator (mirror the Snowflake example)

PR ondeepset-ai/haystack-integrations

Create integrations/sqlalchemy.md with the following front matter and content - see other pages for specific structure

---
layout: integration
name: SQLAlchemy
description: >
  Retrieve tabular data from any SQLAlchemy-supported database (PostgreSQL,
  MySQL, SQLite, MSSQL, and more) directly in Haystack pipelines.
toc: true
---

The body should contain:

  • Installation section (pip install sqlalchemy-haystack)
  • Usage section with an initialisation example and a run() call
  • Pipeline example with PromptBuilder + OpenAIGenerator
  • License section

Note: A SQLAlchemy logo (sqlalchemy.png) needs to be added to the /logos/ directory in the same PR

Checklist

If the request is accepted, ensure the following checklist is complete before closing this issue.

Tasks

  • The code is documented with docstrings and was merged in the main branch
  • Docs are published at https://docs.haystack.deepset.ai/
  • There is a Github workflow running the tests for the integration nightly and at every PR
  • A new label named like integration:<your integration name> has been added to the list of labels for this repository
  • The labeler.yml file has been updated
  • The package has been released on PyPI
  • An integration tile with a usage example has been added to https://github.com/deepset-ai/haystack-integrations
  • The integration has been listed in the Inventory section of this repo README
  • The feature was announced through social media

Metadata

Metadata

Assignees

Labels

P3contributions wanted!Looking for external contributionsnew integrationDiscuss the creation of a new integration in Core

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions