feat: add Amazon Textract integration (#2391) by zafatar · Pull Request #3148 · deepset-ai/haystack-core-integrations

zafatar · 2026-04-13T12:37:24Z

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).

Related Issues

fixes Add support for AWS textract #2391

Proposed Changes:

Similar to the other converter tools such as Azure Document Intelligence or other Amazon resources such as Amazon Bedrock, it covers the access to the Amazon Textract by using boto3 and AWS credentials from the environment variables.

How did you test it?

The tests are run as two separate groups:

cd ./integrations/amazon_textract
hatch run test:unit
hatch run test:integration

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).

CLAassistant · 2026-04-13T12:44:38Z

All committers have signed the CLA.

zafatar requested a review from a team as a code owner April 13, 2026 12:37

zafatar requested review from bogdankostic and removed request for a team April 13, 2026 12:37

github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 13, 2026

feat: add Amazon Textract examples

543e1a6

zafatar added 4 commits April 13, 2026 14:53

fix: linting issue with the github workflow file

9563b7c

fix: typo causing faillure of api-reference-build in CI

ef75627

fix: update naming inconvention

abe2ca4

fix: remove redundant imports and errors

1c4865d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Amazon Textract integration (#2391)#3148

feat: add Amazon Textract integration (#2391)#3148
zafatar wants to merge 6 commits intodeepset-ai:mainfrom
zafatar:main

zafatar commented Apr 13, 2026

Uh oh!

CLAassistant commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zafatar commented Apr 13, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

CLAassistant commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Apr 13, 2026 •

edited

Loading