Skip to content

feat: add Amazon Textract integration (#2391)#3148

Open
zafatar wants to merge 6 commits intodeepset-ai:mainfrom
zafatar:main
Open

feat: add Amazon Textract integration (#2391)#3148
zafatar wants to merge 6 commits intodeepset-ai:mainfrom
zafatar:main

Conversation

@zafatar
Copy link
Copy Markdown

@zafatar zafatar commented Apr 13, 2026

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).

Related Issues

Proposed Changes:

Similar to the other converter tools such as Azure Document Intelligence or other Amazon resources such as Amazon Bedrock, it covers the access to the Amazon Textract by using boto3 and AWS credentials from the environment variables.

How did you test it?

The tests are run as two separate groups:

cd ./integrations/amazon_textract
hatch run test:unit
hatch run test:integration

Notes for the reviewer

Checklist

Add AmazonTextractConverter component that extracts text from images and single-page PDFs using the AWS Textract synchronous API. Supports both DetectDocumentText (plain OCR) and AnalyzeDocument (tables, forms, signatures, layout) as well as natural-language queries. Includes CI workflow, unit/integration tests, pydoc config, and repo-level wiring (labeler, coverage comment, README).
@zafatar zafatar requested a review from a team as a code owner April 13, 2026 12:37
@zafatar zafatar requested review from bogdankostic and removed request for a team April 13, 2026 12:37
@github-actions github-actions bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 13, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for AWS textract

2 participants