Name	Name	Last commit message	Last commit date
parent directory ..
DataModules.TextExtraction.dfm	DataModules.TextExtraction.dfm
DataModules.TextExtraction.pas	DataModules.TextExtraction.pas
DataModules.TextToSpeech.dfm	DataModules.TextToSpeech.dfm
DataModules.TextToSpeech.pas	DataModules.TextToSpeech.pas
DataModules.Translation.dfm	DataModules.Translation.dfm
DataModules.Translation.pas	DataModules.Translation.pas
Forms.Main.fmx	Forms.Main.fmx
Forms.Main.pas	Forms.Main.pas
README.md	README.md
ReadWhatYouSee.dpr	ReadWhatYouSee.dpr
ReadWhatYouSee.dproj	ReadWhatYouSee.dproj
ReadWhatYouSee.res	ReadWhatYouSee.res
Screenshot.png	Screenshot.png

Name

Last commit message

Last commit date

DataModules.TextExtraction.dfm

DataModules.TextExtraction.pas

DataModules.TextToSpeech.dfm

DataModules.TextToSpeech.pas

DataModules.Translation.dfm

DataModules.Translation.pas

Read What You See

Read What You See chains three AWS services together: load an image, click on any text in it, and the app will extract it (Textract), translate it (Translate), and read it aloud (Polly).

What's demonstrated

Three services, one interaction:

Amazon Textract — TTextractClient.DetectDocumentText to extract text and bounding boxes from the image
Amazon Translate — TTranslateClient.TranslateText with automatic source language detection
Amazon Polly — TPollyClient.DescribeVoices and SynthesizeSpeech to speak the translated text

Architecture

Each AWS service lives in its own DataModule, keeping the main form focused on UI and coordination:

Unit	Responsibility
`DataModules.TextExtraction`	Calls Textract, stores detected blocks, performs hit-testing by coordinate
`DataModules.Translation`	Translates text with automatic source language detection
`DataModules.TextToSpeech`	Manages Polly voices and synthesizes speech for the target language
`Forms.Main`	Coordinates the above: image display, click handling, language selection

The flow

User clicks text in image
        |
        v
TextExtractionDM.SelectTextAt(point)     -- hit-test against Textract bounding boxes
        |
        v
TranslationDM.TranslateText(selected)    -- translate to target language
        |
        v
TextToSpeechDM.ConvertTextToSpeech(text) -- synthesize and play audio

Running the sample

Open "ReadWhatYouSee.dproj" in Delphi or RAD Studio.
Select "Run > Run" from the menu or press F9.
Click "Open file..." and select an image containing text.
Once the image has loaded, click on any text in the image to select it.
Choose the output language from the drop-down on the top-left.
The selected text will be translated and spoken aloud automatically.

Required IAM permissions

textract:DetectDocumentText
translate:TranslateText
polly:DescribeVoices
polly:SynthesizeSpeech

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Read What You See

What's demonstrated

Architecture

The flow

Running the sample

Required IAM permissions

FilesExpand file tree

ReadWhatYouSee

Directory actions

More options

Directory actions

More options

Latest commit

History

ReadWhatYouSee

Folders and files

parent directory

README.md

Read What You See

What's demonstrated

Architecture

The flow

Running the sample

Required IAM permissions