Read What You See chains three AWS services together: load an image, click on any text in it, and the app will extract it (Textract), translate it (Translate), and read it aloud (Polly).
Three services, one interaction:
- Amazon Textract —
TTextractClient.DetectDocumentTextto extract text and bounding boxes from the image - Amazon Translate —
TTranslateClient.TranslateTextwith automatic source language detection - Amazon Polly —
TPollyClient.DescribeVoicesandSynthesizeSpeechto speak the translated text
Each AWS service lives in its own DataModule, keeping the main form focused on UI and coordination:
| Unit | Responsibility |
|---|---|
DataModules.TextExtraction |
Calls Textract, stores detected blocks, performs hit-testing by coordinate |
DataModules.Translation |
Translates text with automatic source language detection |
DataModules.TextToSpeech |
Manages Polly voices and synthesizes speech for the target language |
Forms.Main |
Coordinates the above: image display, click handling, language selection |
User clicks text in image
|
v
TextExtractionDM.SelectTextAt(point) -- hit-test against Textract bounding boxes
|
v
TranslationDM.TranslateText(selected) -- translate to target language
|
v
TextToSpeechDM.ConvertTextToSpeech(text) -- synthesize and play audio
- Open "ReadWhatYouSee.dproj" in Delphi or RAD Studio.
- Select "Run > Run" from the menu or press F9.
- Click "Open file..." and select an image containing text.
- Once the image has loaded, click on any text in the image to select it.
- Choose the output language from the drop-down on the top-left.
- The selected text will be translated and spoken aloud automatically.
textract:DetectDocumentTexttranslate:TranslateTextpolly:DescribeVoicespolly:SynthesizeSpeech
