amazon-transcribe-post-call-analytics/docs/introduction/transcribe-features.md at develop · aws-samples/amazon-transcribe-post-call-analytics

Transcription features

The following shows which Amazon Transcribe features are supported - please refer to the service documentation links for further details of how each feature works.

Feature	Description	Supported
Language identification	Allow Amazon Transcribe to determine the dominant language used in the audio file, which it will then use for the whole transcription process	☑️
Custom vocabularies	Provide more information on how to transcribe specific words or phrases, typically used for domain-specific words for specific use cases	☑️
Vocabulary filtering	Allows you to mark or remove unwanted words from the transcripts	☑️
Custom language models	Use your own text data to improve transcription accuracy for your specific use case	✘
Streaming transcriptions	Send an audio stream to Amazon Transcribe and receive transcription output in real-time	✘
Channel identification	Process each channel in an audio file independently, combining the transcriptions from each channel into a single output	☑️
Speaker diarization	Label each speaker utterance with a distinct speaker tag	☑️
Call analytics	Inject analytical insights from your calls into your transcripts	☑️
Redaction	Mask or remove sensitive personally identifiable information (PII) content from the text transcripts	☑️
KMS-based encryption	Encrypt transcription output files in your Amazon S3 bucket using KMS keys rather than the default Amazon S3 key (SSE-S3)	✘

Feature restrictions

Language support

Some of the Amazon Transcribe features are not available in all supported languages. These are highlighted within the service documentation languages page , and at the time of writing can be summarised as follows:

Redaction is only available in the US English en-US language model
Call analytics is supported only by a subset of the Amazon Transcribe languages
Digit transcription, whereby number phrase like Fifty five or a hundredth are transcribed as 55 and 1/100 respectively, is supported only by a subset of the Amazon Transcribe languages

Custom vocabulary

Acronyms are only supported by a subset of the Amazon Transcribe languages
Language ID is not currently supported natively with custom vocabularies. However, this solution works around this limitation and performs language ID on clip of the original audio, and then chooses the correct custom vocabulary file before submitting the whole audio file for processing. This workaround will be removed once custom vocabualries are supported when using language ID.

Other feature limitations

The Channel identification and Speaker diarization features are mutually exclusive
Call analytics requires stereo channel-separated audio files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcription features

Feature restrictions

Language support

Custom vocabulary

Other feature limitations

Uh oh!

FilesExpand file tree

transcribe-features.md

Latest commit

History

transcribe-features.md

File metadata and controls

Transcription features

Feature restrictions

Language support

Custom vocabulary

Other feature limitations