Using Deepgram with Label Studio for Text to Speech annotation
This backend uses the Deepgram API to take the input text from the user, do text to speech, and return the output audio for annotation in Label Studio.
DeepgramDemo.mp4
IMPORTANT NOTE: YOU MUST REFRESH THE PAGE AFTER SUBMITTING THE TEXT TO SEE THE AUDIO APPEAR.
- Deepgram API Key -- create an account and follow the instructions to get an api key with default permissions. Store this key as
DEEPGRAM_API_KEYindocker_compose.yml - AWS Storage -- make sure you configure the following parameters in
docker_compose.yml:AWS_ACCESS_KEY_ID-- your AWS access key idAWS_SECRET_ACCESS_KEY-- your AWS secret access keyAWS_SESSION_TOKEN-- your AWS session tokenAWS_DEFAULT_REGION- the region you want to use for S3S3_BUCKET-- the name of the bucket where you'd like to store the created audio filesS3_FOLDER-- the name of the folder within the specified bucket where you'd like to store the audio files.
- Label Studio -- make sure you set your
LABEL_STUDIO_URLand yourLABEL_STUDIO_API_KEYindocker_compose.yml. As of 11/12/25, you must use the LEGACY TOKEN.
This is the base labeling config to be used with this backend. Note that you may add additional annotations to the document after the audio without breaking anything!
<View>
<Header value="What would you like to TTS?"/>
<TextArea name="text" toName="audio" placeholder="What do you want to tts?" value="$text" valrows="4" maxSubmissions="1"/>
<Audio name="audio" value="$audio" zoom="true" hotkey="ctrl+enter"/>
</View>
Note that in order for this to work, you need to upload dummy data (i.e. empty text and audio) so that the tasks populate. You can use dummy_data.json as this data.
When you attach the model to Label Studio in your model settings, make sure to toggle ON interactive preannotations!