One to one audio video call workflow

The initial idea of Open Chat was to create a work chat with an embedded AI assistant on every part of the app. One of the first things to try it out on was audio and video calls, where long conversations are stored in the recordings, which can sometimes be a bit time-consuming to revive. So here is a flow of how it was implemented using AWS and Kurento.

one-to-one-call-final

A breakdown of what is going on in the diagram above

A user initiates a WebRTC call to the server with another user
A callee receives a notification about the incoming call
A call happens while a Kurento Media Server(KMS) records each of the audio/video streams of both users
Once the call finishes, KMS will multiplex two streams into one, resulting in one WEBM file
The server uploads the file to AWS S3
The server makes an API call to AWS Transcribe with the link of the uploaded file to process the recording file and extract the transcribe
The result of the processing is the JSON file which stored in the AWS S3
Then the transcription is passed to the Google Gemini and Open AI Chat GPT to make a summary of the whole conversation. Both versions are stored in the database

The meeting summary will be accessible to both users later on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One to one audio video call workflow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally