Skip to content

One to one audio video call workflow

Borbuev Beksultan edited this page Jan 2, 2025 · 3 revisions

The initial idea of Open Chat was to create a work chat with an embedded AI assistant on every part of the app. One of the first things to try it out on was audio and video calls, where long conversations are stored in the recordings, which can sometimes be a bit time-consuming to revive. So here is a flow of how it was implemented using AWS and Kurento.

one-to-one-call-final

A breakdown of what is going on in the diagram above

  1. A user initiates a WebRTC call to the server with another user
  2. A callee receives a notification about the incoming call
  3. A call happens while a Kurento Media Server(KMS) records each of the audio/video streams of both users
  4. Once the call finishes, KMS will multiplex two streams into one, resulting in one WEBM file
  5. The server uploads the file to AWS S3
  6. The server makes an API call to AWS Transcribe with the link of the uploaded file to process the recording file and extract the transcribe
  7. The result of the processing is the JSON file which stored in the AWS S3
  8. Then the transcription is passed to the Google Gemini and Open AI Chat GPT to make a summary of the whole conversation. Both versions are stored in the database

The meeting summary will be accessible to both users later on.

Clone this wiki locally