Skip to content

Latest commit

 

History

History
90 lines (71 loc) · 3.86 KB

File metadata and controls

90 lines (71 loc) · 3.86 KB

AudioQnA Application

AudioQnA is an example that demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS).

Table of Contents

  1. Architecture
  2. Deployment Options

Architecture

The AudioQnA example is implemented using the component-level microservices defined in GenAIComps. The flow chart below shows the information flow between different microservices for this example.

---
config:
  flowchart:
    nodeSpacing: 400
    rankSpacing: 100
    curve: linear
  themeVariables:
    fontSize: 50px
---
flowchart LR
    %% Colors %%
    classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
    classDef invisible fill:transparent,stroke:transparent;
    style AudioQnA-MegaService stroke:#000000

    %% Subgraphs %%
    subgraph AudioQnA-MegaService["AudioQnA MegaService "]
        direction LR
        ASR([ASR MicroService]):::blue
        LLM([LLM MicroService]):::blue
        TTS([TTS MicroService]):::blue
    end
    subgraph UserInterface[" User Interface "]
        direction LR
        a([User Input Query]):::orchid
        UI([UI server<br>]):::orchid
    end



    WSP_SRV{{whisper service<br>}}
    SPC_SRV{{speecht5 service <br>}}
    LLM_gen{{LLM Service <br>}}
    GW([AudioQnA GateWay<br>]):::orange


    %% Questions interaction
    direction LR
    a[User Audio Query] --> UI
    UI --> GW
    GW <==> AudioQnA-MegaService
    ASR ==> LLM
    LLM ==> TTS

    %% Embedding service flow
    direction LR
    ASR <-.-> WSP_SRV
    LLM <-.-> LLM_gen
    TTS <-.-> SPC_SRV

Loading

Deployment Options

The table below lists currently available deployment options. They outline in detail the implementation of this example on selected hardware.

Category Deployment Option Description
On-premise Deployments Docker compose AudioQnA deployment on Xeon
AudioQnA deployment on Gaudi
AudioQnA deployment on AMD EPYC
AudioQnA deployment on AMD ROCm
Kubernetes Helm Charts

Validated Configurations

Deploy Method LLM Engine LLM Model Hardware
Docker Compose vLLM, TGI meta-llama/Meta-Llama-3-8B-Instruct Intel Gaudi
Docker Compose vLLM, TGI, GPT-SoVITS meta-llama/Meta-Llama-3-8B-Instruct Intel Xeon
Docker Compose vLLM, TGI meta-llama/Meta-Llama-3-8B-Instruct AMD EPYC
Docker Compose vLLM, TGI Intel/neural-chat-7b-v3-3 AMD ROCm
Helm Charts vLLM, TGI meta-llama/Meta-Llama-3-8B-Instruct Intel Gaudi
Helm Charts vLLM, TGI meta-llama/Meta-Llama-3-8B-Instruct Intel Xeon