Skip to content

pragmatrix/context-switch

Repository files navigation

Context Switch

Context Switch is a Rust-based framework for building real-time conversational applications with support for multiple modalities (audio and text). It provides a unified interface for interacting with various speech and language services like Azure Speech Services and OpenAI.

Features

  • Multi-modal conversation support (audio and text)
  • Pluggable service architecture
  • Integration with:
    • Azure Speech Services (transcription, translation, synthesis)
    • ElevenLabs realtime speech-to-text (Scribe v2 Realtime)
    • OpenAI dialog services
  • Asynchronous processing using Tokio

Project Structure

  • core/: Core functionality and interfaces
  • services/: Implementation of various service integrations
    • azure/: Azure Speech Services integration
    • elevenlabs/: ElevenLabs speech-to-text integration
    • google-transcribe/: Google Speech-to-Text integration (WIP)
    • openai-dialog/: OpenAI conversational services integration
  • audio-knife/: WebSocket server that implements the mod_audio_fork protocol for real-time audio streaming from telephony systems via FreeSWITCH. Provides a bridge between audio sources and the Context Switch framework.
  • examples/: Example applications showcasing different features

Getting Started

Prerequisites

  • Rust
  • API keys for the services you intend to use:
    • OpenAI API key
    • Azure Speech Services subscription key
    • Google Cloud API key (for Google transcription)
  • For Aristech services:
    • Install protoc
      • macOS: brew install protobuf
      • Linux: apt-get install protobuf-compiler

Installation

  1. Clone the repository:

    git clone https://github.com/pragmatrix/context-switch.git
    cd context-switch
  2. Initialize submodules:

    git submodule update --init --recursive
  3. Create a .env file with your API keys (see .env.example for reference)

Running Examples

The project includes several examples showcasing different functionalities:

# Run OpenAI dialog example
cargo run --example openai-dialog

# Run generic transcribe example with Azure provider
cargo run --example transcribe -- azure

# Run generic transcribe example with ElevenLabs provider
cargo run --example transcribe -- elevenlabs

# Run generic transcribe example with Aristech provider
cargo run --example transcribe -- aristech

# Run Azure synthesize example
cargo run --example azure-synthesize

Using Audio Knife

Audio Knife is a WebSocket server that implements the mod_audio_fork protocol, allowing real-time audio streaming from and to FreeSWITCH. It acts as a bridge between audio sources and the Context Switch framework.

To run the Audio Knife server:

cargo run -p audio-knife

By default, it listens on 127.0.0.1:8123. You can customize the address by setting the AUDIO_KNIFE_ADDRESS environment variable.

Configuration

Configure the services by setting the appropriate environment variables in your .env file:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_key
OPENAI_REALTIME_API_MODEL=gpt-4o-mini-realtime-preview
OPENAI_REALTIME_ENDPOINT=

# Azure Configuration
AZURE_SUBSCRIPTION_KEY=your_azure_key
AZURE_REGION=your_azure_region

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your_elevenlabs_key

# Audio Knife Configuration
AUDIO_KNIFE_ADDRESS=127.0.0.1:8123

For Azure OpenAI realtime endpoints (*.openai.azure.com), the realtime client automatically appends api-key as a query parameter to the websocket URL. For other hosts, it uses the standard Authorization: Bearer ... header.

The websocket client does not follow redirects. If the endpoint responds with 3xx (for example 302 Found), update the configured endpoint URL to the final websocket target.

License

MIT License

About

Audio Streaming for FreeSWITCH with backends powered by Azure, Google, OpenAI, ElevenLabs, and Aristech

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors