A comprehensive real-time speech translation system that converts English speech to multiple Indian languages with advanced noise reduction and high-quality text-to-speech synthesis. Built with state-of-the-art models from AI4Bharat optimised for speed.
Perfect for voice-based interfaces, accessibility tools, multilingual communication systems, and language learning applications.
- 🎤 Real-Time Audio Capture - Record directly from your microphone with live feedback
- 🔇 Advanced Noise Reduction - Spectral subtraction with Voice Activity Detection (VAD)
- �️ Speech Recognition - Powered by OpenAI Whisper for accurate English transcription
- 🌐 Multi-Language Translation - Support for 6 Indian languages using IndicTrans2
- 🔊 Dual TTS Engines - Fast optimized Kannada TTS + Standard multi-language TTS
- �️ User-Friendly GUI - Intuitive Tkinter interface with real-time status updates
- ✅ Adjustable noise reduction levels (0.5x to 3.0x)
- ✅ Multiple speaker voices per language
- ✅ Auto-play option after translation
- ✅ CPU-only operation (no GPU required)
- ✅ Audio quality indicators and enhancement metrics
- ✅ Cross-platform support (macOS, Linux, Windows)
| Language | Script | Code | Speakers | Fast TTS |
|---|---|---|---|---|
| Kannada | ಕನ್ನಡ | kan_Knda | Suresh, Anu, Chetan, Vidya | ✅ Yes |
| Telugu | తెలుగు | tel_Telu | Prakash, Lalitha, Kiran | ❌ No |
| Hindi | हिन्दी | hin_Deva | Ravi, Priya, Amit | ❌ No |
| Tamil | தமிழ் | tam_Taml | Arun, Meena | ❌ No |
| Gujarati | ગુજરાતી | guj_Gujr | Jignesh, Kavita | ❌ No |
| Bengali | বাংলা | ben_Beng | Rahul, Mou | ❌ No |
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Microphone │ -> │ Noise Filter │ -> │ Whisper │ -> │ English │
│ Input │ │ (VAD+STFT) │ │ ASR │ │ Text │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐
│ Audio │ <- │ TTS Engine │ <- │ IndicTrans2 │ <- │ Target │
│ Output │ │ (Fast/Std) │ │ Translation │ │ Language │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────┘
System Requirements:
- Python 3.8 or higher
- 4GB+ RAM (8GB recommended)
- 5GB+ free disk space
- Microphone input device
- Internet connection (for initial model downloads)
System Dependencies:
macOS
brew install portaudio ffmpegUbuntu/Debian
sudo apt-get update
sudo apt-get install portaudio19-dev python3-tk ffmpegWindows
- Install FFmpeg from ffmpeg.org
- Download PyAudio wheel from here
- Install:
pip install <downloaded-wheel-file>
-
Clone the repository:
git clone https://github.com/arushi-vaidya/Indic-Speech-Translation.git cd Indic-Speech-Translation -
Create and activate virtual environment:
python3 -m venv venv # macOS/Linux source venv/bin/activate # Windows venv\Scripts\activate
-
Install Python dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Download Kannada TTS models (for Fast TTS):
cd kannada_tts_fast wget https://github.com/AI4Bharat/Indic-TTS/releases/download/v1-checkpoints-release/kn.zip unzip kn.zip # Create model directory structure mkdir -p models/v1 cp -r kn models/v1/ python3 mac_optimize.py python3 kannada_tts.py cd ..
-
Run the application:
cp main2.py IndicTrans/ cd IndicTrans python3 main2.py
-
Launch the application:
python3 main2.py
-
Wait for models to load (30-60 seconds on first run)
-
Select your target language from the dropdown menu
-
Click "Start Recording" and speak in English
-
Click "Stop Recording" when finished
-
View results:
- English transcription appears in the first text box
- Translated text appears in the second text box
- Audio automatically plays (if auto-play is enabled)
Audio Enhancement Settings:
- Toggle noise reduction on/off
- Adjust noise reduction level (1.2 is default)
- Enable/disable auto-play
TTS Settings:
- Choose between Fast TTS (Kannada only) or Standard TTS
- Select different speaker voices
- Output format: WAV or MP3
Indic-Speech-Translation/
├── main2.py # Main application entry point
├── requirements.txt # Python dependencies
├── README.md # This file
├── requirements.md # Detailed requirements specification
├── design.md # System design documentation
│
├── Noise_Supression/ # Audio enhancement module
│ ├── spectral3.py # Spectral subtraction implementation
│ ├── requirements.txt # Module-specific dependencies
│ └── README.md # Noise suppression documentation
│
├── kannada_tts_fast/ # Fast Kannada TTS module
│ ├── kannada_tts.py # Main TTS implementation
│ ├── mac_optimize.py # macOS optimization script
│ ├── universal_optimize.py # Cross-platform optimization
│ ├── kn/ # Original Kannada models
│ │ ├── fastpitch/ # FastPitch TTS model
│ │ └── hifigan/ # HiFi-GAN vocoder
│ └── models/v1/kn/ # Model copies for compatibility
│
├── IndicTrans2/ # Translation toolkit (git submodule)
├── IndicTransToolkit/ # Indian language processing
│
└── audio_files/ # Runtime audio storage (created on run)
├── temp_recording.wav # Raw microphone input
├── enhanced_recording.wav # Noise-reduced audio
└── output.mp3 # Final TTS output
Edit these in main2.py or adjust via UI:
AudioEnhancer(
samplerate=16000, # Audio sample rate (Hz)
frame_length=2048, # STFT window size
hop_length=512, # STFT hop size
alpha=0.98, # Noise profile smoothing (0-1)
noise_threshold=1.2, # Noise reduction strength (0.5-3.0)
vad_aggressiveness=2 # VAD sensitivity (0-3)
)Add new languages in the language_mapping dictionary:
"NewLanguage": {
"code": "lang_Script",
"speakers": ["Speaker1", "Speaker2"],
"default_speaker": "Speaker1",
"fast_tts": False
}- Captures audio at 16kHz mono from microphone
- Applies Short-Time Fourier Transform (STFT)
- Uses WebRTC VAD to detect speech vs. noise
- Updates noise profile during non-speech frames
- Applies spectral subtraction mask
- Converts back to time domain with ISTFT
- Enhanced audio fed to OpenAI Whisper model
- Transcribes English speech to text
- Handles various accents and speaking styles
- English text preprocessed with IndicProcessor
- Tokenized and fed to IndicTrans2 model
- Beam search decoding (5 beams) for quality
- Postprocessed to target language script
- Fast Kannada TTS: CPU-optimized FastPitch + HiFi-GAN
- Standard TTS: Indic Parler-TTS for all languages
- Automatic fallback if primary TTS fails
- Output saved as WAV/MP3
| Operation | Time (Typical) | Hardware |
|---|---|---|
| Model Loading | 30-45s | First run only |
| 10s Recording | 10s | Real-time |
| Audio Enhancement | 0.5-1s | CPU |
| Transcription | 2-3s | CPU |
| Translation | 1-2s | CPU |
| Fast Kannada TTS | 0.5-1.5s | CPU |
| Standard TTS | 3-4s | CPU |
| Total Pipeline | 10-15s | CPU-only |
Tested on: MacBook Pro M1, 16GB RAM
No audio recorded / Microphone not working
- Check microphone permissions in system settings
- Verify microphone is set as default input device
- Test microphone with other applications
- Try running with
sudo(Linux only)
Model loading fails
- Ensure stable internet connection
- Check available disk space (need 5GB+)
- Clear Hugging Face cache:
rm -rf ~/.cache/huggingface - Retry download
Fast Kannada TTS not working
- Verify models are in
kannada_tts_fast/kn/directory - Run optimization:
cd kannada_tts_fast && python3 mac_optimize.py - Check for error messages in console
- Fallback to Standard TTS will be used automatically
Poor transcription quality
- Enable audio enhancement
- Increase noise reduction level
- Speak clearly and closer to microphone
- Reduce background noise
- Check microphone quality
Out of memory errors
- Close other applications
- Use smaller Whisper model (change
basetotinyin code) - Reduce recording length
- Increase system swap space
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
- Additional language support
- UI/UX improvements
- Performance optimizations
- Bug fixes and testing
- Documentation improvements
- requirements.md - Detailed functional and non-functional requirements
- design.md - System architecture and design decisions
- Noise_Supression/README.md - Audio enhancement details
- kannada_tts_fast/README.md - Fast TTS implementation
This project is built on excellent work from:
- AI4Bharat - IndicTrans2 and Indic-TTS models
- OpenAI Whisper - Speech recognition
- Hugging Face - Model hosting and Parler-TTS
- Coqui TTS - TTS framework
This project is licensed under the MIT License - see the LICENSE file for details.
- Whisper: MIT License
- IndicTrans2: MIT License
- Parler-TTS: Apache 2.0 License
- Indic-TTS: MIT License
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Add support for Malayalam, Marathi, Punjabi
- Real-time streaming translation (no recording needed)
- Web-based interface
- Mobile app (iOS/Android)
- Bidirectional translation (Indian languages to English)
- Conversation mode with turn-taking
- GPU acceleration option
- Docker containerization
- REST API for integration
If you find this project useful, please consider giving it a star! ⭐
Made with ❤️ for the Indian language community