Convert spoken words to text in real-time using an ESP32 microcontroller, an I2S MEMS microphone, and the Wit.ai speech recognition API — with results displayed on an OLED screen and the Serial Monitor.
This project demonstrates how to build a low-cost, Wi-Fi-enabled Speech-to-Text system on the ESP32. Audio is captured via an I2S digital microphone, streamed to the Wit.ai cloud API for transcription, and the resulting text is shown on a connected OLED display.
- 🎤 Real-time audio capture using an I2S MEMS microphone
- ☁️ Cloud-based speech recognition via Wit.ai
- 🖥️ Transcribed text displayed on an SSD1306 OLED screen
- 📟 Output also available via Serial Monitor (for debugging)
- 📶 Wi-Fi connectivity using the ESP32's built-in radio
| Component | Description |
|---|---|
| ESP32 Dev Board | Main microcontroller (e.g., ESP32-WROOM-32) |
| I2S MEMS Microphone | e.g., INMP441 |
| OLED Display | 0.96" SSD1306 (128×64, I2C) |
| Jumper Wires | For connections |
| Breadboard | Optional, for prototyping |
| Mic Pin | ESP32 Pin |
|---|---|
| VDD | 3.3V |
| GND | GND |
| WS (LRCK) | GPIO 15 |
| SCK (BCLK) | GPIO 14 |
| SD (Data) | GPIO 32 |
| L/R | GND (Left channel) |
| OLED Pin | ESP32 Pin |
|---|---|
| VCC | 3.3V |
| GND | GND |
| SDA | GPIO 21 |
| SCL | GPIO 22 |
⚠️ Pin numbers may vary depending on your specific ESP32 board. Adjust in the code as needed.
- Arduino IDE (v1.8+ or v2.x)
- ESP32 Board Support Package
- Required Libraries:
Adafruit SSD1306Adafruit GFX LibraryWiFiClientSecure(built-in with ESP32 core)ArduinoJson(optional, for parsing Wit.ai response)
git clone https://github.com/your-username/esp32-speech-to-text.git
cd esp32-speech-to-text- Go to File → Preferences
- Add this URL to Additional Board Manager URLs:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json - Go to Tools → Board → Board Manager, search for
esp32, and install.
In Arduino IDE, go to Sketch → Include Library → Manage Libraries and install:
Adafruit SSD1306Adafruit GFX LibraryArduinoJson(if used)
Open the main .ino file and update the following:
const char* ssid = "YOUR_WIFI_SSID";
const char* password = "YOUR_WIFI_PASSWORD";
const char* witai_token = "YOUR_WIT_AI_ACCESS_TOKEN";- Go to https://wit.ai and sign in with your Facebook/Meta account.
- Create a new app and select your language.
- Copy the Server Access Token from Settings.
- Select your board under Tools → Board → ESP32 Dev Module
- Select the correct Port
- Click Upload
[User Speaks]
↓
[I2S Mic captures audio]
↓
[ESP32 buffers audio samples]
↓
[Audio sent to Wit.ai via HTTPS POST]
↓
[Wit.ai returns transcribed text (JSON)]
↓
[Text displayed on OLED + Serial Monitor]
esp32-speech-to-text/
├── esp32_speech_to_text.ino # Main Arduino sketch
├── wit_ai.h # Wit.ai API communication
├── i2s_mic.h # I2S microphone configuration
├── oled_display.h # OLED display helpers
└── README.md
Open the Serial Monitor at 115200 baud to see debug logs:
Connecting to WiFi...
Connected! IP: 192.168.1.42
Recording audio...
Sending to Wit.ai...
Response: "turn on the light"
| Issue | Possible Fix |
|---|---|
| No audio captured | Check I2S wiring; verify pin definitions in code |
| OLED not displaying | Confirm I2C address (usually 0x3C); check SDA/SCL pins |
| Wi-Fi not connecting | Double-check SSID and password in config |
| Wit.ai returns empty | Speak clearly; check your access token; verify audio format |
| Upload fails | Hold the BOOT button on ESP32 during upload |
This project is licensed under the MIT License.
- Wit.ai — Free speech recognition API by Meta
- Espressif — ESP32 platform
- Adafruit — OLED display libraries
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
- Fork the repo
- Create your feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/my-feature - Open a Pull Request