Real-time microphone-to-text transcription using the Foundry Local Python SDK with Nemotron ASR.
- Foundry Local installed
- Python 3.9+
- A microphone (optional — falls back to synthetic audio with
--synthor if PyAudio is unavailable)
pip install -r requirements.txtNote:
pyaudiois optional — it provides cross-platform microphone capture. Without it, the example falls back to synthetic audio for testing.Install manually if needed:
pip install pyaudio
python src/app.pySpeak into your microphone. Transcription appears in real-time. Press Ctrl+C to stop.
To force synthetic audio (e.g., for CI or when no microphone is available):
python src/app.py --synth- Initializes the Foundry Local SDK and loads the Nemotron ASR model
- Creates a
LiveAudioTranscriptionSessionwith 16kHz/16-bit/mono PCM settings - Captures microphone audio via
pyaudio(or generates synthetic audio as fallback) - Pushes PCM chunks to the SDK via
session.append() - Reads transcription results in a background thread via
for result in session.get_stream() - Access text via
result.content[0].text(OpenAI Realtime ConversationItem pattern)
audio_client = model.get_audio_client()
session = audio_client.create_live_transcription_session()
session.settings.sample_rate = 16000
session.settings.channels = 1
session.settings.language = "en"
session.start()
# Push audio
session.append(pcm_bytes)
# Read results (typically on a background thread)
for result in session.get_stream():
print(result.content[0].text) # transcribed text
print(result.content[0].transcript) # alias (OpenAI compat)
print(result.is_final) # True for final results
session.stop()