Add Nemotron-ASR streaming inference to C++ SDK by rui-ren · Pull Request #655 · microsoft/Foundry-Local

rui-ren · 2026-04-20T19:31:15Z

Add Nemotron-ASR streaming inference to Python SDK

Description

Adds real-time audio streaming support to the Foundry Local C++ SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

This is the C++ port of C# PR #485 with full feature parity. The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as a synchronous generator.

Agent-Logs-Url: https://github.com/microsoft/Foundry-Local/sessions/25aafe73-46df-4a26-a235-d2d9bfbd05b5 Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>

…n-example/app.js Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>

Add LiveAudioTranscriptionSession for real-time PCM audio streaming with thread-safe push/pull queues, binary FFI interop, and async worker thread. New files: - openai_live_audio_types.h/.cpp: Response/options/error types with JSON parsing - openai_live_audio_client.h/.cpp: Session class with Start/Append/TryGetNext/Stop - thread_safe_queue.h: Bounded thread-safe queue with close/error semantics - live_audio_test.cpp: Unit tests using MockCore pattern Modified files: - flcore_native.h: Add StreamingRequestBuffer and execute_command_with_binary_fn - foundry_local_internal_core.h: Add callWithBinary() to IFoundryLocalCore - core.h: Implement callWithBinary() in Core, load new FFI export - openai_audio_client.h/.cpp: Add CreateLiveTranscriptionSession() factory - foundry_local.h: Include new public headers - mock_core.h: Add callWithBinary() override to MockCore and FileBackedCore - CMakeLists.txt: Add new source and test files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vercel · 2026-04-20T19:31:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Apr 24, 2026 10:59pm

Copilot

Pull request overview

This PR introduces a new C++ SDK surface for Nemotron-ASR live/streaming transcription on top of the Foundry Local native core, plus updates Python SDK pinned core package versions.

Changes:

Add C++ CoreInterop dynamic loader/FFI wrappers including audio stream start/push/stop commands.
Add C++ LiveAudioTranscriptionSession + supporting types and a bounded thread-safe queue to support push/pull streaming transcription.
Add C++ unit/E2E tests and update Python requirements to a newer foundry-local-core* build.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
sdk/python/requirements.txt	Bumps pinned `foundry-local-core` version.
sdk/python/requirements-winml.txt	Bumps pinned `foundry-local-core-winml` version.
sdk/cpp/CMakeLists.txt	Adds new C++ SDK library + tests + optional E2E target.
sdk/cpp/README.md	Documents C++ live audio transcription API and build steps.
sdk/cpp/include/foundry_local/thread_safe_queue.h	Adds bounded thread-safe queue primitive used by streaming session.
sdk/cpp/include/foundry_local/live_audio_transcription_types.h	Adds streaming transcription response/options/error types.
sdk/cpp/include/foundry_local/live_audio_transcription_session.h	Declares streaming session API (start/append/try_get_next/stop).
sdk/cpp/include/foundry_local/foundry_local_exception.h	Adds SDK exception type.
sdk/cpp/include/foundry_local/core_interop_types.h	Defines FFI structs and managed request/response types.
sdk/cpp/include/foundry_local/core_interop.h	Declares dynamic loader + command execution + audio streaming helpers.
sdk/cpp/include/foundry_local/audio_client.h	Adds `AudioClient` factory for live transcription sessions.
sdk/cpp/src/core_interop_types.cpp	Implements JSON serialization for request params.
sdk/cpp/src/core_interop.cpp	Implements dynamic loading and command invocation wrappers.
sdk/cpp/src/live_audio_transcription_types.cpp	Implements JSON parsing for transcription and error responses.
sdk/cpp/src/live_audio_transcription_session.cpp	Implements streaming session lifecycle, queues, and push loop thread.
sdk/cpp/src/audio_client.cpp	Implements `AudioClient` constructor and session factory.
sdk/cpp/tests/test_thread_safe_queue.cpp	Unit tests for queue semantics (bounded/unbounded, close/error, concurrency).
sdk/cpp/tests/test_live_audio_transcription_types.cpp	Unit tests for JSON parsing and default option values.
sdk/cpp/tests/test_live_audio_transcription_session.cpp	Unit tests for session state guards, lifecycle, concurrency, and error handling.
sdk/cpp/tests/test_e2e_live_audio.cpp	Optional E2E test against real core library + model assets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…eam-cpp

The JS live-audio-transcription-example/app.js file was a leftover from the initial implementation and is unrelated to the C++ SDK changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

1. Revert accidental encoding change in core.h line 4 (kunal-vaishnavi) 2. Remove TryAppend/TryAppendFor keep only Append() to match C# parity (kunal-vaishnavi) 3. Parse final transcription response from audio_stream_stop and enqueue it (bmehta001) 4. Change TryPush to Push in PushWorkerLoop to avoid dropping results (bmehta001) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix potential deadlock: close resultQueue before joining pushThread in StopInternal, store final response in member variable instead of pushing to closed queue. TryGetNext returns it after queue drains. - Use TryPush in PushWorkerLoop to prevent worker blocking on full result queue (log warning on drop instead of deadlocking). - Validate push_queue_capacity > 0 before Start() to prevent hang/DoS. - Add bounds check for size_t to int32_t cast in callWithBinary. - Improve error messages: distinguish not-started vs already-stopped. - Fall back to raw response.error when parsed CoreErrorResponse.message is empty. - Mark CreateLiveTranscriptionSession() as const. - Add tests: AppendAfterStopThrows, Start_InvalidCapacityThrows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

## Summary Add Nemotron live-audio transcription samples across JS, C#, Python, Rust, and C++ in their language-specific sample folders. ## What’s included ### JavaScript - Updated `samples/js/live-audio-transcription-example/app.js` - Synced to the final PR #588 behavior: - single-copy buffer handling in audio callback - improved queue/backpressure stability behavior retained ### C# - Updated `samples/cs/live-audio-transcription-example/Program.cs` - Uses spinner-based EP registration flow for consistency with other C# samples ### Python - Added new sample: - `samples/python/live-audio-transcription/src/app.py` - `samples/python/live-audio-transcription/requirements.txt` - Implements live microphone transcription with Nemotron (`create_live_transcription_session` pattern) ### Rust - Added new sample: - `samples/rust/live-audio-transcription-example/src/main.rs` - `samples/rust/live-audio-transcription-example/Cargo.toml` - `samples/rust/live-audio-transcription-example/README.md` - Added listing entry in `samples/rust/README.md` ### C++ - Added new sample: - `samples/cpp/live-audio-transcription-example/main.cpp` - `samples/cpp/live-audio-transcription-example/README.md` - Sample is based on the live-audio C++ API surface introduced in PR #655 ## Notes - Only sample-related files are included. - Unrelated local artifacts (e.g. `.tgz`, local temp folders) were intentionally excluded. --------- Co-authored-by: ruiren_microsoft <ruiren@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: samkemp <samkemp@microsoft.com>

ruiren_microsoft and others added 5 commits April 3, 2026 21:23

update app.js

d2ef88d

fix: eliminate redundant PCM buffer copy in audio data handler

e31d9a1

Agent-Logs-Url: https://github.com/microsoft/Foundry-Local/sessions/25aafe73-46df-4a26-a235-d2d9bfbd05b5 Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>

Merge origin/main into ruiren/app-js, keeping live-audio-transcriptio…

4eb318d

…n-example/app.js Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>

Merge branch 'main' into ruiren/app-js

9269c46

Copilot AI review requested due to automatic review settings April 20, 2026 19:31

Copilot started reviewing on behalf of rui-ren April 20, 2026 19:31 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

kunal-vaishnavi requested review from baijumeswani and nenad1002 April 21, 2026 15:38