Add Nemotron-ASR streaming inference to C++ SDK#655
Merged
Conversation
Agent-Logs-Url: https://github.com/microsoft/Foundry-Local/sessions/25aafe73-46df-4a26-a235-d2d9bfbd05b5 Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>
…n-example/app.js Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>
Add LiveAudioTranscriptionSession for real-time PCM audio streaming with thread-safe push/pull queues, binary FFI interop, and async worker thread. New files: - openai_live_audio_types.h/.cpp: Response/options/error types with JSON parsing - openai_live_audio_client.h/.cpp: Session class with Start/Append/TryGetNext/Stop - thread_safe_queue.h: Bounded thread-safe queue with close/error semantics - live_audio_test.cpp: Unit tests using MockCore pattern Modified files: - flcore_native.h: Add StreamingRequestBuffer and execute_command_with_binary_fn - foundry_local_internal_core.h: Add callWithBinary() to IFoundryLocalCore - core.h: Implement callWithBinary() in Core, load new FFI export - openai_audio_client.h/.cpp: Add CreateLiveTranscriptionSession() factory - foundry_local.h: Include new public headers - mock_core.h: Add callWithBinary() override to MockCore and FileBackedCore - CMakeLists.txt: Add new source and test files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a new C++ SDK surface for Nemotron-ASR live/streaming transcription on top of the Foundry Local native core, plus updates Python SDK pinned core package versions.
Changes:
- Add C++
CoreInteropdynamic loader/FFI wrappers including audio stream start/push/stop commands. - Add C++
LiveAudioTranscriptionSession+ supporting types and a bounded thread-safe queue to support push/pull streaming transcription. - Add C++ unit/E2E tests and update Python requirements to a newer
foundry-local-core*build.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/python/requirements.txt | Bumps pinned foundry-local-core version. |
| sdk/python/requirements-winml.txt | Bumps pinned foundry-local-core-winml version. |
| sdk/cpp/CMakeLists.txt | Adds new C++ SDK library + tests + optional E2E target. |
| sdk/cpp/README.md | Documents C++ live audio transcription API and build steps. |
| sdk/cpp/include/foundry_local/thread_safe_queue.h | Adds bounded thread-safe queue primitive used by streaming session. |
| sdk/cpp/include/foundry_local/live_audio_transcription_types.h | Adds streaming transcription response/options/error types. |
| sdk/cpp/include/foundry_local/live_audio_transcription_session.h | Declares streaming session API (start/append/try_get_next/stop). |
| sdk/cpp/include/foundry_local/foundry_local_exception.h | Adds SDK exception type. |
| sdk/cpp/include/foundry_local/core_interop_types.h | Defines FFI structs and managed request/response types. |
| sdk/cpp/include/foundry_local/core_interop.h | Declares dynamic loader + command execution + audio streaming helpers. |
| sdk/cpp/include/foundry_local/audio_client.h | Adds AudioClient factory for live transcription sessions. |
| sdk/cpp/src/core_interop_types.cpp | Implements JSON serialization for request params. |
| sdk/cpp/src/core_interop.cpp | Implements dynamic loading and command invocation wrappers. |
| sdk/cpp/src/live_audio_transcription_types.cpp | Implements JSON parsing for transcription and error responses. |
| sdk/cpp/src/live_audio_transcription_session.cpp | Implements streaming session lifecycle, queues, and push loop thread. |
| sdk/cpp/src/audio_client.cpp | Implements AudioClient constructor and session factory. |
| sdk/cpp/tests/test_thread_safe_queue.cpp | Unit tests for queue semantics (bounded/unbounded, close/error, concurrency). |
| sdk/cpp/tests/test_live_audio_transcription_types.cpp | Unit tests for JSON parsing and default option values. |
| sdk/cpp/tests/test_live_audio_transcription_session.cpp | Unit tests for session state guards, lifecycle, concurrency, and error handling. |
| sdk/cpp/tests/test_e2e_live_audio.cpp | Optional E2E test against real core library + model assets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nenad1002
reviewed
Apr 23, 2026
d505aef to
631eba8
Compare
The JS live-audio-transcription-example/app.js file was a leftover from the initial implementation and is unrelated to the C++ SDK changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bmehta001
reviewed
Apr 24, 2026
bmehta001
reviewed
Apr 24, 2026
1. Revert accidental encoding change in core.h line 4 (kunal-vaishnavi) 2. Remove TryAppend/TryAppendFor keep only Append() to match C# parity (kunal-vaishnavi) 3. Parse final transcription response from audio_stream_stop and enqueue it (bmehta001) 4. Change TryPush to Push in PushWorkerLoop to avoid dropping results (bmehta001) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kunal-vaishnavi
previously approved these changes
Apr 24, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix potential deadlock: close resultQueue before joining pushThread in StopInternal, store final response in member variable instead of pushing to closed queue. TryGetNext returns it after queue drains. - Use TryPush in PushWorkerLoop to prevent worker blocking on full result queue (log warning on drop instead of deadlocking). - Validate push_queue_capacity > 0 before Start() to prevent hang/DoS. - Add bounds check for size_t to int32_t cast in callWithBinary. - Improve error messages: distinguish not-started vs already-stopped. - Fall back to raw response.error when parsed CoreErrorResponse.message is empty. - Mark CreateLiveTranscriptionSession() as const. - Add tests: AppendAfterStopThrows, Start_InvalidCapacityThrows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kunal-vaishnavi
approved these changes
Apr 25, 2026
samuel100
added a commit
that referenced
this pull request
Apr 27, 2026
## Summary Add Nemotron live-audio transcription samples across JS, C#, Python, Rust, and C++ in their language-specific sample folders. ## What’s included ### JavaScript - Updated `samples/js/live-audio-transcription-example/app.js` - Synced to the final PR #588 behavior: - single-copy buffer handling in audio callback - improved queue/backpressure stability behavior retained ### C# - Updated `samples/cs/live-audio-transcription-example/Program.cs` - Uses spinner-based EP registration flow for consistency with other C# samples ### Python - Added new sample: - `samples/python/live-audio-transcription/src/app.py` - `samples/python/live-audio-transcription/requirements.txt` - Implements live microphone transcription with Nemotron (`create_live_transcription_session` pattern) ### Rust - Added new sample: - `samples/rust/live-audio-transcription-example/src/main.rs` - `samples/rust/live-audio-transcription-example/Cargo.toml` - `samples/rust/live-audio-transcription-example/README.md` - Added listing entry in `samples/rust/README.md` ### C++ - Added new sample: - `samples/cpp/live-audio-transcription-example/main.cpp` - `samples/cpp/live-audio-transcription-example/README.md` - Sample is based on the live-audio C++ API surface introduced in PR #655 ## Notes - Only sample-related files are included. - Unrelated local artifacts (e.g. `.tgz`, local temp folders) were intentionally excluded. --------- Co-authored-by: ruiren_microsoft <ruiren@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: samkemp <samkemp@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Nemotron-ASR streaming inference to Python SDK
Description
Adds real-time audio streaming support to the Foundry Local C++ SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).
This is the C++ port of C# PR #485 with full feature parity. The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as a synchronous generator.