Skip to content

Add Nemotron-ASR streaming inference to C++ SDK#655

Merged
rui-ren merged 9 commits intomainfrom
ruiren/live-audio-stream-cpp
Apr 25, 2026
Merged

Add Nemotron-ASR streaming inference to C++ SDK#655
rui-ren merged 9 commits intomainfrom
ruiren/live-audio-stream-cpp

Conversation

@rui-ren
Copy link
Copy Markdown
Contributor

@rui-ren rui-ren commented Apr 20, 2026

Add Nemotron-ASR streaming inference to Python SDK

Description

Adds real-time audio streaming support to the Foundry Local C++ SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

This is the C++ port of C# PR #485 with full feature parity. The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as a synchronous generator.

ruiren_microsoft and others added 5 commits April 3, 2026 21:23
…n-example/app.js

Co-authored-by: rui-ren <15321482+rui-ren@users.noreply.github.com>
Add LiveAudioTranscriptionSession for real-time PCM audio streaming with
thread-safe push/pull queues, binary FFI interop, and async worker thread.

New files:
- openai_live_audio_types.h/.cpp: Response/options/error types with JSON parsing
- openai_live_audio_client.h/.cpp: Session class with Start/Append/TryGetNext/Stop
- thread_safe_queue.h: Bounded thread-safe queue with close/error semantics
- live_audio_test.cpp: Unit tests using MockCore pattern

Modified files:
- flcore_native.h: Add StreamingRequestBuffer and execute_command_with_binary_fn
- foundry_local_internal_core.h: Add callWithBinary() to IFoundryLocalCore
- core.h: Implement callWithBinary() in Core, load new FFI export
- openai_audio_client.h/.cpp: Add CreateLiveTranscriptionSession() factory
- foundry_local.h: Include new public headers
- mock_core.h: Add callWithBinary() override to MockCore and FileBackedCore
- CMakeLists.txt: Add new source and test files

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 20, 2026 19:31
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Apr 24, 2026 10:59pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new C++ SDK surface for Nemotron-ASR live/streaming transcription on top of the Foundry Local native core, plus updates Python SDK pinned core package versions.

Changes:

  • Add C++ CoreInterop dynamic loader/FFI wrappers including audio stream start/push/stop commands.
  • Add C++ LiveAudioTranscriptionSession + supporting types and a bounded thread-safe queue to support push/pull streaming transcription.
  • Add C++ unit/E2E tests and update Python requirements to a newer foundry-local-core* build.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
sdk/python/requirements.txt Bumps pinned foundry-local-core version.
sdk/python/requirements-winml.txt Bumps pinned foundry-local-core-winml version.
sdk/cpp/CMakeLists.txt Adds new C++ SDK library + tests + optional E2E target.
sdk/cpp/README.md Documents C++ live audio transcription API and build steps.
sdk/cpp/include/foundry_local/thread_safe_queue.h Adds bounded thread-safe queue primitive used by streaming session.
sdk/cpp/include/foundry_local/live_audio_transcription_types.h Adds streaming transcription response/options/error types.
sdk/cpp/include/foundry_local/live_audio_transcription_session.h Declares streaming session API (start/append/try_get_next/stop).
sdk/cpp/include/foundry_local/foundry_local_exception.h Adds SDK exception type.
sdk/cpp/include/foundry_local/core_interop_types.h Defines FFI structs and managed request/response types.
sdk/cpp/include/foundry_local/core_interop.h Declares dynamic loader + command execution + audio streaming helpers.
sdk/cpp/include/foundry_local/audio_client.h Adds AudioClient factory for live transcription sessions.
sdk/cpp/src/core_interop_types.cpp Implements JSON serialization for request params.
sdk/cpp/src/core_interop.cpp Implements dynamic loading and command invocation wrappers.
sdk/cpp/src/live_audio_transcription_types.cpp Implements JSON parsing for transcription and error responses.
sdk/cpp/src/live_audio_transcription_session.cpp Implements streaming session lifecycle, queues, and push loop thread.
sdk/cpp/src/audio_client.cpp Implements AudioClient constructor and session factory.
sdk/cpp/tests/test_thread_safe_queue.cpp Unit tests for queue semantics (bounded/unbounded, close/error, concurrency).
sdk/cpp/tests/test_live_audio_transcription_types.cpp Unit tests for JSON parsing and default option values.
sdk/cpp/tests/test_live_audio_transcription_session.cpp Unit tests for session state guards, lifecycle, concurrency, and error handling.
sdk/cpp/tests/test_e2e_live_audio.cpp Optional E2E test against real core library + model assets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk/cpp/tests/test_thread_safe_queue.cpp Outdated
Comment thread sdk/cpp/README.md Outdated
Comment thread sdk/python/requirements.txt Outdated
Comment thread sdk/cpp/src/core_interop.cpp Outdated
Comment thread sdk/cpp/src/live_audio_transcription_session.cpp Outdated
Comment thread sdk/cpp/src/core_interop.cpp Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/include/foundry_local/live_audio_transcription_session.h Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/tests/test_e2e_live_audio.cpp Outdated
Comment thread sdk/cpp/CMakeLists.txt Outdated
Comment thread sdk/cpp/README.md Outdated
Comment thread sdk/cpp/include/foundry_local/core_interop.h Outdated
Comment thread sdk/cpp/include/foundry_local/foundry_local_exception.h Outdated
The JS live-audio-transcription-example/app.js file was a leftover
from the initial implementation and is unrelated to the C++ SDK changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread sdk/cpp/src/core.h
Comment thread sdk/cpp/include/openai/openai_live_audio_client.h Outdated
Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
1. Revert accidental encoding change in core.h line 4 (kunal-vaishnavi)
2. Remove TryAppend/TryAppendFor  keep only Append() to match C# parity (kunal-vaishnavi)
3. Parse final transcription response from audio_stream_stop and enqueue it (bmehta001)
4. Change TryPush to Push in PushWorkerLoop to avoid dropping results (bmehta001)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
Comment thread sdk/cpp/src/openai_live_audio_client.cpp
Comment thread sdk/cpp/src/core.h
Comment thread sdk/cpp/include/openai/openai_audio_client.h Outdated
Comment thread sdk/cpp/CMakeLists.txt
Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
Comment thread sdk/cpp/src/openai_live_audio_client.cpp Outdated
- Fix potential deadlock: close resultQueue before joining pushThread in
  StopInternal, store final response in member variable instead of pushing
  to closed queue. TryGetNext returns it after queue drains.
- Use TryPush in PushWorkerLoop to prevent worker blocking on full result
  queue (log warning on drop instead of deadlocking).
- Validate push_queue_capacity > 0 before Start() to prevent hang/DoS.
- Add bounds check for size_t to int32_t cast in callWithBinary.
- Improve error messages: distinguish not-started vs already-stopped.
- Fall back to raw response.error when parsed CoreErrorResponse.message
  is empty.
- Mark CreateLiveTranscriptionSession() as const.
- Add tests: AppendAfterStopThrows, Start_InvalidCapacityThrows.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rui-ren rui-ren enabled auto-merge (squash) April 25, 2026 00:02
@rui-ren rui-ren merged commit 984892e into main Apr 25, 2026
47 checks passed
@rui-ren rui-ren deleted the ruiren/live-audio-stream-cpp branch April 25, 2026 01:48
samuel100 added a commit that referenced this pull request Apr 27, 2026
## Summary

Add Nemotron live-audio transcription samples across JS, C#, Python,
Rust, and C++ in their language-specific sample folders.

 ## What’s included

 ### JavaScript
 - Updated `samples/js/live-audio-transcription-example/app.js`
 - Synced to the final PR #588 behavior:
   - single-copy buffer handling in audio callback
   - improved queue/backpressure stability behavior retained

 ### C#
 - Updated `samples/cs/live-audio-transcription-example/Program.cs`
- Uses spinner-based EP registration flow for consistency with other C#
samples

 ### Python
 - Added new sample:
   - `samples/python/live-audio-transcription/src/app.py`
   - `samples/python/live-audio-transcription/requirements.txt`
- Implements live microphone transcription with Nemotron
(`create_live_transcription_session` pattern)

 ### Rust
 - Added new sample:
   - `samples/rust/live-audio-transcription-example/src/main.rs`
   - `samples/rust/live-audio-transcription-example/Cargo.toml`
   - `samples/rust/live-audio-transcription-example/README.md`
 - Added listing entry in `samples/rust/README.md`

 ### C++
 - Added new sample:
   - `samples/cpp/live-audio-transcription-example/main.cpp`
   - `samples/cpp/live-audio-transcription-example/README.md`
- Sample is based on the live-audio C++ API surface introduced in PR
#655

 ## Notes

 - Only sample-related files are included.
- Unrelated local artifacts (e.g. `.tgz`, local temp folders) were
intentionally excluded.

---------

Co-authored-by: ruiren_microsoft <ruiren@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: samkemp <samkemp@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants