Skip to content

Commit 573dbde

Browse files
rui-renruiren_microsoftCopilotsamuel100
authored
Add Nemotron samples multi-lang test samples (#672)
## Summary Add Nemotron live-audio transcription samples across JS, C#, Python, Rust, and C++ in their language-specific sample folders. ## What’s included ### JavaScript - Updated `samples/js/live-audio-transcription-example/app.js` - Synced to the final PR #588 behavior: - single-copy buffer handling in audio callback - improved queue/backpressure stability behavior retained ### C# - Updated `samples/cs/live-audio-transcription-example/Program.cs` - Uses spinner-based EP registration flow for consistency with other C# samples ### Python - Added new sample: - `samples/python/live-audio-transcription/src/app.py` - `samples/python/live-audio-transcription/requirements.txt` - Implements live microphone transcription with Nemotron (`create_live_transcription_session` pattern) ### Rust - Added new sample: - `samples/rust/live-audio-transcription-example/src/main.rs` - `samples/rust/live-audio-transcription-example/Cargo.toml` - `samples/rust/live-audio-transcription-example/README.md` - Added listing entry in `samples/rust/README.md` ### C++ - Added new sample: - `samples/cpp/live-audio-transcription-example/main.cpp` - `samples/cpp/live-audio-transcription-example/README.md` - Sample is based on the live-audio C++ API surface introduced in PR #655 ## Notes - Only sample-related files are included. - Unrelated local artifacts (e.g. `.tgz`, local temp folders) were intentionally excluded. --------- Co-authored-by: ruiren_microsoft <ruiren@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: samkemp <samkemp@microsoft.com>
1 parent c94bb03 commit 573dbde

18 files changed

Lines changed: 1494 additions & 3 deletions

File tree

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Live Audio Transcription Example (C++)
2+
3+
Demonstrates real-time microphone-to-text using the Foundry Local C++ SDK.
4+
5+
Uses [PortAudio](http://www.portaudio.com/) for cross-platform microphone capture
6+
(the C/C++ equivalent of `naudiodon2` used by the JS sample). If PortAudio is not
7+
available, falls back to synthetic PCM audio.
8+
9+
10+
## Build
11+
12+
```bash
13+
# With PortAudio (live microphone)
14+
g++ -std=c++20 -DHAS_PORTAUDIO main.cpp -lfoundry_local -lportaudio -o live-audio-transcription-example
15+
16+
# Without PortAudio (synthetic audio only)
17+
g++ -std=c++20 main.cpp -lfoundry_local -o live-audio-transcription-example
18+
```
19+
20+
## Run
21+
22+
```bash
23+
# Live microphone (requires PortAudio)
24+
./live-audio-transcription-example
25+
26+
# Synthetic 440Hz sine wave (no microphone needed)
27+
./live-audio-transcription-example --synth
28+
```
Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
// Live Audio Transcription — Foundry Local C++ SDK Example
2+
//
3+
// Demonstrates real-time microphone-to-text using the C++ SDK.
4+
// Uses PortAudio for cross-platform mic capture (like naudiodon2 in the JS sample).
5+
// Falls back to synthetic PCM if PortAudio is unavailable.
6+
//
7+
// Requires: PortAudio (libportaudio), Foundry Local C++ SDK
8+
//
9+
// Usage: ./live-audio-transcription-example [--synth]
10+
11+
#include <algorithm>
12+
#include <atomic>
13+
#include <chrono>
14+
#include <climits>
15+
#include <cmath>
16+
#include <csignal>
17+
#include <cstdint>
18+
#include <deque>
19+
#include <iostream>
20+
#include <mutex>
21+
#include <string>
22+
#include <thread>
23+
#include <vector>
24+
25+
#include "foundry_local.h"
26+
27+
// PortAudio is optional — compile with -DHAS_PORTAUDIO and link -lportaudio
28+
// to enable live microphone capture.
29+
#ifdef HAS_PORTAUDIO
30+
#include <portaudio.h>
31+
#endif
32+
33+
namespace {
34+
35+
// Global flag for Ctrl+C graceful shutdown (mirrors JS process.on('SIGINT'))
36+
std::atomic<bool> g_running{true};
37+
38+
void SignalHandler(int /*signum*/) {
39+
g_running = false;
40+
}
41+
42+
// Bounded audio queue (mirrors JS appendQueue with cap of 100)
43+
class AudioQueue {
44+
public:
45+
void Push(std::vector<uint8_t> chunk) {
46+
std::lock_guard<std::mutex> lock(mu_);
47+
if (queue_.size() >= kMaxSize) {
48+
queue_.pop_front();
49+
if (!warnedDrop_) {
50+
warnedDrop_ = true;
51+
std::cerr << "Audio append queue overflow; dropping oldest chunk to keep stream alive." << std::endl;
52+
}
53+
}
54+
queue_.push_back(std::move(chunk));
55+
}
56+
57+
bool TryPop(std::vector<uint8_t>& out) {
58+
std::lock_guard<std::mutex> lock(mu_);
59+
if (queue_.empty()) return false;
60+
out = std::move(queue_.front());
61+
queue_.pop_front();
62+
return true;
63+
}
64+
65+
private:
66+
static constexpr size_t kMaxSize = 100;
67+
std::deque<std::vector<uint8_t>> queue_;
68+
std::mutex mu_;
69+
bool warnedDrop_ = false;
70+
};
71+
72+
std::vector<uint8_t> GenerateSineWavePcm(int sampleRate, int durationSeconds, double frequencyHz) {
73+
const auto totalSamples = static_cast<size_t>(sampleRate * durationSeconds);
74+
std::vector<uint8_t> pcm(totalSamples * 2, 0); // 16-bit mono, little-endian
75+
76+
for (size_t i = 0; i < totalSamples; ++i) {
77+
const double t = static_cast<double>(i) / static_cast<double>(sampleRate);
78+
const auto sample = static_cast<int16_t>(
79+
static_cast<double>(INT16_MAX) * 0.5 * std::sin(2.0 * 3.14159265358979323846 * frequencyHz * t));
80+
const auto encodedSample = static_cast<uint16_t>(sample);
81+
pcm[i * 2] = static_cast<uint8_t>(encodedSample & 0xFF);
82+
pcm[i * 2 + 1] = static_cast<uint8_t>((encodedSample >> 8) & 0xFF);
83+
}
84+
return pcm;
85+
}
86+
87+
#ifdef HAS_PORTAUDIO
88+
// PortAudio callback — captures 16-bit mono PCM and pushes to the queue
89+
int PaCallback(const void* input, void* /*output*/,
90+
unsigned long frameCount,
91+
const PaStreamCallbackTimeInfo* /*timeInfo*/,
92+
PaStreamCallbackFlags /*statusFlags*/,
93+
void* userData) {
94+
auto* queue = static_cast<AudioQueue*>(userData);
95+
const auto* pcm = static_cast<const uint8_t*>(input);
96+
const size_t byteCount = frameCount * 2; // 16-bit mono = 2 bytes per frame
97+
std::vector<uint8_t> chunk(pcm, pcm + byteCount);
98+
queue->Push(std::move(chunk));
99+
return g_running ? paContinue : paComplete;
100+
}
101+
#endif
102+
103+
} // namespace
104+
105+
int main(int argc, char* argv[]) {
106+
bool useSynth = false;
107+
for (int i = 1; i < argc; ++i) {
108+
if (std::string(argv[i]) == "--synth") useSynth = true;
109+
}
110+
111+
// Install Ctrl+C handler (mirrors JS process.on('SIGINT'))
112+
std::signal(SIGINT, SignalHandler);
113+
114+
try {
115+
std::cout << "===========================================================" << std::endl;
116+
std::cout << " Foundry Local -- Live Audio Transcription Demo (C++)" << std::endl;
117+
std::cout << "===========================================================" << std::endl;
118+
std::cout << std::endl;
119+
120+
foundry_local::Configuration config;
121+
config.appName = "foundry_local_samples";
122+
123+
foundry_local::Manager::Create(config);
124+
auto& manager = foundry_local::Manager::Instance();
125+
manager.EnsureEpsDownloaded();
126+
127+
auto& catalog = manager.GetCatalog();
128+
auto* model = catalog.GetModel("nemotron-speech-streaming-en-0.6b");
129+
if (!model) {
130+
throw std::runtime_error("Model \"nemotron-speech-streaming-en-0.6b\" not found in catalog");
131+
}
132+
133+
std::cout << "Downloading model (if needed)..." << std::endl;
134+
model->Download([](float pct) {
135+
std::cout << "\rDownloading: " << pct << "% " << std::flush;
136+
});
137+
std::cout << std::endl;
138+
std::cout << "Loading model..." << std::endl;
139+
model->Load();
140+
std::cout << "Model loaded" << std::endl;
141+
142+
// NOTE: CreateLiveTranscriptionSession() is not yet available in the C++ SDK.
143+
// The audio client and session code below is forward-looking.
144+
foundry_local::OpenAIAudioClient audioClient(*model);
145+
auto session = audioClient.CreateLiveTranscriptionSession();
146+
147+
session->Settings().sample_rate = 16000;
148+
session->Settings().channels = 1;
149+
session->Settings().bits_per_sample = 16;
150+
session->Settings().language = "en";
151+
session->Start();
152+
std::cout << "Session started" << std::endl;
153+
154+
// Read transcription results in a background thread (mirrors JS readPromise)
155+
std::thread readThread([&session]() {
156+
foundry_local::LiveAudioTranscriptionResponse result;
157+
while (g_running) {
158+
const auto status = session->TryGetNext(result, std::chrono::milliseconds(500));
159+
if (status == foundry_local::TranscriptionStatus::Result) {
160+
if (result.is_final) {
161+
std::cout << "\n [FINAL] " << result.text << std::endl;
162+
} else if (!result.text.empty()) {
163+
std::cout << result.text << std::flush;
164+
}
165+
} else if (status == foundry_local::TranscriptionStatus::Closed) {
166+
break;
167+
} else if (status == foundry_local::TranscriptionStatus::Timeout) {
168+
continue;
169+
} else {
170+
std::cerr << "Transcription stream error: " << session->GetErrorMessage() << std::endl;
171+
break;
172+
}
173+
}
174+
});
175+
176+
// --- Microphone capture (mirrors JS naudiodon2 section) ---
177+
// Uses PortAudio for cross-platform audio capture. If PortAudio is not
178+
// available or --synth is passed, falls back to synthetic PCM.
179+
180+
bool micActive = false;
181+
182+
#ifdef HAS_PORTAUDIO
183+
PaStream* paStream = nullptr;
184+
AudioQueue audioQueue;
185+
186+
if (!useSynth) {
187+
PaError err = Pa_Initialize();
188+
if (err == paNoError) {
189+
PaStreamParameters inputParams{};
190+
inputParams.device = Pa_GetDefaultInputDevice();
191+
if (inputParams.device != paNoDevice) {
192+
inputParams.channelCount = 1;
193+
inputParams.sampleFormat = paInt16;
194+
inputParams.suggestedLatency =
195+
Pa_GetDeviceInfo(inputParams.device)->defaultLowInputLatency;
196+
inputParams.hostApiSpecificStreamInfo = nullptr;
197+
198+
// framesPerBuffer=3200 matches JS framesPerBuffer setting
199+
err = Pa_OpenStream(&paStream, &inputParams, nullptr,
200+
16000, 3200, paClipOff,
201+
PaCallback, &audioQueue);
202+
if (err == paNoError) {
203+
err = Pa_StartStream(paStream);
204+
}
205+
}
206+
207+
if (err == paNoError && paStream) {
208+
micActive = true;
209+
std::cout << std::endl;
210+
std::cout << "===========================================================" << std::endl;
211+
std::cout << " LIVE TRANSCRIPTION ACTIVE" << std::endl;
212+
std::cout << " Speak into your microphone." << std::endl;
213+
std::cout << " Press Ctrl+C to stop." << std::endl;
214+
std::cout << "===========================================================" << std::endl;
215+
std::cout << std::endl;
216+
217+
// Pump audio from the queue to the session (mirrors JS pumpAudio)
218+
while (g_running) {
219+
std::vector<uint8_t> chunk;
220+
if (audioQueue.TryPop(chunk)) {
221+
session->Append(chunk.data(), chunk.size());
222+
} else {
223+
std::this_thread::sleep_for(std::chrono::milliseconds(10));
224+
}
225+
}
226+
227+
Pa_StopStream(paStream);
228+
Pa_CloseStream(paStream);
229+
} else {
230+
std::cerr << "Could not initialize microphone: "
231+
<< Pa_GetErrorText(err) << std::endl;
232+
std::cerr << "Falling back to synthetic audio test..." << std::endl;
233+
std::cerr << std::endl;
234+
}
235+
Pa_Terminate();
236+
}
237+
}
238+
#endif
239+
240+
// Fallback: push synthetic PCM (440Hz sine wave) — mirrors JS catch block
241+
if (!micActive) {
242+
std::cout << "Pushing synthetic audio (440Hz sine, 2s)..." << std::endl;
243+
const auto pcm = GenerateSineWavePcm(16000, 2, 440.0);
244+
const size_t chunkSize = static_cast<size_t>(16000 / 10 * 2); // 100ms
245+
for (size_t offset = 0; offset < pcm.size() && g_running; offset += chunkSize) {
246+
const size_t len = std::min(chunkSize, pcm.size() - offset);
247+
session->Append(pcm.data() + offset, len);
248+
std::this_thread::sleep_for(std::chrono::milliseconds(100));
249+
}
250+
std::cout << "Synthetic audio pushed" << std::endl;
251+
252+
// Wait briefly for remaining transcription results
253+
std::this_thread::sleep_for(std::chrono::seconds(3));
254+
}
255+
256+
// Graceful shutdown (mirrors JS SIGINT handler)
257+
std::cout << "\n\nStopping..." << std::endl;
258+
session->Stop();
259+
readThread.join();
260+
model->Unload();
261+
foundry_local::Manager::Destroy();
262+
std::cout << "Done" << std::endl;
263+
return 0;
264+
} catch (const std::exception& ex) {
265+
std::cerr << "Error: " << ex.what() << std::endl;
266+
foundry_local::Manager::Destroy();
267+
return 1;
268+
}
269+
}

samples/cs/Directory.Packages.props

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
<PackageVersion Include="Microsoft.AI.Foundry.Local" Version="*-*" />
88
<PackageVersion Include="Microsoft.AI.Foundry.Local.WinML" Version="*-*" />
99
<PackageVersion Include="Betalgo.Ranul.OpenAI" Version="9.2.0" />
10-
<PackageVersion Include="Microsoft.Extensions.Logging" Version="9.0.10" />
11-
<PackageVersion Include="Microsoft.Extensions.Logging.Console" Version="9.0.10" />
10+
<PackageVersion Include="Microsoft.Extensions.Logging" Version="9.0.15" />
11+
<PackageVersion Include="Microsoft.Extensions.Logging.Console" Version="9.0.15" />
1212
<PackageVersion Include="NAudio" Version="2.2.1" />
1313
<PackageVersion Include="OpenAI" Version="2.5.0" />
1414
</ItemGroup>
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
<Project Sdk="Microsoft.NET.Sdk">
2+
3+
<PropertyGroup>
4+
<OutputType>Exe</OutputType>
5+
<ImplicitUsings>enable</ImplicitUsings>
6+
<Nullable>enable</Nullable>
7+
</PropertyGroup>
8+
9+
<!-- Windows: target Windows SDK for WinML hardware acceleration -->
10+
<PropertyGroup Condition="$([MSBuild]::IsOSPlatform('Windows'))">
11+
<TargetFramework>net9.0-windows10.0.26100</TargetFramework>
12+
<WindowsAppSDKSelfContained>false</WindowsAppSDKSelfContained>
13+
<Platforms>ARM64;x64</Platforms>
14+
<WindowsPackageType>None</WindowsPackageType>
15+
<EnableCoreMrtTooling>false</EnableCoreMrtTooling>
16+
</PropertyGroup>
17+
18+
<!-- Non-Windows: standard .NET -->
19+
<PropertyGroup Condition="!$([MSBuild]::IsOSPlatform('Windows'))">
20+
<TargetFramework>net9.0</TargetFramework>
21+
</PropertyGroup>
22+
23+
<PropertyGroup Condition="'$(RuntimeIdentifier)'==''">
24+
<RuntimeIdentifier>$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
25+
</PropertyGroup>
26+
27+
<!-- Windows: WinML for hardware acceleration -->
28+
<ItemGroup Condition="$([MSBuild]::IsOSPlatform('Windows'))">
29+
<PackageReference Include="Microsoft.AI.Foundry.Local.WinML" />
30+
</ItemGroup>
31+
32+
<!-- Non-Windows: standard SDK -->
33+
<ItemGroup Condition="!$([MSBuild]::IsOSPlatform('Windows'))">
34+
<PackageReference Include="Microsoft.AI.Foundry.Local" />
35+
</ItemGroup>
36+
37+
<!-- Linux GPU support -->
38+
<ItemGroup Condition="'$(RuntimeIdentifier)' == 'linux-x64'">
39+
<PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" />
40+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" />
41+
</ItemGroup>
42+
43+
<!-- Shared utilities -->
44+
<ItemGroup>
45+
<Compile Include="../Shared/*.cs" />
46+
</ItemGroup>
47+
48+
<!-- Packages -->
49+
<ItemGroup>
50+
<PackageReference Include="Microsoft.Extensions.Logging" />
51+
<PackageReference Include="Microsoft.Extensions.Logging.Console" />
52+
<PackageReference Include="NAudio" />
53+
</ItemGroup>
54+
55+
</Project>

0 commit comments

Comments
 (0)