Fixes ios#514
Conversation
📝 WalkthroughWalkthroughThe PR adds LFM2-VL support in the llama.cpp VLM path, refines VLM token streaming, rewires voice-agent capture and playback around a new mic driver and turn-event bridge, removes the iOS Solutions demo wiring, and updates React Native build and dependency pins. ChangesVoice Agent Runtime and Bridge
LLM and VLM Runtime
iOS Solutions Demo Removal
Build and Dependency Alignment
Sequence Diagram(s)sequenceDiagram
participant StreamConsumer
participant GenerateStream as RunAnywhere+TextGeneration.generateStream
participant NitroModulesGlobalInit
participant NativeLLM as native.llmGenerateStreamProto
StreamConsumer->>GenerateStream: next()
GenerateStream->>NitroModulesGlobalInit: getNitroModulesProxySync()
GenerateStream->>NativeLLM: llmGenerateStreamProto(callback)
NativeLLM-->>GenerateStream: LLMStreamEvent
GenerateStream-->>StreamConsumer: queued event
StreamConsumer->>GenerateStream: return()
GenerateStream->>NativeLLM: llmCancelProto()
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Null LLM text crash
- Restored the null-safe fallback when copying LLM text for turn lifecycle emission.
- ✅ Fixed: Turn error ends active UI
- Kept per-turn voice errors as messages without transitioning the active session UI into an error state.
Or push these changes by commenting:
@cursor push d245a33033
Preview (d245a33033)
diff --git a/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift b/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
--- a/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
+++ b/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
@@ -561,8 +561,6 @@
case let .error(err):
logger.error("Voice agent error: \(err.message)")
errorMessage = err.message
- sessionState = .error(err.message)
- currentStatus = "Error"
case let .sessionError(err):
logger.error("Voice session error: \(err.message)")
diff --git a/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp b/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
--- a/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
+++ b/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
@@ -397,7 +397,7 @@
}
{
const std::string stt_text(stt.text);
- const std::string llm_text(llm.text);
+ const std::string llm_text(llm.text ? llm.text : "");
pending_emits.emplace_back([handle, stt_text, llm_text]() {
emit_turn_lifecycle(
handle, runanywhere::v1::TURN_LIFECYCLE_EVENT_KIND_AGENT_RESPONSE_COMPLETED,You can send follow-ups to the cloud agent here.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.
| { | ||
| const std::string stt_text(stt.text); | ||
| const std::string llm_text(llm.text ? llm.text : ""); | ||
| const std::string llm_text(llm.text); |
There was a problem hiding this comment.
Null LLM text crash
High Severity
Constructing std::string directly from llm.text removes the prior null guard. If rac_llm_generate succeeds but leaves llm.text null, this undefined behavior can crash during turn lifecycle emission.
Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.
| logger.error("Voice agent error: \(err.message)") | ||
| errorMessage = err.message | ||
| sessionState = .error(err.message) | ||
| currentStatus = "Error" |
There was a problem hiding this comment.
Turn error ends active UI
Medium Severity
Every .error voice event now sets sessionState to .error, so isActive becomes false while streamVoiceAgent() and the SDK mic driver keep running. A single failed turn can show a dead session in the UI even though capture continues.
Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp (1)
1216-1240: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winAvoid truncating cleaned tokens to 511 bytes.
The new scratch buffer caps
display_tokenat 511 bytes. Any longer token is silently truncated before it is appended toctx->text, counted, and emitted, which corrupts the streamed/output text on that path.Suggested fix
- char cleaned[512]; - const char* display_token = vlm_strip_special_tokens(safe_token, cleaned, sizeof(cleaned)); + std::string cleaned(std::strlen(safe_token) + 1, '\0'); + const char* display_token = + vlm_strip_special_tokens(safe_token, cleaned.data(), cleaned.size());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp` around lines 1216 - 1240, Avoid truncating cleaned tokens in the VLM streaming path: the local scratch buffer used by vlm_strip_special_tokens in the token handling block can silently cut display_token to 511 bytes before it is appended to ctx->text, counted, and published. Update the token-cleaning flow in vlm_module.cpp around the display_token/publish_event logic to preserve the full token content, using a dynamically sized or sufficiently sized buffer so dispatch_vlm_stream_event and the generation event receive the complete token text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/ios/RunAnywhereAI/AGENTS.md`:
- Line 67: Update the navigation table entry for MoreHubView so Voice Keyboard
is clearly marked as iOS-only. In AGENTS.md, adjust the MoreHubView row to
reflect the same platform gating used in the runtime view’s `#if` os(iOS) logic,
so the docs do not imply Voice Keyboard appears on macOS.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift`:
- Around line 585-588: The .agentResponseStarted handling in VoiceAgentViewModel
is clearing both assistantResponse and currentTranscript, which removes the
finalized user transcript too early. Update the .agentResponseStarted case to
clear only assistantResponse and leave currentTranscript intact so the .userSaid
transcript remains visible. Use the VoiceAgentViewModel state handling around
.userSaid and .agentResponseStarted to locate the change.
In `@sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp`:
- Line 400: `llm.text` is used in `voice_agent_proto_abi.cpp` without
validation, which can crash before the existing error handling and also
propagate a null pointer into TTS. In the `llm_text` construction and the later
synthesis path, add a guard in the `voice_agent` flow to verify `llm.text` is
non-null and non-empty before creating the `std::string` or calling the
TTS/synthesis helpers, and route invalid responses through the existing failure
path in the same block.
In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift`:
- Around line 143-151: The audio session setup in
HybridAudioCapture.startRecording is incorrectly using audioSession.category as
the only signal for whether to keep full-duplex mode. Add an explicit state flag
in this flow, such as isFullDuplexActive, to track ownership from
activateAudioSession and stopRecording(deactivateSession:), then use that flag
to decide when to apply .record/.measurement. Update the startRecording logic so
plain STT starts reconfigure to measurement mode unless a real voice-agent
session is active, instead of relying on the stale .playAndRecord category.
In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift`:
- Around line 182-188: The shared-session reuse check in HybridAudioPlayback’s
audio-session setup is too loose because it keys off AVAudioSession.category
alone, which can remain .playAndRecord after the voice agent has deactivated.
Update the logic around the session configuration path to gate reuse on explicit
active ownership/state from the voice-session controller instead of the stale
category, so the code only skips setCategory(.playback, mode: .default, options:
[.duckOthers]) and setActive(true) when a voice session is truly active. Keep
the ownsSession flag in sync with that explicit state so cleanup still runs
correctly when the agent is no longer holding the session.
In
`@sdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.ts`:
- Around line 87-104: The VoiceAgentMicDriver stop/start flow can leave an
in-flight processTurn() from a previous session alive, allowing a stale native
turn to complete after restart and affect the new session. Update
VoiceAgentMicDriver so stop() either awaits or invalidates any outstanding turn
work before returning, and make processTurn()/the turn-completion path check a
session/token or cancellation state that survives start() resetting stopped;
ensure the stale result cannot pass the completion guard and trigger playback
after a new start.
- Around line 74-82: `VoiceAgentMicDriver.start` activates the audio session
before calling `capture.startRecording`, but if recording throws, the session
can stay active because `stop()` may do nothing when recording never started.
Wrap the activation/startRecording sequence in a failure path inside
`VoiceAgentMicDriver` and, on any error from `startRecording`, explicitly clean
up by stopping/deactivating the capture session before rethrowing so the session
is not left active.
In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere`+TextGeneration.ts:
- Around line 268-273: Guard the native cancel in the iterator cleanup path so
`return()` only calls `native.llmCancelProto()` when this `LLMStream` still owns
an active generation. Use the stream’s active-state tracking in
`RunAnywhere+TextGeneration` (and its `finish()`/cleanup flow) to skip cancel on
unopened, already-finished, or stale iterators, preventing `return()` from
aborting a newer generation started after the iterator was disposed.
In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.ts:
- Around line 336-347: The mic startup in VoiceAgentMicDriver is fire-and-forget
and the stream still uses adapter.stream() delegation, which can leave the voice
session hanging on startup failure and is not Hermes-compatible. Update
RunAnywhere+VoiceAgent to await micDriver.start() before entering the stream
consumption path, and replace yield* adapter.stream() in the streaming logic
with an explicit manual iterator.next() loop for async-iterable compatibility.
Make sure the finally block still stops the mic driver after the loop exits.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift`:
- Around line 141-144: Capture the session-ownership decision at playback start
in AudioPlaybackManager by storing the current managesAudioSession value in the
locked State snapshot before calling configureAudioSession(), then use that
stored value during cleanup instead of re-reading the mutable property; update
the start/stop flow in AudioPlaybackManager and its State handling so cleanup
uses the same ownership choice for the whole playback instance.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.swift:
- Around line 225-236: The mic-driver task in RunAnywhere+VoiceAgent currently
only logs failures from VoiceAgentMicDriver.run() and lets adapter.stream()
continue waiting, which can hang the session. Update the flow around micTask and
the stream loop so the mic task is raced against streaming, and if it exits
unexpectedly with a non-cancellation error, propagate that failure or finish the
continuation immediately. Use the existing symbols VoiceAgentMicDriver.run(),
micTask, and adapter.stream() to locate the logic and make sure a dead mic
session cannot leave the stream pending.
---
Outside diff comments:
In `@sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp`:
- Around line 1216-1240: Avoid truncating cleaned tokens in the VLM streaming
path: the local scratch buffer used by vlm_strip_special_tokens in the token
handling block can silently cut display_token to 511 bytes before it is appended
to ctx->text, counted, and published. Update the token-cleaning flow in
vlm_module.cpp around the display_token/publish_event logic to preserve the full
token content, using a dynamically sized or sufficiently sized buffer so
dispatch_vlm_stream_event and the generation event receive the complete token
text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fb7d83eb-8113-45c9-8651-22956b724f3d
⛔ Files ignored due to path filters (2)
examples/ios/RunAnywhereAI/RunAnywhereAI/Generated/SolutionsYaml.swiftis excluded by!**/generated/**yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (30)
engines/llamacpp/rac_vlm_llamacpp.cppexamples/ios/RunAnywhereAI/AGENTS.mdexamples/ios/RunAnywhereAI/RunAnywhereAI/App/ContentView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Solutions/SolutionsView.swiftexamples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swiftexamples/ios/RunAnywhereAI/scripts/sync-solutions-yamls.shexamples/ios/RunAnywhereAI/scripts/verify.shexamples/react-native/RunAnywhereAI/ios/Podfileexamples/react-native/RunAnywhereAI/ios/RunAnywhereAI.xcworkspace/xcshareddata/swiftpm/Package.resolvedexamples/react-native/RunAnywhereAI/package.jsonexamples/react-native/RunAnywhereAI/src/screens/ChatScreen.tsxexamples/react-native/RunAnywhereAI/src/screens/STTScreen.tsxexamples/react-native/RunAnywhereAI/src/screens/TTSScreen.tsxexamples/react-native/RunAnywhereAI/src/services/ModelCatalogBootstrap.tspackage.jsonsdk/runanywhere-commons/src/features/vlm/vlm_module.cppsdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cppsdk/runanywhere-react-native/package.jsonsdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swiftsdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swiftsdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.tssdk/runanywhere-react-native/packages/core/src/Features/VoiceSession/AudioPlaybackManager.tssdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere+TextGeneration.tssdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.tssdk/runanywhere-react-native/packages/core/src/native/NitroModulesGlobalInit.tssdk/runanywhere-swift/Sources/RunAnywhere/Features/STT/Services/AudioCaptureManager.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Features/VoiceAgent/Services/VoiceAgentMicDriver.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+ModalityProtoABI.swiftsdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.swift
💤 Files with no reviewable changes (5)
- examples/ios/RunAnywhereAI/scripts/verify.sh
- examples/ios/RunAnywhereAI/RunAnywhereAI/App/ContentView.swift
- examples/react-native/RunAnywhereAI/ios/RunAnywhereAI.xcworkspace/xcshareddata/swiftpm/Package.resolved
- examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Solutions/SolutionsView.swift
- examples/ios/RunAnywhereAI/scripts/sync-solutions-yamls.sh
| | 1 | `VisionHubView` | VLM camera | | ||
| | 2 | `VoiceAssistantView` | Full voice agent (STT + LLM + TTS pipeline) | | ||
| | 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Solutions, Voice Keyboard | | ||
| | 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard | |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Mark Voice Keyboard as iOS-only in the navigation table.
Line 67 currently reads like MoreHubView exposes Voice Keyboard on every platform, but the runtime view gates that entry behind #if os(iOS). The doc is inaccurate for macOS.
✏️ Suggested doc fix
-| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard |
+| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, iOS-only Voice Keyboard |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard | | |
| | 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, iOS-only Voice Keyboard | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/ios/RunAnywhereAI/AGENTS.md` at line 67, Update the navigation table
entry for MoreHubView so Voice Keyboard is clearly marked as iOS-only. In
AGENTS.md, adjust the MoreHubView row to reflect the same platform gating used
in the runtime view’s `#if` os(iOS) logic, so the docs do not imply Voice Keyboard
appears on macOS.
| case .agentResponseStarted: | ||
| assistantResponse = "" | ||
| currentTranscript = "" | ||
|
|
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Don’t clear the finalized user transcript when the response starts.
.userSaid sets currentTranscript, but .agentResponseStarted can arrive right after and erase it before the user sees what was transcribed. Clear only the assistant response here.
Proposed fix
case .agentResponseStarted:
assistantResponse = ""
- currentTranscript = ""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case .agentResponseStarted: | |
| assistantResponse = "" | |
| currentTranscript = "" | |
| case .agentResponseStarted: | |
| assistantResponse = "" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift`
around lines 585 - 588, The .agentResponseStarted handling in
VoiceAgentViewModel is clearing both assistantResponse and currentTranscript,
which removes the finalized user transcript too early. Update the
.agentResponseStarted case to clear only assistantResponse and leave
currentTranscript intact so the .userSaid transcript remains visible. Use the
VoiceAgentViewModel state handling around .userSaid and .agentResponseStarted to
locate the change.
| { | ||
| const std::string stt_text(stt.text); | ||
| const std::string llm_text(llm.text ? llm.text : ""); | ||
| const std::string llm_text(llm.text); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Validate llm.text before constructing or synthesizing from it.
Line 400 now constructs std::string from llm.text without a null check; if an LLM backend returns success with a null/empty text, this can crash before the error path runs, and Lines 415/417 would also pass the null pointer into TTS.
Proposed fix
if (rc != RAC_SUCCESS) {
if (have_lifecycle_llm) {
rac::llm::release_lifecycle_llm(&llm_ref);
}
@@
error_message = "LLM generation failed";
goto cleanup_and_return;
}
+ if (!llm.text || llm.text[0] == '\0') {
+ rac_llm_result_free(&llm);
+ if (have_lifecycle_llm) {
+ rac::llm::release_lifecycle_llm(&llm_ref);
+ }
+ rac_stt_result_free(&stt);
+ if (have_lifecycle_stt) {
+ rac::lifecycle::release_lifecycle_stt(&stt_ref);
+ }
+ pending_emits.emplace_back([handle]() {
+ emit_component_failure(handle, "llm", RAC_ERROR_INVALID_STATE,
+ "LLM generation was empty");
+ });
+ error_code = RAC_ERROR_INVALID_STATE;
+ error_message = "LLM generation was empty";
+ rc = error_code;
+ goto cleanup_and_return;
+ }
{
const std::string stt_text(stt.text);
const std::string llm_text(llm.text);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const std::string llm_text(llm.text); | |
| if (!llm.text || llm.text[0] == '\0') { | |
| rac_llm_result_free(&llm); | |
| if (have_lifecycle_llm) { | |
| rac::llm::release_lifecycle_llm(&llm_ref); | |
| } | |
| rac_stt_result_free(&stt); | |
| if (have_lifecycle_stt) { | |
| rac::lifecycle::release_lifecycle_stt(&stt_ref); | |
| } | |
| pending_emits.emplace_back([handle]() { | |
| emit_component_failure(handle, "llm", RAC_ERROR_INVALID_STATE, | |
| "LLM generation was empty"); | |
| }); | |
| error_code = RAC_ERROR_INVALID_STATE; | |
| error_message = "LLM generation was empty"; | |
| rc = error_code; | |
| goto cleanup_and_return; | |
| } | |
| { | |
| const std::string stt_text(stt.text); | |
| const std::string llm_text(llm.text); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp`
at line 400, `llm.text` is used in `voice_agent_proto_abi.cpp` without
validation, which can crash before the existing error handling and also
propagate a null pointer into TTS. In the `llm_text` construction and the later
synthesis path, add a guard in the `voice_agent` flow to verify `llm.text` is
non-null and non-empty before creating the `std::string` or calling the
TTS/synthesis helpers, and route invalid responses through the existing failure
path in the same block.
| // Preserve a full-duplex session if the voice agent already | ||
| // configured one (activateAudioSession → .playAndRecord): | ||
| // switching back to .record would silence the TTS reply and | ||
| // disable the simultaneous playback the agent needs. STT capture | ||
| // (no prior activate) falls through to the unprocessed | ||
| // .record/.measurement path that gives Whisper the cleanest signal. | ||
| if audioSession.category != .playAndRecord { | ||
| try audioSession.setCategory(.record, mode: .measurement) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cat -n sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 15829
🏁 Script executed:
grep -n "SDKLogger\|NSLock\|OSAllocatedUnfairLock" sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swiftRepository: RunanywhereAI/runanywhere-sdks
Length of output: 311
🏁 Script executed:
python3 <<'PY'
# According to Apple's AVAudioSession documentation:
# When you call setActive(false), the session is deactivated but the category
# remains unchanged. It does not automatically revert to a default category.
# Therefore, if a full-duplex session was configured with .playAndRecord,
# and then deactivateSession(true) is called, the session becomes inactive
# but retains .playAndRecord as its category.
# The code in configureAndActivateSession (lines 149-151) checks:
# if audioSession.category != .playAndRecord {
# try audioSession.setCategory(.record, mode: .measurement)
# }
# If the previous session was .playAndRecord and was deactivated:
# 1. The category remains .playAndRecord
# 2. The next startRecording() will NOT change the category to .record
# 3. This means the session is activated in .playAndRecord mode, not .record mode
# 4. This could affect audio quality for pure STT capture (measurement mode provides
# cleaner signal for Whisper by disabling echo cancellation/processing)
# The original review comment is CORRECT:
# - Checking only `category` is insufficient because it persists after deactivation
# - The code should explicitly track full-duplex ownership OR always reset to
# .record/.measurement when starting a plain STT session
# - Alternatively, use `isSessionActive` or explicit state to track if a voice-agent
# session is currently active
print("CONFIRMED: The category persists after setActive(false). The review concern is valid.")
print("The code does track state internally (CaptureState in line 42), but does not")
print("expose whether the session is full-duplex vs single-direction.")
print("")
print("Issue: Lines 149-151 rely solely on category check, which can be stale.")
PYRepository: RunanywhereAI/runanywhere-sdks
Length of output: 467
Don't infer active voice-agent state from category alone.
After stopRecording(deactivateSession: true), the AVAudioSession category persists as .playAndRecord. A subsequent startRecording() will detect this stale category and skip the .record/.measurement configuration, resulting in STT capture using full-duplex session processing instead of the optimized measurement mode.
Introduce an explicit boolean flag (e.g., isFullDuplexActive) to track voice-agent session ownership, or unconditionally apply .record/.measurement on plain recording starts that are not preceded by an active activateAudioSession() call.
if audioSession.category != .playAndRecord {
try audioSession.setCategory(.record, mode: .measurement)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift`
around lines 143 - 151, The audio session setup in
HybridAudioCapture.startRecording is incorrectly using audioSession.category as
the only signal for whether to keep full-duplex mode. Add an explicit state flag
in this flow, such as isFullDuplexActive, to track ownership from
activateAudioSession and stopRecording(deactivateSession:), then use that flag
to decide when to apply .record/.measurement. Update the startRecording logic so
plain STT starts reconfigure to measurement mode unless a real voice-agent
session is active, instead of relying on the stale .playAndRecord category.
| if session.category == .playAndRecord { | ||
| lock.withLock { $0.ownsSession = false } | ||
| return | ||
| } | ||
| try session.setCategory(.playback, mode: .default, options: [.duckOthers]) | ||
| try session.setActive(true) | ||
| lock.withLock { $0.ownsSession = true } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor
Gate shared-session reuse on explicit ownership, not stale category.
AVAudioSession.category == .playAndRecord persists even when the voice agent deactivates its session. If the agent ends a call but leaves the category configured, HybridAudioPlayback incorrectly identifies this "stale" state as an active shared session. This causes it to skip reconfiguring to .playback, retain ownsSession = false, and fail to clean up the session later, leaving the app in a suboptimal .playAndRecord state.
Replace the category-only check with a check for an active voice session controller, or require explicit state signaling to ensure the shared session is currently active before skipping configuration.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift`
around lines 182 - 188, The shared-session reuse check in HybridAudioPlayback’s
audio-session setup is too loose because it keys off AVAudioSession.category
alone, which can remain .playAndRecord after the voice agent has deactivated.
Update the logic around the session configuration path to gate reuse on explicit
active ownership/state from the voice-session controller instead of the stale
category, so the code only skips setCategory(.playback, mode: .default, options:
[.duckOthers]) and setActive(true) when a voice session is truly active. Keep
the ownsSession flag in sync with that explicit state so cleanup still runs
correctly when the agent is no longer holding the session.
| async stop(): Promise<void> { | ||
| if (this.stopped) return; | ||
| this.stopped = true; | ||
| try { | ||
| this.capture.stopRecording(); | ||
| } catch { | ||
| /* noop */ | ||
| } | ||
| try { | ||
| this.playback.stop(); | ||
| } catch { | ||
| /* noop */ | ||
| } | ||
| this.preRoll = []; | ||
| this.utterance = []; | ||
| this.inSpeech = false; | ||
| this.logger.info('Voice-agent mic capture stopped'); | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Invalidate or await in-flight turns on stop.
processTurn() is fire-and-forget, so stop() can return while a native turn is still pending. If the driver is started again before that promise resolves, start() sets stopped = false, allowing the old result to pass Line 190 and play a stale reply in the new session.
Proposed fix
private stopped = false;
private processing = false;
+ private generation = 0;
+ private inFlightTurn: Promise<void> | undefined;
@@
async start(): Promise<void> {
@@
this.stopped = false;
+ this.generation += 1;
@@
async stop(): Promise<void> {
if (this.stopped) return;
this.stopped = true;
+ this.generation += 1;
@@
this.utterance = [];
this.inSpeech = false;
+ this.speechMs = 0;
+ this.silenceMs = 0;
+ this.processing = false;
+ await this.inFlightTurn?.catch(() => undefined);
this.logger.info('Voice-agent mic capture stopped');
}
@@
if (speechMs >= MIN_SPEECH_MS) {
@@
this.processing = true;
- void this.processTurn(audio).finally(() => {
- this.processing = false;
- });
+ const generation = this.generation;
+ this.inFlightTurn = this.processTurn(audio, generation).finally(() => {
+ if (this.generation === generation) {
+ this.processing = false;
+ }
+ });
@@
- private async processTurn(audio: Uint8Array): Promise<void> {
- if (this.stopped || audio.byteLength === 0) return;
+ private async processTurn(audio: Uint8Array, generation: number): Promise<void> {
+ if (this.stopped || this.generation !== generation || audio.byteLength === 0) return;
@@
- if (this.stopped) return;
+ if (this.stopped || this.generation !== generation) return;Also applies to: 170-173, 182-200
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@sdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.ts`
around lines 87 - 104, The VoiceAgentMicDriver stop/start flow can leave an
in-flight processTurn() from a previous session alive, allowing a stale native
turn to complete after restart and affect the new session. Update
VoiceAgentMicDriver so stop() either awaits or invalidates any outstanding turn
work before returning, and make processTurn()/the turn-completion path check a
session/token or cancellation state that survives start() resetting stopped;
ensure the stale result cannot pass the completion guard and trigger playback
after a new start.
| async return(): Promise<IteratorResult<LLMStreamEventType>> { | ||
| // Await the native cancel before resolving so back-to-back | ||
| // cancel → generate sequences are race-free. Matches Swift | ||
| // cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto(). | ||
| try { await native.llmCancelProto(); } catch { /* noop */ } | ||
| if (inner) { | ||
| try { await inner.return?.(); } catch { /* noop */ } | ||
| } | ||
| finish(); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Guard llmCancelProto() behind active-stream state.
llmCancelProto() is a global native cancel entrypoint, not a request-scoped handle. Calling it unconditionally from return() means cleanup on an unopened or already-finished iterator can abort a different generation that started afterward.
Suggested fix
async return(): Promise<IteratorResult<LLMStreamEventType>> {
- try { await native.llmCancelProto(); } catch { /* noop */ }
+ if (started && !done) {
+ try {
+ await native.llmCancelProto();
+ } catch {
+ /* noop */
+ }
+ }
finish();
return { value: undefined as unknown as LLMStreamEventType, done: true };
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async return(): Promise<IteratorResult<LLMStreamEventType>> { | |
| // Await the native cancel before resolving so back-to-back | |
| // cancel → generate sequences are race-free. Matches Swift | |
| // cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto(). | |
| try { await native.llmCancelProto(); } catch { /* noop */ } | |
| if (inner) { | |
| try { await inner.return?.(); } catch { /* noop */ } | |
| } | |
| finish(); | |
| async return(): Promise<IteratorResult<LLMStreamEventType>> { | |
| // Await the native cancel before resolving so back-to-back | |
| // cancel → generate sequences are race-free. Matches Swift | |
| // cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto(). | |
| if (started && !done) { | |
| try { | |
| await native.llmCancelProto(); | |
| } catch { | |
| /* noop */ | |
| } | |
| } | |
| finish(); | |
| return { value: undefined as unknown as LLMStreamEventType, done: true }; | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere`+TextGeneration.ts
around lines 268 - 273, Guard the native cancel in the iterator cleanup path so
`return()` only calls `native.llmCancelProto()` when this `LLMStream` still owns
an active generation. Use the stream’s active-state tracking in
`RunAnywhere+TextGeneration` (and its `finish()`/cleanup flow) to skip cancel on
unopened, already-finished, or stale iterators, preventing `return()` from
aborting a newer generation started after the iterator was disposed.
| const micDriver = new VoiceAgentMicDriver(); | ||
| void micDriver.start().catch((error) => { | ||
| logger.error( | ||
| `Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}` | ||
| ); | ||
| }); | ||
| try { | ||
| yield* adapter.stream(); | ||
| } finally { | ||
| // Breaking out of the consuming loop (or unsubscribe) tears down mic | ||
| // capture, mirroring Swift's `defer { micTask.cancel() }`. | ||
| await micDriver.stop(); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Await mic startup and use the manual async iterator loop.
Starting the mic driver fire-and-forget can leave the stream open with no audio on permission/start failure, and it can race with finally if the consumer unsubscribes while startup is still pending. Also replace yield* with an explicit iterator.next() loop for Hermes/Nitro async-iterable compatibility.
Proposed fix
// turn fan out to this same handle callback, so collectors see them.
const micDriver = new VoiceAgentMicDriver();
- void micDriver.start().catch((error) => {
- logger.error(
- `Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}`
- );
- });
+ await micDriver.start();
+ const iterator = adapter.stream()[Symbol.asyncIterator]();
try {
- yield* adapter.stream();
+ while (true) {
+ const { value, done } = await iterator.next();
+ if (done) break;
+ yield value;
+ }
} finally {
+ await iterator.return?.();
// Breaking out of the consuming loop (or unsubscribe) tears down mic
// capture, mirroring Swift's `defer { micTask.cancel() }`.
await micDriver.stop();As per coding guidelines, sdk/runanywhere-react-native/**/*.ts should use “manual iterator.next() loops instead of for await...of due to Hermes limitations.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const micDriver = new VoiceAgentMicDriver(); | |
| void micDriver.start().catch((error) => { | |
| logger.error( | |
| `Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}` | |
| ); | |
| }); | |
| try { | |
| yield* adapter.stream(); | |
| } finally { | |
| // Breaking out of the consuming loop (or unsubscribe) tears down mic | |
| // capture, mirroring Swift's `defer { micTask.cancel() }`. | |
| await micDriver.stop(); | |
| const micDriver = new VoiceAgentMicDriver(); | |
| await micDriver.start(); | |
| const iterator = adapter.stream()[Symbol.asyncIterator](); | |
| try { | |
| while (true) { | |
| const { value, done } = await iterator.next(); | |
| if (done) break; | |
| yield value; | |
| } | |
| } finally { | |
| await iterator.return?.(); | |
| // Breaking out of the consuming loop (or unsubscribe) tears down mic | |
| // capture, mirroring Swift's `defer { micTask.cancel() }`. | |
| await micDriver.stop(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.ts
around lines 336 - 347, The mic startup in VoiceAgentMicDriver is
fire-and-forget and the stream still uses adapter.stream() delegation, which can
leave the voice session hanging on startup failure and is not Hermes-compatible.
Update RunAnywhere+VoiceAgent to await micDriver.start() before entering the
stream consumption path, and replace yield* adapter.stream() in the streaming
logic with an explicit manual iterator.next() loop for async-iterable
compatibility. Make sure the finally block still stops the mic driver after the
loop exits.
Source: Coding guidelines
| // Configure audio session for playback (unless the caller owns it) | ||
| if managesAudioSession { | ||
| try configureAudioSession() | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
Snapshot session ownership per playback instance.
The managesAudioSession flag is a mutable property read independently during playback start and cleanup. If this flag changes between these two points, the cleanup logic may incorrectly deactivate a shared session or fail to deactivate a dedicated one, causing race conditions.
Resolve this by capturing the flag's value at the start of playback within the locked State struct and using that snapshot during cleanup to ensure consistent session management.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift`
around lines 141 - 144, Capture the session-ownership decision at playback start
in AudioPlaybackManager by storing the current managesAudioSession value in the
locked State snapshot before calling configureAudioSession(), then use that
stored value during cleanup instead of re-reading the mutable property; update
the start/stop flow in AudioPlaybackManager and its State handling so cleanup
uses the same ownership choice for the whole playback instance.
| let micDriver = VoiceAgentMicDriver(handle: handle) | ||
| let micTask = Task { | ||
| do { | ||
| try await micDriver.run() | ||
| } catch is CancellationError { | ||
| // Expected when the consumer stops the session. | ||
| } catch { | ||
| SDKLogger.voiceAgent.error("Voice-agent mic driver stopped: \(error.localizedDescription)") | ||
| } | ||
| } | ||
|
|
||
| let adapter = VoiceAgentStreamAdapter(handle: handle) | ||
| for await event in adapter.stream() { | ||
| if Task.isCancelled { break } | ||
| continuation.yield(event) | ||
| defer { micTask.cancel() } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
Propagate mic-driver failure to the stream
If micDriver.run() throws (e.g., permission or audio session errors), the current task only logs the error while adapter.stream() waits indefinitely. Consumers will hang because no utterances are generated and the stream is never terminated.
Race the mic task with the stream loop and fail or finish the continuation if the mic driver exits unexpectedly.
Current flow risk
```swift let micTask = Task { try await micDriver.run() } // Swallows errors // Continues to stream loop which hangs if mic never starts for await event in adapter.stream() { ... } ```🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.swift
around lines 225 - 236, The mic-driver task in RunAnywhere+VoiceAgent currently
only logs failures from VoiceAgentMicDriver.run() and lets adapter.stream()
continue waiting, which can hang the session. Update the flow around micTask and
the stream loop so the mic task is raced against streaming, and if it exits
unexpectedly with a non-cancellation error, propagate that failure or finish the
continuation immediately. Use the existing symbols VoiceAgentMicDriver.run(),
micTask, and adapter.stream() to locate the logic and make sure a dead mic
session cannot leave the stream pending.



Description
Brief description of the changes made.
Type of Change
Testing
Platform-Specific Testing (check all that apply)
Swift SDK / iOS Sample:
Kotlin SDK / Android Sample:
Flutter SDK / Flutter Sample:
React Native SDK / React Native Sample:
Playground:
Web SDK / Web Sample:
Labels
Please add the appropriate label(s):
SDKs:
Swift SDK- Changes to Swift SDK (sdk/runanywhere-swift)Kotlin SDK- Changes to Kotlin SDK (sdk/runanywhere-kotlin)Flutter SDK- Changes to Flutter SDK (sdk/runanywhere-flutter)React Native SDK- Changes to React Native SDK (sdk/runanywhere-react-native)Web SDK- Changes to Web SDK (sdk/runanywhere-web)Commons- Changes to shared native code (sdk/runanywhere-commons)Sample Apps:
iOS Sample- Changes to iOS example app (examples/ios)Android Sample- Changes to Android example app (examples/android)Flutter Sample- Changes to Flutter example app (examples/flutter)React Native Sample- Changes to React Native example app (examples/react-native)Web Sample- Changes to Web example app (examples/web)Checklist
Screenshots
Attach relevant UI screenshots for changes (if applicable):
Note
Medium Risk
Changes span real-time audio session lifecycle, voice-agent turn processing, and LLM streaming wiring across native and JS—high user-visible impact but no auth or data-store changes; RN iOS Pod post-install edits add build-pipeline risk on Xcode upgrades.
Overview
Voice agent and audio — Swift and React Native now run a mic driver while
streamVoiceAgent()is active: energy-based utterance segmentation, per-turn submission to the C core, and TTS playback with the mic gated during replies. iOS/RN native layers use a shared.playAndRecordsession so capture is not torn down when TTS plays; playback/capture managers skip reconfiguring or deactivating a session the agent owns.RN SDK —
generateStreamswitches to atomicllmGenerateStreamProto(replacing handle subscription + non-streaming generate), fixing chat streams that never received tokens.VoiceAgentMicDriver,playWav, and Nitro proxy eager resolution support the voice pipeline.VLM — Llama.cpp backend adds LFM2-VL detection and
<|startoftext|>+ ChatML prompting; commons VLM streaming strips special tokens (e.g.<|im_end|>) from streamed/display text.Examples / tooling — iOS sample drops the Solutions YAML demo (view, generated YAML, sync script, verify step) and wires more voice event arms in the sample ViewModel. RN example Podfile fixes Xcode 26 CocoaPods static XCFramework output lists and avoids marking Copy XCFrameworks always-out-of-date; catalog
memoryRequirementvalues align with real download sizes; model banners fall back toframework. React is pinned to 19.2.3; RN workspace Package.resolved removed.Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.
Summary by CodeRabbit
New Features
Bug Fixes
Chores