Skip to content

Fixes ios#514

Merged
shubhammalhotra28 merged 2 commits into
mainfrom
fixes-ios
Jun 25, 2026
Merged

Fixes ios#514
shubhammalhotra28 merged 2 commits into
mainfrom
fixes-ios

Conversation

@shubhammalhotra28

@shubhammalhotra28 shubhammalhotra28 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Description

Brief description of the changes made.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Refactoring

Testing

  • Lint passes locally
  • Added/updated tests for changes

Platform-Specific Testing (check all that apply)

Swift SDK / iOS Sample:

  • Tested on iPhone (Simulator or Device)
  • Tested on iPad / Tablet
  • Tested on Mac (macOS target)

Kotlin SDK / Android Sample:

  • Tested on Android Phone (Emulator or Device)
  • Tested on Android Tablet

Flutter SDK / Flutter Sample:

  • Tested on iOS
  • Tested on Android

React Native SDK / React Native Sample:

  • Tested on iOS
  • Tested on Android

Playground:

  • Tested on target platform
  • Verified no regressions in existing Playground projects
    Web SDK / Web Sample:
  • Tested in Chrome (Desktop)
  • Tested in Firefox
  • Tested in Safari
  • WASM backends load (LlamaCpp + ONNX)
  • OPFS storage persistence verified (survives page refresh)
  • Settings persistence verified (localStorage)

Labels

Please add the appropriate label(s):

SDKs:

  • Swift SDK - Changes to Swift SDK (sdk/runanywhere-swift)
  • Kotlin SDK - Changes to Kotlin SDK (sdk/runanywhere-kotlin)
  • Flutter SDK - Changes to Flutter SDK (sdk/runanywhere-flutter)
  • React Native SDK - Changes to React Native SDK (sdk/runanywhere-react-native)
  • Web SDK - Changes to Web SDK (sdk/runanywhere-web)
  • Commons - Changes to shared native code (sdk/runanywhere-commons)

Sample Apps:

  • iOS Sample - Changes to iOS example app (examples/ios)
  • Android Sample - Changes to Android example app (examples/android)
  • Flutter Sample - Changes to Flutter example app (examples/flutter)
  • React Native Sample - Changes to React Native example app (examples/react-native)
  • Web Sample - Changes to Web example app (examples/web)

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Documentation updated (if needed)

Screenshots

Attach relevant UI screenshots for changes (if applicable):

  • Mobile (Phone)
  • Tablet / iPad
  • Desktop / Mac

Note

Medium Risk
Changes span real-time audio session lifecycle, voice-agent turn processing, and LLM streaming wiring across native and JS—high user-visible impact but no auth or data-store changes; RN iOS Pod post-install edits add build-pipeline risk on Xcode upgrades.

Overview
Voice agent and audio — Swift and React Native now run a mic driver while streamVoiceAgent() is active: energy-based utterance segmentation, per-turn submission to the C core, and TTS playback with the mic gated during replies. iOS/RN native layers use a shared .playAndRecord session so capture is not torn down when TTS plays; playback/capture managers skip reconfiguring or deactivating a session the agent owns.

RN SDKgenerateStream switches to atomic llmGenerateStreamProto (replacing handle subscription + non-streaming generate), fixing chat streams that never received tokens. VoiceAgentMicDriver, playWav, and Nitro proxy eager resolution support the voice pipeline.

VLM — Llama.cpp backend adds LFM2-VL detection and <|startoftext|> + ChatML prompting; commons VLM streaming strips special tokens (e.g. <|im_end|>) from streamed/display text.

Examples / tooling — iOS sample drops the Solutions YAML demo (view, generated YAML, sync script, verify step) and wires more voice event arms in the sample ViewModel. RN example Podfile fixes Xcode 26 CocoaPods static XCFramework output lists and avoids marking Copy XCFrameworks always-out-of-date; catalog memoryRequirement values align with real download sizes; model banners fall back to framework. React is pinned to 19.2.3; RN workspace Package.resolved removed.

Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.

Summary by CodeRabbit

  • New Features

    • Added support for a new vision-language model type, including improved prompt formatting and model detection.
    • Introduced microphone-driven voice agent streaming with direct playback of generated audio.
    • Added direct WAV playback support and better shared audio-session handling.
  • Bug Fixes

    • Improved voice-session event handling, including clearer error states and more responsive UI updates.
    • Cleaned up streamed tokens so special markers no longer appear in output.
    • Fixed model status banners to show the correct framework more consistently.
  • Chores

    • Updated example and build setup files, including dependency version adjustments.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR adds LFM2-VL support in the llama.cpp VLM path, refines VLM token streaming, rewires voice-agent capture and playback around a new mic driver and turn-event bridge, removes the iOS Solutions demo wiring, and updates React Native build and dependency pins.

Changes

Voice Agent Runtime and Bridge

Layer / File(s) Summary
Bridge and audio sessions
sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+ModalityProtoABI.swift, sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp, sdk/runanywhere-swift/Sources/RunAnywhere/Features/STT/Services/AudioCaptureManager.swift, sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift, sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift, sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift
The voice-turn ABI adds event callbacks, and the shared audio-session helpers switch to configurable ownership and full-duplex reuse.
Mic driver loop
sdk/runanywhere-swift/Sources/RunAnywhere/Features/VoiceAgent/Services/VoiceAgentMicDriver.swift, sdk/runanywhere-react-native/packages/core/src/Features/VoiceSession/AudioPlaybackManager.ts
The new Swift mic driver captures audio, segments utterances, submits turns to the native core, and plays synthesized replies.
Stream lifecycle wiring
sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.ts, sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.swift
The React Native and Swift voice-agent stream extensions start and stop the mic driver alongside the existing stream flow.
Example voice event handling
examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
The iOS example view model now updates session state, transcript state, audio level, and error state from new event arms.

LLM and VLM Runtime

Layer / File(s) Summary
LLM streaming iterator
sdk/runanywhere-react-native/packages/core/src/native/NitroModulesGlobalInit.ts, sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere+TextGeneration.ts
The text-generation extension now pulls LLMStreamEvent data from one native callback and finishes or cancels the iterator through llmCancelProto.
VLM token cleanup
sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp
generated_stream_token_trampoline now strips special tokens before updating streamed text and token telemetry.
LFM2VL prompt and detection
engines/llamacpp/rac_vlm_llamacpp.cpp
LFM2VL gets a dedicated manual prompt template branch and metadata detection from architecture and name fields.
Model catalog memory requirements
examples/react-native/RunAnywhereAI/src/services/ModelCatalogBootstrap.ts
The seeded model catalog updates memoryRequirement values for several LlamaCPP, VLM, and embedding entries.
Model status banner fallback
examples/react-native/RunAnywhereAI/src/screens/ChatScreen.tsx, examples/react-native/RunAnywhereAI/src/screens/STTScreen.tsx, examples/react-native/RunAnywhereAI/src/screens/TTSScreen.tsx
The model status banner now receives a framework fallback from currentModel.framework when preferredFramework is unset.

iOS Solutions Demo Removal

Layer / File(s) Summary
Navigation and docs cleanup
examples/ios/RunAnywhereAI/AGENTS.md, examples/ios/RunAnywhereAI/RunAnywhereAI/App/ContentView.swift
The iOS example docs and utility navigation remove Solutions references and add the iOS Voice Keyboard link.
Solution script removal
examples/ios/RunAnywhereAI/scripts/sync-solutions-yamls.sh, examples/ios/RunAnywhereAI/scripts/verify.sh
The solution YAML sync script is deleted and verification no longer invokes its check step.

Build and Dependency Alignment

Layer / File(s) Summary
CocoaPods post-install fixes
examples/react-native/RunAnywhereAI/ios/Podfile
The Xcode 16 script-phase workaround skips Copy XCFrameworks phases, and the Xcode 26 xcframework rewrite maps static bundles to lib*.a paths.
Dependency pins and lockfiles
package.json, examples/react-native/RunAnywhereAI/package.json, sdk/runanywhere-react-native/package.json, examples/react-native/RunAnywhereAI/ios/RunAnywhereAI.xcworkspace/xcshareddata/swiftpm/Package.resolved
React is pinned to 19.2.3 in the root and example manifests, and the SwiftPM resolved package contents are cleared.

Sequence Diagram(s)

sequenceDiagram
  participant StreamConsumer
  participant GenerateStream as RunAnywhere+TextGeneration.generateStream
  participant NitroModulesGlobalInit
  participant NativeLLM as native.llmGenerateStreamProto

  StreamConsumer->>GenerateStream: next()
  GenerateStream->>NitroModulesGlobalInit: getNitroModulesProxySync()
  GenerateStream->>NativeLLM: llmGenerateStreamProto(callback)
  NativeLLM-->>GenerateStream: LLMStreamEvent
  GenerateStream-->>StreamConsumer: queued event
  StreamConsumer->>GenerateStream: return()
  GenerateStream->>NativeLLM: llmCancelProto()
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • RunanywhereAI/runanywhere-sdks#284 — Both PRs touch sdk/runanywhere-react-native/packages/core/src/Features/VoiceSession/AudioPlaybackManager.ts and related WAV playback flow.

Suggested reviewers

  • sanchitmonga22

Poem

I twitched my nose at tokens bright,
and hopped through streams by moonlit light.
With carrots, code, and a cheerful beep,
the burrow sang in voices deep. 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The template is mostly followed, but the required Description section is still placeholder text and does not explain the actual changes. Replace the placeholder with a concrete summary of the fixes and scope, and add screenshots only if any UI changes are involved.
Title check ❓ Inconclusive The title is too vague to identify the actual change; it doesn't describe the specific iOS fixes in this PR. Use a specific title naming the main iOS/RN/Swift fix, such as the voice-agent and audio-session changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fixes-ios

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Null LLM text crash
    • Restored the null-safe fallback when copying LLM text for turn lifecycle emission.
  • ✅ Fixed: Turn error ends active UI
    • Kept per-turn voice errors as messages without transitioning the active session UI into an error state.

Create PR

Or push these changes by commenting:

@cursor push d245a33033
Preview (d245a33033)
diff --git a/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift b/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
--- a/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
+++ b/examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
@@ -561,8 +561,6 @@
         case let .error(err):
             logger.error("Voice agent error: \(err.message)")
             errorMessage = err.message
-            sessionState = .error(err.message)
-            currentStatus = "Error"
 
         case let .sessionError(err):
             logger.error("Voice session error: \(err.message)")

diff --git a/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp b/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
--- a/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
+++ b/sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
@@ -397,7 +397,7 @@
         }
         {
             const std::string stt_text(stt.text);
-            const std::string llm_text(llm.text);
+            const std::string llm_text(llm.text ? llm.text : "");
             pending_emits.emplace_back([handle, stt_text, llm_text]() {
                 emit_turn_lifecycle(
                     handle, runanywhere::v1::TURN_LIFECYCLE_EVENT_KIND_AGENT_RESPONSE_COMPLETED,

You can send follow-ups to the cloud agent here.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.

{
const std::string stt_text(stt.text);
const std::string llm_text(llm.text ? llm.text : "");
const std::string llm_text(llm.text);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null LLM text crash

High Severity

Constructing std::string directly from llm.text removes the prior null guard. If rac_llm_generate succeeds but leaves llm.text null, this undefined behavior can crash during turn lifecycle emission.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.

logger.error("Voice agent error: \(err.message)")
errorMessage = err.message
sessionState = .error(err.message)
currentStatus = "Error"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn error ends active UI

Medium Severity

Every .error voice event now sets sessionState to .error, so isActive becomes false while streamVoiceAgent() and the SDK mic driver keep running. A single failed turn can show a dead session in the UI even though capture continues.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6bf113c. Configure here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp (1)

1216-1240: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Avoid truncating cleaned tokens to 511 bytes.

The new scratch buffer caps display_token at 511 bytes. Any longer token is silently truncated before it is appended to ctx->text, counted, and emitted, which corrupts the streamed/output text on that path.

Suggested fix
-    char cleaned[512];
-    const char* display_token = vlm_strip_special_tokens(safe_token, cleaned, sizeof(cleaned));
+    std::string cleaned(std::strlen(safe_token) + 1, '\0');
+    const char* display_token =
+        vlm_strip_special_tokens(safe_token, cleaned.data(), cleaned.size());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp` around lines 1216 -
1240, Avoid truncating cleaned tokens in the VLM streaming path: the local
scratch buffer used by vlm_strip_special_tokens in the token handling block can
silently cut display_token to 511 bytes before it is appended to ctx->text,
counted, and published. Update the token-cleaning flow in vlm_module.cpp around
the display_token/publish_event logic to preserve the full token content, using
a dynamically sized or sufficiently sized buffer so dispatch_vlm_stream_event
and the generation event receive the complete token text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/ios/RunAnywhereAI/AGENTS.md`:
- Line 67: Update the navigation table entry for MoreHubView so Voice Keyboard
is clearly marked as iOS-only. In AGENTS.md, adjust the MoreHubView row to
reflect the same platform gating used in the runtime view’s `#if` os(iOS) logic,
so the docs do not imply Voice Keyboard appears on macOS.

In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift`:
- Around line 585-588: The .agentResponseStarted handling in VoiceAgentViewModel
is clearing both assistantResponse and currentTranscript, which removes the
finalized user transcript too early. Update the .agentResponseStarted case to
clear only assistantResponse and leave currentTranscript intact so the .userSaid
transcript remains visible. Use the VoiceAgentViewModel state handling around
.userSaid and .agentResponseStarted to locate the change.

In `@sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp`:
- Line 400: `llm.text` is used in `voice_agent_proto_abi.cpp` without
validation, which can crash before the existing error handling and also
propagate a null pointer into TTS. In the `llm_text` construction and the later
synthesis path, add a guard in the `voice_agent` flow to verify `llm.text` is
non-null and non-empty before creating the `std::string` or calling the
TTS/synthesis helpers, and route invalid responses through the existing failure
path in the same block.

In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift`:
- Around line 143-151: The audio session setup in
HybridAudioCapture.startRecording is incorrectly using audioSession.category as
the only signal for whether to keep full-duplex mode. Add an explicit state flag
in this flow, such as isFullDuplexActive, to track ownership from
activateAudioSession and stopRecording(deactivateSession:), then use that flag
to decide when to apply .record/.measurement. Update the startRecording logic so
plain STT starts reconfigure to measurement mode unless a real voice-agent
session is active, instead of relying on the stale .playAndRecord category.

In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift`:
- Around line 182-188: The shared-session reuse check in HybridAudioPlayback’s
audio-session setup is too loose because it keys off AVAudioSession.category
alone, which can remain .playAndRecord after the voice agent has deactivated.
Update the logic around the session configuration path to gate reuse on explicit
active ownership/state from the voice-session controller instead of the stale
category, so the code only skips setCategory(.playback, mode: .default, options:
[.duckOthers]) and setActive(true) when a voice session is truly active. Keep
the ownsSession flag in sync with that explicit state so cleanup still runs
correctly when the agent is no longer holding the session.

In
`@sdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.ts`:
- Around line 87-104: The VoiceAgentMicDriver stop/start flow can leave an
in-flight processTurn() from a previous session alive, allowing a stale native
turn to complete after restart and affect the new session. Update
VoiceAgentMicDriver so stop() either awaits or invalidates any outstanding turn
work before returning, and make processTurn()/the turn-completion path check a
session/token or cancellation state that survives start() resetting stopped;
ensure the stale result cannot pass the completion guard and trigger playback
after a new start.
- Around line 74-82: `VoiceAgentMicDriver.start` activates the audio session
before calling `capture.startRecording`, but if recording throws, the session
can stay active because `stop()` may do nothing when recording never started.
Wrap the activation/startRecording sequence in a failure path inside
`VoiceAgentMicDriver` and, on any error from `startRecording`, explicitly clean
up by stopping/deactivating the capture session before rethrowing so the session
is not left active.

In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere`+TextGeneration.ts:
- Around line 268-273: Guard the native cancel in the iterator cleanup path so
`return()` only calls `native.llmCancelProto()` when this `LLMStream` still owns
an active generation. Use the stream’s active-state tracking in
`RunAnywhere+TextGeneration` (and its `finish()`/cleanup flow) to skip cancel on
unopened, already-finished, or stale iterators, preventing `return()` from
aborting a newer generation started after the iterator was disposed.

In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.ts:
- Around line 336-347: The mic startup in VoiceAgentMicDriver is fire-and-forget
and the stream still uses adapter.stream() delegation, which can leave the voice
session hanging on startup failure and is not Hermes-compatible. Update
RunAnywhere+VoiceAgent to await micDriver.start() before entering the stream
consumption path, and replace yield* adapter.stream() in the streaming logic
with an explicit manual iterator.next() loop for async-iterable compatibility.
Make sure the finally block still stops the mic driver after the loop exits.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift`:
- Around line 141-144: Capture the session-ownership decision at playback start
in AudioPlaybackManager by storing the current managesAudioSession value in the
locked State snapshot before calling configureAudioSession(), then use that
stored value during cleanup instead of re-reading the mutable property; update
the start/stop flow in AudioPlaybackManager and its State handling so cleanup
uses the same ownership choice for the whole playback instance.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.swift:
- Around line 225-236: The mic-driver task in RunAnywhere+VoiceAgent currently
only logs failures from VoiceAgentMicDriver.run() and lets adapter.stream()
continue waiting, which can hang the session. Update the flow around micTask and
the stream loop so the mic task is raced against streaming, and if it exits
unexpectedly with a non-cancellation error, propagate that failure or finish the
continuation immediately. Use the existing symbols VoiceAgentMicDriver.run(),
micTask, and adapter.stream() to locate the logic and make sure a dead mic
session cannot leave the stream pending.

---

Outside diff comments:
In `@sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp`:
- Around line 1216-1240: Avoid truncating cleaned tokens in the VLM streaming
path: the local scratch buffer used by vlm_strip_special_tokens in the token
handling block can silently cut display_token to 511 bytes before it is appended
to ctx->text, counted, and published. Update the token-cleaning flow in
vlm_module.cpp around the display_token/publish_event logic to preserve the full
token content, using a dynamically sized or sufficiently sized buffer so
dispatch_vlm_stream_event and the generation event receive the complete token
text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fb7d83eb-8113-45c9-8651-22956b724f3d

📥 Commits

Reviewing files that changed from the base of the PR and between c272a1f and 6bf113c.

⛔ Files ignored due to path filters (2)
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Generated/SolutionsYaml.swift is excluded by !**/generated/**
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (30)
  • engines/llamacpp/rac_vlm_llamacpp.cpp
  • examples/ios/RunAnywhereAI/AGENTS.md
  • examples/ios/RunAnywhereAI/RunAnywhereAI/App/ContentView.swift
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Solutions/SolutionsView.swift
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift
  • examples/ios/RunAnywhereAI/scripts/sync-solutions-yamls.sh
  • examples/ios/RunAnywhereAI/scripts/verify.sh
  • examples/react-native/RunAnywhereAI/ios/Podfile
  • examples/react-native/RunAnywhereAI/ios/RunAnywhereAI.xcworkspace/xcshareddata/swiftpm/Package.resolved
  • examples/react-native/RunAnywhereAI/package.json
  • examples/react-native/RunAnywhereAI/src/screens/ChatScreen.tsx
  • examples/react-native/RunAnywhereAI/src/screens/STTScreen.tsx
  • examples/react-native/RunAnywhereAI/src/screens/TTSScreen.tsx
  • examples/react-native/RunAnywhereAI/src/services/ModelCatalogBootstrap.ts
  • package.json
  • sdk/runanywhere-commons/src/features/vlm/vlm_module.cpp
  • sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp
  • sdk/runanywhere-react-native/package.json
  • sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift
  • sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift
  • sdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.ts
  • sdk/runanywhere-react-native/packages/core/src/Features/VoiceSession/AudioPlaybackManager.ts
  • sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere+TextGeneration.ts
  • sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.ts
  • sdk/runanywhere-react-native/packages/core/src/native/NitroModulesGlobalInit.ts
  • sdk/runanywhere-swift/Sources/RunAnywhere/Features/STT/Services/AudioCaptureManager.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Features/VoiceAgent/Services/VoiceAgentMicDriver.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Foundation/Bridge/Extensions/CppBridge+ModalityProtoABI.swift
  • sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere+VoiceAgent.swift
💤 Files with no reviewable changes (5)
  • examples/ios/RunAnywhereAI/scripts/verify.sh
  • examples/ios/RunAnywhereAI/RunAnywhereAI/App/ContentView.swift
  • examples/react-native/RunAnywhereAI/ios/RunAnywhereAI.xcworkspace/xcshareddata/swiftpm/Package.resolved
  • examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Solutions/SolutionsView.swift
  • examples/ios/RunAnywhereAI/scripts/sync-solutions-yamls.sh

| 1 | `VisionHubView` | VLM camera |
| 2 | `VoiceAssistantView` | Full voice agent (STT + LLM + TTS pipeline) |
| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Solutions, Voice Keyboard |
| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Mark Voice Keyboard as iOS-only in the navigation table.

Line 67 currently reads like MoreHubView exposes Voice Keyboard on every platform, but the runtime view gates that entry behind #if os(iOS). The doc is inaccurate for macOS.

✏️ Suggested doc fix
-| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard |
+| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, iOS-only Voice Keyboard |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, Voice Keyboard |
| 3 | `MoreHubView` | RAG, STT, TTS, VAD, Storage, iOS-only Voice Keyboard |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/ios/RunAnywhereAI/AGENTS.md` at line 67, Update the navigation table
entry for MoreHubView so Voice Keyboard is clearly marked as iOS-only. In
AGENTS.md, adjust the MoreHubView row to reflect the same platform gating used
in the runtime view’s `#if` os(iOS) logic, so the docs do not imply Voice Keyboard
appears on macOS.

Comment on lines +585 to +588
case .agentResponseStarted:
assistantResponse = ""
currentTranscript = ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Don’t clear the finalized user transcript when the response starts.

.userSaid sets currentTranscript, but .agentResponseStarted can arrive right after and erase it before the user sees what was transcribed. Clear only the assistant response here.

Proposed fix
         case .agentResponseStarted:
             assistantResponse = ""
-            currentTranscript = ""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case .agentResponseStarted:
assistantResponse = ""
currentTranscript = ""
case .agentResponseStarted:
assistantResponse = ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@examples/ios/RunAnywhereAI/RunAnywhereAI/Features/Voice/VoiceAgentViewModel.swift`
around lines 585 - 588, The .agentResponseStarted handling in
VoiceAgentViewModel is clearing both assistantResponse and currentTranscript,
which removes the finalized user transcript too early. Update the
.agentResponseStarted case to clear only assistantResponse and leave
currentTranscript intact so the .userSaid transcript remains visible. Use the
VoiceAgentViewModel state handling around .userSaid and .agentResponseStarted to
locate the change.

{
const std::string stt_text(stt.text);
const std::string llm_text(llm.text ? llm.text : "");
const std::string llm_text(llm.text);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Validate llm.text before constructing or synthesizing from it.

Line 400 now constructs std::string from llm.text without a null check; if an LLM backend returns success with a null/empty text, this can crash before the error path runs, and Lines 415/417 would also pass the null pointer into TTS.

Proposed fix
         if (rc != RAC_SUCCESS) {
             if (have_lifecycle_llm) {
                 rac::llm::release_lifecycle_llm(&llm_ref);
             }
@@
             error_message = "LLM generation failed";
             goto cleanup_and_return;
         }
+        if (!llm.text || llm.text[0] == '\0') {
+            rac_llm_result_free(&llm);
+            if (have_lifecycle_llm) {
+                rac::llm::release_lifecycle_llm(&llm_ref);
+            }
+            rac_stt_result_free(&stt);
+            if (have_lifecycle_stt) {
+                rac::lifecycle::release_lifecycle_stt(&stt_ref);
+            }
+            pending_emits.emplace_back([handle]() {
+                emit_component_failure(handle, "llm", RAC_ERROR_INVALID_STATE,
+                                       "LLM generation was empty");
+            });
+            error_code = RAC_ERROR_INVALID_STATE;
+            error_message = "LLM generation was empty";
+            rc = error_code;
+            goto cleanup_and_return;
+        }
         {
             const std::string stt_text(stt.text);
             const std::string llm_text(llm.text);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const std::string llm_text(llm.text);
if (!llm.text || llm.text[0] == '\0') {
rac_llm_result_free(&llm);
if (have_lifecycle_llm) {
rac::llm::release_lifecycle_llm(&llm_ref);
}
rac_stt_result_free(&stt);
if (have_lifecycle_stt) {
rac::lifecycle::release_lifecycle_stt(&stt_ref);
}
pending_emits.emplace_back([handle]() {
emit_component_failure(handle, "llm", RAC_ERROR_INVALID_STATE,
"LLM generation was empty");
});
error_code = RAC_ERROR_INVALID_STATE;
error_message = "LLM generation was empty";
rc = error_code;
goto cleanup_and_return;
}
{
const std::string stt_text(stt.text);
const std::string llm_text(llm.text);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/runanywhere-commons/src/features/voice_agent/voice_agent_proto_abi.cpp`
at line 400, `llm.text` is used in `voice_agent_proto_abi.cpp` without
validation, which can crash before the existing error handling and also
propagate a null pointer into TTS. In the `llm_text` construction and the later
synthesis path, add a guard in the `voice_agent` flow to verify `llm.text` is
non-null and non-empty before creating the `std::string` or calling the
TTS/synthesis helpers, and route invalid responses through the existing failure
path in the same block.

Comment on lines +143 to +151
// Preserve a full-duplex session if the voice agent already
// configured one (activateAudioSession → .playAndRecord):
// switching back to .record would silence the TTS reply and
// disable the simultaneous playback the agent needs. STT capture
// (no prior activate) falls through to the unprocessed
// .record/.measurement path that gives Whisper the cleanest signal.
if audioSession.category != .playAndRecord {
try audioSession.setCategory(.record, mode: .measurement)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 15829


🏁 Script executed:

grep -n "SDKLogger\|NSLock\|OSAllocatedUnfairLock" sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 311


🏁 Script executed:

python3 <<'PY'
# According to Apple's AVAudioSession documentation:
# When you call setActive(false), the session is deactivated but the category 
# remains unchanged. It does not automatically revert to a default category.
# Therefore, if a full-duplex session was configured with .playAndRecord,
# and then deactivateSession(true) is called, the session becomes inactive
# but retains .playAndRecord as its category.

# The code in configureAndActivateSession (lines 149-151) checks:
# if audioSession.category != .playAndRecord {
#     try audioSession.setCategory(.record, mode: .measurement)
# }

# If the previous session was .playAndRecord and was deactivated:
# 1. The category remains .playAndRecord
# 2. The next startRecording() will NOT change the category to .record
# 3. This means the session is activated in .playAndRecord mode, not .record mode
# 4. This could affect audio quality for pure STT capture (measurement mode provides
#    cleaner signal for Whisper by disabling echo cancellation/processing)

# The original review comment is CORRECT:
# - Checking only `category` is insufficient because it persists after deactivation
# - The code should explicitly track full-duplex ownership OR always reset to 
#   .record/.measurement when starting a plain STT session
# - Alternatively, use `isSessionActive` or explicit state to track if a voice-agent
#   session is currently active

print("CONFIRMED: The category persists after setActive(false). The review concern is valid.")
print("The code does track state internally (CaptureState in line 42), but does not")
print("expose whether the session is full-duplex vs single-direction.")
print("")
print("Issue: Lines 149-151 rely solely on category check, which can be stale.")
PY

Repository: RunanywhereAI/runanywhere-sdks

Length of output: 467


Don't infer active voice-agent state from category alone.

After stopRecording(deactivateSession: true), the AVAudioSession category persists as .playAndRecord. A subsequent startRecording() will detect this stale category and skip the .record/.measurement configuration, resulting in STT capture using full-duplex session processing instead of the optimized measurement mode.

Introduce an explicit boolean flag (e.g., isFullDuplexActive) to track voice-agent session ownership, or unconditionally apply .record/.measurement on plain recording starts that are not preceded by an active activateAudioSession() call.

                    if audioSession.category != .playAndRecord {
                        try audioSession.setCategory(.record, mode: .measurement)
                    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioCapture.swift`
around lines 143 - 151, The audio session setup in
HybridAudioCapture.startRecording is incorrectly using audioSession.category as
the only signal for whether to keep full-duplex mode. Add an explicit state flag
in this flow, such as isFullDuplexActive, to track ownership from
activateAudioSession and stopRecording(deactivateSession:), then use that flag
to decide when to apply .record/.measurement. Update the startRecording logic so
plain STT starts reconfigure to measurement mode unless a real voice-agent
session is active, instead of relying on the stale .playAndRecord category.

Comment on lines +182 to +188
if session.category == .playAndRecord {
lock.withLock { $0.ownsSession = false }
return
}
try session.setCategory(.playback, mode: .default, options: [.duckOthers])
try session.setActive(true)
lock.withLock { $0.ownsSession = true }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor

Gate shared-session reuse on explicit ownership, not stale category.

AVAudioSession.category == .playAndRecord persists even when the voice agent deactivates its session. If the agent ends a call but leaves the category configured, HybridAudioPlayback incorrectly identifies this "stale" state as an active shared session. This causes it to skip reconfiguring to .playback, retain ownsSession = false, and fail to clean up the session later, leaving the app in a suboptimal .playAndRecord state.

Replace the category-only check with a check for an active voice session controller, or require explicit state signaling to ensure the shared session is currently active before skipping configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@sdk/runanywhere-react-native/packages/core/ios/HybridAudioPlayback.swift`
around lines 182 - 188, The shared-session reuse check in HybridAudioPlayback’s
audio-session setup is too loose because it keys off AVAudioSession.category
alone, which can remain .playAndRecord after the voice agent has deactivated.
Update the logic around the session configuration path to gate reuse on explicit
active ownership/state from the voice-session controller instead of the stale
category, so the code only skips setCategory(.playback, mode: .default, options:
[.duckOthers]) and setActive(true) when a voice session is truly active. Keep
the ownsSession flag in sync with that explicit state so cleanup still runs
correctly when the agent is no longer holding the session.

Comment on lines +87 to +104
async stop(): Promise<void> {
if (this.stopped) return;
this.stopped = true;
try {
this.capture.stopRecording();
} catch {
/* noop */
}
try {
this.playback.stop();
} catch {
/* noop */
}
this.preRoll = [];
this.utterance = [];
this.inSpeech = false;
this.logger.info('Voice-agent mic capture stopped');
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Invalidate or await in-flight turns on stop.

processTurn() is fire-and-forget, so stop() can return while a native turn is still pending. If the driver is started again before that promise resolves, start() sets stopped = false, allowing the old result to pass Line 190 and play a stale reply in the new session.

Proposed fix
   private stopped = false;
   private processing = false;
+  private generation = 0;
+  private inFlightTurn: Promise<void> | undefined;
@@
   async start(): Promise<void> {
@@
     this.stopped = false;
+    this.generation += 1;
@@
   async stop(): Promise<void> {
     if (this.stopped) return;
     this.stopped = true;
+    this.generation += 1;
@@
     this.utterance = [];
     this.inSpeech = false;
+    this.speechMs = 0;
+    this.silenceMs = 0;
+    this.processing = false;
+    await this.inFlightTurn?.catch(() => undefined);
     this.logger.info('Voice-agent mic capture stopped');
   }
@@
       if (speechMs >= MIN_SPEECH_MS) {
@@
         this.processing = true;
-        void this.processTurn(audio).finally(() => {
-          this.processing = false;
-        });
+        const generation = this.generation;
+        this.inFlightTurn = this.processTurn(audio, generation).finally(() => {
+          if (this.generation === generation) {
+            this.processing = false;
+          }
+        });
@@
-  private async processTurn(audio: Uint8Array): Promise<void> {
-    if (this.stopped || audio.byteLength === 0) return;
+  private async processTurn(audio: Uint8Array, generation: number): Promise<void> {
+    if (this.stopped || this.generation !== generation || audio.byteLength === 0) return;
@@
-      if (this.stopped) return;
+      if (this.stopped || this.generation !== generation) return;

Also applies to: 170-173, 182-200

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@sdk/runanywhere-react-native/packages/core/src/Features/VoiceAgent/VoiceAgentMicDriver.ts`
around lines 87 - 104, The VoiceAgentMicDriver stop/start flow can leave an
in-flight processTurn() from a previous session alive, allowing a stale native
turn to complete after restart and affect the new session. Update
VoiceAgentMicDriver so stop() either awaits or invalidates any outstanding turn
work before returning, and make processTurn()/the turn-completion path check a
session/token or cancellation state that survives start() resetting stopped;
ensure the stale result cannot pass the completion guard and trigger playback
after a new start.

Comment on lines 268 to +273
async return(): Promise<IteratorResult<LLMStreamEventType>> {
// Await the native cancel before resolving so back-to-back
// cancel → generate sequences are race-free. Matches Swift
// cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto().
try { await native.llmCancelProto(); } catch { /* noop */ }
if (inner) {
try { await inner.return?.(); } catch { /* noop */ }
}
finish();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Guard llmCancelProto() behind active-stream state.

llmCancelProto() is a global native cancel entrypoint, not a request-scoped handle. Calling it unconditionally from return() means cleanup on an unopened or already-finished iterator can abort a different generation that started afterward.

Suggested fix
         async return(): Promise<IteratorResult<LLMStreamEventType>> {
-          try { await native.llmCancelProto(); } catch { /* noop */ }
+          if (started && !done) {
+            try {
+              await native.llmCancelProto();
+            } catch {
+              /* noop */
+            }
+          }
           finish();
           return { value: undefined as unknown as LLMStreamEventType, done: true };
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async return(): Promise<IteratorResult<LLMStreamEventType>> {
// Await the native cancel before resolving so back-to-back
// cancel → generate sequences are race-free. Matches Swift
// cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto().
try { await native.llmCancelProto(); } catch { /* noop */ }
if (inner) {
try { await inner.return?.(); } catch { /* noop */ }
}
finish();
async return(): Promise<IteratorResult<LLMStreamEventType>> {
// Await the native cancel before resolving so back-to-back
// cancel → generate sequences are race-free. Matches Swift
// cancelGeneration() which awaits CppBridge.LLM.shared.cancelProto().
if (started && !done) {
try {
await native.llmCancelProto();
} catch {
/* noop */
}
}
finish();
return { value: undefined as unknown as LLMStreamEventType, done: true };
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/LLM/RunAnywhere`+TextGeneration.ts
around lines 268 - 273, Guard the native cancel in the iterator cleanup path so
`return()` only calls `native.llmCancelProto()` when this `LLMStream` still owns
an active generation. Use the stream’s active-state tracking in
`RunAnywhere+TextGeneration` (and its `finish()`/cleanup flow) to skip cancel on
unopened, already-finished, or stale iterators, preventing `return()` from
aborting a newer generation started after the iterator was disposed.

Comment on lines +336 to +347
const micDriver = new VoiceAgentMicDriver();
void micDriver.start().catch((error) => {
logger.error(
`Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}`
);
});
try {
yield* adapter.stream();
} finally {
// Breaking out of the consuming loop (or unsubscribe) tears down mic
// capture, mirroring Swift's `defer { micTask.cancel() }`.
await micDriver.stop();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Await mic startup and use the manual async iterator loop.

Starting the mic driver fire-and-forget can leave the stream open with no audio on permission/start failure, and it can race with finally if the consumer unsubscribes while startup is still pending. Also replace yield* with an explicit iterator.next() loop for Hermes/Nitro async-iterable compatibility.

Proposed fix
       // turn fan out to this same handle callback, so collectors see them.
       const micDriver = new VoiceAgentMicDriver();
-      void micDriver.start().catch((error) => {
-        logger.error(
-          `Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}`
-        );
-      });
+      await micDriver.start();
+      const iterator = adapter.stream()[Symbol.asyncIterator]();
       try {
-        yield* adapter.stream();
+        while (true) {
+          const { value, done } = await iterator.next();
+          if (done) break;
+          yield value;
+        }
       } finally {
+        await iterator.return?.();
         // Breaking out of the consuming loop (or unsubscribe) tears down mic
         // capture, mirroring Swift's `defer { micTask.cancel() }`.
         await micDriver.stop();

As per coding guidelines, sdk/runanywhere-react-native/**/*.ts should use “manual iterator.next() loops instead of for await...of due to Hermes limitations.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const micDriver = new VoiceAgentMicDriver();
void micDriver.start().catch((error) => {
logger.error(
`Voice-agent mic driver stopped: ${error instanceof Error ? error.message : String(error)}`
);
});
try {
yield* adapter.stream();
} finally {
// Breaking out of the consuming loop (or unsubscribe) tears down mic
// capture, mirroring Swift's `defer { micTask.cancel() }`.
await micDriver.stop();
const micDriver = new VoiceAgentMicDriver();
await micDriver.start();
const iterator = adapter.stream()[Symbol.asyncIterator]();
try {
while (true) {
const { value, done } = await iterator.next();
if (done) break;
yield value;
}
} finally {
await iterator.return?.();
// Breaking out of the consuming loop (or unsubscribe) tears down mic
// capture, mirroring Swift's `defer { micTask.cancel() }`.
await micDriver.stop();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@sdk/runanywhere-react-native/packages/core/src/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.ts
around lines 336 - 347, The mic startup in VoiceAgentMicDriver is
fire-and-forget and the stream still uses adapter.stream() delegation, which can
leave the voice session hanging on startup failure and is not Hermes-compatible.
Update RunAnywhere+VoiceAgent to await micDriver.start() before entering the
stream consumption path, and replace yield* adapter.stream() in the streaming
logic with an explicit manual iterator.next() loop for async-iterable
compatibility. Make sure the finally block still stops the mic driver after the
loop exits.

Source: Coding guidelines

Comment on lines +141 to +144
// Configure audio session for playback (unless the caller owns it)
if managesAudioSession {
try configureAudioSession()
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

Snapshot session ownership per playback instance.

The managesAudioSession flag is a mutable property read independently during playback start and cleanup. If this flag changes between these two points, the cleanup logic may incorrectly deactivate a shared session or fail to deactivate a dedicated one, causing race conditions.

Resolve this by capturing the flag's value at the start of playback within the locked State struct and using that snapshot during cleanup to ensure consistent session management.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Features/TTS/Services/AudioPlaybackManager.swift`
around lines 141 - 144, Capture the session-ownership decision at playback start
in AudioPlaybackManager by storing the current managesAudioSession value in the
locked State snapshot before calling configureAudioSession(), then use that
stored value during cleanup instead of re-reading the mutable property; update
the start/stop flow in AudioPlaybackManager and its State handling so cleanup
uses the same ownership choice for the whole playback instance.

Comment on lines +225 to +236
let micDriver = VoiceAgentMicDriver(handle: handle)
let micTask = Task {
do {
try await micDriver.run()
} catch is CancellationError {
// Expected when the consumer stops the session.
} catch {
SDKLogger.voiceAgent.error("Voice-agent mic driver stopped: \(error.localizedDescription)")
}
}

let adapter = VoiceAgentStreamAdapter(handle: handle)
for await event in adapter.stream() {
if Task.isCancelled { break }
continuation.yield(event)
defer { micTask.cancel() }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

Propagate mic-driver failure to the stream

If micDriver.run() throws (e.g., permission or audio session errors), the current task only logs the error while adapter.stream() waits indefinitely. Consumers will hang because no utterances are generated and the stream is never terminated.

Race the mic task with the stream loop and fail or finish the continuation if the mic driver exits unexpectedly.

Current flow risk ```swift let micTask = Task { try await micDriver.run() } // Swallows errors // Continues to stream loop which hangs if mic never starts for await event in adapter.stream() { ... } ```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@sdk/runanywhere-swift/Sources/RunAnywhere/Public/Extensions/VoiceAgent/RunAnywhere`+VoiceAgent.swift
around lines 225 - 236, The mic-driver task in RunAnywhere+VoiceAgent currently
only logs failures from VoiceAgentMicDriver.run() and lets adapter.stream()
continue waiting, which can hang the session. Update the flow around micTask and
the stream loop so the mic task is raced against streaming, and if it exits
unexpectedly with a non-cancellation error, propagate that failure or finish the
continuation immediately. Use the existing symbols VoiceAgentMicDriver.run(),
micTask, and adapter.stream() to locate the logic and make sure a dead mic
session cannot leave the stream pending.

@Siddhesh2377 Siddhesh2377 added enhancement New feature or request ios-sample iOS example app ios-sdk iOS / Swift SDK ready-to-merge Approved and ready to merge core C++ commons core (runanywhere-commons) labels Jun 25, 2026
@shubhammalhotra28 shubhammalhotra28 merged commit 5d0e6df into main Jun 25, 2026
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core C++ commons core (runanywhere-commons) enhancement New feature or request ios-sample iOS example app ios-sdk iOS / Swift SDK ready-to-merge Approved and ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants