feat(audio): add AudioProcessingOptions and adopt webrtc-sdk state v2 contract

hiroshihorie · hiroshihorie · commit 70d0e9d5f9f5 · 2026-06-22T19:30:43.000+09:00
Introduce AudioProcessingOptions and wire it through AudioCaptureOptions,
AudioManager, and LocalAudioTrack.

The audio processing state read-back moved from PeerConnection to the
factory upstream, so the per-publisher source registry is gone:
AudioManager reads the factory-owned engine-wide state directly. State
types follow the v2 vocabulary (requested -&gt; resolved -&gt; active -&gt;
effective) with collapsed booleans, and the device-level BuiltIn* types
are renamed Platform*. ADM voice-processing calls target the renamed
isPlatformVoiceProcessingAllowed accessors. Docs/audio.md updated to match.
diff --git a/.changes/audio-processing-options b/.changes/audio-processing-options
@@ -0,0 +1 @@
+patch type="added" "Add AudioProcessingOptions"
diff --git a/Docs/audio.md b/Docs/audio.md
@@ -46,25 +46,25 @@ AudioManager.shared.audioSession.isAutomaticDeactivationEnabled = false
 
 When set to `false`, the audio session remains active after the LiveKit call ends, preserving your app's audio state.
 
-## Disabling Voice Processing
+## Disallowing Platform Voice Processing
 
-Apple's voice processing is enabled by default, such as echo cancellation and auto-gain control.
+Apple's platform voice processing is allowed by default, such as echo cancellation and auto-gain control.
 
-If your app doesn't require voice processing at all, you can disable it entirely:
+If your app must not use Apple Voice Processing I/O, disable voice processing:
 
 ```swift
 try AudioManager.shared.setVoiceProcessingEnabled(false)
 ```
 
-This restarts the internal `AVAudioEngine` to apply the change. It can cause a short audio glitch, so it is recommended to set it once before connecting to a Room. Disabling voice processing also disables muted speaker detection.
+This restarts the internal `AVAudioEngine` when an Apple VPIO path is active. It is recommended to set it once before connecting to a Room. Runtime `AudioProcessingOptions` with `automatic` mode will fall back to WebRTC software processing while platform voice processing is disallowed.
 
-If your app requires toggling voice processing at run-time, it is recommended to use:
+For per-track or per-capture software processing, use `AudioProcessingOptions` with `.software` modes. The lower-level bypass API remains available when you need to directly control Apple VPIO:
 
 ```swift
 AudioManager.shared.isVoiceProcessingBypassed = true
 ```
 
-Set it back to `false` to re-enable processing. This uses `AVAudioEngine`'s [isVoiceProcessingBypassed](https://developer.apple.com/documentation/avfaudio/avaudioinputnode/isvoiceprocessingbypassed) and works seamlessly at run-time.
+Set it back to `false` to re-enable the Apple path. This uses `AVAudioEngine`'s [isVoiceProcessingBypassed](https://developer.apple.com/documentation/avfaudio/avaudioinputnode/isvoiceprocessingbypassed). Runtime `AudioProcessingOptions` can overwrite this Apple-specific state when capture starts or options are reapplied.
 
 ## Other audio ducking
 
diff --git a/Sources/LiveKit/Audio/Manager/AudioManager.swift b/Sources/LiveKit/Audio/Manager/AudioManager.swift
@@ -320,10 +320,15 @@ public class AudioManager: Loggable {
         set { RTC.audioDeviceModule.duckingLevel = newValue.toRTCType() }
     }
 
-    /// The main flag that determines whether to enable Voice-Processing I/O of the internal AVAudioEngine. Toggling this requires restarting the AudioEngine.
-    /// Setting this to `false` prevents any voice-processing-related initialization, and muted talker detection will not work.
-    /// Typically, it is recommended to keep this set to `true` and toggle ``isVoiceProcessingBypassed`` when possible.
-    /// Defaults to `true`.
+    /// Whether Apple's platform voice processing is allowed.
+    ///
+    /// Defaults to `true`. When set to `false`, runtime ``AudioProcessingOptions``
+    /// treat Apple Voice Processing I/O as unavailable. `automatic` mode falls
+    /// back to WebRTC software processing and `platform` mode is rejected.
+    ///
+    /// Use ``AudioProcessingOptions`` with `.software` modes for per-track or
+    /// per-capture software voice processing. Use this policy when the app must
+    /// guarantee Apple Voice Processing I/O is not used.
     public var isVoiceProcessingEnabled: Bool { RTC.audioDeviceModule.isPlatformVoiceProcessingAllowed }
 
     public func setVoiceProcessingEnabled(_ enabled: Bool) throws {
@@ -333,6 +338,8 @@ public class AudioManager: Loggable {
 
     /// Bypass Voice-Processing I/O of internal AVAudioEngine.
     /// It is valid to toggle this at runtime and AudioEngine doesn't require restart.
+    /// Runtime ``AudioProcessingOptions`` may overwrite this Apple-specific state
+    /// when capture starts or when local audio track options are reapplied.
     /// Defaults to `false`.
     public var isVoiceProcessingBypassed: Bool {
         get {
@@ -359,6 +366,21 @@ public class AudioManager: Loggable {
         set { RTC.audioDeviceModule.isVoiceProcessingAGCEnabled = newValue }
     }
 
+    /// Device-level platform voice-processing capability and requested/active state.
+    public var platformAudioProcessingState: PlatformAudioProcessingState {
+        RTC.audioDeviceModule.platformAudioProcessingState.toLKType()
+    }
+
+    /// Diagnostic snapshot of the resolved audio processing state.
+    ///
+    /// The audio processing module is owned by the peer connection factory and
+    /// shared engine-wide, so this reflects what is actually applied across the
+    /// engine rather than any single track or connection — use it to verify what
+    /// a ``LocalAudioTrack/setAudioProcessingOptions(_:)`` request resolved to.
+    public var audioProcessingState: AudioProcessingState {
+        RTC.audioProcessingState().toLKType()
+    }
+
     /// Enables manual rendering (no-device) mode of AVAudioEngine.
     /// In this mode, you can provide audio buffers by calling `AudioManager.shared.mixer.capture(appAudio:)` continuously.
     /// Remote audio will not play out automatically. Get remote mixed audio buffers with `AudioManager.shared.add(localAudioRenderer:)` or individual tracks with ``RemoteAudioTrack/add(audioRenderer:)``.
@@ -383,22 +405,31 @@ public class AudioManager: Loggable {
     /// which keeps recording initialized and pre-warms voice processing.
     ///
     /// - Parameter enabled: Pass `true` to enable always-prepared recording, or `false` to disable it.
+    /// - Parameter audioProcessingOptions: Optional voice-processing options used when prewarming mic input.
     /// - Note: If `audioSession.isAutomaticConfigurationEnabled` is `true`, the session category is configured to `.playAndRecord`.
     /// - Note: Microphone permission is required. iOS may prompt if not already granted.
     /// - Note: This persists across ``Room`` lifecycles and connections until disabled.
     /// - Throws: An error if the underlying audio device module fails to apply the setting.
-    public func setRecordingAlwaysPreparedMode(_ enabled: Bool) async throws {
-        let result = RTC.audioDeviceModule.setRecordingAlwaysPreparedMode(enabled)
+    public func setRecordingAlwaysPreparedMode(
+        _ enabled: Bool,
+        audioProcessingOptions: AudioProcessingOptions? = nil,
+    ) async throws {
+        let result = RTC.audioDeviceModule.setRecordingAlwaysPreparedMode(
+            enabled,
+            audioProcessingOptions: audioProcessingOptions?.toRTCType(),
+        )
         try checkAdmResult(code: result)
     }
 
     /// Starts mic input to the SDK even without any ``Room`` or a connection.
     /// Audio buffers will flow into ``LocalAudioTrack/add(audioRenderer:)`` and ``capturePostProcessingDelegate``.
-    public func startLocalRecording() throws {
+    public func startLocalRecording(audioProcessingOptions: AudioProcessingOptions? = nil) throws {
         // Always unmute APM if muted by last session.
         RTC.audioProcessingModule.isMuted = false // TODO: Possibly not required anymore with new libs
         // Start recording on the ADM.
-        let result = RTC.audioDeviceModule.initAndStartRecording()
+        let result = RTC.audioDeviceModule.initAndStartRecording(
+            audioProcessingOptions: audioProcessingOptions?.toRTCType(),
+        )
         try checkAdmResult(code: result)
     }
 
diff --git a/Sources/LiveKit/Core/RTC.swift b/Sources/LiveKit/Core/RTC.swift
@@ -83,6 +83,10 @@ actor RTC {
                                                                                 delegate: nil) }
     }
 
+    static func audioProcessingState() -> LKRTCAudioProcessingState {
+        DispatchQueue.liveKitWebRTC.sync { peerConnectionFactory.audioProcessingState }
+    }
+
     static func createVideoSource(forScreenShare: Bool) -> LKRTCVideoSource {
         DispatchQueue.liveKitWebRTC.sync { peerConnectionFactory.videoSource(forScreenCast: forScreenShare) }
     }
diff --git a/Sources/LiveKit/Track/Local/LocalAudioTrack.swift b/Sources/LiveKit/Track/Local/LocalAudioTrack.swift
@@ -66,6 +66,10 @@ public class LocalAudioTrack: Track, LocalTrackProtocol, AudioTrackProtocol, @un
             "googNoiseSuppression": options.noiseSuppression.toString(),
             "googTypingNoiseDetection": options.typingNoiseDetection.toString(),
             "googHighpassFilter": options.highpassFilter.toString(),
+            "echoCancellationMode": options.echoCancellationMode.toConstraintValue(),
+            "autoGainControlMode": options.autoGainControlMode.toConstraintValue(),
+            "noiseSuppressionMode": options.noiseSuppressionMode.toConstraintValue(),
+            "highPassFilterMode": options.highPassFilterMode.toConstraintValue(),
         ]
 
         let audioConstraints = DispatchQueue.liveKitWebRTC.sync { LKRTCMediaConstraints(mandatoryConstraints: nil,
@@ -90,12 +94,32 @@ public class LocalAudioTrack: Track, LocalTrackProtocol, AudioTrackProtocol, @un
         try await super._unmute()
     }
 
+    /// Updates this local track's voice processing options without restarting capture.
+    ///
+    /// If this track is already published, WebRTC reapplies the updated options through
+    /// the active sender. Effective APM configuration is shared by the WebRTC voice engine,
+    /// so conflicting updates from multiple local audio tracks are last-writer-wins.
+    @discardableResult
+    public func setAudioProcessingOptions(_ options: AudioProcessingOptions) throws -> AudioProcessingOptionsResult {
+        guard let audioTrack = mediaTrack as? LKRTCAudioTrack else {
+            throw LiveKitError(.invalidState, message: "Media track is not an audio track")
+        }
+        let result = audioTrack.setAudioProcessingOptions(options.toRTCType()).toLKType()
+        guard result.isSuccess else {
+            let reason = result.message.isEmpty ? "\(result.code)" : "\(result.code): \(result.message)"
+            throw LiveKitError(.webRTC, message: "Failed to set audio processing options: \(reason)")
+        }
+        return result
+    }
+
     // MARK: - Internal
 
     override func startCapture() async throws {
         // AudioDeviceModule's InitRecording() and StartRecording() automatically get called by WebRTC, but
         // explicitly init & start it early to detect audio engine failures (mic not accessible for some reason, etc.).
-        try AudioManager.shared.startLocalRecording()
+        try AudioManager.shared.startLocalRecording(
+            audioProcessingOptions: captureOptions.audioProcessingOptions,
+        )
     }
 
     override func stopCapture() async throws {
diff --git a/Sources/LiveKit/Types/AudioProcessingOptions.swift b/Sources/LiveKit/Types/AudioProcessingOptions.swift
diff --git a/Sources/LiveKit/Types/Options/AudioCaptureOptions.swift b/Sources/LiveKit/Types/Options/AudioCaptureOptions.swift

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+patch type="added" "Add AudioProcessingOptions"`
Original file line number	Diff line number	Diff line change
`@@ -83,6 +83,10 @@ actor RTC {`
`83`	`83`	`delegate: nil) }`
`84`	`84`	`}`
`85`	`85`
	`86`	`+ static func audioProcessingState() -> LKRTCAudioProcessingState {`
	`87`	`+ DispatchQueue.liveKitWebRTC.sync { peerConnectionFactory.audioProcessingState }`
	`88`	`+ }`
	`89`	`+`
`86`	`90`	`static func createVideoSource(forScreenShare: Bool) -> LKRTCVideoSource {`
`87`	`91`	`DispatchQueue.liveKitWebRTC.sync { peerConnectionFactory.videoSource(forScreenCast: forScreenShare) }`
`88`	`92`	`}`