Skip to content

feat(audio): add AudioProcessingOptions#1048

Open
hiroshihorie wants to merge 3 commits into
mainfrom
hiroshi/runtime-vp
Open

feat(audio): add AudioProcessingOptions#1048
hiroshihorie wants to merge 3 commits into
mainfrom
hiroshi/runtime-vp

Conversation

@hiroshihorie

@hiroshihorie hiroshihorie commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

Adds an explicit, per-effect audio processing API so apps can choose how each effect runs, either Apple's platform Voice Processing I/O on the device or WebRTC's software processing, instead of relying on implicit toggles. This also adopts the webrtc-sdk audio processing state v2 contract for accurate, engine-wide diagnostics.

Motivation

Until now, software voice processing was reached indirectly by disabling platform voice processing, and the exact behavior changed across releases. Some apps deliberately avoid Apple VPIO (for consistent hardware volume, faster call start, screen recording with audio, and no system mute sounds) while still wanting echo cancellation and noise suppression in software. This PR makes that an explicit, stable choice.

New API

AudioProcessingMode

Per-effect selection of the implementation:

  • .automatic (default): prefer platform processing when available, fall back to WebRTC software otherwise.
  • .platform: use platform processing only. Rejected if the platform implementation is unavailable.
  • .software: force WebRTC software processing and disable the matching platform effect when possible.

AudioProcessingOptions

A value type describing all four effects (echo cancellation, noise suppression, auto gain control, high-pass filter) with an enabled flag and a mode each. Includes AudioProcessingOptions.communication and AudioProcessingOptions.noProcessing presets.

AudioCaptureOptions

Gains matching *Mode fields (echoCancellationMode, autoGainControlMode, noiseSuppressionMode, highPassFilterMode), plus interop with AudioProcessingOptions (a convenience init and an audioProcessingOptions accessor). Existing call sites keep working since modes default to .automatic.

Runtime control

LocalAudioTrack.setAudioProcessingOptions(_:) applies options to an already-published track and returns an AudioProcessingOptionsResult (for example .applied, .stored, or a rejection reason).

Diagnostics

AudioManager.audioProcessingState and AudioManager.platformAudioProcessingState expose the v2 state, following the requested -> resolved -> active -> effective vocabulary, read from the factory-owned, engine-wide module rather than a single peer connection.

Renamed API

AudioManager.setVoiceProcessingEnabled(_:) and isVoiceProcessingEnabled are renamed to setPlatformVoiceProcessingAllowed(_:) and isPlatformVoiceProcessingAllowed, matching the underlying ADM accessors and the "allowed" policy meaning. The old names remain as deprecated, renamed forwarders, so existing code keeps compiling with a fix-it.

Usage

Publish the microphone with all effects forced to software:

let options = AudioCaptureOptions(
    echoCancellation: true,
    autoGainControl: true,
    noiseSuppression: true,
    highpassFilter: true,
    echoCancellationMode: .software,
    autoGainControlMode: .software,
    noiseSuppressionMode: .software,
    highPassFilterMode: .software
)
try await room.localParticipant.setMicrophone(enabled: true, captureOptions: options)

Set it as the room default so it applies whenever the mic track is published:

try await room.connect(url: url, token: token,
                       roomOptions: RoomOptions(defaultAudioCaptureOptions: options))

Guarantee Apple VPIO is never used, then rely on software processing:

try AudioManager.shared.setPlatformVoiceProcessingAllowed(false)

Verify what each effect resolved to:

let state = AudioManager.shared.audioProcessingState
print(state.echoCancellation.effective) // Software
print(state.noiseSuppression.effective)  // Software

Documentation

Docs/audio.md gains a new section, "Audio Processing Modes (software, platform, automatic)", covering publish-time, room-default, and runtime configuration, plus how to verify the resolved implementation. The "Disallowing Platform Voice Processing" section is updated to the renamed API.

Example app

A companion example demonstrates the full surface, including a live "Voice Processing" and "Runtime Audio Processing" panel with effective-state diagnostics:

That branch pins its SDK dependency to this branch (hiroshi/runtime-vp).

Notes

  • Requires LiveKitWebRTC 144.7559.10 (already on main).
  • There is no platform high-pass filter, so .platform is rejected for highPassFilterMode. Use .software.
  • Room defaults apply at track creation. An already-published mic track is updated through setAudioProcessingOptions(_:), not by toggling the mic.

Test plan

  • swift build passes
  • Example app builds and runs on macOS against this branch
  • swift test

@github-actions

Copy link
Copy Markdown

⚠️ This PR does not contain any files in the .changes directory.

… contract

Introduce AudioProcessingOptions and wire it through AudioCaptureOptions,
AudioManager, and LocalAudioTrack.

The audio processing state read-back moved from PeerConnection to the
factory upstream, so the per-publisher source registry is gone:
AudioManager reads the factory-owned engine-wide state directly. State
types follow the v2 vocabulary (requested -> resolved -> active ->
effective) with collapsed booleans, and the device-level BuiltIn* types
are renamed Platform*. ADM voice-processing calls target the renamed
isPlatformVoiceProcessingAllowed accessors. Docs/audio.md updated to match.
@hiroshihorie hiroshihorie changed the title feat(audio): add AudioProcessingOptions and adopt webrtc-sdk state v2 contract feat(audio): add AudioProcessingOptions Jun 22, 2026
Rename AudioManager.setVoiceProcessingEnabled(_:)/isVoiceProcessingEnabled
to setPlatformVoiceProcessingAllowed(_:)/isPlatformVoiceProcessingAllowed
to match the underlying ADM accessors and the v2 'allowed' vocabulary.
Keep the old names as deprecated, renamed forwarders. Update tests and
Docs/audio.md to the new API.
Add a Docs/audio.md section covering the per-effect AudioProcessingMode
(software / platform / automatic) and AudioProcessingOptions: how to set
modes at publish time, as a room default, and at runtime via
setAudioProcessingOptions, plus how to verify the resolved implementation
through AudioManager.audioProcessingState.
@hiroshihorie hiroshihorie mentioned this pull request Jun 24, 2026
@hiroshihorie hiroshihorie marked this pull request as ready for review June 24, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant