This document explains how react-native-waveform-player is put together
internally — how a JS prop change ends up moving a bar on the screen, who owns
which piece of state, and why the design looks the way it does.
It's the doc you want to read before fixing a bug or adding a feature.
The library is a Fabric component. JS hands the native side a source URI plus
a bunch of styling / playback props; the native side runs an audio engine and
draws the UI. There is no JS in the playback or rendering hot path — every
~30 Hz progress tick, every drag pixel, every speed change is handled in
Swift / Kotlin.
┌─────────────────────────────────────────────────────────────────┐
│ JS layer (src/) │
│ │
│ AudioWaveformView (forwardRef) │
│ │ │
│ ▼ │
│ AudioWaveformViewNativeComponent.ts ← codegen spec │
│ │ │
└───────┼─────────────────────────────────────────────────────────┘
│ Fabric props / commands / events
▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ iOS (ios/) │ │ Android (android/) │
│ │ │ │
│ AudioWaveformView.mm │ │ AudioWaveformViewManager│
│ (Obj-C++ Fabric shim) │ │ (Fabric ViewManager) │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ AudioWaveformViewImpl │ │ AudioWaveformView │
│ (Swift composite view) │ │ (Kotlin FrameLayout) │
│ │ │ │ │ │
│ ├─ AudioPlayerEngine ├─ AudioPlayerEngine │
│ ├─ WaveformDecoder ├─ WaveformDecoder │
│ ├─ WaveformBarsView ├─ WaveformBarsView │
│ ├─ PlayPauseButton ├─ PlayPauseButton │
│ └─ SpeedPillView └─ SpeedPillView │
└──────────────────────────┘ └──────────────────────────┘
Both platforms ship the same five pieces in roughly the same shape: an audio engine that wraps the platform player, a decoder that turns audio samples into amplitude buckets, a bars view that draws those buckets, a play / pause button, and a speed pill. The composite view glues them together.
| File | Role |
|---|---|
AudioWaveformViewNativeComponent.ts |
Codegen spec — declares props, events, commands. Source of truth for the JS↔native interface. |
AudioWaveformView.native.tsx |
Public component (.native.tsx resolution → loaded on iOS/Android). forwardRef, controlled-prop translation, ref imperative API, JS-side event unwrapping. |
AudioWaveformView.tsx |
Web/Node fallback that throws on render. Exists so importing the package on non-native platforms doesn't break type-checking. |
index.tsx |
Public exports. |
| File | Role |
|---|---|
AudioWaveformView.h / .mm |
Obj-C++ Fabric component view. Bridges C++ codegen types ↔ Swift impl. Owns prepareForRecycle. |
AudioWaveformViewImpl.swift |
The composite Swift view. Owns subviews, engine, decoder, lifecycle observers, gesture state. ~830 lines, the "main" file. |
AudioPlayerEngine.swift |
Wraps AVPlayer. State machine (idle / loading / ready / ended / error), pending-start queue, KVO + time observer + end-of-item listener. |
WaveformDecoder.swift |
URL → amplitudes. Downloads remote URLs first (AVAssetReader can't read them), then decodes with bucket-by-time accumulation. Emits progressive partial arrays. |
WaveformBarsView.swift |
UIView that draws the bars + the partial-fill playhead. Animated bar growth via CADisplayLink. Hosts the scrub gesture recogniser. |
PlayPauseButton.swift |
UIControl with SF Symbols (play.fill / pause.fill) and a UIActivityIndicatorView for the loading state. |
SpeedPillView.swift |
Tap-to-cycle speed pill. |
| File | Role |
|---|---|
AudioWaveformPackage.kt |
RN package entry — registers the view manager. |
AudioWaveformViewManager.kt |
Fabric SimpleViewManager. Implements the codegen interface (prop setters + commands). Wires the view's callback closures up to EventDispatcher. |
AudioWaveformEvent.kt |
Generic Fabric Event subclass for dispatching custom events. |
AudioWaveformView.kt |
Composite FrameLayout. Same role as AudioWaveformViewImpl.swift. ~700 lines. |
AudioPlayerEngine.kt |
Wraps android.media.MediaPlayer. Same state machine + pending-start logic as iOS. |
WaveformDecoder.kt |
URL → amplitudes via MediaExtractor + MediaCodec. MediaExtractor natively handles HTTP, so no separate download step. |
WaveformBarsView.kt |
Custom View mirroring the iOS bars view. ValueAnimator for bar growth. |
PlayPauseButton.kt |
FrameLayout wrapping an ImageView (vector drawable) and a ProgressBar for the loading state. |
SpeedPillView.kt |
Custom TextView with a GradientDrawable rounded background. |
This is the most useful diagram in the doc — keep it open when adding new ones.
TS spec (AudioWaveformViewNativeComponent.ts)
│ declared as `barWidth?: WithDefault<Float, 3.0>`
▼
codegen output (auto, in the build)
├─ iOS: AudioWaveformViewProps with `Float barWidth`
└─ Android: ViewManagerInterface with `setBarWidth(view, value: Float)`
│
▼
iOS: AudioWaveformView.mm updateProps → _impl.barWidth = …
Android: ViewManager.setBarWidth(view, value) → view.barWidthDp = …
│
▼
Composite view (Swift / Kotlin):
stores the value, forwards to barsView, calls setNeedsLayout()
Sources of truth: codegen TS spec + updateProps in the iOS shim + the
set* overrides in the Android view manager. Add a prop = touch all three.
JS: ref.current.play()
▼
AudioWaveformView.native.tsx → Commands.play(nativeRef.current)
▼
codegen routes to platform:
iOS: AudioWaveformView.mm → handleCommand:@"play" → [_impl play]
Android: AudioWaveformViewManager.play(view) → view.play()
▼
Composite view → engine.play()
Events flow the other way:
engine.onTimeUpdate?.(currentMs, durationMs)
▼
Composite view forwards to its `onTimeUpdate` closure
▼
iOS: closure stored on _impl, called by .mm → typedEventEmitter()->onTimeUpdate(...)
Android: closure stored on view, called via UIManagerHelper.getEventDispatcherForReactTag(...).dispatchEvent(...)
▼
JS: NativeSyntheticEvent → unwrapped in AudioWaveformView.native.tsx → public callback
Sources of truth: codegen TS spec (declares the payload shape) +
AudioWaveformView.mm (emitOn* helpers) + AudioWaveformViewManager.kt
(dispatchEvent) + AudioWaveformView.native.tsx (unwraps nativeEvent).
A thin wrapper that turns the platform player (AVPlayer / MediaPlayer)
into the API the rest of the component wants: a state machine plus
callbacks. Nothing in the engine knows about React, view hierarchy, or
gestures.
State machine:
setSource()
idle ─────────────► loading
│
readyToPlay / │ failure
prepared │
▼
ready ◄──┐
│ │
play()│ │seek to 0 + play()
▼ │
(playing) │
│ │
end-of-item │loop=true
▼ │
ended ───┘
Public surface:
- State:
state,durationMs,currentMs,isPlaying,rate. - Config:
loop,setBackgroundPlaybackEnabled(...). - Mutation:
setSource,play,pause,toggle,seek(toMs:),setRate,reset. - Callbacks:
onLoad,onLoadError,onStateChange,onTimeUpdate,onEnded.
Two design decisions worth knowing:
-
pendingStartqueue.play()called whilestate == .loadingdoes not actually start playback — it setspendingStart = trueand returns without firing any callbacks. The state setter then resumes playback atomically the moment it transitions to.ready, so a singleonStateChangenotification reflects the final state (ready+isPlaying = true). This is what eliminates the brief play-icon flash you'd otherwise see when a tap-during-loading "comes through". -
isPlayingis set after the platform call succeeds. On Android,MediaPlayer.start()can throwSecurityException(e.g. wake-lock acquisition) — we don't want to leave the play/pause icon stuck on "pause" with no audio. iOS does the equivalent.
Turns a URL into a normalised array of per-bar RMS amplitudes in [0, 1].
Both platforms emit progressive partial results (see "loading UX" below)
so the bars view can paint as the decode runs.
The two implementations diverge in one important place:
| iOS | Android | |
|---|---|---|
| Local file | AVAssetReader reads PCM samples |
MediaExtractor + MediaCodec decodes to PCM |
| Remote URL | Pre-download via URLSession.downloadTask, then decode the local file |
MediaExtractor natively streams HTTP — no separate download |
The iOS pre-download is forced because AVAssetReader can't read remote
URLs. The library also makes the download wait until engine.onLoad fires
(see Sequencing below).
Bucket-by-time accumulation:
- Each bar represents a fixed time window (
totalDurationUs / barCount). - For every PCM sample we compute its presentation time, find the bar it
falls into, and accumulate
sumSquares[bar] += sample²andsampleCounts[bar] += 1. - At the end (and periodically along the way) we compute
rms = sqrt(sumSquares / sampleCount)per bar and normalise the whole array to[0, 1]against the loudest bar.
This is more accurate than the sample-count-bucketing approach (which biases short clips when the bar count doesn't divide evenly into total samples).
Cancellation is token-based (a UUID/counter): the next decode() call
just bumps the token, so any in-flight closures from the old decode become
no-ops on completion.
Draws the bars and the partial-fill playhead. Owns the scrub gesture too.
Key state:
amplitudes: the latest target amplitude array
displayedAmps: what's currently being drawn (interpolated)
startAmps ┐ animation source/target — fed into a
targetAmps ┘ CADisplayLink (iOS) / ValueAnimator (Android)
progressFraction: 0..1, set by the composite view from engine.currentMs / durationMs
Two-pass render with clipping, so the bar straddling the playhead is filled "played" up to the exact pixel and "unplayed" after that:
- Set fill =
unplayedBarColor, draw the full bar path. - Clip to
(0, 0, progressFraction × width, height), set fill =playedBarColor, draw the same path.
When displayedAmps is empty (no decode result yet) the view paints
uniform placeholder bars at a placeholderAmplitude constant. Once
the first amplitude payload arrives we animate from placeholder → real
amplitudes over ~200ms with an ease-out curve.
Gesture handling:
- iOS: a
UILongPressGestureRecognizerwithminimumPressDuration = 0. Subclassed only becausetouchesBegan/Moved/Endedget cancelled the moment a parentUIScrollViewdecides the user is doing a horizontal scroll. The gesture recogniser claims the touch first. - Android: a
MotionEvent-based handler that callsparent.requestDisallowInterceptTouchEvent(true)onACTION_DOWN.
Both forward onScrubBegan / onScrubMoved / onScrubEnded(cancelled)
closures back to the composite view, which performs the actual seek and
restores the play state.
A UIControl (iOS) / FrameLayout (Android) with three observable
properties:
isPlaying: Bool— chooses the icon (play.fillvspause.fillon iOS,R.drawable.play_fillvspause_fillon Android).isLoading: Bool— when true, hide the icon and show a nativeUIActivityIndicatorView/ProgressBar.iconColor— tints both the icon and the spinner.
The composite view sets isPlaying before isLoading on every state
change so that during a loading→ready transition with a queued tap, the
imageView is already pointing at the pause icon under the spinner — when
the spinner clears you go straight to the right icon, no crossfade flash.
Cosmetic only. Renders a rounded background + the speed text (e.g.
"1.5x"). Tap → onTap closure → composite view picks the next speed
from the speeds array and applies it.
The trickiest part of the library is the order things happen during load. Two key constraints drove the current design:
- AVPlayer should get the network undivided during its initial buffer fetch — bandwidth contention with the waveform decoder was leaving the spinner on screen for many seconds even after the waveform itself had appeared.
- The user should be able to tap "play" before the engine is ready without their tap getting silently dropped.
Resulting flow on applySource(url):
applySource(url):
1. clear stale amplitudes → bars view falls back to placeholder bars
2. cancel any in-flight decode
3. engine.setSource(url) → state = .loading, KVO/listeners attached
4. emitPlayerState()
5. (intentionally do NOT start the waveform decoder here)
… user might tap play here …
handlePlayButtonTap → engine.toggle() → engine.play()
sees state = .loading → pendingStart = true, return without callback
… AVPlayer finishes its initial buffer …
item.status → .readyToPlay (iOS) / onPrepared (Android)
▼
engine.state = .ready
│ (state setter checks pendingStart, runs startPlaybackInternal
│ atomically before firing onStateChange)
▼
handleEngineStateChange:
playButton.isPlaying = engine.isPlaying ← already true if pendingStart
playButton.isLoading = false
▼
engine.onLoad fires:
apply initialPositionMs
autoPlay / controlled-state fallback
kick off decoder.decode(url) ← finally
▼
decoder progressively emits amplitudes → bars view animates them in
Side effects of this design you should be aware of when modifying:
applySourcedoes not calldecodeAmplitudesIfPossibleanymore. The decode kick-off lives inengine.onLoad. Don't move it back without reading the comments inapplySource.applyProvidedSamples(thesamplesprop path) bypasses the decoder entirely.- For local files (
file://) the engine becomes ready in a few ms, so the deferral has zero perceptible cost.
The playing?: boolean and speed?: number props translate into native
sentinel ints/floats:
controlledPlaying: -1= uncontrolled (default),0= paused,1= playing.controlledSpeed: -1= uncontrolled, otherwise the actual rate.
Native code branches on the sentinel:
- Uncontrolled — taps mutate internal state and call into the engine.
- Controlled — taps fire
onPlayerStateChangewith the requested new value but don't mutate state. The parent app updates its prop and the change comes back in viaapplyControlledState.
Sentinels live in the codegen spec
(AudioWaveformViewNativeComponent.ts). The conversion happens once in
AudioWaveformView.native.tsx (useMemo). Keep that one place as the
single point where TS optionality becomes a sentinel — don't translate
in the native layer.
- All callbacks (engine + decoder) fire on the main thread. The decoder does its sample-crunching on a background queue and dispatches progress updates back to main.
- The 30 Hz progress tick comes from
AVPlayer.addPeriodicTimeObserveron iOS and aHandler.postDelayedloop on Android. Both run on main. - The waveform repaint that piggy-backs on the progress tick is gated
by
pauseUiUpdatesInBackground— the JSonTimeUpdateevent still fires either way (so callers can drive Now Playing / analytics / Lock Screen).
Both platforms register lifecycle observers in commonInit.
- App foregrounded → backgrounded:
- If
playInBackground == false, pause the engine. - If
playInBackground == true, leave it alone (and on iOS the engine has already activated the playbackAVAudioSession). - Set
isBackgrounded = true.
- If
- App backgrounded → foregrounded:
- Set
isBackgrounded = false. - Snap the bars view + time label + play button icon to
engine.currentMs / durationMs / isPlaying— important because ifpauseUiUpdatesInBackground == truewe skipped tick refreshes while offscreen.
- Set
pauseUiUpdatesInBackground only gates the bars/time-label refresh
inside engine.onTimeUpdate. It never gates onTimeUpdate to JS.
Fabric pools component views and reuses them. Without help, an unmounted
AudioWaveformView would keep its AVPlayer alive inside the pool and
keep playing audio.
Fix: AudioWaveformView.mm overrides prepareForRecycle and calls
AudioWaveformViewImpl.tearDown(), which:
- Sets
sourceURI = ""(runs throughapplySource→ cancels decoder, resets engine, stops display link). - Resets
internalPlaying,internalSpeed,defaultSpeedApplied,initialPositionApplied, gesture state, amplitudes, bars view, time label, play button, speed pill.
tearDown() is idempotent so it's also safe from deinit. Android's
SimpleViewManager doesn't pool, so the equivalent isn't needed —
onDetachedFromWindow already runs the engine reset.
| State | Owner | Notes |
|---|---|---|
| Audio (current/duration ms, isPlaying, rate) | AudioPlayerEngine |
Single source of truth. Every other piece reads from here. |
| Decoded amplitudes | Composite view (amplitudes) |
Mirrored into barsView.amplitudes for rendering. |
| Internal play / speed (uncontrolled) | Composite view (internalPlaying, internalSpeed) |
Only consulted when controlledPlaying / controlledSpeed == -1. |
pendingStart ("queued tap during loading") |
AudioPlayerEngine |
Cleared on pause / reset / source change. |
Scrub state (isScrubbing, resumeAfterScrub, pendingScrubMs) |
Composite view | Set/cleared in the scrub callbacks. |
isBackgrounded |
Composite view | Set by lifecycle observers. |
Loading icon state (playButton.isLoading) |
PlayPauseButton, driven by composite view |
isLoading = (engine.state == .loading). |
- Add it to
src/AudioWaveformViewNativeComponent.ts(useWithDefault<...>if it has a default). - Surface a public type on
AudioWaveformViewPropsinsrc/AudioWaveformView.native.tsx(and the non-native stub for type parity). - iOS: add a stored property on
AudioWaveformViewImpl(withdidSetif it triggers UI work), then wire it inAudioWaveformView.mm'supdateProps. - Android: add a property on
AudioWaveformView.kt(with the appropriate setter behaviour), then addoverride fun set<Name>(...)onAudioWaveformViewManager.kt. - Update the props table in
README.md.
- Add the payload type +
DirectEventHandler<...>in the codegen spec. - Add the public TS callback type in
AudioWaveformView.native.tsxand unwrapnativeEventinAudioWaveformViewInner. - iOS: add an
@objccallback property onAudioWaveformViewImpland anemitOn<Name>helper inAudioWaveformView.mmthat callstypedEventEmitter()->onWhatever(...). - Android: add a callback closure on
AudioWaveformView.ktand dispatch viaUIManagerHelper.getEventDispatcherForReactTag(...)inAudioWaveformViewManager.wireEvents. - Document it in
README.md(events table).
- Add it to
NativeCommands+ thesupportedCommandslist in the codegen spec. - Add a method on
AudioWaveformViewRefand wire it in theuseImperativeHandlehook. - iOS: add a
public funconAudioWaveformViewImpland dispatch fromhandleCommand:in the.mm. - Android: add a
funonAudioWaveformView.ktand anoverrideonAudioWaveformViewManager. - Document it on the
AudioWaveformViewReftype inREADME.md.
- Recording (we play + visualise; we do not record).
- Live / streaming waveforms — we visualise a fixed audio file.
- Hooking into
react-native-gesture-handler/ Reanimated — gestures are handled natively for zero JS overhead and we don't want a runtime dependency on those libraries.