fix: make Rive data binding thread-safe on iOS & Android (#297)#298
Draft
mfazekas wants to merge 18 commits into
Draft
fix: make Rive data binding thread-safe on iOS & Android (#297)#298mfazekas wants to merge 18 commits into
mfazekas wants to merge 18 commits into
Conversation
The Rive iOS runtime (RiveViewModel/RiveView, the Obj-C RiveDataBindingViewModelInstance caches, and the C++ ViewModelInstanceRuntime maps) is not thread-safe and must be driven from the main thread, where the state machine advances and property listeners fire. The Nitro layer accessed it from three threads with no synchronization: - main render thread (state machine advance / afterUpdate reconfigure) - JS thread (synchronous property accessors) - Swift cooperative pool (every `Promise.async` block) Concurrent access corrupts those caches, producing the crashes in #297 (NSMutableDictionary mutation, unordered_map free-of-unallocated, artboard instancing). Thread Sanitizer flagged ~25 data races at the data-binding boundary. Route all data-binding access through the main thread: - add `MainThread.run` (runs inline on main, hops via `DispatchQueue.main.sync` otherwise) and `Promise.onMain` (does the work on main and returns an already-resolved Promise, so Nitro's `.then` never races a background resolve) - apply to HybridViewModelInstance, HybridViewModel, every HybridViewModel*Property, and the property listener helper After: TSan reports 0 races under the same stress; data binding still renders.
- example/src/__tests__/issue297.databinding.test.ts: on-device harness test that drives `viewModelAsync` (resolved off-main) concurrently with synchronous property access on the same ViewModelInstance. It stays green; built with Thread Sanitizer it reports the data race (unfixed) or is clean (fixed). Turn CONCURRENCY/ROUNDS up to reproduce the hard crash. - rn-harness.config.mjs: HARNESS_TSAN-gated TSAN_OPTIONS via appLaunchOptions, and HARNESS_METRO_PORT override (defaults unchanged). - Issue297ThreadRace.tsx: manual reproducer screen (same stress in-app). - gitignore the .harness/ build cache.
…no-deprecated in repro)
Renders a live RiveView (state machine on the render thread) and churns the same property from the JS thread through useRiveNumber (writes + addListener via remounts). Under a TSan build it flags the listener race on the unfixed runtime, clean on the fixed one.
…ad (#297) setNumberInputValue/getNumberInputValue, setBooleanInputValue/getBooleanInputValue, triggerInput, and setTextRunValue/getTextRunValue touch the artboard/state machine from the JS thread. Funnel them through MainThread.run, same as the data-binding API.
…modification (#297) onChanged() iterates the listener map on a Dispatchers.Default flow collector while addListener/removeListener mutate it from the JS/native thread, throwing ConcurrentModificationException and crashing the app (the Android analogue of the data-binding race fixed on iOS). Use ConcurrentHashMap and iterate over a toList() snapshot so dispatch is safe even if a callback adds/removes a listener mid-iteration.
…under TSan in one step The harness only drives a prebuilt app, so Thread Sanitizer (a compile flag) must be baked into the app rather than passed to the harness. This script builds the example with -enableThreadSanitizer YES, installs it on the booted simulator, and runs the harness against it.
TSAN_OPTIONS is only read by the TSan runtime and ignored on a non-TSan build, so the HARNESS_TSAN gate was pointless. Always set it; running under TSan is now just 'yarn test:harness:ios' once the app is built with -enableThreadSanitizer YES.
Auto-detect the booted simulator in rn-harness.config.mjs (override with DEVICE_MODEL / IOS_VERSION; falls back to the previous fixed default). Now 'yarn ios' + 'yarn test:harness:ios' work without naming a device. Drop the now- redundant device detection from the TSan script.
Booted-sim auto-detection is ambiguous with more than one booted device (it and 'simctl install booted' can disagree). Collect all booted sims and warn which one is used, pointing at DEVICE_MODEL / IOS_VERSION to pin it.
The 3-step TSan flow (build:ios --extra-params, simctl install booted, test:harness:ios) is simple enough and the harness now auto-targets the booted simulator, so the wrapper script just duplicated it (via a different build path).
92cbb5d to
69526cd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the occasional crashes in #297. The Rive data-binding objects aren't thread-safe and must be driven from a single thread, but the Nitro layer accessed them from several threads with no synchronization:
RiveDataBindingViewModelInstance/ C++ViewModelInstanceRuntimecaches are main-thread-only, yet they were touched from the JS thread (sync accessors), the Swift cooperative pool (everyPromise.async), and the main render loop. Concurrent access corrupts those caches → the three crash stacks in the issue.BaseHybridViewModelPropertyImpliterated its listener map on aDispatchers.Defaultflow collector whileaddListener/removeListenermutated it from the JS thread →ConcurrentModificationException(this is what crashed thetest-harness-androidCI job).The fix
iOS — route all data-binding access through the main thread:
MainThread.run— runs inline on main, else hops viaDispatchQueue.main.sync.Promise.onMain— runs the work on the main thread and resolves the promise right there on the calling thread, instead of resolving it later from a background thread. Nitro'sPromisehas an unlocked listener list; resolving it on the spot (before Nitro attaches.then) means no backgroundresolvecan race that.then, which clears the Nitro Promise-layer races too, not just the Rive ones.HybridViewModelInstance,HybridViewModel, everyHybridViewModel*Property, the property listener helper, and the deprecated imperativeRiveViewinput/text-run methods (set/getNumberInputValue,set/getBooleanInputValue,triggerInput,set/getTextRunValue).Android —
BaseHybridViewModelPropertyImplnow uses aConcurrentHashMapand iterates atoList()snapshot inonChanged, so dispatch is safe even if a callback adds/removes a listener mid-iteration.Verification (Thread Sanitizer)
Built the iOS example with
-enableThreadSanitizer YESand ran a concurrent data-binding stress (viewModelAsync+ synchronous property access on the same instance). TSan reports data races on the unfixed build and none after the fix; under heavy load the unfixed build crashes, the fixed one doesn't.On Android, the harness test hits the
ConcurrentModificationExceptioncrash before the fix and passes after.Tests
react-native-harness tests — they always pass, the sanitizer is the real assertion (unfixed runtime → race reported / crash; fixed → clean):
example/src/__tests__/issue297.databinding.test.ts—viewModelAsync(off-main) concurrent with synchronous property access. RaiseCONCURRENCY/ROUNDSto reproduce the hard crash (what took down the Android CI job).example/src/__tests__/issue297.hooks.test.tsx— the same bug via the publicuse*hooks: a liveRiveViewadvances the state machine on the render thread whileuseRiveNumberchurns the property (writes +addListener) on the JS thread.example/src/reproducers/Issue297ThreadRace.tsxis a manual in-app reproducer.Running the harness tests under Thread Sanitizer (iOS)
TSan is a compile flag, and the harness only drives a prebuilt app (it has no build step), so it can't be a harness option — it's the normal build plus one xcodebuild flag.
yarn iosbuilds and installs the app, then the harness drives it:--no-packagerstopsrun-iosfrom starting Metro (the harness starts its own). The harness auto-targets the booted simulator (seern-harness.config.mjs). TSan reports land in/tmp/tsan_harness.<pid>; grep forWARNING: ThreadSanitizer: data race— present on the unfixed runtime, absent after the fix.Prefer an explicit build + install (what CI does)? Note
build:iosneeds--buildFolderto land inios/build:yarn build:ios --buildFolder build --extra-params "-enableThreadSanitizer YES" xcrun simctl install booted ios/build/Build/Products/Debug-iphonesimulator/RiveExample.app yarn test:harness:ios --testPathPattern issue297Or enable Thread Sanitizer in the Xcode scheme (Run → Diagnostics) and
⌘Ronce to build+install.