Skip to content

Crash on iCloud sync: swift_task_dealloc LIFO violation in iCloudSynchronizer.enqueue #226

@NickAtGit

Description

@NickAtGit

Summary

Apps using Defaults with one or more keys declared iCloud: true crash on launch or on foreground/background wake-up with EXC_CRASH (SIGABRT). The abort comes from the Swift Concurrency runtime's task-allocator LIFO check, fired from inside the closure that iCloudSynchronizer.enqueue yields to TaskQueue. The crash is reproducible across iOS 16, 17, and 18, multiple devices, and at least two distinct app versions, so it doesn't look OS- or app-specific.

I have 23 byte-identical crash reports with the same + 392 offset in closure #1 in iCloudSynchronizer.enqueue(_:) and the same Utilities.swift:335 line in the TaskQueue consumer.

Environment

  • Defaults version: 9.0.8 (current). The affected code paths are unchanged since the 9.0 iCloud-sync work landed, so likely affects all 9.x.
  • Reproduces on:
    • iOS 16.7.15 (iPhone X) — 10 reports
    • iOS 17.4.1 (iPhone 11) — 6 reports
    • iOS 17.7.2 (iPhone 11) — 3 reports
    • iOS 17.5.1 (iPhone 13 mini) — 3 reports
    • iOS 18.4 (iPhone 13) — 1 report (older app build, Role: Background)
  • Both Role: Foreground (cold launch) and Role: Background (background wake-up).
  • App declares 31 keys with iCloud: true. iCloud KV-store entitlement is enabled and the affected users have non-default payloads in their remote KV-store (i.e. syncFromRemote takes the inner Task { @MainActor in … } branch, not the empty-guard fast path).

Crash signature

Exception Type:  EXC_CRASH (SIGABRT)
Triggered by Thread: <varies — 2/3/4/6, depending on which cooperative pool thread the consumer is parked on>

0  libsystem_kernel.dylib   __pthread_kill
1  libsystem_pthread.dylib  pthread_kill
2  libsystem_c.dylib        __abort
3  libsystem_c.dylib        abort
4  libswift_Concurrency     swift_Concurrency_fatalErrorv         (Error.cpp:25)
5  libswift_Concurrency     swift_Concurrency_fatalError          (Error.cpp:35)
6  libswift_Concurrency     swift_task_dealloc + 124              (TaskAlloc.cpp:59)
7  <App>                    closure #1 in iCloudSynchronizer.enqueue(_:) + 392
8  <App>                    <deduplicated_symbol>                  (async continuation trampoline)
9  <App>                    closure #1 in TaskQueue.init(priority:) + 1 (Utilities.swift:335)
10 <App>                    <deduplicated_symbol>
11 <App>                    specialized thunk for @escaping @isolated(any) @callee_guaranteed @async () -> (@out A)
13 libswift_Concurrency     completeTaskWithClosure               (Task.cpp:496)

swift_task_dealloc + 124 is the runtime's LIFO bump-allocator check. The fatal-error message that prints to the system log immediately before the abort is the standard Concurrency one (along the lines of "freed pointer was not the last allocation"); it doesn't appear in the crash report itself but should be reproducible in Console.

Steps to reproduce

This is what's strongly indicated by the corpus; I haven't yet boiled it down to a standalone minimal repro target.

  1. Declare several iCloud: true keys (the affected app has 31, but it likely reproduces with fewer).
  2. Sign in to iCloud on a device/simulator with the app's iCloud KV-store entitlement active.
  3. Populate NSUbiquitousKeyValueStore.default with non-default values for those keys, so iCloudSynchronizer.syncFromRemote(forKey:) enters the Task { @MainActor in … } arm (Sources/Defaults/Defaults+iCloud.swift, lines 401–406).
  4. Cold-launch the app, or trigger any path that fans out many enqueue calls in quick succession (initial registration, NSUbiquitousKeyValueStore.didChangeExternallyNotification, UIScene.willEnterForeground).
  5. Crash in iCloudSynchronizer.enqueue's closure when the runtime tries to pop the @TaskLocal var timestamp binding.

Suspected cause

The dangerous composition is the way enqueue wraps syncFromRemote:

// Defaults+iCloud.swift:358
private func enqueue(_ task: @escaping TaskQueue.AsyncTask) {
    backgroundQueue.async {
        await Self.$timestamp.withValue(Date()) {  // pushes TaskLocal binding on parent allocator
            await task()                            // syncKey → syncFromRemote
        }                                           // pops binding ← crash here
    }
}

// Defaults+iCloud.swift:388
private func syncFromRemote(forKey key: Defaults.Keys) async {
    _remoteSyncingKeys.modify { $0.insert(key) }

    await withCheckedContinuation { continuation in
        guard  else { continuation.resume(); return }
        Task { @MainActor in                        // unstructured, cross-actor
            
            continuation.resume()                   // parent resumes from a different actor
        }
    }

    _remoteSyncingKeys.modify { $0.remove(key) }
}

When withCheckedContinuation's parent task is resumed by an unstructured Task { @MainActor in … } (cross-actor resume) inside an enclosing @TaskLocal.withValue binding running on a Task.detached consumer of an AsyncStream, the task allocator's pairing of pushes/pops can end up non-LIFO on unwind. Specifically, the continuation context allocated on the parent task's bump allocator can outlive the TaskLocal binding pushed above it by one frame, and when withValue then pops its binding, the runtime sees the top-of-stack isn't the binding and aborts.

Each individual API used here is legal Swift Concurrency; the failure is in the composition. Since this is happening on iOS 16, 17, and 18 in the same way, it isn't a runtime regression on a single OS — it's the pattern itself that the runtime fails to pair safely.

What might fix it on the Defaults side

This is just a suggestion — not tested. Replacing the withCheckedContinuation { … Task { @MainActor in continuation.resume() } } pattern with a direct await MainActor.run { … } keeps the semantics (block until the main-actor write completes) but removes the cross-actor continuation resume that's tripping the allocator:

private func syncFromRemote(forKey key: Defaults.Keys) async {
    _remoteSyncingKeys.modify { $0.insert(key) }
    defer { _remoteSyncingKeys.modify { $0.remove(key) } }

    guard
        let object = remoteStorage.object(forKey: key.name) as? [Any],
        let date = Self.timestamp,
        let value = object[safe: 1]
    else {
        return
    }

    await MainActor.run {
        Self.logKeySyncStatus(key, source: .remote, syncStatus: .syncing, value: value)
        key.suite.set(value, forKey: key.name)
        key.suite.set(date, forKey: "\(key.name)\(defaultsSyncKey)")
    }
}

MainActor.run is available on the package's current minimum platforms (iOS 15+, macOS 12+). I haven't verified this clears the crash in practice — I can do that and open a PR if helpful.

Notes / things I'm not yet sure about

  • I haven't captured the exact pre-abort fatal-error string from Console for any of the reports yet. The stack alone strongly implies the LIFO check, but the string would close that gap.
  • I haven't atos-resolved the + 392 offset against a local Release build to confirm it lands on the withValue unwind specifically. Happy to do that if it helps.
  • TaskQueue.init(priority:) in Utilities.swift also spawns an orphaned Task.detached that isn't stored anywhere — separate concern, but worth flagging while looking at this area.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions