Panic on backward system-clock adjustment (sleep/wake NTP correction) wedges IDS auth and downstream CloudKit sync

## Summary

`CachedKeys::get_stale_time()` in [`src/ids/identity_manager.rs#L45-L49`](https://github.com/OpenBubbles/rustpush/blob/main/src/ids/identity_manager.rs#L45-L49) panics with `Time went backwards` whenever the system clock is corrected backward (e.g. NTP resync after sleep/wake). Because the function is called from `is_valid()` (L51-L58) and `is_dirty()` (L61-L65), which are invoked on every cached-identity check (L283, L305), the panic kills the IDS auth path and cascades into every downstream subsystem — most visibly **CloudKit sync wedges into a retry loop until the app is force-killed**.

## Reproduction

Confirmed on Windows 11 (ARM64), OpenBubbles 1.18.800.0 (Microsoft Store) with rustpush as bundled.

1. Run OpenBubbles with CloudKit sync enabled (`cloudSyncingEnabled: true`).
2. Put the machine into Modern Standby for several hours/days.
3. Wake the machine. Windows NTP corrects the system clock backward (typically a few seconds; in my case ~6.5s after a 3.5-day sleep).
4. OpenBubbles thread immediately panics with `PanicException(Time went backwards: SystemTimeError(<N>s))`.
5. The CloudKit sync supervisor retries on a ~4.6s interval, producing a steady stream of failures (in my log: exactly **13 errors/min for ~9 minutes straight**) and pegging one CPU core at ~96% indefinitely. The Flutter UI freezes; only force-killing the process recovers.

## Evidence

### Panic from the app log

```
2026-06-02T15:38:18.658213Z [ERROR] [BlueBubblesApp] PanicException(Time went backwards: SystemTimeError(4.9310864s))
2026-06-02T15:38:18.731222Z [ERROR] [BlueBubblesApp] PanicException(Time went backwards: SystemTimeError(4.7560443s))
2026-06-02T15:38:19.006534Z [ERROR] [BlueBubblesApp] PanicException(Time went backwards: SystemTimeError(5.7651135s))
2026-06-02T15:38:19.604706Z [ERROR] [BlueBubblesApp] PanicException(Time went backwards: SystemTimeError(3.8677244s))
2026-06-02T15:38:19.633064Z [ERROR] [BlueBubblesApp] PanicException(Time went backwards: SystemTimeError(3.8665786s))
... (panic stack into SimpleDecoder.decode at flutter_rust_bridge/src/codec/base.dart:35)
```

### Correlated Windows Kernel-General clock-adjust events (same machine, same minute)

```
2026-06-02 08:37:32 (Event 1) — system time set to 2026-06-02T15:37:32Z from 2026-05-29T20:28:02Z (wake from 3.5-day sleep)
2026-06-02 08:38:18 (Event 1) — system time set to 2026-06-02T15:38:18.489Z from 2026-06-02T15:38:25.010Z   ← clock jumped 6.5s BACKWARD
2026-06-02 08:37:38 (Power 507) — exiting Modern Standby
```

The 6.5s backward NTP correction at 08:38:18 PDT (= 15:38:18 UTC) is exactly the trigger for the panic burst that started <200 ms later. The 3-5s panic magnitudes match the NTP correction window. After the initial burst, the retry loop maintained a **steady 13 errors/min for 9+ minutes** before I killed the process.

### Steady cadence proves a wedged retry loop, not just a one-shot panic

```
Errors per minute on 2026-06-02:
  17:15  13
  17:14  13
  17:13  13
  17:12  13
  17:11  13
  17:10  13
  17:09  13
  17:08  13
  17:07  13
```

## Root cause

[`src/ids/identity_manager.rs#L45-L49`](https://github.com/OpenBubbles/rustpush/blob/main/src/ids/identity_manager.rs#L45-L49):

```rust
fn get_stale_time(&self) -> Duration {
    SystemTime::now()
        .duration_since(UNIX_EPOCH + Duration::from_millis(self.at_ms))
        .expect("Time went backwards")
}
```

When `SystemTime::now()` is earlier than `self.at_ms` (which happens whenever the wall clock is adjusted backward), `duration_since` returns `Err(SystemTimeError)` and `.expect()` panics. `self.at_ms` was captured before the NTP correction; `SystemTime::now()` is read after. The cache thinks its keys are from "the future" relative to the wall clock — which is physically impossible but trivially produced by the OS.

The panic propagates up through the Tokio task, the IDS lookup task is gone, but the supervising CloudKit code keeps retrying — and each retry hits the same panicking code path because the cached `at_ms` values are still in the (now-relative) future. The loop only breaks when the wall clock catches back up (potentially many seconds, or never, if drift cancels out).

## Impact

- **Severity:** High. Single-event trigger → unrecoverable application hang requiring force-kill. Reproducible on every wake from a non-trivial sleep on Windows (where backward NTP corrections after Modern Standby are routine).
- **Blast radius:** Any code path that gates on `CachedKeys::is_valid` or `is_dirty` — i.e. every IDS-authenticated request, including CloudKit (`gateway.icloud.com/ckdatabase/...`), iMessage delivery checks, etc.
- **User-visible behaviour:** App freezes; CPU pegs at one full core; iCloud sync, contact sync, and message sending all silently break until the user notices and force-kills.

## Proposed fix

Replace the panic with a safe fallback. Three options, ordered by minimal-change → robust:

**1. Minimal — saturate to zero on backward skew:**
```rust
fn get_stale_time(&self) -> Duration {
    SystemTime::now()
        .duration_since(UNIX_EPOCH + Duration::from_millis(self.at_ms))
        .unwrap_or(Duration::ZERO)
}
```
This treats a clock-backward situation as "the key was just refreshed," which is conservative (keys appear fresh until the next genuine staleness check). Safe because `is_valid()` returning true on fresh keys is the no-op path.

**2. Better — return `Duration::MAX` (force re-auth):**
```rust
.unwrap_or(Duration::MAX)
```
Forces an immediate refresh, which is the right behaviour if the wall clock genuinely moved by an unexpected amount.

**3. Best — use a monotonic clock for staleness:**
Store `Instant::now()` alongside `at_ms` at cache-insertion time and compute staleness against `Instant`. Monotonic clocks are immune to wall-clock skew. (Drawback: `Instant` is not persistable, so this only works for in-memory caches — but `CachedKeys` already looks in-memory based on the surrounding code.)

## Related class of bugs

There are **15+ other `SystemTime::now().duration_since(...).unwrap()` / `.expect()` call sites** in the codebase that have the same fundamental problem. Found via `gh search code 'duration_since repo:OpenBubbles/rustpush'`:

- `src/statuskit.rs` — 2 sites
- `src/passwords.rs` — 1 site
- `src/findmy.rs` — 1 site
- `src/ids/user.rs` — 1 site
- `src/util.rs` — 2 sites (`duration_since(SystemTime::UNIX_EPOCH).unwrap()`)
- `src/auth.rs` — `mme_refreshed` weekly check
- `src/facetime.rs`, `src/imessage/aps_client.rs`, `src/imessage/messages.rs`, `src/icloud/keychain.rs`, `src/ids/identity_manager.rs`, `cloudkit-proto/src/lib.rs` — various
- `src/sharedstreams.rs` — `round_seconds()`

`identity_manager.rs#L45` is the one I caught panicking, but any of these can panic the same way under the right clock-skew conditions. Worth fixing as a class — perhaps a small helper `duration_since_safe()` in `util.rs` that returns `Duration::ZERO` (or whatever the call site needs) on `Err`.

## Workaround for users until fixed

Disable CloudKit sync via `flutter.cloudSyncingEnabled = false` and `flutter.attachmentSyncEnabled = false` in `shared_preferences.json`. iMessages still deliver via APS reflection; only cross-device iCloud history sync is lost.

---

Happy to send a PR for option 1 or 3 if you have a preferred approach.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on backward system-clock adjustment (sleep/wake NTP correction) wedges IDS auth and downstream CloudKit sync #29

Summary

Reproduction

Evidence

Panic from the app log

Correlated Windows Kernel-General clock-adjust events (same machine, same minute)

Steady cadence proves a wedged retry loop, not just a one-shot panic

Root cause

Impact

Proposed fix

Related class of bugs

Workaround for users until fixed

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Panic on backward system-clock adjustment (sleep/wake NTP correction) wedges IDS auth and downstream CloudKit sync #29

Description

Summary

Reproduction

Evidence

Panic from the app log

Correlated Windows Kernel-General clock-adjust events (same machine, same minute)

Steady cadence proves a wedged retry loop, not just a one-shot panic

Root cause

Impact

Proposed fix

Related class of bugs

Workaround for users until fixed

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions