|
| 1 | +# Plan: Fix "Database is Locked" After Background Sync on App Open |
| 2 | + |
| 3 | +**Date:** 2026-04-02 |
| 4 | +**Status:** Implemented |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Problem Statement |
| 9 | + |
| 10 | +When a push notification arrives while the app is **terminated**, the background task (`executeBackgroundSync`) opens a write connection to the SQLite database and runs `cloudsync_network_sync()`. If the user opens the app while this task is running — or after it was interrupted — the foreground hits a **"database is locked"** (`SQLITE_BUSY`) error. |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## Process Model — Why This Is Hard |
| 15 | + |
| 16 | +**Both iOS and Android use the same process for background tasks:** |
| 17 | +- **iOS**: launches app in background mode, promotes that same process to foreground when user opens the app |
| 18 | +- **Android**: uses Android "Headless JS" — the JS runtime runs in the existing app process without a mounted component tree (same process, no component tree) |
| 19 | + |
| 20 | +This is confirmed by `pushNotificationSyncCallbacks.ts` already using module-level state that background tasks read — a pattern that only works in a shared process. |
| 21 | + |
| 22 | +**The key difference is how tasks get killed:** |
| 23 | + |
| 24 | +| | iOS | Android | |
| 25 | +|---|---|---| |
| 26 | +| Task killed | iOS terminates the *task*, **keeps the process alive** | Android typically kills the *entire process* | |
| 27 | +| Lock released after kill? | ❌ No — orphaned connection stays open in same process | ✅ Yes — POSIX releases all locks on process death | |
| 28 | + |
| 29 | +**The critical failure path (both platforms, task killed in same process):** |
| 30 | + |
| 31 | +``` |
| 32 | +1. Background task opens write DB connection |
| 33 | +2. cloudsync_network_sync() starts (native, off JS thread) |
| 34 | +3. TaskManager hits background task time limit |
| 35 | + → Expo kills the *task*, process stays alive (iOS) |
| 36 | + → OR background task is interrupted before finally runs (Android) |
| 37 | + → The JS finally { db.close() } never executes |
| 38 | +4. User opens app → same process → foreground init runs |
| 39 | +5. Foreground tries to open write connection |
| 40 | + → Same process still holds an open, unclosed write connection |
| 41 | + → SQLITE_BUSY |
| 42 | +6. busy_timeout waits... but the lock holder (orphaned connection in same process) |
| 43 | + never releases → user is stuck |
| 44 | +``` |
| 45 | + |
| 46 | +**Why `busy_timeout` alone is not enough for the same-process case:** |
| 47 | +SQLite's busy_timeout retries until the lock is released. If the lock is held by an orphaned open connection in the same process that nobody will ever close, the lock is held **indefinitely**. The timeout expires and the user cannot open the database. |
| 48 | + |
| 49 | +**When `busy_timeout` alone IS sufficient (Android process-death case):** |
| 50 | +If Android kills the entire app process (e.g. very aggressive task termination), POSIX releases all file locks automatically on process death. The foreground app starts fresh in a new process, finds no orphaned connections in module state, and `busy_timeout` covers the brief window before the lock is fully released. |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Root Cause |
| 55 | + |
| 56 | +The background task opens a DB connection but there is no mechanism for the foreground to know about or clean up that connection if the task is interrupted. The connection becomes an **orphan**: still open, holding a write lock, with no code path that will ever call `db.close()` on it. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Solution: Module-level DB reference + forceful cleanup |
| 61 | + |
| 62 | +Since the background task and foreground run in the same process, module-level state is shared. The fix is to store the background task's DB connection in a module-level variable **before the first `await`**. The foreground checks this variable on startup and forcibly closes any orphaned connection before opening its own. |
| 63 | + |
| 64 | +``` |
| 65 | +Background: |
| 66 | + db = createDatabase(...) |
| 67 | + setActiveBackgroundDb(db) ← store reference immediately, synchronously |
| 68 | + ... sync (may get interrupted here) ... |
| 69 | + finally: |
| 70 | + clearActiveBackgroundDb() |
| 71 | + db.close() |
| 72 | +
|
| 73 | +Foreground initialization: |
| 74 | + staleDb = consumeActiveBackgroundDb() |
| 75 | + if (staleDb) { |
| 76 | + staleDb.close() ← forcibly release the lock, regardless of whether |
| 77 | + } background is done or still running |
| 78 | + // Now safe to open own connection |
| 79 | + db = createDatabase(...) |
| 80 | +``` |
| 81 | + |
| 82 | +**Why forceful close is safe:** |
| 83 | +- If the background task was already done: this is a no-op (reference was already cleared by the task's `finally` block) |
| 84 | +- If the background task is interrupted/stuck: closing the connection rolls back any uncommitted write transaction and releases the lock. SQLite guarantees the database stays consistent after a connection is closed mid-transaction. The partial sync data is lost, but that is acceptable — the next foreground sync will re-download it |
| 85 | +- If the native `cloudsync_network_sync()` call is still running when we close: OP-SQLite will return an error on the native side, the background task's `catch` block handles it, and the `finally` tries to close an already-closed connection — which we catch and ignore |
| 86 | + |
| 87 | +**For Android / true cross-process scenarios:** |
| 88 | +Module-level state is not shared. However, when a separate background process is killed by the OS, POSIX guarantees all file locks are released (all file descriptors are closed on SIGKILL). `PRAGMA busy_timeout` handles the wait window here. This is a secondary defense for Android, not the primary fix for iOS. |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## Implementation Plan |
| 93 | + |
| 94 | +### Phase 1 — New module: `activeBackgroundDb.ts` |
| 95 | + |
| 96 | +**New file:** `src/core/background/activeBackgroundDb.ts` |
| 97 | + |
| 98 | +Manages the module-level reference to the background sync's DB connection. |
| 99 | + |
| 100 | +```typescript |
| 101 | +import type { DB } from '@op-engineering/op-sqlite'; |
| 102 | + |
| 103 | +let _db: DB | null = null; |
| 104 | + |
| 105 | +/** Called by executeBackgroundSync before first await — sets reference synchronously */ |
| 106 | +export function setActiveBackgroundDb(db: DB): void { |
| 107 | + _db = db; |
| 108 | +} |
| 109 | + |
| 110 | +/** |
| 111 | + * Called by useDatabaseInitialization on startup. |
| 112 | + * Returns and clears the reference so the caller owns the connection. |
| 113 | + * Returns null if no background sync connection exists. |
| 114 | + */ |
| 115 | +export function consumeActiveBackgroundDb(): DB | null { |
| 116 | + const db = _db; |
| 117 | + _db = null; |
| 118 | + return db; |
| 119 | +} |
| 120 | + |
| 121 | +/** Called in background task's finally block to clear reference after normal close */ |
| 122 | +export function clearActiveBackgroundDb(): void { |
| 123 | + _db = null; |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +### Phase 2 — Update `executeBackgroundSync.ts` |
| 130 | + |
| 131 | +Set the module-level reference **before the first `await`** (synchronously), so it's visible to any foreground code that runs concurrently. Clear it and close in the `finally` block — but only if the foreground hasn't already consumed and closed it. |
| 132 | + |
| 133 | +Key changes: |
| 134 | +- Pass `setActiveBackgroundDb` as `onOpen` callback to `createDatabase` (see Phase 4 amendment) — this fires synchronously after `open()` before any awaited PRAGMAs, closing the kill-window gap |
| 135 | +- In `finally`: call `clearActiveBackgroundDb()` before `db.close()` |
| 136 | +- Handle the case where `db.close()` throws because foreground already closed it (already caught by existing try/catch in `finally`) |
| 137 | + |
| 138 | +> **Amendment (post-review):** Originally called `setActiveBackgroundDb(db)` after `await createDatabase(...)` returned, which still left a window between `open()` and the first PRAGMA await. Fixed by using the `onOpen` callback in `createDatabase` (see Phase 4 amendment). |
| 139 | +
|
| 140 | +--- |
| 141 | + |
| 142 | +### Phase 3 — Update `useDatabaseInitialization.ts` |
| 143 | + |
| 144 | +At the very start of the `initialize()` function, before opening any connections, check for and close any orphaned background connection. |
| 145 | + |
| 146 | +```typescript |
| 147 | +import { consumeActiveBackgroundDb } from '../background/activeBackgroundDb'; |
| 148 | + |
| 149 | +const initialize = async () => { |
| 150 | + /** CLEANUP ORPHANED BACKGROUND CONNECTION */ |
| 151 | + const orphanedDb = consumeActiveBackgroundDb(); |
| 152 | + if (orphanedDb) { |
| 153 | + logger.warn('⚠️ Found orphaned background sync connection — closing to release write lock'); |
| 154 | + try { |
| 155 | + orphanedDb.updateHook(null); |
| 156 | + orphanedDb.close(); |
| 157 | + logger.info('✅ Orphaned connection closed'); |
| 158 | + } catch (closeErr) { |
| 159 | + logger.warn('⚠️ Could not close orphaned connection (may already be closed):', closeErr); |
| 160 | + // Best effort — proceed regardless |
| 161 | + } |
| 162 | + } |
| 163 | + |
| 164 | + // ... rest of existing initialize() code unchanged |
| 165 | +}; |
| 166 | +``` |
| 167 | + |
| 168 | +--- |
| 169 | + |
| 170 | +### Phase 4 — Update `createDatabase.ts` |
| 171 | + |
| 172 | +Two changes: |
| 173 | + |
| 174 | +**1. Add `PRAGMA busy_timeout`** as a cross-process fallback (Android) and general-purpose defense. |
| 175 | + |
| 176 | +```typescript |
| 177 | +const db = open({ name }); |
| 178 | +await db.execute('PRAGMA busy_timeout = 10000'); // 10s fallback for cross-process |
| 179 | +await db.execute('PRAGMA journal_mode = WAL'); |
| 180 | +``` |
| 181 | + |
| 182 | +10 seconds is sufficient for Android's cross-process scenario — when the background process is killed, POSIX releases the lock within milliseconds. 10s covers any OS-level delay. |
| 183 | + |
| 184 | +**2. Add `onOpen?: (db: DB) => void` callback parameter** (amendment, see Phase 2 note). |
| 185 | + |
| 186 | +```typescript |
| 187 | +export async function createDatabase( |
| 188 | + name: string, |
| 189 | + mode: 'write' | 'read', |
| 190 | + onOpen?: (db: DB) => void |
| 191 | +): Promise<DB> { |
| 192 | + const db = open({ name }); |
| 193 | + onOpen?.(db); // ← fires synchronously before any await |
| 194 | + await db.execute('PRAGMA busy_timeout = 10000'); |
| 195 | + ... |
| 196 | +} |
| 197 | +``` |
| 198 | + |
| 199 | +This allows callers to register ownership of the raw connection before any async gap. Background sync passes `setActiveBackgroundDb` here. Normal callers (foreground, read connections) pass nothing. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +### Phase 5 — Tests |
| 204 | + |
| 205 | +**New:** `src/core/background/__tests__/activeBackgroundDb.test.ts` |
| 206 | +- `setActiveBackgroundDb` makes connection available |
| 207 | +- `consumeActiveBackgroundDb` returns and clears the reference |
| 208 | +- `clearActiveBackgroundDb` clears without returning |
| 209 | + |
| 210 | +**Update:** `src/core/background/__tests__/executeBackgroundSync.test.ts` |
| 211 | +- Verify `createDatabase` is called with an `onOpen` callback (function arg in position 3) |
| 212 | +- Verify that `onOpen` callback calls `setActiveBackgroundDb` with the DB |
| 213 | +- Verify `clearActiveBackgroundDb` is called in the `finally` block (both success and error paths) |
| 214 | +- Verify `db.close()` error after foreground already closed it is handled gracefully |
| 215 | + |
| 216 | +**Update:** `src/core/database/__tests__/useDatabaseInitialization.test.ts` |
| 217 | +- When `consumeActiveBackgroundDb` returns a mock DB, verify `db.close()` is called on it before initialization proceeds |
| 218 | +- When `consumeActiveBackgroundDb` returns `null`, verify normal initialization path |
| 219 | + |
| 220 | +**Update:** `src/core/database/__tests__/createDatabase.test.ts` |
| 221 | +- Verify `PRAGMA busy_timeout = 10000` is executed |
| 222 | +- Verify `busy_timeout` is set before `journal_mode` (ordering matters) |
| 223 | +- Verify `onOpen` callback is called synchronously before any PRAGMA (pragma call count is 0 when `onOpen` fires) |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## File Impact |
| 228 | + |
| 229 | +| File | Change | Risk | |
| 230 | +|------|--------|------| |
| 231 | +| `src/core/background/activeBackgroundDb.ts` | New file, ~20 lines | None | |
| 232 | +| `src/core/background/executeBackgroundSync.ts` | Pass `setActiveBackgroundDb` as `onOpen` to `createDatabase`; `clearActiveBackgroundDb()` in `finally` | Low | |
| 233 | +| `src/core/database/useDatabaseInitialization.ts` | Add cleanup block at start of `initialize()` | Low | |
| 234 | +| `src/core/database/createDatabase.ts` | Add `onOpen?` callback param + `PRAGMA busy_timeout = 10000` | Very Low | |
| 235 | +| Test files | New + updated test cases (51 tests total) | None | |
| 236 | + |
| 237 | +--- |
| 238 | + |
| 239 | +## Failure Scenarios Covered |
| 240 | + |
| 241 | +| Scenario | Covered by | |
| 242 | +|----------|-----------| |
| 243 | +| Background task completes normally, user opens app immediately after | Phase 4 (`busy_timeout`) — lock released naturally, brief wait | |
| 244 | +| Background task interrupted (iOS time limit), same process | Phase 1-3 — foreground forcibly closes orphaned connection | |
| 245 | +| Background task stuck (CloudSync network timeout), same process | Phase 1-3 — foreground forcibly closes orphaned connection | |
| 246 | +| Background process killed by OS (Android / cross-process) | Phase 4 — POSIX releases lock on process death, `busy_timeout` covers the window | |
| 247 | +| No background sync running | `consumeActiveBackgroundDb()` returns null, zero overhead | |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## What This Does NOT Cover |
| 252 | + |
| 253 | +- If the CloudSync native library (`cloudsync_network_sync`) holds the lock for longer than `busy_timeout` in a cross-process scenario AND the process is not killed yet — this would require aborting the CloudSync native call, which is outside this library's control. In practice, iOS kills background processes well within 30s. |
| 254 | + |
| 255 | +--- |
| 256 | + |
| 257 | +## Verification Criteria |
| 258 | + |
| 259 | +- [x] `setActiveBackgroundDb(db)` is called synchronously via `onOpen` callback inside `createDatabase`, before any PRAGMA await |
| 260 | +- [x] `clearActiveBackgroundDb()` is called in the `finally` block of `executeBackgroundSync` |
| 261 | +- [x] `useDatabaseInitialization` calls `consumeActiveBackgroundDb()` and closes the returned connection before opening its own |
| 262 | +- [x] If `consumeActiveBackgroundDb()` returns null, zero behavior change vs. current code |
| 263 | +- [x] `PRAGMA busy_timeout = 10000` is first PRAGMA after `open()` in `createDatabase` |
| 264 | +- [x] All existing tests pass |
| 265 | +- [x] New `activeBackgroundDb` tests pass (51 tests total, all passing) |
| 266 | +- [ ] Manual test: terminate app → trigger background sync → open app during sync → no locked error, app loads cleanly |
0 commit comments