Skip to content

Commit 6255fca

Browse files
committed
fix: prevent DB lock on app open after background sync, remove retries in push mode
Two related fixes for push notification sync: 1. Database locked on app open Background sync opens a write connection that could be orphaned if the task is interrupted before the finally block runs. The foreground then hits SQLITE_BUSY because the lock is still held in the same process. Fix: store the background DB connection in module-level state before any await (activeBackgroundDb.ts). On foreground startup, consume and forcibly close any orphaned connection before opening the app's own connections. Added PRAGMA busy_timeout = 10000 as a cross-process fallback for Android. 2. Remove retries in push mode The server sends two notifications per sync event: apply (immediate) then check (once the artifact is ready). Client-side retries are redundant and cause the apply and check syncs to overlap, since the check notification arrives while apply is still retrying. Fix: maxAttempts = 1 in push mode for both background sync and foreground sync (useSyncManager). The server's apply→check protocol handles reliability.
1 parent 55a6520 commit 6255fca

File tree

11 files changed

+617
-8
lines changed

11 files changed

+617
-8
lines changed
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# Plan: Fix "Database is Locked" After Background Sync on App Open
2+
3+
**Date:** 2026-04-02
4+
**Status:** Implemented
5+
6+
---
7+
8+
## Problem Statement
9+
10+
When a push notification arrives while the app is **terminated**, the background task (`executeBackgroundSync`) opens a write connection to the SQLite database and runs `cloudsync_network_sync()`. If the user opens the app while this task is running — or after it was interrupted — the foreground hits a **"database is locked"** (`SQLITE_BUSY`) error.
11+
12+
---
13+
14+
## Process Model — Why This Is Hard
15+
16+
**Both iOS and Android use the same process for background tasks:**
17+
- **iOS**: launches app in background mode, promotes that same process to foreground when user opens the app
18+
- **Android**: uses Android "Headless JS" — the JS runtime runs in the existing app process without a mounted component tree (same process, no component tree)
19+
20+
This is confirmed by `pushNotificationSyncCallbacks.ts` already using module-level state that background tasks read — a pattern that only works in a shared process.
21+
22+
**The key difference is how tasks get killed:**
23+
24+
| | iOS | Android |
25+
|---|---|---|
26+
| Task killed | iOS terminates the *task*, **keeps the process alive** | Android typically kills the *entire process* |
27+
| Lock released after kill? | ❌ No — orphaned connection stays open in same process | ✅ Yes — POSIX releases all locks on process death |
28+
29+
**The critical failure path (both platforms, task killed in same process):**
30+
31+
```
32+
1. Background task opens write DB connection
33+
2. cloudsync_network_sync() starts (native, off JS thread)
34+
3. TaskManager hits background task time limit
35+
→ Expo kills the *task*, process stays alive (iOS)
36+
→ OR background task is interrupted before finally runs (Android)
37+
→ The JS finally { db.close() } never executes
38+
4. User opens app → same process → foreground init runs
39+
5. Foreground tries to open write connection
40+
→ Same process still holds an open, unclosed write connection
41+
→ SQLITE_BUSY
42+
6. busy_timeout waits... but the lock holder (orphaned connection in same process)
43+
never releases → user is stuck
44+
```
45+
46+
**Why `busy_timeout` alone is not enough for the same-process case:**
47+
SQLite's busy_timeout retries until the lock is released. If the lock is held by an orphaned open connection in the same process that nobody will ever close, the lock is held **indefinitely**. The timeout expires and the user cannot open the database.
48+
49+
**When `busy_timeout` alone IS sufficient (Android process-death case):**
50+
If Android kills the entire app process (e.g. very aggressive task termination), POSIX releases all file locks automatically on process death. The foreground app starts fresh in a new process, finds no orphaned connections in module state, and `busy_timeout` covers the brief window before the lock is fully released.
51+
52+
---
53+
54+
## Root Cause
55+
56+
The background task opens a DB connection but there is no mechanism for the foreground to know about or clean up that connection if the task is interrupted. The connection becomes an **orphan**: still open, holding a write lock, with no code path that will ever call `db.close()` on it.
57+
58+
---
59+
60+
## Solution: Module-level DB reference + forceful cleanup
61+
62+
Since the background task and foreground run in the same process, module-level state is shared. The fix is to store the background task's DB connection in a module-level variable **before the first `await`**. The foreground checks this variable on startup and forcibly closes any orphaned connection before opening its own.
63+
64+
```
65+
Background:
66+
db = createDatabase(...)
67+
setActiveBackgroundDb(db) ← store reference immediately, synchronously
68+
... sync (may get interrupted here) ...
69+
finally:
70+
clearActiveBackgroundDb()
71+
db.close()
72+
73+
Foreground initialization:
74+
staleDb = consumeActiveBackgroundDb()
75+
if (staleDb) {
76+
staleDb.close() ← forcibly release the lock, regardless of whether
77+
} background is done or still running
78+
// Now safe to open own connection
79+
db = createDatabase(...)
80+
```
81+
82+
**Why forceful close is safe:**
83+
- If the background task was already done: this is a no-op (reference was already cleared by the task's `finally` block)
84+
- If the background task is interrupted/stuck: closing the connection rolls back any uncommitted write transaction and releases the lock. SQLite guarantees the database stays consistent after a connection is closed mid-transaction. The partial sync data is lost, but that is acceptable — the next foreground sync will re-download it
85+
- If the native `cloudsync_network_sync()` call is still running when we close: OP-SQLite will return an error on the native side, the background task's `catch` block handles it, and the `finally` tries to close an already-closed connection — which we catch and ignore
86+
87+
**For Android / true cross-process scenarios:**
88+
Module-level state is not shared. However, when a separate background process is killed by the OS, POSIX guarantees all file locks are released (all file descriptors are closed on SIGKILL). `PRAGMA busy_timeout` handles the wait window here. This is a secondary defense for Android, not the primary fix for iOS.
89+
90+
---
91+
92+
## Implementation Plan
93+
94+
### Phase 1 — New module: `activeBackgroundDb.ts`
95+
96+
**New file:** `src/core/background/activeBackgroundDb.ts`
97+
98+
Manages the module-level reference to the background sync's DB connection.
99+
100+
```typescript
101+
import type { DB } from '@op-engineering/op-sqlite';
102+
103+
let _db: DB | null = null;
104+
105+
/** Called by executeBackgroundSync before first await — sets reference synchronously */
106+
export function setActiveBackgroundDb(db: DB): void {
107+
_db = db;
108+
}
109+
110+
/**
111+
* Called by useDatabaseInitialization on startup.
112+
* Returns and clears the reference so the caller owns the connection.
113+
* Returns null if no background sync connection exists.
114+
*/
115+
export function consumeActiveBackgroundDb(): DB | null {
116+
const db = _db;
117+
_db = null;
118+
return db;
119+
}
120+
121+
/** Called in background task's finally block to clear reference after normal close */
122+
export function clearActiveBackgroundDb(): void {
123+
_db = null;
124+
}
125+
```
126+
127+
---
128+
129+
### Phase 2 — Update `executeBackgroundSync.ts`
130+
131+
Set the module-level reference **before the first `await`** (synchronously), so it's visible to any foreground code that runs concurrently. Clear it and close in the `finally` block — but only if the foreground hasn't already consumed and closed it.
132+
133+
Key changes:
134+
- Pass `setActiveBackgroundDb` as `onOpen` callback to `createDatabase` (see Phase 4 amendment) — this fires synchronously after `open()` before any awaited PRAGMAs, closing the kill-window gap
135+
- In `finally`: call `clearActiveBackgroundDb()` before `db.close()`
136+
- Handle the case where `db.close()` throws because foreground already closed it (already caught by existing try/catch in `finally`)
137+
138+
> **Amendment (post-review):** Originally called `setActiveBackgroundDb(db)` after `await createDatabase(...)` returned, which still left a window between `open()` and the first PRAGMA await. Fixed by using the `onOpen` callback in `createDatabase` (see Phase 4 amendment).
139+
140+
---
141+
142+
### Phase 3 — Update `useDatabaseInitialization.ts`
143+
144+
At the very start of the `initialize()` function, before opening any connections, check for and close any orphaned background connection.
145+
146+
```typescript
147+
import { consumeActiveBackgroundDb } from '../background/activeBackgroundDb';
148+
149+
const initialize = async () => {
150+
/** CLEANUP ORPHANED BACKGROUND CONNECTION */
151+
const orphanedDb = consumeActiveBackgroundDb();
152+
if (orphanedDb) {
153+
logger.warn('⚠️ Found orphaned background sync connection — closing to release write lock');
154+
try {
155+
orphanedDb.updateHook(null);
156+
orphanedDb.close();
157+
logger.info('✅ Orphaned connection closed');
158+
} catch (closeErr) {
159+
logger.warn('⚠️ Could not close orphaned connection (may already be closed):', closeErr);
160+
// Best effort — proceed regardless
161+
}
162+
}
163+
164+
// ... rest of existing initialize() code unchanged
165+
};
166+
```
167+
168+
---
169+
170+
### Phase 4 — Update `createDatabase.ts`
171+
172+
Two changes:
173+
174+
**1. Add `PRAGMA busy_timeout`** as a cross-process fallback (Android) and general-purpose defense.
175+
176+
```typescript
177+
const db = open({ name });
178+
await db.execute('PRAGMA busy_timeout = 10000'); // 10s fallback for cross-process
179+
await db.execute('PRAGMA journal_mode = WAL');
180+
```
181+
182+
10 seconds is sufficient for Android's cross-process scenario — when the background process is killed, POSIX releases the lock within milliseconds. 10s covers any OS-level delay.
183+
184+
**2. Add `onOpen?: (db: DB) => void` callback parameter** (amendment, see Phase 2 note).
185+
186+
```typescript
187+
export async function createDatabase(
188+
name: string,
189+
mode: 'write' | 'read',
190+
onOpen?: (db: DB) => void
191+
): Promise<DB> {
192+
const db = open({ name });
193+
onOpen?.(db); // ← fires synchronously before any await
194+
await db.execute('PRAGMA busy_timeout = 10000');
195+
...
196+
}
197+
```
198+
199+
This allows callers to register ownership of the raw connection before any async gap. Background sync passes `setActiveBackgroundDb` here. Normal callers (foreground, read connections) pass nothing.
200+
201+
---
202+
203+
### Phase 5 — Tests
204+
205+
**New:** `src/core/background/__tests__/activeBackgroundDb.test.ts`
206+
- `setActiveBackgroundDb` makes connection available
207+
- `consumeActiveBackgroundDb` returns and clears the reference
208+
- `clearActiveBackgroundDb` clears without returning
209+
210+
**Update:** `src/core/background/__tests__/executeBackgroundSync.test.ts`
211+
- Verify `createDatabase` is called with an `onOpen` callback (function arg in position 3)
212+
- Verify that `onOpen` callback calls `setActiveBackgroundDb` with the DB
213+
- Verify `clearActiveBackgroundDb` is called in the `finally` block (both success and error paths)
214+
- Verify `db.close()` error after foreground already closed it is handled gracefully
215+
216+
**Update:** `src/core/database/__tests__/useDatabaseInitialization.test.ts`
217+
- When `consumeActiveBackgroundDb` returns a mock DB, verify `db.close()` is called on it before initialization proceeds
218+
- When `consumeActiveBackgroundDb` returns `null`, verify normal initialization path
219+
220+
**Update:** `src/core/database/__tests__/createDatabase.test.ts`
221+
- Verify `PRAGMA busy_timeout = 10000` is executed
222+
- Verify `busy_timeout` is set before `journal_mode` (ordering matters)
223+
- Verify `onOpen` callback is called synchronously before any PRAGMA (pragma call count is 0 when `onOpen` fires)
224+
225+
---
226+
227+
## File Impact
228+
229+
| File | Change | Risk |
230+
|------|--------|------|
231+
| `src/core/background/activeBackgroundDb.ts` | New file, ~20 lines | None |
232+
| `src/core/background/executeBackgroundSync.ts` | Pass `setActiveBackgroundDb` as `onOpen` to `createDatabase`; `clearActiveBackgroundDb()` in `finally` | Low |
233+
| `src/core/database/useDatabaseInitialization.ts` | Add cleanup block at start of `initialize()` | Low |
234+
| `src/core/database/createDatabase.ts` | Add `onOpen?` callback param + `PRAGMA busy_timeout = 10000` | Very Low |
235+
| Test files | New + updated test cases (51 tests total) | None |
236+
237+
---
238+
239+
## Failure Scenarios Covered
240+
241+
| Scenario | Covered by |
242+
|----------|-----------|
243+
| Background task completes normally, user opens app immediately after | Phase 4 (`busy_timeout`) — lock released naturally, brief wait |
244+
| Background task interrupted (iOS time limit), same process | Phase 1-3 — foreground forcibly closes orphaned connection |
245+
| Background task stuck (CloudSync network timeout), same process | Phase 1-3 — foreground forcibly closes orphaned connection |
246+
| Background process killed by OS (Android / cross-process) | Phase 4 — POSIX releases lock on process death, `busy_timeout` covers the window |
247+
| No background sync running | `consumeActiveBackgroundDb()` returns null, zero overhead |
248+
249+
---
250+
251+
## What This Does NOT Cover
252+
253+
- If the CloudSync native library (`cloudsync_network_sync`) holds the lock for longer than `busy_timeout` in a cross-process scenario AND the process is not killed yet — this would require aborting the CloudSync native call, which is outside this library's control. In practice, iOS kills background processes well within 30s.
254+
255+
---
256+
257+
## Verification Criteria
258+
259+
- [x] `setActiveBackgroundDb(db)` is called synchronously via `onOpen` callback inside `createDatabase`, before any PRAGMA await
260+
- [x] `clearActiveBackgroundDb()` is called in the `finally` block of `executeBackgroundSync`
261+
- [x] `useDatabaseInitialization` calls `consumeActiveBackgroundDb()` and closes the returned connection before opening its own
262+
- [x] If `consumeActiveBackgroundDb()` returns null, zero behavior change vs. current code
263+
- [x] `PRAGMA busy_timeout = 10000` is first PRAGMA after `open()` in `createDatabase`
264+
- [x] All existing tests pass
265+
- [x] New `activeBackgroundDb` tests pass (51 tests total, all passing)
266+
- [ ] Manual test: terminate app → trigger background sync → open app during sync → no locked error, app loads cleanly
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Plan: Remove Retries in Push Mode
2+
3+
**Date:** 2026-04-02
4+
**Status:** Approved
5+
6+
---
7+
8+
## Problem Statement
9+
10+
After the database-locked fix, opening the app following a terminated-state background sync shows a gray screen for 20-30 seconds on Android.
11+
12+
**Root cause (from logs):**
13+
14+
```
15+
13:32:39 — User taps app icon (android activity not available yet)
16+
13:32:46 — Background sync 1 starts (apply event) ← openDB + 3-attempt native retry
17+
13:32:48 — Background sync 2 starts (check event) ← openDB + 3-attempt native retry, concurrent!
18+
13:32:51 — Both syncs complete
19+
13:32:59 — JS thread torn down ("finished thread" errors)
20+
13:33:02 — App finally renders ← 23 seconds after tap
21+
13:33:04 — Initial foreground sync: 4 attempts × ~1.3s = ~5s of extra blocking
22+
```
23+
24+
Two contributing factors:
25+
1. **Background syncs retry (native, 3 attempts)** — both tasks hold the JS engine busy during the headless→foreground Android transition
26+
2. **Foreground initial sync retries (JS, 4 attempts × 1s delay)** — delays UI data availability after app opens
27+
28+
**Why retries are unnecessary in push mode:**
29+
30+
The server's two-notification protocol is the retry mechanism:
31+
- `apply` event → app syncs (may get 0 changes if artifact not ready yet)
32+
- `check` event (with `artifactURI`) → app syncs with the actual data
33+
34+
If a sync attempt misses changes, the server sends the next notification. Client-side retries are redundant and actively harmful to startup performance.
35+
36+
---
37+
38+
## Why Concurrent Background Syncs Were Happening
39+
40+
The concurrency was a **symptom of retries**, not an independent problem:
41+
42+
```
43+
t=0s apply arrives → sync attempt 1 → 0 changes → wait 500ms → attempt 2 starts
44+
t=2s server sends check (artifact now ready) → second executeBackgroundSync fires
45+
↑ overlaps with apply's attempt 2
46+
```
47+
48+
Removing retries fixes this: the `apply` sync finishes in ~1-2s (single attempt),
49+
so by the time `check` arrives 2 seconds later, nothing is running. No concurrency
50+
guard needed.
51+
52+
---
53+
54+
## Solution
55+
56+
### Change 1 — `executeBackgroundSync.ts`: `maxAttempts: 1`
57+
58+
Background sync always runs in push mode. No retries needed — the server's `check`
59+
notification is the retry mechanism.
60+
61+
```typescript
62+
// Before
63+
await executeSync(db, logger, {
64+
useNativeRetry: true,
65+
maxAttempts: 3,
66+
attemptDelay: 500,
67+
});
68+
69+
// After
70+
await executeSync(db, logger, {
71+
maxAttempts: 1,
72+
});
73+
```
74+
75+
---
76+
77+
### Change 2 — `useSyncManager.ts`: `maxAttempts: 1` in push mode
78+
79+
`syncMode` is already available. Pass `maxAttempts: 1` when in push mode:
80+
81+
```typescript
82+
const changes = await executeSync(writeDbRef.current, logger, {
83+
useTransaction: true,
84+
maxAttempts: syncMode === 'push' ? 1 : 4,
85+
attemptDelay: syncMode === 'push' ? 0 : 1000,
86+
});
87+
```
88+
89+
This covers both foreground notification-triggered syncs and the initial sync on app open.
90+
91+
---
92+
93+
## File Impact
94+
95+
| File | Change | Risk |
96+
|------|--------|------|
97+
| `src/core/background/executeBackgroundSync.ts` | `maxAttempts: 1` | Very Low |
98+
| `src/core/sync/useSyncManager.ts` | `maxAttempts: syncMode === 'push' ? 1 : 4` | Very Low |
99+
100+
---
101+
102+
## Expected Outcome
103+
104+
```
105+
Before: gray screen 20-30s, two concurrent background syncs, foreground retries 4×
106+
After: gray screen ~5-8s (Android JS context switch overhead only, not reducible by client code)
107+
```
108+
109+
The remaining gray screen time is the Android headless→foreground JS context transition itself, which is outside this library's control.
110+
111+
---
112+
113+
## Test Updates
114+
115+
- `executeBackgroundSync.test.ts` — verify `executeSync` is called with `maxAttempts: 1`
116+
- `useSyncManager` tests — verify `maxAttempts: 1` when `syncMode === 'push'`, `maxAttempts: 4` when `syncMode === 'polling'`
117+
118+
---
119+
120+
## Verification Criteria
121+
122+
- [ ] Background sync calls `executeSync` with `maxAttempts: 1`
123+
- [ ] `useSyncManager` uses `maxAttempts: 1` when `syncMode === 'push'`
124+
- [ ] `useSyncManager` uses `maxAttempts: 4` when `syncMode === 'polling'` (no regression)
125+
- [ ] All existing tests pass
126+
- [ ] Manual: terminate app → trigger push → open app → no gray screen delay, data synced correctly

0 commit comments

Comments
 (0)