Skip to content

Commit 3e45199

Browse files
authored
Fix deadlock when canceling queries stuck in VDBE or busy handler (#343)
This PR fixes a deadlock where `DBConnection.disconnect/2` could block forever while closing a connection whose dirty NIF thread was still inside SQLite. The dirty NIF execution model is unchanged; the fix is to make the blocked SQLite operation cancellable so it can release `conn->mutex` and allow close to finish. The deadlock can happen in two important cases: a long-running statement inside SQLite VDBE execution, or a statement sleeping in SQLite's busy handler while waiting for a lock. In both cases the NIF call holds `conn->mutex`. If the pool times out and disconnect tries to close the connection, close also needs `conn->mutex`, so it waits behind the stuck operation. ## What changed This adds `Exqlite.Sqlite3.cancel/1`, which sets a connection cancellation flag and calls `sqlite3_interrupt()`. `Exqlite.Connection.disconnect/2` now calls `cancel/1` before `close/1`, so teardown can interrupt a running statement or cause a busy wait to stop retrying. This replaces SQLite's default busy handler with an Exqlite busy handler. The handler preserves SQLite's default retry delay schedule, stores the timeout in `conn->busy_timeout_ms`, sleeps with `sqlite3_sleep()`, and checks the cancellation flag between sleeps. This avoids the non-cancellable behavior of SQLite's default busy handler while keeping the implementation portable. This adds `Exqlite.Sqlite3.set_busy_timeout/2` and changes connection setup to use it instead of `PRAGMA busy_timeout`. `PRAGMA busy_timeout` calls `sqlite3_busy_timeout()`, which installs SQLite's default busy handler and replaces any custom handler. The new API updates Exqlite's stored timeout without destroying the cancellable busy handler. This adds a configurable progress handler interval through `Exqlite.Sqlite3.set_progress_handler_steps/2` and the `:progress_handler_steps` connection option. The default is `1000` VDBE steps. Values less than `1` disable the progress handler for callers that want to avoid that overhead. The shared cancellation flag is protected by `interrupt_mutex`. The busy handler, progress handler, `cancel/1`, and close/destructor paths read or write that flag under the same lock. The busy-timeout and progress-handler-step fields are also updated under `interrupt_mutex`. ## User-facing behavior High-level users who configure `:busy_timeout` on `Exqlite.start_link/1` or an Ecto SQLite repo should not need to change anything. The connection setup path now applies that timeout through `Exqlite.Sqlite3.set_busy_timeout/2`. Low-level users who execute `PRAGMA busy_timeout = ...` directly should migrate to `Exqlite.Sqlite3.set_busy_timeout/2` if they want to keep cancellation of busy waits. Executing the pragma still works as SQLite behavior, but it replaces Exqlite's custom busy handler with SQLite's default handler, so `cancel/1` can no longer stop the busy-handler retry loop before SQLite's busy timeout expires. `Exqlite.Sqlite3.interrupt/1` remains available for SQLite interruption. `Exqlite.Sqlite3.cancel/1` is the stronger teardown-oriented API because it sets Exqlite's cancellation flag for the busy/progress handlers and also calls `sqlite3_interrupt()`. ## Implementation notes The busy handler uses SQLite's default delay schedule: `1, 2, 5, 10, 15, 20, 25, 25, 25, 50, 50` ms, then `50` ms for later retries. Cancellation is observed between sleep intervals, so a busy wait exits after the current sleep finishes, with later sleeps capped at 50 ms. The NIF stores the calling environment and pid while SQLite operations are running so the busy handler can detect when the caller process has died. If the caller is gone, the busy handler sets the cancellation flag and stops retrying. `cancel/1` and `interrupt/1` avoid taking `conn->mutex` while a query may be running. A running SQLite call holds that mutex, so taking it in the cancellation path would block behind the operation we are trying to cancel. The timeout/progress configuration setters still take `conn->mutex` because they are not used to break a currently running SQLite call. The NIF resource destructor and close path coordinate through `conn->mutex` and `interrupt_mutex` so connection teardown does not race with cancellation state updates or a concurrent interrupt. The Mix project now tracks the C source files as external resources, and the Makefile emits dependency files for C headers, so C/NIF changes are rebuilt more reliably. Partially-Addresses: #192
1 parent dcd9f7c commit 3e45199

10 files changed

Lines changed: 1539 additions & 13 deletions

File tree

Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,10 @@ endif
138138

139139
$(BUILD)/%.o: c_src/%.c
140140
@echo " CC $(notdir $@)"
141-
$(CC) -c $(ERL_CFLAGS) $(CFLAGS) -o $@ $<
141+
$(CC) -c $(ERL_CFLAGS) $(CFLAGS) -MMD -MP -o $@ $<
142+
143+
# Include dependency files for automatic header tracking
144+
-include $(OBJ:.o=.d)
142145

143146
$(LIB_NAME): $(OBJ)
144147
@echo " LD $(notdir $@)"
@@ -152,7 +155,7 @@ $(PREFIX) $(BUILD):
152155
mkdir -p $@
153156

154157
clean:
155-
$(RM) $(LIB_NAME) $(ARCHIVE_NAME) $(OBJ)
158+
$(RM) $(LIB_NAME) $(ARCHIVE_NAME) $(OBJ) $(OBJ:.o=.d)
156159

157160
.PHONY: all clean
158161

0 commit comments

Comments
 (0)