Commit 3e45199
authored
Fix deadlock when canceling queries stuck in VDBE or busy handler (#343)
This PR fixes a deadlock where `DBConnection.disconnect/2` could block
forever while closing a connection whose dirty NIF thread was still
inside SQLite. The dirty NIF execution model is unchanged; the fix is to
make the blocked SQLite operation cancellable so it can release
`conn->mutex` and allow close to finish.
The deadlock can happen in two important cases: a long-running statement
inside SQLite VDBE execution, or a statement sleeping in SQLite's busy
handler while waiting for a lock. In both cases the NIF call holds
`conn->mutex`. If the pool times out and disconnect tries to close the
connection, close also needs `conn->mutex`, so it waits behind the stuck
operation.
## What changed
This adds `Exqlite.Sqlite3.cancel/1`, which sets a connection
cancellation flag and calls `sqlite3_interrupt()`.
`Exqlite.Connection.disconnect/2` now calls `cancel/1` before `close/1`,
so teardown can interrupt a running statement or cause a busy wait to
stop retrying.
This replaces SQLite's default busy handler with an Exqlite busy
handler. The handler preserves SQLite's default retry delay schedule,
stores the timeout in `conn->busy_timeout_ms`, sleeps with
`sqlite3_sleep()`, and checks the cancellation flag between sleeps. This
avoids the non-cancellable behavior of SQLite's default busy handler
while keeping the implementation portable.
This adds `Exqlite.Sqlite3.set_busy_timeout/2` and changes connection
setup to use it instead of `PRAGMA busy_timeout`. `PRAGMA busy_timeout`
calls `sqlite3_busy_timeout()`, which installs SQLite's default busy
handler and replaces any custom handler. The new API updates Exqlite's
stored timeout without destroying the cancellable busy handler.
This adds a configurable progress handler interval through
`Exqlite.Sqlite3.set_progress_handler_steps/2` and the
`:progress_handler_steps` connection option. The default is `1000` VDBE
steps. Values less than `1` disable the progress handler for callers
that want to avoid that overhead.
The shared cancellation flag is protected by `interrupt_mutex`. The busy
handler, progress handler, `cancel/1`, and close/destructor paths read
or write that flag under the same lock. The busy-timeout and
progress-handler-step fields are also updated under `interrupt_mutex`.
## User-facing behavior
High-level users who configure `:busy_timeout` on `Exqlite.start_link/1`
or an Ecto SQLite repo should not need to change anything. The
connection setup path now applies that timeout through
`Exqlite.Sqlite3.set_busy_timeout/2`.
Low-level users who execute `PRAGMA busy_timeout = ...` directly should
migrate to `Exqlite.Sqlite3.set_busy_timeout/2` if they want to keep
cancellation of busy waits. Executing the pragma still works as SQLite
behavior, but it replaces Exqlite's custom busy handler with SQLite's
default handler, so `cancel/1` can no longer stop the busy-handler retry
loop before SQLite's busy timeout expires.
`Exqlite.Sqlite3.interrupt/1` remains available for SQLite interruption.
`Exqlite.Sqlite3.cancel/1` is the stronger teardown-oriented API because
it sets Exqlite's cancellation flag for the busy/progress handlers and
also calls `sqlite3_interrupt()`.
## Implementation notes
The busy handler uses SQLite's default delay schedule: `1, 2, 5, 10, 15,
20, 25, 25, 25, 50, 50` ms, then `50` ms for later retries. Cancellation
is observed between sleep intervals, so a busy wait exits after the
current sleep finishes, with later sleeps capped at 50 ms.
The NIF stores the calling environment and pid while SQLite operations
are running so the busy handler can detect when the caller process has
died. If the caller is gone, the busy handler sets the cancellation flag
and stops retrying.
`cancel/1` and `interrupt/1` avoid taking `conn->mutex` while a query
may be running. A running SQLite call holds that mutex, so taking it in
the cancellation path would block behind the operation we are trying to
cancel. The timeout/progress configuration setters still take
`conn->mutex` because they are not used to break a currently running
SQLite call.
The NIF resource destructor and close path coordinate through
`conn->mutex` and `interrupt_mutex` so connection teardown does not race
with cancellation state updates or a concurrent interrupt.
The Mix project now tracks the C source files as external resources, and
the Makefile emits dependency files for C headers, so C/NIF changes are
rebuilt more reliably.
Partially-Addresses: #1921 parent dcd9f7c commit 3e45199
10 files changed
Lines changed: 1539 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
142 | 145 | | |
143 | 146 | | |
144 | 147 | | |
| |||
152 | 155 | | |
153 | 156 | | |
154 | 157 | | |
155 | | - | |
| 158 | + | |
156 | 159 | | |
157 | 160 | | |
158 | 161 | | |
| |||
0 commit comments