Fix #189: process worker fork-safety on macOS Tahoe (+ psycopg3)#207
Open
brentrager wants to merge 5 commits into
Open
Fix #189: process worker fork-safety on macOS Tahoe (+ psycopg3)#207brentrager wants to merge 5 commits into
brentrager wants to merge 5 commits into
Conversation
…r stderr Two changes for the macOS Tahoe segfault reported in issue Maxteabag#189: 1. process_worker_client.py: drop the spawn -> fork fallback when running on darwin. macOS Tahoe (26+) widened the set of system libraries that abort if initialized after fork() — CoreFoundation, Security.framework, libsystem_trace, libsystem_info — so any library the parent has loaded (psycopg, getaddrinfo, setproctitle, NSCharacterSet) can crash the forked child. Letting spawn fail and surfacing it to the caller lets the in-process fallback in query_execution take over instead. Cost: query cancellation isn't available when spawn fails. Gain: no hard crashes ("Worker connection closed.") on macOS. 2. process_worker.py: redirect the worker's stdout/stderr to a log file and enable faulthandler at entry. The worker can SIGPIPE on libpq notice writes when its inherited FDs point to a Textual-managed pipe, and any future C-level fault on the worker side is otherwise invisible to the parent (the pipe just dies). The log path honors $SQLIT_WORKER_LOG and defaults to <tmpdir>/sqlit-worker.log. Refs the celery (Tahoe + setproctitle, celery#9894) and gunicorn (Tahoe + NSCharacterSet, gunicorn#3526) reports for the broader fork- safety regression on Tahoe.
Companion to the Maxteabag#189 fork-fallback fix. psycopg3 delegates TLS to libpq instead of bundling libssl, so importing it doesn't pull Security.framework into the parent process on macOS — eliminating one specific child-side crash class (psycopg2 calling SSL_CTX_new inside the libpq init path after fork). The change is local to the two PostgreSQL-family adapters and the matching install metadata: - postgresql/adapter.py, cockroachdb/adapter.py: import psycopg (v3) and pass dbname= instead of database=; everything else (sslmode, sslrootcert, sslcert, sslkey, sslpassword, autocommit, extra_options) is already libpq-conninfo-compatible. - install_strategy.py: add psycopg / psycopg[binary] to the Arch package-name mapping. - pyproject.toml: replace psycopg2-binary>=2.9.0 with psycopg[binary]>=3.2.0 in the postgres, cockroachdb, and all extras. - flake.nix: pyPkgs.psycopg2 -> pyPkgs.psycopg. - README.md: driver-reference table now lists psycopg[binary]. - tests + integration fixtures: import psycopg (3) and use dbname=. The install-strategy tests still use the literal string "psycopg2-binary" because they're exercising detect_strategy with a sample package name, not the real driver. 837 unit tests pass.
The first two commits made the macOS Tahoe segfault non-fatal by disabling the fork fallback, but every spawn after Textual booted still failed with `bad value(s) in fds_to_keep` — leaving the user with a "Process worker unavailable; falling back" popup and the slower in-process executor on every query. Root cause: `multiprocessing.spawn` snapshots the parent's open file descriptors at spawn time. Once Textual's `App.__init__` has registered its signal-handler pipes and event-loop FDs, several of those are not inheritable, and spawn aborts before the child even runs. The previous lazy creation in the idle scheduler always hit this race. Fix: spawn the worker in `cli.py` *before* `SSMSTUI(...)` is constructed. That's the latest moment at which the parent's FD set is still clean. The new `_prewarm_process_worker(runtime)` helper returns a live `ProcessWorkerClient`, or `None` if the worker is disabled / mocked / spawn raised — in which case we fall through to the lazy path inside the UI (with the in-process fallback on darwin). The pre-spawned client is passed into `SSMSTUI` via a new `process_worker_client` kwarg and stashed as `self._process_worker_client`. `ProcessWorkerLifecycleMixin` already returns the cached client first, so no other changes are needed. Verified on macOS Tahoe 26.4.1 / Python 3.14 / PostgreSQL over TLS: the popup no longer appears and queries run in the worker process.
Covers the load-bearing behaviors introduced by the prior commits: - _maybe_fallback_start re-raises on darwin (commit 1) - _maybe_fallback_start still re-raises non-fds_to_keep errors verbatim - _open_worker_log honors $SQLIT_WORKER_LOG and survives an unwritable path - _prewarm_process_worker respects runtime.process_worker, runtime.mock, and swallows spawn exceptions so startup never crashes (commit 3) - SSMSTUI(process_worker_client=client) stashes the client on self._process_worker_client, which is what ProcessWorkerLifecycleMixin consults first
… multi-statement
Running tests/integration/test_transactions.py against a real PostgreSQL
container with the psycopg3 adapter surfaced two driver-behavior
differences that need handling in the shared CursorBasedAdapter:
1. autocommit-aware execute_non_query
psycopg2 silently makes conn.commit() a no-op when conn.autocommit is
True; psycopg v3 actively COMMITs any open transaction. The adapter's
execute_non_query unconditionally called conn.commit(), which under
psycopg3 meant `executor.execute("BEGIN")` immediately ended the
transaction it just started — so the subsequent INSERT was committed
and ROLLBACK had nothing to undo. Skipping the commit when
conn.autocommit is True preserves the original intent on both
drivers.
2. nextset() walk in execute_query
psycopg v3 exposes each result set in a multi-statement string and
leaves the cursor positioned at the first (e.g. on BEGIN, which has
no columns). psycopg2 only ever surfaced the last set. Walking
cursor.nextset() until it returns falsy lands on the last set; the
loop is a no-op on drivers that don't implement nextset() (a number
raise NotSupportedError, hence the try/except).
The integration tests in tests/integration/test_transactions.py needed
three fixes that are independent of the driver swap — verified to fail
or hang on upstream main with psycopg2 as well:
- test_multi_statement_as_single_query_works
The "BEGIN; INSERT; SELECT" payload left an open transaction on the
executor's persistent connection; the test's subsequent DROP TABLE
(on a different connection) then blocked waiting for the lock. Added
reset_tui_executor() before the cleanup so the persistent connection
rolls back and releases the lock.
- test_atomic_execute_rolls_back_on_error
The test expected atomic_execute to raise on a failing statement.
Per its docstring and current implementation, atomic_execute reports
the failure via a MultiStatementResult with completed=False and the
failing index in error_index. Updated the assertions accordingly.
- test_atomic_execute_commits_on_success
Multi-statement atomic_execute is documented to return a
MultiStatementResult (one StatementResult per statement), not a
single QueryResult. Asserting on result.results[-1].result picks up
the trailing SELECT's QueryResult exactly as the test intended.
Result against the docker-compose postgres:16 fixture:
- tests/integration/test_transactions.py: 13 passed
- tests/integration/test_database_browsing_flow.py::TestPostgreSQLDatabaseBrowsing: 1 passed
- tests/integration/test_stale_connection_reconnect.py -k postgres: 5 passed, 1 idle-variant skipped (pre-existing)
- tests/unit: 842 passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First — thanks for sqlit! It's the cleanest of the SQL TUIs I've tried, and getting it working against our Postgres was worth digging into.
Fixes #189. macOS Tahoe (26.4+) widened the set of system libraries that abort when initialized after
fork(), which has been quietly breaking quite a few Python projects (celery, gunicorn, psycopg2 all have open issues from the last few months — links below). For sqlit, the symptom is that the process worker either hangs withWorker connection closed.or segfaults the child when expanding a folder or running a query against PG over TLS. The commits in this PR each address one layer of that and are independently reviewable.Commits
Drop the spawn → fork fallback on darwin; redirect worker stderr. When
multiprocessing.spawnfails withbad value(s) in fds_to_keep, the current code falls back tofork(). On Tahoe that's no longer safe — CoreFoundation, Security.framework, libsystem_trace'sos_log, and libsystem_info'sgetaddrinfoall now abort if they were loaded by the parent before fork. Letting spawn fail and surfacing it to the caller lets the in-process fallback take over instead. The worker also now redirects its stdout/stderr to$SQLIT_WORKER_LOG(defaults to<tmpdir>/sqlit-worker.log) and enablesfaulthandler, so future C-level faults aren't invisible behind a closed pipe — that's what let me find the segfault site in the first place.Switch PostgreSQL / CockroachDB adapter from
psycopg2topsycopgv3. psycopg3 delegates TLS to libpq instead of bundling libssl, so importing it doesn't pullSecurity.frameworkinto the parent at import time — one specific crash class gone. The diff is small:import psycopg,dbname=instead ofdatabase=, andpsycopg[binary]>=3.2.0in pyproject / flake.nix / README / install-strategy / tests + integration fixtures. The install-strategy unit tests still use the literal string"psycopg2-binary"because they're exercisingdetect_strategywith a sample package name, not the real driver.Pre-spawn the worker in
cli.pybeforeApp.__init__. This is the long-term fix from the issue addendum.multiprocessing.spawnsnapshots the parent's open FDs at spawn time, and Textual'sApp.__init__registers signal-handler pipes that aren't inheritable — that's thebad value(s) in fds_to_keepValueError. The previous lazy creation inidle_scheduler.py:_warmwas racing this. Spawning beforeSSMSTUI(...)is constructed is the latest moment at which the FD set is still clean. The pre-spawned client is plumbed in via a newprocess_worker_clientkwarg onSSMSTUI;ProcessWorkerLifecycleMixinalready returns the cached client first, so no other changes are needed. If pre-spawn fails for any reason, the lazy path still runs (and on darwin, that path now correctly falls back to in-process execution per commit 1).Unit tests for the new code paths. New
tests/unit/test_process_worker_fallback.pyandtests/unit/test_cli_prewarm.pycover the darwin re-raise, the worker-log env override, the prewarm gating onruntime.process_worker/runtime.mock, prewarm exception-swallowing, and the SSMSTUI kwarg plumbing.psycopg v3 transaction + multi-statement compatibility in CursorBasedAdapter. Running
tests/integration/test_transactions.pyagainst a real Postgres container with the psycopg3 adapter surfaced two driver-behavior differences:execute_non_querywas callingconn.commit()unconditionally. psycopg2 silently no-ops that whenautocommit=True; psycopg v3 actively COMMITs any open transaction, soexecutor.execute("BEGIN")ended its own transaction immediately and ROLLBACK had nothing to undo. Skipping the commit whengetattr(conn, "autocommit", False)preserves the intent on both drivers.execute_queryonly consulted the first result set. psycopg v3 exposes each result set in a multi-statement string and positions the cursor on the first (e.g. on BEGIN, with no columns). Walkingcursor.nextset()lands on the last set; the loop is a no-op on drivers that don't implement it (some raiseNotSupportedError, hence the try/except).The same commit fixes three test bugs that also reproduce on
Maxteabag/sqlit:mainwith psycopg2 (verified by running the upstream tree against the same docker fixture):test_multi_statement_as_single_query_worksleft an open transaction on the executor's persistent connection, then the test's own DROP TABLE on a fresh connection blocked on the lock. Added areset_tui_executor()before cleanup.test_atomic_execute_rolls_back_on_errorasserted thatatomic_executeraises on error; per its docstring and current implementation it actually returnsMultiStatementResult(completed=False, error_index=…).test_atomic_execute_commits_on_successasserted aQueryResultfor a 3-statement query;atomic_executeis documented to return aMultiStatementResultcarrying each StatementResult. Asserting onresult.results[-1].resultpulls out the trailing SELECT'sQueryResult.Result on macOS Tahoe 26.4.1 / Python 3.14 / PostgreSQL over TLS
Worker connection closed.Tests
Run against postgres:16 from
infra/docker/docker-compose.test.yml:tests/unittests/integration/test_transactions.pytests/integration/test_database_browsing_flow.py::TestPostgreSQLDatabaseBrowsingtests/integration/test_stale_connection_reconnect.py -k postgresRelated upstream regressions on Tahoe (for context — this isn't sqlit-specific)
'fork'is broken: change to `'forkserver' || 'spawn'python/cpython#84559 — CPython 3.14 makesforkserverdefault on POSIX (partial mitigation)Tradeoffs to flag
process_worker=True(the default) — there's no longer an idle-only path on the first launch. Cost: one extra Python subprocess at TUI startup. Benefit: no race, no popup, worker is hot.execute_non_query. Other CursorBasedAdapter drivers that don't setconn.autocommit=Trueare unaffected (getattr(conn, "autocommit", False)returns False and the commit still fires). The mysql/mssql/oracle/sqlite/firebird unit suites are unchanged and all green.Happy to split this if any one commit is too big to land alongside the others — commits 1 + 3 alone are sufficient to fix #189. And happy to iterate on any of the approaches if you'd prefer something different.