Skip to content

[CRASH] OpenSIPS - tls_mgm: use-after-free / double-free crash on shutdown with an open DB-backed TLS connection #3905

@voicelandglobal

Description

@voicelandglobal

OpenSIPS version you are running

This affects all Opensips Versions and I have tested them all from 3.4 to Master and they all Crash.

root@ip-10-200-0-118:~# /usr/local/opensips/sbin/opensips -V
version: opensips 3.4.18 (aarch64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, USE_POSIX_SEM
MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 97cae9c01
main.c compiled on 19:22:11 May 31 2026 with gcc 12

Linux ip-10-200-0-118.eu-central-1.compute.internal 6.1.158-180.294.amzn2023.aarch64 #1 SMP Mon Dec  1 05:36:18 UTC 2025 aarch64 GNU/Linux

Crash Core Dump
https://pastebin.com/pmFFN23W

Describe the traffic that generated the bug
There is no special SIP message — this is a shutdown teardown bug, not a packet-handling bug. The only traffic condition needed is that a TLS connection matched against a database-loaded TLS domain (tls_mgm with db_url) is still established (or lingering in the TCP connection table) at the moment OpenSIPS receives its shutdown signal (SIGTERM / MI shutdown / systemctl restart).

In production this happens on any graceful restart while a TLS peer (SIP trunk/UAC/UAS over TLS) holds a connection open — a normal TLS call or registration is enough to leave a warm connection in the table. The crash fires during the shutdown sweep, not during call processing. Script-defined (non-DB) TLS domains do not trigger it, because tls_release_domain() early-returns unless the domain has DOM_FLAG_DB.

To Reproduce

  1. Build OpenSIPS with Q_MALLOC_DBG (opensips -a Q_MALLOC_DBG ...) so the corruption surfaces as a deterministic abort instead of silent corruption.
  2. Configure tls_mgm to load TLS domains from the database (modparam("tls_mgm","db_url",...)) and have ≥1 server (or client) TLS domain row → the domain carries DOM_FLAG_DB and is reference-counted.
  3. Configure a proto_tls listener (socket = tls:...).
  4. Start OpenSIPS and open a TLS connection that matches the DB domain (e.g. a TLS OPTIONS/REGISTER from a TLS UAC), and keep it open.
  5. Shut OpenSIPS down while that connection is still in the table (kill -TERM, opensips-cli -x mi shutdown, or service restart).
  6. OpenSIPS aborts in cleanup() during tcp_destroy().

OS/environment information

  • Operating System: Amazon Linux 2023 - Linux version 6.1.158-180.294.amzn2023.aarch64 (mockbuild@ip-10-0-60-185) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.5) Dummy ticket #1 SMP Mon Dec 1 05:36:18 UTC 2025
  • OpenSIPS installation: 3.4 git

Attached patches are all fixing the issue and I have verified that on all OpenSIPS versions with local testing.

Patch descriptions
01-tls_mgm-guard.patch — tls_mgm self-protecting guard
01-tls_mgm-guard.patch
Makes tls_mgm robust to the late shutdown call. Sets a tls_mgm_destroyed flag in mod_destroy(), makes tls_release_domain() a no-op once that flag is set, and NULLs the matching-map handles after they're freed. Fully contained in tls_mgm (3 files: tls_domain.c/.h, tls_mgm.c), no core change. Adds one branch on the connection-teardown path only — never the data path — so zero TLS performance cost. Protects both proto_tls and proto_wss. Smallest blast radius; the one to submit upstream.

02-core-shutdown-order.patch — core shutdown reorder
02-core-shutdown-order.patch
Fixes the systemic root cause in core: runs udp_destroy()/tcp_destroy() before destroy_modules() in cleanup(), so connections (and their conn_clean callbacks) are torn down while module-owned state still exists. One file (shutdown.c). Safe because destroy_modules() doesn't dlclose() modules, so the callback pointers stay valid. More "correct" conceptually and fixes the ordering for all protocol modules, but touches core and so carries a wider review/audit burden.

03-both-defense-in-depth.patch — both, defense in depth
03-both-defense-in-depth.patch
The two commits above combined (core reorder + tls_mgm guard). The reorder fixes shutdown ordering for everyone; the guard keeps tls_mgm self-protecting regardless of core order. Belt-and-suspenders — best for an internal/production build. For upstream, prefer submitting 01 alone (optionally proposing 02 as a separate, clearly-scoped core commit).

I'd like to add that I see these patches only as a starting point for the discussion: the OpenSIPS team know the codebase far better than I do (or claude), and the right fix is of course OpenSIPS team to decide. If you'd rather approach this differently, I'm very glad to help — I'm happy to test any alternative solution you may have in mind for this kind of crash, at any time and on whichever branch is most useful to you.

Thank you very much for taking a look on this issue.

Best Regards
Vasilios Tzanoudakis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions