Summary
During the daily restart window (~03:15 PDT), the :pkix and :ejabberd_pkix
GenServers consistently crash with "Couldn't get RSA public key" while
processing what appears to be an ECDSA (Let's Encrypt) certificate. On at least
one occasion, this startup sequence resulted in a process crash with status=139
(segfault in beam.smp).
The service typically recovers successfully after the restart, but the PKIX
errors occur on every restart cycle and the segfault represents a more serious
edge case that warrants upstream investigation.
Environment
| Field |
Value |
| ejabberd version |
24.12.0 |
| ERTS version |
14.2.5.4 |
| OS |
Ubuntu (Linux) |
| Certificate type |
ECDSA (Let's Encrypt, P-256) |
| pkix dep version |
1.0.10 |
| Install method |
Binary package |
Exact Errors
Error 1 — :pkix GenServer terminating
2026-05-04 03:15:08.270 [error] GenServer :pkix terminating
** (stop) {:badarg, {"pkey.c", 435}, "Couldn't get RSA public key"}
(pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
(pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4
(pkix 1.0.10) pkix.erl:988: :pkix.validate/4
Last message (from :ejabberd_pkix):
{:commit, "/opt/ejabberd/database/ejabberd@/certs",
"/etc/ssl/certs/ca-certificates.crt", :soft,
&:ejabberd_pkix.notify_expired/1, [0, 3600, 86400, 604800]}
Error 2 — :ejabberd_pkix GenServer terminating
2026-05-04 03:15:08.344 [error] GenServer :ejabberd_pkix terminating
** (stop) exited in: :gen_server.call(:pkix, {:commit, ...}, 600000)
** (ErlangError) Erlang error: {:badarg, {"pkey.c", 435},
"Couldn't get RSA public key"}
- 5th argument: Couldn't get RSA public key
(pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
(pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4
Error 3 — Hook crash
[error] Hook :ejabberd_started crashed when running
:ejabberd_pkix::ejabberd_started/0:
** exception exit: {{{{badarg,{"pkey.c",435},
"Couldn't get RSA public key"}, ...}}}
Error 4 — Segfault (observed once during same startup sequence)
systemd: ejabberd.service: Main process exited, code=killed, status=139
Reproduction Pattern
- Occurs every restart during the scheduled ~03:15 restart window
- The
:pkix GenServer calls validate_path/3 and crashes attempting RSA
key extraction on what is an ECDSA key
:ejabberd_pkix then also crashes when its gen_server.call to :pkix
times out / errors
- Service normally recovers after the crash sequence
- The status=139 segfault was observed once during this same sequence,
suggesting a race condition or memory safety issue in the NIF layer
(pkey.c:435) when handling the type mismatch
Root Cause Hypothesis
The error string "Couldn't get RSA public key" originates from pkey.c in
the Erlang/OTP crypto NIF layer, and is raised when code attempts to extract
an RSA key from an EVP_PKEY structure that actually holds an ECDSA key.
This suggests pkix.validate_path/3 (or a function it calls) is dispatching
to an RSA-specific code path when processing an ECDSA certificate/key, rather
than branching correctly on key type.
The segfault (status=139) may be a consequence of this type mismatch propagating
into unsafe C memory access in the NIF, rather than being caught cleanly at the
Erlang level.
What Was Verified
Suggested Fix Direction
pkix.validate_path/3 should detect the public key algorithm (RSA vs ECDSA)
before attempting key extraction and dispatch accordingly, rather than assuming
RSA. A missing key-type guard before the pkey.c NIF call could explain both
the badarg crash and the occasional segfault.
Thank you.
Summary
During the daily restart window (~03:15 PDT), the
:pkixand:ejabberd_pkixGenServers consistently crash with
"Couldn't get RSA public key"whileprocessing what appears to be an ECDSA (Let's Encrypt) certificate. On at least
one occasion, this startup sequence resulted in a process crash with status=139
(segfault in beam.smp).
The service typically recovers successfully after the restart, but the PKIX
errors occur on every restart cycle and the segfault represents a more serious
edge case that warrants upstream investigation.
Environment
Exact Errors
Error 1 —
:pkixGenServer terminating2026-05-04 03:15:08.270 [error] GenServer :pkix terminating
** (stop) {:badarg, {"pkey.c", 435}, "Couldn't get RSA public key"}
(pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
(pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4
(pkix 1.0.10) pkix.erl:988: :pkix.validate/4
Last message (from :ejabberd_pkix):
{:commit, "/opt/ejabberd/database/ejabberd@/certs",
"/etc/ssl/certs/ca-certificates.crt", :soft,
&:ejabberd_pkix.notify_expired/1, [0, 3600, 86400, 604800]}
Error 2 —
:ejabberd_pkixGenServer terminating2026-05-04 03:15:08.344 [error] GenServer :ejabberd_pkix terminating
** (stop) exited in: :gen_server.call(:pkix, {:commit, ...}, 600000)
** (ErlangError) Erlang error: {:badarg, {"pkey.c", 435},
"Couldn't get RSA public key"}
(pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
(pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4
Error 3 — Hook crash
[error] Hook :ejabberd_started crashed when running
:ejabberd_pkix::ejabberd_started/0:
** exception exit: {{{{badarg,{"pkey.c",435},
"Couldn't get RSA public key"}, ...}}}
Error 4 — Segfault (observed once during same startup sequence)
systemd: ejabberd.service: Main process exited, code=killed, status=139
Reproduction Pattern
:pkixGenServer callsvalidate_path/3and crashes attempting RSAkey extraction on what is an ECDSA key
:ejabberd_pkixthen also crashes when itsgen_server.callto:pkixtimes out / errors
suggesting a race condition or memory safety issue in the NIF layer
(
pkey.c:435) when handling the type mismatchRoot Cause Hypothesis
The error string
"Couldn't get RSA public key"originates frompkey.cinthe Erlang/OTP crypto NIF layer, and is raised when code attempts to extract
an RSA key from an
EVP_PKEYstructure that actually holds an ECDSA key.This suggests
pkix.validate_path/3(or a function it calls) is dispatchingto an RSA-specific code path when processing an ECDSA certificate/key, rather
than branching correctly on key type.
The segfault (status=139) may be a consequence of this type mismatch propagating
into unsafe C memory access in the NIF, rather than being caught cleanly at the
Erlang level.
What Was Verified
Suggested Fix Direction
pkix.validate_path/3should detect the public key algorithm (RSA vs ECDSA)before attempting key extraction and dispatch accordingly, rather than assuming
RSA. A missing key-type guard before the
pkey.cNIF call could explain boththe
badargcrash and the occasional segfault.Thank you.