Skip to content

ejabberd 24.12: PKIX GenServer crashes with "Couldn't get RSA public key" on ECDSA cert during startup — occasional status=139 segfault #4576

@matilopi

Description

@matilopi

Summary

During the daily restart window (~03:15 PDT), the :pkix and :ejabberd_pkix
GenServers consistently crash with "Couldn't get RSA public key" while
processing what appears to be an ECDSA (Let's Encrypt) certificate. On at least
one occasion, this startup sequence resulted in a process crash with status=139
(segfault in beam.smp).

The service typically recovers successfully after the restart, but the PKIX
errors occur on every restart cycle and the segfault represents a more serious
edge case that warrants upstream investigation.


Environment

Field Value
ejabberd version 24.12.0
ERTS version 14.2.5.4
OS Ubuntu (Linux)
Certificate type ECDSA (Let's Encrypt, P-256)
pkix dep version 1.0.10
Install method Binary package

Exact Errors

Error 1 — :pkix GenServer terminating

2026-05-04 03:15:08.270 [error] GenServer :pkix terminating
** (stop) {:badarg, {"pkey.c", 435}, "Couldn't get RSA public key"}
(pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
(pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4
(pkix 1.0.10) pkix.erl:988: :pkix.validate/4
Last message (from :ejabberd_pkix):
{:commit, "/opt/ejabberd/database/ejabberd@/certs",
"/etc/ssl/certs/ca-certificates.crt", :soft,
&:ejabberd_pkix.notify_expired/1, [0, 3600, 86400, 604800]}

Error 2 — :ejabberd_pkix GenServer terminating

2026-05-04 03:15:08.344 [error] GenServer :ejabberd_pkix terminating
** (stop) exited in: :gen_server.call(:pkix, {:commit, ...}, 600000)
** (ErlangError) Erlang error: {:badarg, {"pkey.c", 435},
"Couldn't get RSA public key"}

  • 5th argument: Couldn't get RSA public key
    (pkix 1.0.10) pkix.erl:1004: :pkix.validate_path/3
    (pkix 1.0.10) pkix.erl:991: anonymous fn/3 in :pkix.validate/4

Error 3 — Hook crash

[error] Hook :ejabberd_started crashed when running
:ejabberd_pkix::ejabberd_started/0:
** exception exit: {{{{badarg,{"pkey.c",435},
"Couldn't get RSA public key"}, ...}}}

Error 4 — Segfault (observed once during same startup sequence)

systemd: ejabberd.service: Main process exited, code=killed, status=139


Reproduction Pattern

  • Occurs every restart during the scheduled ~03:15 restart window
  • The :pkix GenServer calls validate_path/3 and crashes attempting RSA
    key extraction on what is an ECDSA key
  • :ejabberd_pkix then also crashes when its gen_server.call to :pkix
    times out / errors
  • Service normally recovers after the crash sequence
  • The status=139 segfault was observed once during this same sequence,
    suggesting a race condition or memory safety issue in the NIF layer
    (pkey.c:435) when handling the type mismatch

Root Cause Hypothesis

The error string "Couldn't get RSA public key" originates from pkey.c in
the Erlang/OTP crypto NIF layer, and is raised when code attempts to extract
an RSA key from an EVP_PKEY structure that actually holds an ECDSA key.

This suggests pkix.validate_path/3 (or a function it calls) is dispatching
to an RSA-specific code path when processing an ECDSA certificate/key, rather
than branching correctly on key type.

The segfault (status=139) may be a consequence of this type mismatch propagating
into unsafe C memory access in the NIF, rather than being caught cleanly at the
Erlang level.


What Was Verified

  • Certificate and private key are valid and correctly matched
  • ejabberd operates normally outside the restart window
  • PKIX errors occur consistently on every restart (not intermittent)
  • No matching open issue found in processone/ejabberd or processone/pkix

Suggested Fix Direction

pkix.validate_path/3 should detect the public key algorithm (RSA vs ECDSA)
before attempting key extraction and dispatch accordingly, rather than assuming
RSA. A missing key-type guard before the pkey.c NIF call could explain both
the badarg crash and the occasional segfault.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions