Before Creating the Bug Report
Runtime platform environment
Linux (observed on CentOS 7 / Ubuntu 22.04)
RocketMQ version
develop (also affects 5.x releases with TLS hot-reload enabled)
JDK Version
JDK 8 / JDK 11, using netty-tcnative (OpenSSL provider)
Describe the Bug
When TLS certificates are dynamically reloaded via TlsCertificateManager (file-watch triggered), a new SslContext is created but the old one is never explicitly released. Since netty-tcnative's OpenSslContext is reference-counted and allocates native (off-heap) memory for the certificate chain, private key, and SSL session cache, simply dereferencing the old context does not free native memory — it relies on GC finalization which may never run under low heap pressure.
This causes native memory (RSS) to grow monotonically with each certificate rotation cycle. In long-running Proxy/Broker deployments with frequent cert rotations (e.g., short-lived certificates rotated every few hours), this eventually leads to OOM kills.
Steps to Reproduce
- Enable TLS with OpenSSL provider (
tls.provider=OPENSSL) on Broker or Proxy
- Configure certificate hot-reload (
tlsCertWatchIntervalMs)
- Repeatedly replace the certificate files to trigger reload cycles
- Monitor native memory (RSS or
jcmd VM.native_memory) — it grows on each reload and never reclaims
What Did You Expect to See?
Native memory should remain stable after certificate rotation. The old SslContext should be released promptly when replaced.
What Did You See Instead?
Native memory grows ~200KB–1MB per rotation cycle (depending on cert chain length and session cache size) and is never reclaimed until process restart.
Additional Context
The fix should call ReferenceCountUtil.release(oldSslContext) after the new context is installed. Care is needed to defer release until in-flight channels using the old context have closed, or use ReferenceCountUtil.safeRelease() with proper draining logic.
Related: #10302 (SNI multi-domain support) introduces more SslContext instances per domain, making this leak more severe if not addressed.
Before Creating the Bug Report
I found a bug, not just asking a question, which should be created in GitHub Discussions.
I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.
I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.
Runtime platform environment
Linux (observed on CentOS 7 / Ubuntu 22.04)
RocketMQ version
develop (also affects 5.x releases with TLS hot-reload enabled)
JDK Version
JDK 8 / JDK 11, using netty-tcnative (OpenSSL provider)
Describe the Bug
When TLS certificates are dynamically reloaded via
TlsCertificateManager(file-watch triggered), a newSslContextis created but the old one is never explicitly released. Since netty-tcnative'sOpenSslContextis reference-counted and allocates native (off-heap) memory for the certificate chain, private key, and SSL session cache, simply dereferencing the old context does not free native memory — it relies on GC finalization which may never run under low heap pressure.This causes native memory (RSS) to grow monotonically with each certificate rotation cycle. In long-running Proxy/Broker deployments with frequent cert rotations (e.g., short-lived certificates rotated every few hours), this eventually leads to OOM kills.
Steps to Reproduce
tls.provider=OPENSSL) on Broker or ProxytlsCertWatchIntervalMs)jcmd VM.native_memory) — it grows on each reload and never reclaimsWhat Did You Expect to See?
Native memory should remain stable after certificate rotation. The old
SslContextshould be released promptly when replaced.What Did You See Instead?
Native memory grows ~200KB–1MB per rotation cycle (depending on cert chain length and session cache size) and is never reclaimed until process restart.
Additional Context
The fix should call
ReferenceCountUtil.release(oldSslContext)after the new context is installed. Care is needed to defer release until in-flight channels using the old context have closed, or useReferenceCountUtil.safeRelease()with proper draining logic.Related: #10302 (SNI multi-domain support) introduces more
SslContextinstances per domain, making this leak more severe if not addressed.