OpenSIPS version you are running
version: opensips 4.0.0-rc1 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, F_PARALLEL_MALLOC, DBG_MALLOC, CC_O0, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: b36210316
main.c compiled on with gcc 13
Describe the bug
After upgrading from OpenSIPS 3.6.6 to 4.0, we observe excessive growth of outgoing HEP/TCP connections towards a single HEP collector. These connections do not seem to get closed. The exact same environment and call scenario does not produce this on OpenSIPS 3.6.6.
Environment:
- OpenSIPS 4.0
- tracer + proto_hep
- Single HEP destination (10.0.19.5:9061;transport=tcp;version=3)
- Heplify/Homer collector
- A SIPp based primitive "loadtest" scenario with max 100 concurrent calls, but for like 5000 overall.
Relevant config:
tcp_workers = 8
tcp_max_connections = 16384
tcp_connection_lifetime = 30
tcp_keepalive = 1
tcp_keepidle = 30
tcp_keepinterval = 10
tcp_keepcount = 5
tcp_max_msg_time = 4
tcp_socket_backlog = 1024
tcp_threshold = 60000
modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_async_local_write_timeout", 200)
modparam("proto_tcp", "tcp_async_max_postponed_chunks", 64)
modparam("proto_hep", "hep_id", CFG_HEP_ID)
modparam("proto_hep", "hep_capture_id", CFG_HEP_CAPTURE_ID)
modparam("proto_hep", "hep_async", 1)
modparam("proto_hep", "hep_async_local_write_timeout", 10)
modparam("proto_hep", "hep_async_max_postponed_chunks", 8)
modparam("proto_hep", "hep_max_retries", 5)
modparam("proto_hep", "hep_retry_cooldown", 60)
modparam("tracer", "trace_id", CFG_TRACER_HEP)
modparam("tracer", "trace_on", 1)
modparam("proto_hep", "hep_id", CFG_HEP_PRE_ID)
Example observations:
- At around the first ~100 concurrent calls, already ~400 ESTABLISHED TCP connections to the HEP collector
- Continued growth until the tcp_max_connections limit, with effectively all from HEP
- Connection count then plateaus and remains in this state long after all test calls were finished
Analysing the TCP connections at this stage shows:
$ opensips-cli -x mi tcp:list > /tmp/tcp.list
$ grep -c '10.0.19.5:9061' /tmp/tcp.list
16381
Once the connection table is exhausted, OpenSIPS starts logging:
ERROR:core:tcpconn_new: maximum number of connections exceeded
ERROR:proto_hep:hep_tcp_or_tls_send: async TCP connect failed
ERROR:proto_hep:send_hep_message: Cannot send HEP message!
and eventually:
ERROR:proto_hep:send_hep_message: HEP send suppressed: too many failures
Expected behavior
Significantly less TCP (HEP) connections per call, following the call lifecycle, and not accumulating.
OS/environment information
- Operating System: Ubuntu Server 24.04 LTS in Azure VM
- OpenSIPS installation: package from apt.opensips.org
- We run our own voip software (a CCaaS, using Asterisk as a SIPUA) that OpenSIPS proxies all calls to.
OpenSIPS version you are running
Describe the bug
After upgrading from OpenSIPS 3.6.6 to 4.0, we observe excessive growth of outgoing HEP/TCP connections towards a single HEP collector. These connections do not seem to get closed. The exact same environment and call scenario does not produce this on OpenSIPS 3.6.6.
Environment:
Relevant config:
Example observations:
Analysing the TCP connections at this stage shows:
Once the connection table is exhausted, OpenSIPS starts logging:
and eventually:
Expected behavior
Significantly less TCP (HEP) connections per call, following the call lifecycle, and not accumulating.
OS/environment information