Skip to content

[BUG] Increased HEP/TCP connections in 4.0 #3909

@pekkaar

Description

@pekkaar

OpenSIPS version you are running

version: opensips 4.0.0-rc1 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, F_PARALLEL_MALLOC, DBG_MALLOC, CC_O0, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: b36210316
main.c compiled on  with gcc 13

Describe the bug
After upgrading from OpenSIPS 3.6.6 to 4.0, we observe excessive growth of outgoing HEP/TCP connections towards a single HEP collector. These connections do not seem to get closed. The exact same environment and call scenario does not produce this on OpenSIPS 3.6.6.

Environment:

  • OpenSIPS 4.0
  • tracer + proto_hep
  • Single HEP destination (10.0.19.5:9061;transport=tcp;version=3)
  • Heplify/Homer collector
  • A SIPp based primitive "loadtest" scenario with max 100 concurrent calls, but for like 5000 overall.

Relevant config:

tcp_workers = 8
tcp_max_connections  = 16384
tcp_connection_lifetime = 30
tcp_keepalive = 1
tcp_keepidle = 30
tcp_keepinterval = 10
tcp_keepcount = 5
tcp_max_msg_time = 4
tcp_socket_backlog = 1024
tcp_threshold = 60000

modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_async_local_write_timeout", 200)
modparam("proto_tcp", "tcp_async_max_postponed_chunks", 64)

modparam("proto_hep", "hep_id", CFG_HEP_ID)
modparam("proto_hep", "hep_capture_id", CFG_HEP_CAPTURE_ID)
modparam("proto_hep", "hep_async", 1)
modparam("proto_hep", "hep_async_local_write_timeout", 10)
modparam("proto_hep", "hep_async_max_postponed_chunks", 8)
modparam("proto_hep", "hep_max_retries", 5)
modparam("proto_hep", "hep_retry_cooldown", 60)

modparam("tracer", "trace_id", CFG_TRACER_HEP)
modparam("tracer", "trace_on", 1)
modparam("proto_hep", "hep_id", CFG_HEP_PRE_ID)

Example observations:

  • At around the first ~100 concurrent calls, already ~400 ESTABLISHED TCP connections to the HEP collector
  • Continued growth until the tcp_max_connections limit, with effectively all from HEP
  • Connection count then plateaus and remains in this state long after all test calls were finished

Analysing the TCP connections at this stage shows:

$ opensips-cli -x mi tcp:list > /tmp/tcp.list
$ grep -c '10.0.19.5:9061' /tmp/tcp.list
16381

Once the connection table is exhausted, OpenSIPS starts logging:

ERROR:core:tcpconn_new: maximum number of connections exceeded
ERROR:proto_hep:hep_tcp_or_tls_send: async TCP connect failed
ERROR:proto_hep:send_hep_message: Cannot send HEP message!

and eventually:

ERROR:proto_hep:send_hep_message: HEP send suppressed: too many failures

Expected behavior
Significantly less TCP (HEP) connections per call, following the call lifecycle, and not accumulating.

OS/environment information

  • Operating System: Ubuntu Server 24.04 LTS in Azure VM
  • OpenSIPS installation: package from apt.opensips.org
  • We run our own voip software (a CCaaS, using Asterisk as a SIPUA) that OpenSIPS proxies all calls to.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions