Skip to content

Resolve the issue of incomplete transmission of large SIP message over TLS protocol#3654

Merged
bogdan-iancu merged 2 commits into
OpenSIPS:masterfrom
36952362:master
Jun 19, 2025
Merged

Resolve the issue of incomplete transmission of large SIP message over TLS protocol#3654
bogdan-iancu merged 2 commits into
OpenSIPS:masterfrom
36952362:master

Conversation

@36952362
Copy link
Copy Markdown
Contributor

@36952362 36952362 commented May 26, 2025

Summary
This PR fixes the issue of incomplete transmission of large SIP message over TLS

Details
When OpenSIPS communicates with a remote UA over the TLS protocol, if the size of the SIP message large than 16,384 bytes, OpenSIPS fails to send the entire SIP message.

if the DEBUG log level (log_level=4) is enabled, the logs show that OpenSIPS sends only 16,384 bytes in the first write attempt. The remaining data is stored in the TCP connection’s tcp_async_chunk. However, OpenSIPS does not release the tcp connection back to the TCP main process, which would normally add it back to the reactor to listen again for the write event.

---log snippet---

[26] DBG:proto_tls:proto_tls_send: sending via fd 4... 
[26] DBG:tls_openssl:openssl_tls_update_fd: New fd is 4 
[26] DBG:tls_openssl:openssl_tls_write: write was successful (16384 bytes)
[26] ERROR:proto_tls:tls_write_on_socket: buff len:20430, write len:16384, 0x7fa82fc9c450
[26] DBG:proto_tls:proto_tls_send: after write: c=0x7fa82fc9c450 n=0 fd=4

Solution
If the SIP message cannot be fully sent in the first attempt, the tcp_connection must be released back to the TCP main process so that it can be added to the reactor to listen for the write event again.

---log snippet---

//First write attempt. which [27] is one process of TCP workers
[27] DBG:proto_tls:proto_tls_send: sending via fd 139..
[27] DBG:tls_openssl:openssl_tls_write: write was successful (16384 bytes)
[27] ERROR:proto_tls:tls_write_on_socket: buff len:20429, write len:16384, 0x7f4041a3c450
[27] DBG:proto_tls:proto_tls_send: after write: c=0x7f4041a3c450 n=0 fd=139


//Release tcp connection to tcp main process with state: ASYNC_WRITE_GENW(5).
[27] DBG:core:tcpconn_release:  releasing con 0x7f4041a3c450, state 5, fd=4, id=122825929
[27] DBG:core:tcpconn_release:  extra_data 0x7f4041a43248


//TCP main process added the fd to reactor with write. which [31]: the TCP main process
[31] DBG:core:handle_worker: read response= 7f4041a3c450, 5, fd -1 from 14 (27)
[31] DBG:core:io_watch_add: [TCP_main] io_watch_add op (122 on 4) (0xa37720, 122, 19, 0x7f4041a3c450,2), fd_no=25/1048576

//TCP main process received write event and dispatcher to [27] worker process
[31] DBG:core:handle_tcpconn_ev: connection 0x7f4041a3c450 fd 122 is now writable
[31] DBG:core:send2worker: to tcp worker 1 (0/14) load 0, 0x7f4041a3c450/122 rw 2

//Second write remain data
[27] DBG:core:handle_io: We have received conn 0x7f4041a3c450 with rw 2 on fd 139
[27] DBG:core:handle_io: Received con for async write 0x7f4041a3c450 ref = 2
[27] DBG:tls_openssl:openssl_tls_update_fd: New fd is 139
[27] ERROR:proto_tls:tls_async_write: Trying to send 4045 bytes from chunk 0x7f4041a618a8 in conn 0x7f4041a3c450 - 34 34
[27] DBG:tls_openssl:openssl_tls_write: write was successful (4045 bytes)

Compatibility
This is a change-specific and isolated in proto_tls module - it's 100% retro-compatible with the other versions.

Closing issues
N/A

@36952362 36952362 changed the title Resolve the issue of incomplete transmission of large SIP messages over TLS protocol Resolve the issue of incomplete transmission of large SIP message over TLS protocol May 26, 2025
@bogdan-iancu
Copy link
Copy Markdown
Member

Hi @36952362 , you have a fair point here. Taking this a step further, maybe the test of ASYNC (to have the fd added for WRITE in reactor) should be mask and automatically done in the tcp_conn_release() (instead of having the second param)

@bogdan-iancu bogdan-iancu self-assigned this May 30, 2025
@36952362
Copy link
Copy Markdown
Contributor Author

36952362 commented Jun 3, 2025

Hi @bogdan-iancu
Thank you for your comments and suggestions. I’ve considered your proposal as well, but there are a few reasons why I didn’t make that change:

  1. tcp_conn_release() is used by many modules, it's not feasible to perform thorough testing for all of them.

  2. I also noticed that several other modules use similar code blocks with different values. For example, in proto_bin(s) and proto_tcp, the second parameter is set to 0 when the TCP connection hasn’t been fully established yet, while in proto_tls, the same logic sets the second parameter to 1. It’s possible that the logic in proto_tls is the correct approach.

----proto-tcp----

LM_DBG("Successfully started async connection \n");
sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt);
tcp_conn_release(c, 0);
return len;

----proto-tls----

rlen = len;
LM_DBG("Successfully started async connection \n");
goto con_release;
...
con_release:
	sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt);
	close(fd);
	tcp_conn_release(c, (rlen < 0)?0:1);
	return rlen;

Comment thread modules/proto_tls/proto_tls.c Outdated
send_sock->last_real_ports->remote = c->rcv.src_port;

tcp_conn_release(c, 0);
tcp_conn_release(c, c->async && c->async->pending?1:0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's keep it local, here. Nevertheless, to avoid any clarity or synchronization issue (btw, the access to ->async->pending should be under lock), I would go for the more simpler approach from the proto_tcp, with the test on the amount of written data. Could you :

Copy link
Copy Markdown
Contributor Author

@36952362 36952362 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advising.

#rework the test here
sure, I retested after re-submit(add lock protection)
#remove the tcp_connections_no fix
Sorry, It was my mistake, I used the same branch for the two PRs. I removed it

@bogdan-iancu
Copy link
Copy Markdown
Member

@36952362, maybe I failed with my explanations :). Yes, using the ->async fields requires locking, but this is not the best way to go. I was rather suggesting to use the same kind of testing as in the proto_tcp module (which requires no locking):

tcp_conn_release(c, (n<len)?1:0/*pending data in async mode?*/ );

by comparing the overall len with the actually written data n .
As the fix is simple here, let me know if you want to update the PR or I should simply push a fix for that line.

@36952362
Copy link
Copy Markdown
Contributor Author

@bogdan-iancu , your suggestion was very helpful. I was misled by the line rlen = len;. After reviewing the code logic again, your approach is indeed simpler and more straightforward. PR updated

@bogdan-iancu bogdan-iancu merged commit 0c54ee7 into OpenSIPS:master Jun 19, 2025
84 of 86 checks passed
bogdan-iancu added a commit that referenced this pull request Jun 19, 2025
Resolve the issue of incomplete transmission of large SIP message over TLS protocol

(cherry picked from commit 0c54ee7)
bogdan-iancu added a commit that referenced this pull request Jun 19, 2025
Resolve the issue of incomplete transmission of large SIP message over TLS protocol

(cherry picked from commit 0c54ee7)
bogdan-iancu added a commit that referenced this pull request Jun 19, 2025
Resolve the issue of incomplete transmission of large SIP message over TLS protocol

(cherry picked from commit 0c54ee7)
@bogdan-iancu
Copy link
Copy Markdown
Member

Thank you @36952362 , PR merged and backported to all 3.4+ versions

NormB pushed a commit to NormB/opensips that referenced this pull request May 9, 2026
Resolve the issue of incomplete transmission of large SIP message over TLS protocol
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants