Skip to content

intra/tcp.go: don't create TCP endpoint before Dial() returns#149

Merged
ignoramous merged 2 commits intocelzero:n2from
Lanius-collaris:tcp-test1
Aug 17, 2025
Merged

intra/tcp.go: don't create TCP endpoint before Dial() returns#149
ignoramous merged 2 commits intocelzero:n2from
Lanius-collaris:tcp-test1

Conversation

@Lanius-collaris
Copy link
Copy Markdown
Contributor

@ignoramous
I wonder if this change breaks anything.

@ignoramous
Copy link
Copy Markdown
Contributor

dnsOverride in L190 assumes a connected gconn.

On the diff, before return err, the egressing conn, pc, must be clos()d.

Comment thread intra/tcp.go Outdated
@Lanius-collaris Lanius-collaris marked this pull request as ready for review May 15, 2025 14:56
@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

Why does firestack c646f0c close the TCP connection if the ANSWER SECTION of a DNS response is empty? ( e.g. dig +tcp github.com AAAA )

@ignoramous
Copy link
Copy Markdown
Contributor

ignoramous commented May 15, 2025

Why does firestack c646f0c close the TCP connection if the ANSWER SECTION of a DNS response is empty? ( e.g. dig +tcp github.com AAAA )

Strange. I'll take a look. Does it happen for all transport types in addition to DNS53 (DoH/ODoH, DoT, DNSCrypt)?

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

Lanius-collaris commented May 16, 2025

Why does firestack c646f0c close the TCP connection if the ANSWER SECTION of a DNS response is empty? ( e.g. dig +tcp github.com AAAA )

Strange. I'll take a look. Does it happen for all transport types in addition to DNS53 (DoH/ODoH, DoT, DNSCrypt)?

Log:

W transport.go:712: dns: fwd: Preferred is missing; using default
W goos.go:159: dns53: goosr: err(lookup github.com. on 192.168.122.1:53: no such host) / size(0)
E transport.go:847: async.go:58>>asm_amd64.s:1700>>???dns: tcp: for -1 err! tot: 1, t: 240ms, lookup github.com. on 192.168.122.1:53: no such host

Firestack called net.Resolver.LookupNetIP() instead of sending the original DNS query?

@ignoramous
Copy link
Copy Markdown
Contributor

Firestack called net.Resolver.LookupNetIP() instead of sending the original DNS query?

Default DNS isn't set (cf bootstrap.go) isn't set, and so it uses goos.go which simply delegates it to Go's built-in resolver (if in Loopback mode) or OS-preferred resolver (via libc, I imagine).

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

Firestack called net.Resolver.LookupNetIP() instead of sending the original DNS query?

Default DNS isn't set (cf bootstrap.go) isn't set, and so it uses goos.go which simply delegates it to Go's built-in resolver (if in Loopback mode) or OS-preferred resolver (via libc, I imagine).

Oh, I hardcoded TIDCSV "Preferred" in my small CLI but forgot to add a DNS transport with ID "Preferred".

@ignoramous
Copy link
Copy Markdown
Contributor

Oh, I hardcoded TIDCSV "Preferred" in my small CLI but forgot to add a DNS transport with ID "Preferred".

The firestack API isn't the greatest :(

@ignoramous
Copy link
Copy Markdown
Contributor

We have been bug bashing connectivity issues (mostly the DNS, retrier.go, and connpool.go, Proxy bits). We are almost done with the critical/big ones. I intend to merge this PR once we are done with it all.

@ignoramous ignoramous force-pushed the n2 branch 2 times, most recently from afe10b0 to 900829a Compare August 14, 2025 18:39
@ignoramous ignoramous merged commit 4d2ed74 into celzero:n2 Aug 17, 2025
7 of 9 checks passed
@ignoramous
Copy link
Copy Markdown
Contributor

I think calling CreateEndpoint late appears to cause "connection was refused" errors? I see a tonne of those too: celzero/rethink-app#2187

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

Lanius-collaris commented Aug 31, 2025

I think calling CreateEndpoint late appears to cause "connection was refused" errors? I see a tonne of those too: celzero/rethink-app#2187

The reason is firestack calls tcp.ForwarderRequest.Complete(true) ( send TCP RST ) even if Dial() return a error other than "connection refused", before merging this pull request, "established connections would get reset immediately" in a similar condition.
Oh, you said ForwarderRequest.CreateEndpoint returns "connection was refused" error, not other apps. I need some time to reproduce.

tcp.ForwarderRequest.Complete() :
https://github.com/google/gvisor/blob/376319f48048/pkg/tcpip/transport/tcp/forwarder.go#L127

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

I can reproduce "connection was refused" errors with this program.

package main

import (
	"context"
	"fmt"
	"net"
	"time"
)

func main() {
	dialer := net.Dialer{}
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond)
	defer cancel()
	conn, err := dialer.DialContext(ctx, "tcp", "65.21.79.229:443")
	if err != nil {
		fmt.Println("close connection after sending TCP SYN")
		return
	}
	defer conn.Close()
}

@ignoramous
Copy link
Copy Markdown
Contributor

Oh, you said ForwarderRequest.CreateEndpoint returns "connection was refused" error

There's also "endpoint is in invalid state" errors. Both those errors happen frequently with any app using HTTP3 (like YouTube and Instagram).

Think it happens more with UDP. It could be because, until CreateEndpoint is called, the internal routing logic isn't made aware. And so, each packet may end up being its own "ForwarderRequest"? Unsure how this works for TCP, given there's a state machine there, and so, I imagine there's little chance of it being confused by CreateEndpoint being called after a significant delay (10ms+).

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

We're working on fixing these network engine related issues that seem to happen due to a core dependency (gvisor Rethink uses, but it isn't obvious us to us why these would happen now (but didn't before)...

Hi, are you still "fighting" against "connection was refused" errors although #149 has been overridden? "connection was refused" errors are often caused by TCP RST packets or ICMP messages ( e.g. port unreachable ), APPs using Happy Eyeball algorithm cancel connection attempts frequently, you don't need to "prevent" "connection was refused" errors.

It's easy to explain why "connection was refused" errors rarely happened before, APPs need to close TCP sockets before the kernel sends ACK of SYN, but firestack calls ForwarderRequest.CreateEndpoint() ( send SYN-ACK ) immediately and the latency of a TUN device is small.

schwabe/ics-openvpn#1058
https://developer.android.com/reference/android/net/VpnService.Builder#allowBypass()

Allows all apps to bypass this VPN connection. By default, all traffic from apps is forwarded through the VPN interface and it is not possible for apps to side-step the VPN. If this method is called, apps may use methods such as ConnectivityManager.bindProcessToNetwork to instead send/receive directly over the underlying network or any other network they have permissions for.

Could you call VpnService.Builder.addDisallowedApplication( rethink app ) before calling Network.bindSocket(), and see if the EPERM errors disappear?

@ignoramous
Copy link
Copy Markdown
Contributor

RST packets or ICMP messages ( e.g. port unreachable ), APPs using Happy Eyeball algorithm cancel connection attempts frequently, you don't need to "prevent" "connection was refused" errors

Apps that hit these usually just get hung trying to connect to their servers. If HappyEyeballs, I'd expect apps to not act as if there's no Internet.

easy to explain why "connection was refused" errors rarely happened before, APPs need to close TCP sockets

Gotcha.

Could you call VpnService.Builder.addDisallowedApplication( rethink app ) before calling Network.bindSocket(), and see if the EPERM errors disappear

Network.bindSocket isn't otherwise needed as the sockets from the VPN app are automatically bound to the current active network (as designated by Android).

EPERM errors happen when Rethink, if setup by the end-user to use wifi or mobile depending on network conditions (instead of the default Android behaviour of always using wifi, which takes precedence, when active), attempts to deviate away from the default Android behaviour mentioned above.

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

RST packets or ICMP messages ( e.g. port unreachable ), APPs using Happy Eyeball algorithm cancel connection attempts frequently, you don't need to "prevent" "connection was refused" errors

Apps that hit these usually just get hung trying to connect to their servers. If HappyEyeballs, I'd expect apps to not act as if there's no Internet.

Send PCAP files and related logs to me if you don't mind.

Could you call VpnService.Builder.addDisallowedApplication( rethink app ) before calling Network.bindSocket(), and see if the EPERM errors disappear

Network.bindSocket isn't otherwise needed as the sockets from the VPN app are automatically bound to the current active network (as designated by Android).

EPERM errors happen when Rethink, if setup by the end-user to use wifi or mobile depending on network conditions (instead of the default Android behaviour of always using wifi, which takes precedence, when active), attempts to deviate away from the default Android behaviour mentioned above.

schwabe/ics-openvpn#1058 (comment)
sis-tangyuan said:

can't bindsocket with any transportType(wifi/cellular/vpn) while vpn service no allow bypass. I encounter this problem too. Have you fixed it.

@hussainmohd-a
If you have time, could you test if excluding rethink app from VPN make Network.bindSocket() usable?
https://github.com/hussainmohd-a/rethink-app/blob/v055n/app/src/main/java/com/celzero/bravedns/service/BraveVPNService.kt#L1287-L1289

@ignoramous
Copy link
Copy Markdown
Contributor

ignoramous commented Sep 29, 2025

Send PCAP files and related logs to me if you don't mind.

I wouldn't have the time until Oct 6.

The errors should readily occur with the YouTube app on v055t. Turning ON Block connections without VPN also triggers this at least once a day in Firefox and other apps (I've observed this to happen after network changes).

excluding rethink app from VPN make Network.bindSocket() usable

Network.bindSocket workable for 3p apps? If so, with VPN enabled, this wouldn't / shouldn't work without VpnBuilder.allowBypass, which Rethink (F-Droid and GitHub/Website flavours) can be instructed to execute if Configure -> Network -> Enable network visibility is turned ON.

excluding rethink app from VPN make

Rethink, in its default setting, already excludes itself? Only if Configure -> Network -> Loopback is turned ON, it doesn't.

@Lanius-collaris
Copy link
Copy Markdown
Contributor Author

excluding rethink app from VPN make

Rethink, in its default setting, already excludes itself? Only if Configure -> Network -> Loopback is turned ON, it doesn't.

celzero/rethink-app#2187
I don't know if saber716rus turned on Configure -> Network -> Loopback , but saber716rus wrote:

1756547790027,E RethinkDnsVpn: err bindToNw(nw: 1163046866957, netid: 270, fid: 262, Binding socket to network 270 failed: EPERM (Operation not permitted)

@ignoramous
Copy link
Copy Markdown
Contributor

ignoramous commented Sep 30, 2025

don't know if saber716rus turned on Configure -> Network -> Loopback , but saber716rus wrote:

1756547790027,E RethinkDnsVpn: err bindToNw(nw: 1163046866957,

Gotcha. Either they are are using Configure -> Network -> Use all available networks (which make Rethink deviate away from the default Android behaviour) or using Configure -> DNS -> System DNS (whose IP is on a specific Network and so the socket must be bound to that specific Network). These errors are not expected in the steady case but may happen when Rethink is trying to bind to once "available" Network that has now been "lost".

Not a critical issue (assuming things "auto recover"), as the DNS IP updates (notified to Rethink as network change events by Android) happen asynchronously elsewhere and all of the code will eventually know what those are (~3s) and where to bind the sockets for them.

@ignoramous
Copy link
Copy Markdown
Contributor

RST packets or ICMP messages ( e.g. port unreachable ), APPs using Happy Eyeball algorithm cancel connection attempts frequently, you don't need to "prevent" "connection was refused" errors

Apps that hit these usually just get hung trying to connect to their servers. If HappyEyeballs, I'd expect apps to not act as if there's no Internet.

Send PCAP files and related logs to me if you don't mind.

We are bringing back this patch behind a setting in the upcoming Rethink version v055u: Configure -> Network -> Choose IP version -> IPv4 & IPv6 (which will make Rethink's tun device always appear as dual-stack). 4b3dc4c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants