Skip to content

Fix DNS loop from seeding defaultNetwork with own VPN#61

Open
Leadaxe wants to merge 1 commit into
SagerNet:devfrom
Leadaxe:fix/default-network-vpn-seed
Open

Fix DNS loop from seeding defaultNetwork with own VPN#61
Leadaxe wants to merge 1 commit into
SagerNet:devfrom
Leadaxe:fix/default-network-vpn-seed

Conversation

@Leadaxe

@Leadaxe Leadaxe commented Jun 13, 2026

Copy link
Copy Markdown

DefaultNetworkMonitor.start() seeds defaultNetwork from getActiveNetwork() without filtering the VPN transport (L19-L20).

The seed can be the app's own VPN

getActiveNetwork() returns the per-app default network. Per the registerDefaultNetworkCallback() docs it "may be a physical network or a virtual network, such as a VPN that applies to the application", and getActiveNetwork() returns the same value. So if the tun is already up when start() runs, the seed is sing-box's own VPN Network.

How that loops DNS

LocalResolver resolves via DnsResolver.rawQuery(network = defaultNetwork, ...). When defaultNetwork is the VPN:

  1. The query is bound to the VPN network, so it enters the tun.
  2. auto_route hijacks all traffic through the tun, DNS included, and hands the query to sing-box.
  3. sing-box routes it back to LocalResolver.
  4. LocalResolver queries on network = VPN again → back to step 1.

The query never reaches a real upstream resolver — apps under the VPN fail to resolve names.

Why the seed, and not the request

defaultNetwork has two writers:

  • The async callback from DefaultNetworkListener. Its NetworkRequest carries NET_CAPABILITY_NOT_VPN (part of NetworkCapabilities.DEFAULT_CAPABILITIES), so this value is already VPN-free.
  • The synchronous seed in start() — a direct getActiveNetwork() getter. The request's NOT_VPN capability does not apply to it.

The seed is read synchronously and is what the first resolutions use. The callback overwrites it with the NOT_VPN-filtered network, but only after the listener fires — so it's a race. On ROMs where the tun is already up at start(), the seed is the VPN for the window before the callback corrects it. Filtering the request capability therefore can't fix this; the seed itself has to be filtered.

ROM dependence

Whether the seed catches the VPN depends on the ordering of "tun came up" vs "seed was read" at start, which varies by device timing:

  • Reproduces: MIUI / Android 13 — split-tunnel allow-list; allowed apps get no DNS until the VPN app itself is added to the allow-list.
  • Does not reproduce: ColorOS / Android 15, AOSP / Android 16 — the seed lands on the physical network.

The fix

takeUnless(::isVpn) on the seed, so it is never the app's own tun. The callback path is unchanged; on devices where the seed was already an underlying network, behavior is unchanged.

The same unfiltered read exists on the API < 23 DefaultNetworkListener.get() fallback (only reachable in the otherLegacy flavor). Left untouched to keep the diff minimal; can extend if preferred.

Found in LxBox, a downstream client built on SFA.

Related: sing-box#3637, sing-box#2643.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6ff502a12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// back through the tun (auto_route) and loop. The NetworkRequest carries
// NET_CAPABILITY_NOT_VPN, but that does not apply to this direct getter,
// so filter the VPN transport out explicitly.
Application.connectivity.activeNetwork?.takeUnless(::isVpn)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for a non-VPN network instead of publishing null

When activeNetwork is the VPN and the NOT_VPN listener has not delivered the underlying network yet (the startup race this change is targeting), this expression stores null and start() returns. LocalResolver.exchange()/lookup() read DefaultNetworkMonitor.defaultNetwork directly and throw "missing default interface" rather than calling require(), so the first DNS requests during this window fail instead of waiting for the listener-provided physical network.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — this is a real edge, but it's a pre-existing one that this PR deliberately doesn't widen, and I'd argue it's out of scope here. I traced it end-to-end in the downstream client as a separate task — LxBox §122 — missing default interface: LocalResolver on null defaultNetwork (won't-fix, with the full reasoning). Summary:

1. This is not a regression introduced by the PR. defaultNetwork already goes null on every onLost (DefaultNetworkListener sends Msg.Lost → network = null), and that path predates this change. takeUnless(::isVpn) adds exactly one new way to reach null: a VPN seed at startup, in the narrow window before the NOT_VPN callback lands. The net effect of the PR is guaranteed DNS looprare fast fail in a startup race — strictly better. The recurring null source is onLost, which the seed filter neither caused nor touches.

2. The window is genuinely narrow. Between the seed and the first DNS query there's setMemoryLimit + startOrReloadService (config parse + tun bring-up, tens-to-hundreds of ms), while the OS typically delivers onAvailable in single-to-low-double-digit ms — so the callback usually wins and null never reaches the resolver. No guarantee, but it's not the common path.

3. error("missing default interface") is not a crash. LocalDNSTransport.exchange/lookup are declared throws Exception in the .aar; the Go wrapper (experimental/libbox/dns.go) propagates it as a failed exchange to the DNS engine, which then fails over / SERVFAILs per the config's DNS rules. It's a handled failure path, not a hang and not a panic. (No silent retry in the wrapper either — so I'm not claiming "it just retries and it's fine"; I'm claiming the engine sees a normal failed exchange.)

4. Waiting (require()) costs more than it buys here. Upstream sing-box-for-android has require(), but doing the wait properly means moving the network read inside runBlocking in both exchange()/lookup() (suspend reshuffle), plus a mandatory timeout with an onCancel — otherwise a never-arriving network hangs the Go DNS thread indefinitely. That's real surface for a window nobody has observed firing in the field. For a fix whose whole point is to be minimal and root-cause-scoped, I'd rather not bundle that in.

So I'd keep this PR limited to the seed filter and leave the nullrequire() question as its own change. Happy to send the require() follow-up separately if you'd prefer it covered — the §122 doc has a ready concept (with the timeout fork already decided) for exactly that. Same applies to the API < 23 DefaultNetworkListener.get() fallback I mentioned in the description.

Leadaxe added a commit to Leadaxe/LxBox that referenced this pull request Jun 13, 2026
§120 — затрекали баг §119 (DefaultNetworkMonitor сидит defaultNetwork из
getActiveNetwork() без VPN-фильтра → DNS-loop) публично:
- upstream PR SagerNet/sing-box-for-android#61 (base dev, +14/−1, takeUnless(::isVpn))
- локальный issue #10 (closed/resolved §119)

Issue в upstream невозможен (issues/discussions у SFA отключены; форма sing-box
требует TUN-free CLI-репро). Полный текст репорта — в таске.

§119 — поправлена атрибуция цитаты: «…VPN that applies to the application»
дословно из javadoc registerDefaultNetworkCallback(), не из гайда Read network state.
start() seeds defaultNetwork from getActiveNetwork(), which returns the
per-app default and may be the app's own VPN when the tun is already up.
LocalResolver then queries DNS through the tun (auto_route) and loops, so
apps under the VPN fail to resolve names.

The async DefaultNetworkListener callback overwrites the seed with the
NOT_VPN-filtered network shortly after, but the synchronous seed is used
for the first resolutions; on ROMs where the tun is up at start() it is the
VPN until the callback corrects it. The request's NOT_VPN capability does
not apply to the direct getActiveNetwork() getter, so filter the VPN
transport from the seed explicitly.
@Leadaxe Leadaxe force-pushed the fix/default-network-vpn-seed branch from b6ff502 to 53eca0c Compare June 13, 2026 22:06
@Leadaxe Leadaxe changed the title Filter VPN transport from the getActiveNetwork() seed in DefaultNetworkMonitor Fix DNS loop from seeding defaultNetwork with own VPN Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant