New article: nf_conntrack overflow causes intermittent UDP tracker downtime with Docker

## Summary

Write a blog post documenting a subtle but impactful kernel networking issue that causes intermittent UDP tracker downtime when running a BitTorrent tracker behind Docker bridge networking. We hit this twice — first on the original Torrust demo (DigitalOcean, 2024–2025) and again on the new Torrust Tracker Demo (Hetzner, 2026). Others have independently documented the same problem.

The post would be useful to anyone running a UDP service behind Docker NAT at non-trivial request rates.

## Background

When Docker publishes a UDP port via bridge networking, the kernel creates a **conntrack (connection tracking) entry** for every packet. Each entry persists for the UDP timeout duration (default: 120 s for bidirectional streams, 30 s for one-way). Under sustained UDP tracker load, the conntrack table fills up and the kernel starts silently dropping packets — no error in the application log, no socket buffer counter, just `nf_conntrack: table full, dropping packet` buried in `dmesg`.

The symptom is **intermittent UDP timeouts** on uptime monitors with a characteristic self-recovery pattern: the table fills → probe gets dropped → entries expire → table drains → probe succeeds → cycle repeats. This makes it look like an application or firewall bug, not a kernel resource exhaustion.

## We Hit This Twice

### First demo (torrust/torrust-demo, DigitalOcean)

- Issue: [torrust/torrust-demo#26](https://github.com/torrust/torrust-demo/issues/26)
- UDP uptime on newTrackon dropped to ~60% at peak; after fix reached 99.2%
- Kernel journal confirmed: `nf_conntrack: table full, dropping packet` with 20M+ early_drops on CPU 3
- Fix: increased `nf_conntrack_max`

### New tracker demo (torrust/torrust-tracker-demo, Hetzner)

- Issue: [torrust/torrust-tracker-demo#21](https://github.com/torrust/torrust-tracker-demo/issues/21)
- Resized server from CCX23 → CCX33 (4→8 vCPU, 16→32 GB) expecting improvement; D+1 uptime was 83.9% — *worse* than pre-resize 92.2%
- Root cause identical: `nf_conntrack_count = nf_conntrack_max = 262144` at the moment of investigation, with 2478 "table full" drop messages in dmesg
- Fix applied 2026-04-20; monitoring for recovery

### Independent documentation

The [ftorrent/open README](https://github.com/zootella/ftorrent/tree/master/open) (a guide to running Aquatic tracker in Docker) covers this problem and its solution in detail under "Kernel tuning for bridge networking".

## The Fix

Three kernel parameters in `/etc/sysctl.d/99-conntrack.conf`:

```ini
# Raise table ceiling — default 65536–262144 is too small under tracker load
net.netfilter.nf_conntrack_max = 1048576

# Reduce UDP stream timeout — default 120 s; tracker exchanges complete in ms
net.netfilter.nf_conntrack_udp_timeout_stream = 15

# Reduce one-way UDP timeout — default 30 s
net.netfilter.nf_conntrack_udp_timeout = 10
```

There is a **reboot persistence trap**: `net.netfilter.*` sysctl keys only exist after the `nf_conntrack` kernel module is loaded. Docker loads it when it starts, but systemd applies sysctl configs at boot *before* Docker runs — so the settings are silently skipped. Fix: pre-load the module via `/etc/modules-load.d/conntrack.conf`.

## Suggested Blog Post Outline

1. **The symptom** — intermittent UDP timeouts, self-recovering, not reproducible on demand
2. **Why it's hard to diagnose** — application log is silent; the evidence is in `dmesg` and `sysctl`
3. **The mechanism** — Docker DNAT + conntrack; how entries accumulate under UDP load
4. **The calculation** — `requests/s × timeout_seconds = minimum table size needed`; how default timeouts make it worse than it needs to be
5. **The fix** — the three sysctl parameters and why each matters
6. **The reboot trap** — why `sysctl.d` alone is not enough; the module pre-load requirement
7. **Monitoring** — how to verify the fix is holding: `nf_conntrack_count`, `dmesg | grep conntrack`, `conntrack -S`
8. **Lessons** — this affects any UDP service behind Docker bridge networking at sufficient request rates, not just BitTorrent trackers

## Status

- Fix applied to the new tracker demo on 2026-04-20; waiting on newTrackon data to confirm recovery
- Should update the blog post once we have confirmed "before and after" uptime numbers (ideally with a Grafana screenshot showing the recovery)

## References

- [torrust/torrust-demo#26](https://github.com/torrust/torrust-demo/issues/26) — original demo investigation
- [torrust/torrust-tracker-demo#21](https://github.com/torrust/torrust-tracker-demo/issues/21) — new demo investigation and fix
- [torrust/torrust-tracker-demo PR #22](https://github.com/torrust/torrust-tracker-demo/pull/22) — commits with the sysctl config
- [ftorrent/open README](https://github.com/zootella/ftorrent/tree/master/open) — independent documentation of the same problem


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New article: nf_conntrack overflow causes intermittent UDP tracker downtime with Docker #192

Summary

Background

We Hit This Twice

First demo (torrust/torrust-demo, DigitalOcean)

New tracker demo (torrust/torrust-tracker-demo, Hetzner)

Independent documentation

The Fix

Suggested Blog Post Outline

Status

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

New article: nf_conntrack overflow causes intermittent UDP tracker downtime with Docker #192

Description

Summary

Background

We Hit This Twice

First demo (torrust/torrust-demo, DigitalOcean)

New tracker demo (torrust/torrust-tracker-demo, Hetzner)

Independent documentation

The Fix

Suggested Blog Post Outline

Status

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions