diff --git a/.github/skills/check-udp-conntrack/skill.md b/.github/skills/check-udp-conntrack/skill.md
new file mode 100644
index 0000000..ac262e3
--- /dev/null
+++ b/.github/skills/check-udp-conntrack/skill.md
@@ -0,0 +1,60 @@
+---
+name: check-udp-conntrack
+description: Workflow for checking whether UDP packet loss or uptime degradation may be caused by conntrack saturation on the torrust-tracker-demo server. Use when diagnosing UDP timeouts, low newTrackon uptime, packet drops, conntrack pressure, UDP receive-buffer errors, or when validating whether conntrack tuning is still healthy.
+metadata:
+  author: torrust
+  version: "1.0"
+---
+
+<!-- cspell:ignore Rcvbuf conntrack NoPorts -->
+
+# Check UDP Conntrack
+
+## Overview
+
+Use this skill to investigate whether UDP instability is caused by kernel-side
+conntrack saturation or related packet-path pressure.
+
+The canonical human-facing reference is:
+
+- `docs/udp-conntrack-runbook.md`
+
+Keep durable explanations and operational guidance in that document. This skill
+should stay focused on workflow and safe execution.
+
+## When To Use
+
+Use this skill when the user asks to:
+
+- check whether conntrack is too small
+- diagnose UDP timeouts or packet loss
+- validate that current conntrack tuning is still active
+- verify whether the server is dropping UDP packets
+- assess whether current symptoms point to conntrack saturation or something else
+
+## Workflow
+
+1. Run the host checks from `docs/udp-conntrack-runbook.md`.
+2. Summarize the results in terms of:
+   - conntrack occupancy
+   - presence or absence of `table full` events
+   - IPv4 and IPv6 UDP receive-buffer errors
+   - whether `NoPorts` counters are relevant or benign
+3. Distinguish conntrack saturation from softirq/RX steering imbalance.
+4. If the user asks to document the result, update the relevant issue evidence
+   or incident file and reference the runbook when appropriate.
+
+## Interpretation Rules
+
+- `nf_conntrack_count` near or equal to `nf_conntrack_max` means real pressure.
+- Any fresh `nf_conntrack: table full, dropping packet` message is a confirmed problem.
+- `UdpRcvbufErrors` or `Udp6RcvbufErrors` increasing during the incident means packet loss below the application layer.
+- `NoPorts` counters alone do not prove tracker loss.
+- High load average with one CPU dominated by `%soft` points to softirq concentration, not necessarily conntrack exhaustion.
+
+## Safety Constraints
+
+- Do not change sysctl values unless the user explicitly asks for a fix.
+- If applying a fix, update both runtime state and persistent files when appropriate.
+- Preserve issue-specific evidence in `docs/issues/evidence/ISSUE-<N>/`.
+- Do not present the skill as the primary source of truth; the runbook in `docs/` is the canonical explanation.
diff --git a/.github/skills/scale-up-server/skill.md b/.github/skills/scale-up-server/skill.md
new file mode 100644
index 0000000..0a3c6fc
--- /dev/null
+++ b/.github/skills/scale-up-server/skill.md
@@ -0,0 +1,206 @@
+---
+name: scale-up-server
+description: Step-by-step workflow for resizing (scaling up) the Hetzner server in the torrust-tracker-demo stack. Use when asked to resize, scale up, or upgrade the server plan. Covers pre-resize preparation, graceful shutdown, provider panel action, post-resize recovery, and evidence capture. Triggers on "resize server", "scale up", "upgrade server plan", "Hetzner resize", "change server type".
+metadata:
+  author: torrust
+  version: "1.0"
+---
+
+<!-- cspell:ignore nproc Rcvbuf snmp nstat urlencode -->
+
+# Scaling Up the Server
+
+## Overview
+
+This skill covers a **planned, live resize** of the Hetzner Cloud server:
+shut down services gracefully, resize the instance in the provider panel,
+restart services, and validate everything before re-opening to traffic.
+
+> **Important**: Resizing a Hetzner Cloud server **does not change IP addresses**.
+> Neither the public IPv4/IPv6 addresses nor any attached Floating IPs are
+> affected. DNS records and Floating IP assignments do not need updating.
+> This is standard cloud-provider behavior for in-place resizes.
+
+## Responsibilities
+
+| Step                                | Who                    |
+| ----------------------------------- | ---------------------- |
+| Capture pre-resize baseline         | AI assistant           |
+| Graceful service shutdown           | AI assistant (via SSH) |
+| Resize in Hetzner Cloud panel       | **Human operator**     |
+| Post-resize recovery and validation | AI assistant (via SSH) |
+| Document evidence and commit        | AI assistant           |
+
+---
+
+## Workflow
+
+### Step 1 — Capture pre-resize baseline
+
+Before touching the server, record the current state so there is a before/after
+reference. Save results to the issue-scoped evidence folder
+(`docs/issues/evidence/ISSUE-<N>/00-pre-resize-baseline.md`).
+
+```bash
+# Host snapshot
+ssh demotracker 'date -u; nproc; free -h; uptime; df -h'
+
+# Docker services
+ssh demotracker 'cd /opt/torrust && docker compose ps'
+
+# Prometheus request rates (5m window)
+ssh demotracker 'curl -sG "http://127.0.0.1:9090/api/v1/query" \
+  --data-urlencode "query=sum(rate(http_tracker_core_requests_received_total{server_binding_protocol=\"http\",server_binding_port=\"7070\"}[5m]))"'
+
+ssh demotracker 'curl -sG "http://127.0.0.1:9090/api/v1/query" \
+  --data-urlencode "query=sum(rate(udp_tracker_server_requests_received_total{server_binding_protocol=\"udp\",server_binding_port=\"6969\"}[5m]))"'
+
+# UDP buffer error counters
+ssh demotracker 'grep "^Udp:" /proc/net/snmp; nstat -az 2>/dev/null | grep -Ei "UdpRcvbufErrors|Udp6RcvbufErrors" || true'
+```
+
+Commit the baseline file before proceeding to shutdown.
+
+### Step 2 — Confirm readiness
+
+Before shutting down:
+
+- Baseline file is complete and committed.
+- Branch is clean and pushed.
+- Nightly backup window awareness (~03:00 UTC). Prefer resizing outside that window.
+- Operator is available to complete the Hetzner panel action promptly.
+
+### Step 3 — Graceful service shutdown (AI assistant)
+
+Run from a local terminal. Capture the full output and record it in
+`docs/issues/evidence/ISSUE-<N>/01-resize-execution.md`.
+
+```bash
+ssh demotracker 'set -e
+  echo "=== shutdown-start-utc ==="
+  date -u +%Y-%m-%dT%H:%M:%SZ
+  cd /opt/torrust
+  echo "=== docker-compose-ps-before ==="
+  docker compose ps
+  echo "=== docker-compose-down ==="
+  docker compose down
+  echo "=== docker-compose-ps-after ==="
+  docker compose ps
+  echo "=== shutdown-end-utc ==="
+  date -u +%Y-%m-%dT%H:%M:%SZ'
+```
+
+Confirm all containers are stopped and networks are removed before handing over.
+
+### Step 4 — Resize in Hetzner Cloud panel (human operator)
+
+1. Log in to [Hetzner Cloud Console](https://console.hetzner.cloud/).
+2. Navigate to the project and select the server (`torrust-tracker-demo` or similar).
+3. Go to **Rescale** (or **Server type**) tab.
+4. Select the target server type (e.g. CCX33) and confirm.
+5. Wait for the resize to complete — typically under 2 minutes.
+6. Power on the server if it does not start automatically.
+7. Notify the AI assistant when the server is reachable again.
+
+> No IP address changes are required. Floating IPs, public IPs, and private
+> network IPs all remain the same after a Hetzner in-place resize.
+
+### Step 5 — Post-resize recovery (AI assistant)
+
+Start all services and capture the new host profile:
+
+```bash
+ssh demotracker 'set -e
+  echo "=== startup-utc ==="
+  date -u +%Y-%m-%dT%H:%M:%SZ
+  echo "=== host ==="
+  nproc; free -h; uptime
+  cd /opt/torrust
+  echo "=== docker-compose-up ==="
+  docker compose up -d
+  echo "=== docker-compose-ps ==="
+  docker compose ps'
+```
+
+### Step 6 — Post-resize validation (AI assistant)
+
+Run all checks and record outputs in the execution log.
+
+```bash
+# Container health
+ssh demotracker 'cd /opt/torrust && docker compose ps'
+
+# UDP buffer counters (should be zero after fresh boot)
+ssh demotracker 'grep "^Udp:" /proc/net/snmp; nstat -az 2>/dev/null | grep -Ei "UdpRcvbufErrors|Udp6RcvbufErrors" || true'
+
+# Prometheus targets
+ssh demotracker 'curl -sG "http://127.0.0.1:9090/api/v1/query" \
+  --data-urlencode "query=up{job=\"tracker_metrics\"}"
+  curl -sG "http://127.0.0.1:9090/api/v1/query" \
+  --data-urlencode "query=up{job=\"tracker_stats\"}"'
+```
+
+External checks (from local machine):
+
+```bash
+# HTTP tracker health
+curl -fsS "https://http1.torrust-tracker-demo.com/health_check"
+
+# Grafana (302 to /login is expected)
+curl -I "https://grafana.torrust-tracker-demo.com"
+
+# UDP port probe
+nc -zvu udp1.torrust-tracker-demo.com 6969 2>&1 | head -5
+```
+
+All services must reach `healthy` status, HTTP health must return `200`,
+and Prometheus targets must show `up=1` before the resize is considered
+complete.
+
+> The tracker API health endpoint (`/health_check` on `api.torrust-tracker-demo.com`)
+> requires authentication and returns `500 unauthorized` without a token.
+> This is expected and not a failure indicator.
+
+### Step 7 — Document and commit
+
+Fill in the execution log (`01-resize-execution.md`) with all checklist items,
+the full timeline (start UTC / end UTC / total impact window), the command
+outputs, and the validation results.
+
+Run linters before committing:
+
+```bash
+./scripts/lint.sh
+```
+
+Commit with:
+
+```bash
+git commit -S -m "docs(issue-<N>): document resize execution and post-resize validation" \
+  -m "Refs: #<N>"
+```
+
+### Step 8 — Update infrastructure docs
+
+After the resize is confirmed stable:
+
+- Update the hardware table in `docs/infrastructure.md` to reflect the new
+  server type, vCPU count, RAM, storage, traffic allowance, and price.
+- Add a row to `docs/infrastructure-resize-history.md` with the resize date,
+  old and new plan, throughput at resize time, normalized req/s per vCPU,
+  and a link to the related issue.
+
+---
+
+## Post-Resize Observation Period
+
+After the resize, monitor for at least **7 days** before concluding success:
+
+- Fill one row per day in `docs/issues/evidence/ISSUE-<N>/02-post-resize-daily-checks.md`
+  using the same Prometheus queries from Step 1.
+- Check external uptime from [newTrackon](https://newtrackon.com/) or similar.
+- Watch UDP buffer error counters for any resurgence.
+
+Once the observation window is complete, fill the final comparison table in
+`docs/issues/evidence/ISSUE-<N>/03-pre-post-comparison.md` and decide whether
+the resize meets the acceptance criteria.
diff --git a/docs/infrastructure-resize-history.md b/docs/infrastructure-resize-history.md
index 11b3fa6..73894a7 100644
--- a/docs/infrastructure-resize-history.md
+++ b/docs/infrastructure-resize-history.md
@@ -23,10 +23,10 @@ investigations (especially for UDP uptime on newTrackon).
 
 ## Timeline
 
-| Date (UTC) | Change type           | Server plan | vCPU | RAM   | HTTP1 req/s | UDP1 req/s | Total req/s | Req/s per vCPU | UDP newTrackon uptime | Notes                                                                                | Related                                                          |
-| ---------- | --------------------- | ----------- | ---- | ----- | ----------- | ---------- | ----------- | -------------- | --------------------- | ------------------------------------------------------------------------------------ | ---------------------------------------------------------------- |
-| 2026-04-13 | Baseline (pre-resize) | CCX23       | 4    | 16 GB | ~1300       | ~1500      | ~2800       | ~700           | 92.20%                | High combined load. Capacity pressure suspected at current normalized request rate.  | [#19](https://github.com/torrust/torrust-tracker-demo/issues/19) |
-| 2026-04-13 | Planned target resize | CCX33       | 8    | 32 GB | ~1300       | ~1500      | ~2800       | ~350           | 92.20%                | Selected next plan: 30 TB traffic, €0.100/h - €62.49/mo. Value assumes similar load. | [#21](https://github.com/torrust/torrust-tracker-demo/issues/21) |
+| Date (UTC) | Change type           | Server plan | vCPU | RAM   | HTTP1 req/s | UDP1 req/s | Total req/s | Req/s per vCPU | UDP newTrackon uptime | Notes                                                                                           | Related                                                          |
+| ---------- | --------------------- | ----------- | ---- | ----- | ----------- | ---------- | ----------- | -------------- | --------------------- | ----------------------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
+| 2026-04-13 | Baseline (pre-resize) | CCX23       | 4    | 16 GB | ~1350       | ~1507      | ~2857       | ~714           | 92.20%                | Baseline from Prometheus 5m rate snapshot at 2026-04-13T15:27:46Z. Capacity pressure suspected. | [#19](https://github.com/torrust/torrust-tracker-demo/issues/19) |
+| 2026-04-13 | Planned target resize | CCX33       | 8    | 32 GB | ~1350       | ~1507      | ~2857       | ~357           | 92.20%                | Selected next plan: 30 TB traffic, €0.100/h - €62.49/mo. Assumes similar load after resize.     | [#21](https://github.com/torrust/torrust-tracker-demo/issues/21) |
 
 ## Decision Criteria (Suggested)
 
@@ -39,5 +39,7 @@ investigations (especially for UDP uptime on newTrackon).
 
 1. Track UDP uptime daily for at least 7 days.
 2. Re-check host load and UDP receive buffer errors.
+   For conntrack-specific diagnosis and remediation, use
+   [udp-conntrack-runbook.md](udp-conntrack-runbook.md).
 3. Compare tracker error/aborted counters before vs after resize.
 4. Record final conclusion in this file and in the related issue.
diff --git a/docs/infrastructure.md b/docs/infrastructure.md
index 733bd8e..76423f8 100644
--- a/docs/infrastructure.md
+++ b/docs/infrastructure.md
@@ -6,6 +6,8 @@ For raw command outputs (`ip addr`, `df -h`, etc.) see
 [infrastructure-raw-outputs.md](infrastructure-raw-outputs.md).
 For server resize and observed request-rate history see
 [infrastructure-resize-history.md](infrastructure-resize-history.md).
+For UDP packet-loss diagnosis and conntrack tuning guidance see
+[udp-conntrack-runbook.md](udp-conntrack-runbook.md).
 
 ## Server
 
diff --git a/docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md b/docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md
index 4abd632..b31fa7b 100644
--- a/docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md
+++ b/docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md
@@ -11,8 +11,8 @@
 ## Overview
 
 Observed traffic and evidence suggest the current server size (CCX23, 4 vCPU,
-16 GB RAM) is likely under pressure for current request volume (roughly
-1300 HTTP req/s + 1500 UDP req/s).
+16 GB RAM) is likely under pressure for current request volume (about
+1350 HTTP req/s + 1507 UDP req/s at the latest baseline snapshot).
 
 Current public uptime observed in newTrackon for UDP is below target:
 
@@ -21,6 +21,21 @@ Current public uptime observed in newTrackon for UDP is below target:
 This issue tracks a controlled resize experiment to determine whether capacity
 is the main bottleneck and to restore/maintain UDP uptime at or above 99%.
 
+## Current State (2026-04-27) — RESOLVED
+
+- Resize (CCX23 -> CCX33) complete and stable.
+- Conntrack overflow root cause identified and fixed on 2026-04-20
+  (`nf_conntrack_max` 262144 → 1048576, UDP timeouts reduced, module pre-load
+  added).
+- 7-day post-fix observation window complete.
+- newTrackon rolling UDP uptime reached **99.9%** — above the 99.0% target.
+
+Outcome: **Success**. See
+[03-pre-post-comparison.md](evidence/ISSUE-21/03-pre-post-comparison.md) for
+the final decision record. Permanent follow-up documentation now lives in
+[udp-conntrack-runbook.md](../../udp-conntrack-runbook.md), with a reusable
+workspace skill at `.github/skills/check-udp-conntrack/skill.md`.
+
 ## Goal
 
 Increase UDP tracker uptime to at least 99.0% over a rolling 7-day window while
@@ -28,15 +43,17 @@ keeping service behavior stable.
 
 ## Current Throughput Baseline (Pre-Resize)
 
-Observed request rates (Grafana, recent 3h window):
+Observed request rates at baseline snapshot (`2026-04-13T15:27:46Z`):
+
+- Source: Prometheus instant query using 5-minute rate windows
 
-- HTTP1: ~1300 req/s
-- UDP1: ~1500 req/s
-- Combined: ~2800 req/s
+- HTTP1: ~1350 req/s
+- UDP1: ~1507 req/s
+- Combined: ~2857 req/s
 
 On the current CCX23 (4 vCPU), this is approximately:
 
-- ~700 req/s per vCPU (combined)
+- ~714 req/s per vCPU (combined)
 
 This baseline must be preserved in the resize history so future sizing
 decisions can be based on both absolute load and normalized load per vCPU.
@@ -98,12 +115,12 @@ The next available option selected for this experiment is:
 
 ## Acceptance Criteria
 
-- [ ] Resize executed and documented in resize history.
-- [ ] No critical service regression immediately after resize.
-- [ ] At least 7 days of post-resize observations recorded.
-- [ ] UDP newTrackon uptime reaches and stays >= 99.0% during evaluation window.
-- [ ] Pre/post comparison documented with clear conclusion.
-- [ ] Resize workflow skill added and referenced.
+- [x] Resize executed and documented in resize history.
+- [x] No critical service regression immediately after resize.
+- [x] At least 7 days of post-resize observations recorded.
+- [x] UDP newTrackon uptime reaches and stays >= 99.0% during evaluation window.
+- [x] Pre/post comparison documented with clear conclusion.
+- [x] Resize workflow skill added and referenced.
 
 ## Possible Outcomes
 
diff --git a/docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md b/docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md
index 665730a..690cc5b 100644
--- a/docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md
+++ b/docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md
@@ -8,27 +8,37 @@ Capture baseline immediately before resizing from CCX23 to CCX33.
 
 ## Snapshot
 
-- Date (UTC):
+- Date (UTC): 2026-04-13T15:27:46Z
 - Server plan: CCX23
 - vCPU / RAM: 4 / 16 GB
 - Traffic allowance: 20 TB
 
 ## Load and Uptime Baseline
 
-- HTTP1 req/s (Grafana, 3h window):
-- UDP1 req/s (Grafana, 3h window):
-- Total req/s:
-- Req/s per vCPU:
-- UDP newTrackon uptime (%):
+- HTTP1 req/s (Prometheus `rate(...[5m])`): ~1350.05
+- UDP1 req/s (Prometheus `rate(...[5m])`): ~1507.10
+- Total req/s: ~2857.15
+- Req/s per vCPU: ~714.29
+- UDP newTrackon uptime (%): 92.20%
 
 ## Reliability and Capacity Signals
 
-- `udp_tracker_server_errors_total` (window/increase):
-- `udp_tracker_server_requests_aborted_total` (window/increase):
-- `udp_tracker_server_responses_sent_total{result="error"}` (window/increase):
-- Host load average (1m/5m/15m):
-- UDP receive buffer errors (`UdpRcvbufErrors`, `Udp6RcvbufErrors`):
+- `udp_tracker_server_errors_total` (1h/increase): ~52983.82
+- `udp_tracker_server_requests_aborted_total` (1h/increase): ~283.18
+- `udp_tracker_server_responses_sent_total{result="error"}` (1h/increase): ~52983.82
+- Host load average (1m/5m/15m): 6.57 / 6.54 / 6.66
+- UDP receive buffer errors (`UdpRcvbufErrors`, `Udp6RcvbufErrors`): 18444 / 494
 
 ## Notes
 
 - Keep command list and links to raw exported artifacts in `data/`.
+- Prometheus query method used (`http_rps_5m`):
+  `sum(rate(http_tracker_core_requests_received_total{server_binding_protocol="http",server_binding_port="7070"}[5m]))`
+- Prometheus query method used (`udp_rps_5m`):
+  `sum(rate(udp_tracker_server_requests_received_total{server_binding_protocol="udp",server_binding_port="6969"}[5m]))`
+- Prometheus query method used (`udp_errors_1h`):
+  `sum(increase(udp_tracker_server_errors_total{server_binding_protocol="udp",server_binding_port="6969"}[1h]))`
+- Prometheus query method used (`udp_aborted_1h`):
+  `sum(increase(udp_tracker_server_requests_aborted_total{server_binding_protocol="udp",server_binding_port="6969"}[1h]))`
+- Prometheus query method used (`udp_error_responses_1h`):
+  `sum(increase(udp_tracker_server_responses_sent_total{server_binding_protocol="udp",server_binding_port="6969",result="error"}[1h]))`
diff --git a/docs/issues/evidence/ISSUE-21/01-resize-execution.md b/docs/issues/evidence/ISSUE-21/01-resize-execution.md
index 0e166e4..445c154 100644
--- a/docs/issues/evidence/ISSUE-21/01-resize-execution.md
+++ b/docs/issues/evidence/ISSUE-21/01-resize-execution.md
@@ -1,5 +1,7 @@
 # Resize Execution Log
 
+<!-- cspell:ignore nproc Rcvbuf perc snmp nstat urlencode poweroff entr -->
+
 ## Planned Change
 
 - From: CCX23 (4 vCPU, 16 GB RAM, 20 TB)
@@ -8,27 +10,196 @@
 
 ## Execution Checklist
 
-- [ ] Resize action executed in provider panel
-- [ ] Server reachable by SSH after resize
-- [ ] `docker compose ps` healthy
-- [ ] HTTP endpoint reachable
-- [ ] UDP endpoint reachable
-- [ ] Prometheus targets up
-- [ ] Grafana accessible
+- [x] Graceful service shutdown completed via `docker compose down`
+- [x] Resize action executed in provider panel
+- [x] Server reachable by SSH after resize
+- [x] `docker compose ps` healthy
+- [x] HTTP endpoint reachable
+- [x] UDP endpoint reachable
+- [x] Prometheus targets up
+- [x] Grafana accessible
+
+## Pre-Resize Safety Checks
+
+- [ ] Confirm latest baseline file is complete:
+      `docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md`
+- [ ] Confirm branch is clean and pushed.
+- [ ] Confirm backup window awareness (nightly restart at ~03:00 UTC).
+- [ ] Confirm maintenance window and operator availability.
+
+## Provider Action (Hetzner)
+
+1. Open server in Hetzner Cloud panel.
+2. Resize from **CCX23** to **CCX33**.
+3. Wait for resize operation to report complete.
+4. Reconnect via SSH and run post-resize checks below.
+
+## Post-Resize Command Checklist
+
+Run from local machine:
+
+```bash
+ssh demotracker 'set -e; echo "=== now ==="; date -u; echo "=== cpu_mem ==="; nproc; free -h; echo "=== uptime ==="; uptime; echo "=== docker ==="; cd /opt/torrust && docker compose ps'
+```
+
+```bash
+ssh demotracker 'set -e; cd /opt/torrust; echo "=== docker_stats ==="; docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"'
+```
+
+```bash
+ssh demotracker 'set -e; echo "=== udp_buffers ==="; grep "^Udp:" /proc/net/snmp; nstat -az 2>/dev/null | grep -Ei "UdpRcvbufErrors|Udp6RcvbufErrors" || true'
+```
+
+```bash
+ssh demotracker 'set -e; q(){ expr="$1"; echo "--- $expr"; curl -sG "http://127.0.0.1:9090/api/v1/query" --data-urlencode "query=$expr"; echo; }; q "up{job=\"tracker_metrics\"}"; q "up{job=\"tracker_stats\"}"'
+```
+
+## Endpoint Sanity Checks
+
+- HTTP tracker health: `curl -fsS "https://http1.torrust-tracker-demo.com/health_check"`
+- Tracker API health: `curl -fsS "https://api.torrust-tracker-demo.com/health_check"`
+- UDP quick sanity (optional): use existing tracker client tooling and store output under `data/`.
 
 ## Timeline
 
-- Start (UTC):
-- End (UTC):
-- Total impact window:
+- Start (UTC): 2026-04-13T15:36:51Z
+- End (UTC): 2026-04-13T15:44:07Z
+- Total impact window: ~7m16s (shutdown + provider resize + startup + validation)
+
+## Pre-Poweroff Graceful Shutdown Log
+
+Command executed from local machine:
+
+```bash
+ssh demotracker 'set -e; echo "=== resize-prep-start-utc ==="; date -u +%Y-%m-%dT%H:%M:%SZ; cd /opt/torrust; echo "=== docker-compose-ps-before ==="; docker compose ps; echo "=== docker-compose-down ==="; docker compose down; echo "=== docker-compose-ps-after ==="; docker compose ps; echo "=== resize-prep-end-utc ==="; date -u +%Y-%m-%dT%H:%M:%SZ'
+```
+
+Captured output:
+
+```text
+=== resize-prep-start-utc ===
+2026-04-13T15:36:51Z
+=== docker-compose-ps-before ===
+NAME         IMAGE                     COMMAND                  SERVICE      CREATED       STATUS                 PORTS
+caddy        caddy:2.10.2              "caddy run --config …"   caddy        4 hours ago   Up 4 hours (healthy)   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp, 0.0.0.0:443->443/udp, :::443->443/udp, 2019/tcp
+grafana      grafana/grafana:12.4.2    "/run.sh"                grafana      4 hours ago   Up 4 hours (healthy)   3000/tcp
+mysql        mysql:8.4                 "docker-entrypoint.s…"   mysql        4 hours ago   Up 4 hours (healthy)   3306/tcp, 33060/tcp
+prometheus   prom/prometheus:v3.5.1    "/bin/prometheus --c…"   prometheus   4 hours ago   Up 4 hours (healthy)   127.0.0.1:9090->9090/tcp
+tracker      torrust/tracker:develop   "/usr/local/bin/entr…"   tracker      4 hours ago   Up 4 hours (healthy)   1212/tcp, 0.0.0.0:6868->6868/udp, :::6868->6868/udp, 1313/tcp, 7070/tcp, 0.0.0.0:6969->6969/udp, :::6969->6969/udp
+=== docker-compose-down ===
+Container grafana  Stopping
+Container caddy  Stopping
+Container grafana  Stopped
+Container grafana  Removing
+Container grafana  Removed
+Container prometheus  Stopping
+Container prometheus  Stopped
+Container prometheus  Removing
+Container prometheus  Removed
+Container tracker  Stopping
+Container caddy  Stopped
+Container caddy  Removing
+Container caddy  Removed
+Container tracker  Stopped
+Container tracker  Removing
+Container tracker  Removed
+Container mysql  Stopping
+Container mysql  Stopped
+Container mysql  Removing
+Container mysql  Removed
+Network torrust_proxy_network  Removing
+Network torrust_database_network  Removing
+Network torrust_visualization_network  Removing
+Network torrust_metrics_network  Removing
+Network torrust_visualization_network  Removed
+Network torrust_database_network  Removed
+Network torrust_metrics_network  Removed
+Network torrust_proxy_network  Removed
+=== docker-compose-ps-after ===
+NAME      IMAGE     COMMAND   SERVICE   CREATED   STATUS    PORTS
+=== resize-prep-end-utc ===
+2026-04-13T15:37:11Z
+```
 
 ## Immediate Post-Resize Snapshot
 
-- `uptime`:
-- `free -h`:
-- `docker stats --no-stream` summary:
+- `nproc`: 8
+- `uptime`: `15:40:20 up 0 min, 1 user, load average: 0.20, 0.06, 0.02`
+- `free -h`: `Mem total 30Gi, used 673Mi, available 29Gi`
+- `docker compose ps`: all services healthy after startup (caddy, grafana, mysql,
+  prometheus, tracker)
+- `docker stats --no-stream` summary (initial warm-up snapshot):
+  - `caddy`: high transient CPU during startup (`603.22%`), memory `3.092GiB`
+  - `tracker`: `153.34%` CPU, memory `364.2MiB`
+  - `mysql`: `46.54%` CPU, memory `553.2MiB`
+  - `grafana`: `40.58%` CPU, memory `257.4MiB`
+  - `prometheus`: `0.06%` CPU, memory `85.14MiB`
 - Any regressions observed:
+  - HTTP1 health endpoint returned `200` with `{"status":"Ok"}`.
+  - Grafana root returned `302` redirect to `/login` (expected behavior).
+  - UDP public port probe succeeded on `udp1:6969`.
+  - API health endpoint returned `500 unauthorized` (same check path appears to
+    require authorization token; not treated as resize failure).
+  - Prometheus targets `up{job="tracker_metrics"}` and `up{job="tracker_stats"}` both `1`.
+  - UDP receive buffer error counters immediately after restart were `0` for both
+    `UdpRcvbufErrors` and `Udp6RcvbufErrors`.
+
+## Rollback Criteria (Operational)
+
+- Server becomes unstable after resize.
+- Core services fail to become healthy.
+- External endpoints unavailable for prolonged window.
+
+If rollback is required, document reason and exact time window here.
+
+## Post-Resize Validation Commands and Key Outputs
+
+Command (host recovery and internal checks):
+
+```bash
+ssh demotracker 'set -e; echo "=== post-resize-start-utc ==="; date -u +%Y-%m-%dT%H:%M:%SZ; echo "=== host-size-check ==="; nproc; free -h; uptime; echo "=== start-stack ==="; cd /opt/torrust; docker compose up -d; echo "=== docker-compose-ps ==="; docker compose ps; echo "=== docker-stats-no-stream ==="; docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"; echo "=== health-http1 ==="; curl -fsS "https://http1.torrust-tracker-demo.com/health_check"; echo; echo "=== health-api ==="; curl -fsS "https://api.torrust-tracker-demo.com/health_check"; echo; echo "=== prometheus-targets-up ==="; curl -sG "http://127.0.0.1:9090/api/v1/query" --data-urlencode "query=up{job=\"tracker_metrics\"}"; echo; curl -sG "http://127.0.0.1:9090/api/v1/query" --data-urlencode "query=up{job=\"tracker_stats\"}"; echo; echo "=== udp-buffer-counters ==="; grep "^Udp:" /proc/net/snmp; nstat -az 2>/dev/null | grep -Ei "UdpRcvbufErrors|Udp6RcvbufErrors" || true; echo "=== post-resize-end-utc ==="; date -u +%Y-%m-%dT%H:%M:%SZ'
+```
+
+Key outputs:
+
+- `post-resize-start-utc`: `2026-04-13T15:40:20Z`
+- `nproc`: `8`
+- `free -h` total memory: `30Gi`
+- `docker compose ps`: all services `healthy`
+- `health-http1`: `200` with `{"status":"Ok"}`
+- `health-api`: initial check failed (`502`, then `500 unauthorized`)
+
+Follow-up command (service stabilization and counters):
+
+```bash
+ssh demotracker 'echo "=== followup-check-utc ==="; date -u +%Y-%m-%dT%H:%M:%SZ; cd /opt/torrust; echo "=== docker-compose-ps ==="; docker compose ps; echo "=== api-health-retries ==="; for i in 1 2 3 4 5; do code=$(curl -s -o /tmp/api_health.out -w "%{http_code}" "https://api.torrust-tracker-demo.com/health_check" || true); echo "try_$i status=$code body=$(cat /tmp/api_health.out 2>/dev/null || true)"; [[ "$code" == "200" ]] && break; sleep 2; done; echo "=== prometheus-target-up ==="; curl -sG "http://127.0.0.1:9090/api/v1/query" --data-urlencode "query=up{job=\"tracker_metrics\"}"; echo; curl -sG "http://127.0.0.1:9090/api/v1/query" --data-urlencode "query=up{job=\"tracker_stats\"}"; echo; echo "=== udp-buffer-counters ==="; grep "^Udp:" /proc/net/snmp; nstat -az 2>/dev/null | grep -Ei "UdpRcvbufErrors|Udp6RcvbufErrors" || true'
+```
+
+Key outputs:
+
+- `followup-check-utc`: `2026-04-13T15:42:10Z`
+- `up{job="tracker_metrics"}`: `1`
+- `up{job="tracker_stats"}`: `1`
+- `UdpRcvbufErrors`: `0`
+- `Udp6RcvbufErrors`: `0`
+
+External sanity checks:
+
+```bash
+curl -s -o /tmp/http1.out -w "%{http_code}" "https://http1.torrust-tracker-demo.com/health_check"
+curl -s -o /tmp/grafana.out -w "%{http_code}" "https://grafana.torrust-tracker-demo.com/"
+nc -zvu -w2 udp1.torrust-tracker-demo.com 6969
+```
+
+Key outputs:
+
+- HTTP1 health: `200`
+- Grafana root: `302` (`/login` redirect)
+- UDP probe: `succeeded`
 
 ## Notes
 
 - Include exact commands and short outputs (or link to files under `data/`).
+- Keep this file chronological and append-only during execution.
+- Shutdown duration before poweroff: ~20 seconds.
+- User-reported provider resize duration: ~1.5 minutes.
diff --git a/docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md b/docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md
index 1858dbe..1ef48a6 100644
--- a/docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md
+++ b/docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md
@@ -1,13 +1,86 @@
 # Post-Resize Daily Checks (7 Days)
 
+<!-- cspell:ignore Rcvbuf snmp utilization -->
+
 ## Daily Log Template
 
-| Day | Date (UTC) | HTTP1 req/s | UDP1 req/s | Total req/s | Req/s per vCPU | UDP uptime (%) | UDP errors trend | UDP aborted trend | Host load trend | Notes |
-| --- | ---------- | ----------- | ---------- | ----------- | -------------- | -------------- | ---------------- | ----------------- | --------------- | ----- |
-| D+1 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+2 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+3 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+4 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+5 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+6 |            |             |            |             |                |                |                  |                   |                 |       |
-| D+7 |            |             |            |             |                |                |                  |                   |                 |       |
+| Day | Date (UTC) | HTTP1 req/s  | UDP1 req/s  | Total req/s | Req/s per vCPU | UDP uptime (%) | UDP errors trend | UDP aborted trend | Host load trend | Notes                                                                                                                                             |
+| --- | ---------- | ------------ | ----------- | ----------- | -------------- | -------------- | ---------------- | ----------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
+| D+1 | 2026-04-20 | ~1564        | ~1015       | ~2579       | ~322           | 83.9%          | ~37k/h (pre-fix) | 0                 | 6.05/5.49/4.80  | conntrack table full (262144/262144); fixed: nf_conntrack_max→1048576, UDP timeouts reduced; also includes planned resize downtime on 2026-04-14  |
+| D+2 | 2026-04-21 |              |             |             |                | 85.70%         |                  |                   |                 | Rolling uptime still low, but recent [newTrackon raw](https://newtrackon.com/raw) probes are currently successful; likely lag from prior failures |
+| D+3 | 2026-04-22 |              |             |             |                |                |                  |                   |                 | Uptime recovering post-fix; rolling window still catching up                                                                                      |
+| D+4 | 2026-04-23 |              |             |             |                |                |                  |                   |                 | Uptime recovering post-fix; rolling window still catching up                                                                                      |
+| D+5 | 2026-04-24 |              |             |             |                |                |                  |                   |                 | Uptime recovering post-fix; rolling window still catching up                                                                                      |
+| D+6 | 2026-04-25 |              |             |             |                |                |                  |                   |                 | Uptime recovering post-fix; rolling window still catching up                                                                                      |
+| D+7 | 2026-04-27 | ~2000 (peak) | ~750 (peak) |             |                | 99.9%          |                  |                   |                 | Target met: 99.9% >= 99.0%; 7-day window complete; issue resolved; peak req/s across 7-day window: HTTP1 ~2000, UDP1 ~750                         |
+
+## D+7 newTrackon Snapshot (2026-04-27)
+
+Source: newTrackon live tracker table captured 2026-04-27.
+
+| Tracker URL                                           | Uptime | Status              | Checked        |
+| ----------------------------------------------------- | ------ | ------------------- | -------------- |
+| `https://http1.torrust-tracker-demo.com:443/announce` | 99.90% | Working for 2 days  | 7 minutes ago  |
+| `udp://udp1.torrust-tracker-demo.com:6969/announce`   | 99.90% | Working for 6 hours | 10 minutes ago |
+
+Both trackers above the 99.0% target. 7-day observation window complete.
+Issue resolved as **Success**.
+
+## D+7 Live Verification Snapshot (2026-04-27)
+
+Checked immediately before merging PR #22 to confirm conntrack is healthy at
+peak traffic (~750 UDP req/s, ~2000 HTTP req/s).
+
+Command run:
+
+```bash
+ssh demotracker '
+  echo "=== conntrack counts ===" &&
+  sudo sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_count &&
+  echo "=== UDP timeouts ===" &&
+  sudo sysctl net.netfilter.nf_conntrack_udp_timeout net.netfilter.nf_conntrack_udp_timeout_stream &&
+  echo "=== dmesg table full ===" &&
+  sudo dmesg -T | grep -i "nf_conntrack: table full" | tail -10 &&
+  echo "(no output = no table-full events)" &&
+  echo "=== UDP receive errors ===" &&
+  cat /proc/net/snmp | grep -E "^Udp:" |
+    awk "NR==1{for(i=1;i<=NF;i++) h[i]=\$i} NR==2{for(i=1;i<=NF;i++) print h[i]\": \"\$i}" |
+    grep -E "RcvbufErrors|InErrors|NoPorts" &&
+  echo "=== UDP6 receive errors ===" &&
+  cat /proc/net/snmp6 | grep -E "Udp6RcvbufErrors|Udp6InErrors|Udp6NoPorts"
+'
+```
+
+Results:
+
+- `nf_conntrack_max`: `1048576`
+- `nf_conntrack_count`: `341652` (`32.59%` of max)
+- `nf_conntrack_udp_timeout`: `10`
+- `nf_conntrack_udp_timeout_stream`: `15`
+- `dmesg` table-full events: none
+- `UdpRcvbufErrors` (IPv4): `0`
+- `UdpInErrors` (IPv4): `0`
+- `UdpNoPorts` (IPv4): `57519` — benign; probes to closed ports, not tracker drops
+- `Udp6RcvbufErrors` (IPv6): `56` — negligible cumulative counter since boot
+- `Udp6InErrors` (IPv6): `56`
+- `Udp6NoPorts` (IPv6): `26183` — benign; same as above
+
+Interpretation: conntrack table is at 32.6% utilization. No table-full events
+in dmesg. No IPv4 UDP receive-buffer drops. The 56 IPv6 errors are a cumulative
+boot-time counter at ~750 req/s peak and are statistically insignificant.
+Conntrack is not overflowing; safe to merge.
+
+## D+2 Live Verification Snapshot (2026-04-21T07:23:08Z)
+
+- Host check command source: `ssh demotracker` runtime validation
+- `nf_conntrack_max`: `1048576`
+- `nf_conntrack_count`: `331258` (`31.59%` of max)
+- `nf_conntrack_udp_timeout_stream`: `15`
+- `nf_conntrack_udp_timeout`: `10`
+- `UdpRcvbufErrors`: `0`
+- `Udp6RcvbufErrors`: `0`
+- `dmesg` check (`sudo -n dmesg -T | grep -i "nf_conntrack: table full" | tail -10`): no recent matches
+
+Interpretation: the configured conntrack sizing and UDP timeouts remain active
+on the live host, and there is no current evidence of UDP packet drops caused
+by conntrack table saturation.
diff --git a/docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md b/docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md
index ed44db7..b8d267e 100644
--- a/docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md
+++ b/docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md
@@ -7,25 +7,32 @@ and reduced sustained reliability pressure.
 
 ## Summary Table
 
-| Metric                | Pre-resize | Post-resize | Change | Interpretation |
-| --------------------- | ---------- | ----------- | ------ | -------------- |
-| HTTP1 req/s           |            |             |        |                |
-| UDP1 req/s            |            |             |        |                |
-| Total req/s           |            |             |        |                |
-| Req/s per vCPU        |            |             |        |                |
-| UDP newTrackon uptime |            |             |        |                |
-| UDP errors            |            |             |        |                |
-| UDP aborted           |            |             |        |                |
-| Host load             |            |             |        |                |
+| Metric                | Pre-resize (CCX23) | Post-resize D+1 (CCX33) | Change  | Interpretation                                                                      |
+| --------------------- | ------------------ | ----------------------- | ------- | ----------------------------------------------------------------------------------- |
+| HTTP1 req/s           | ~1350              | ~1564                   | +16%    | Traffic grew during observation gap                                                 |
+| UDP1 req/s            | ~1507              | ~1015                   | -33%    | Traffic lower on D+1; conntrack overflow may have been suppressing visible count    |
+| Total req/s           | ~2857              | ~2579                   | -10%    | Overall lower on D+1                                                                |
+| Req/s per vCPU        | ~714 (4 vCPU)      | ~322 (8 vCPU)           | -55%    | Significant headroom gained from resize                                             |
+| UDP newTrackon uptime | 92.20%             | 83.9% (D+1, pre-fix)    | -8.3 pp | Degraded — resize alone was insufficient; conntrack overflow was actual bottleneck  |
+| UDP errors            | ~52984/h           | ~37474/h (pre-fix)      | -29%    | Lower but still high; dropped after conntrack fix applied                           |
+| UDP aborted           | ~283/h             | 0                       | -100%   | Gone after resize                                                                   |
+| Host load             | 6.57/6.54/6.66     | 6.05/5.49/4.80          | Lower   | Load spread over 8 vCPUs vs 4; normalized load dropped from ~1.65 to ~0.76 per vCPU |
 
 ## Decision
 
-- [ ] Success: target met and sustained
-- [ ] Partial: improved but below target
+- [x] Success: target met and sustained
+- [ ] Partial: improved but below target — resize alone was insufficient; conntrack overflow was the actual bottleneck
 - [ ] No improvement: continue with next bottleneck path
 
+**Status (2026-04-27):** 7-day observation window complete. UDP uptime on newTrackon
+reached **99.9%** — above the 99.0% target. The conntrack fix applied on D+1
+(2026-04-20) was the decisive change. The resize from CCX23 → CCX33 was a
+necessary supporting step (halved normalized CPU load), but insufficient alone.
+Issue resolved.
+
 ## Follow-up Actions
 
-1.
-2.
-3.
+1. ~~Monitor D+2 through D+7 UDP uptime on newTrackon to confirm fix holds.~~ Done: 99.9% confirmed on 2026-04-27.
+2. ~~Verify conntrack fix survives a server reboot (module pre-load + sysctl applied).~~ Done: settings verified live on 2026-04-21.
+3. ~~If uptime >= 99.0% by D+7 close issue as resolved.~~ Done: issue resolved.
+4. ~~Document in post-mortem if UDP uptime does not recover after fix.~~ N/A: uptime recovered.
diff --git a/docs/udp-conntrack-runbook.md b/docs/udp-conntrack-runbook.md
new file mode 100644
index 0000000..5b27dec
--- /dev/null
+++ b/docs/udp-conntrack-runbook.md
@@ -0,0 +1,338 @@
+<!-- cspell:ignore Rcvbuf conntrack softirq recvmmsg NoPorts ksoftirqd nproc vmstat mpstat -->
+
+# UDP Conntrack Runbook
+
+Operational guide for detecting, fixing, and explaining UDP packet loss caused
+by conntrack saturation or related kernel-side packet-path pressure.
+
+This runbook exists for reuse beyond issue-specific evidence. For the incident
+that led to the current tuning, see
+[ISSUE-21](issues/ISSUE-21-scale-up-server-for-udp-uptime.md) and the evidence
+under `docs/issues/evidence/ISSUE-21/`.
+
+## When To Use This Runbook
+
+Use this runbook when one or more of these symptoms appear:
+
+- newTrackon or other external probes show intermittent UDP timeouts
+- UDP uptime drops while HTTP stays healthy
+- UDP request volume is high and Docker DNAT is in the packet path
+- `nf_conntrack` may be full or close to full
+- Host load looks odd relative to per-CPU usage and packet drops are suspected
+
+## How To Detect The Problem
+
+### External Symptoms
+
+Common user-visible symptoms:
+
+- External UDP probes alternate between working and timing out
+- Failures self-recover without a deploy or restart
+- HTTP tracker remains mostly healthy while UDP uptime degrades
+- Rolling uptime remains low for hours even after recent successful probes
+
+### Host Checks
+
+Run this on the live host:
+
+```bash
+ssh demotracker '
+  echo "=== conntrack counts ===" &&
+  sudo sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_count &&
+  echo "=== UDP timeouts ===" &&
+  sudo sysctl net.netfilter.nf_conntrack_udp_timeout \
+              net.netfilter.nf_conntrack_udp_timeout_stream &&
+  echo "=== dmesg table full ===" &&
+  sudo dmesg -T | grep -i "nf_conntrack: table full" | tail -10 &&
+  echo "(no output = no table-full events)" &&
+  echo "=== UDP receive errors ===" &&
+  cat /proc/net/snmp | grep -E "^Udp:" |
+    awk "NR==1{for(i=1;i<=NF;i++) h[i]=\$i} NR==2{for(i=1;i<=NF;i++) print h[i]\": \"\$i}" |
+    grep -E "RcvbufErrors|InErrors|NoPorts" &&
+  echo "=== UDP6 receive errors ===" &&
+  cat /proc/net/snmp6 | grep -E "Udp6RcvbufErrors|Udp6InErrors|Udp6NoPorts"
+'
+```
+
+Interpret the output like this:
+
+- `nf_conntrack_count == nf_conntrack_max`: immediate problem; table is full
+- `dmesg` contains `nf_conntrack: table full, dropping packet`: confirmed drops
+- `UdpRcvbufErrors > 0` or `Udp6RcvbufErrors > 0`: receive-buffer drops exist
+- `UdpNoPorts` or `Udp6NoPorts`: usually benign; probes to closed ports, not the tracker itself
+
+### Optional Load Distribution Check
+
+Use this when load average looks high but per-process CPU usage does not explain
+it clearly:
+
+```bash
+ssh demotracker '
+  uptime &&
+  nproc &&
+  mpstat -P ALL 1 1 2>/dev/null || echo "mpstat not available" &&
+  ps -eo pid,comm,%cpu,%mem,stat --sort=-%cpu | head -15 &&
+  vmstat 1 3
+'
+```
+
+Interpretation:
+
+- high `%soft` on one CPU means kernel packet handling is concentrated there
+- this points to softirq/RX steering imbalance, not necessarily tracker code problems
+- this is a separate bottleneck from conntrack table saturation
+
+## How To Fix It
+
+### Immediate Live Fix
+
+Apply the kernel tuning live:
+
+```bash
+ssh demotracker '
+  sudo sysctl -w net.netfilter.nf_conntrack_max=1048576 &&
+  sudo sysctl -w net.netfilter.nf_conntrack_udp_timeout=10 &&
+  sudo sysctl -w net.netfilter.nf_conntrack_udp_timeout_stream=15
+'
+```
+
+### Persist The Fix In This Repository
+
+The persistent configuration lives in:
+
+- `server/etc/sysctl.d/99-conntrack.conf`
+- `server/etc/modules-load.d/conntrack.conf`
+
+Why both files matter:
+
+- `99-conntrack.conf` stores the kernel parameter values
+- `conntrack.conf` preloads the `nf_conntrack` module at boot
+- without preloading, the `net.netfilter.*` keys may not exist yet when systemd applies sysctl files, so the values can be skipped after reboot
+
+Current tuned values used by this repository:
+
+| Key                                             | Value     |
+| ----------------------------------------------- | --------- |
+| `net.netfilter.nf_conntrack_max`                | `1048576` |
+| `net.netfilter.nf_conntrack_udp_timeout`        | `10`      |
+| `net.netfilter.nf_conntrack_udp_timeout_stream` | `15`      |
+
+### Validate After The Change
+
+Re-run the detection command above and confirm all of these:
+
+- `nf_conntrack_count` is well below `nf_conntrack_max`
+- no fresh `table full` messages appear in `dmesg`
+- `UdpRcvbufErrors` and `Udp6RcvbufErrors` are stable or zero
+- external UDP probes recover and remain healthy for multiple hours or days
+
+## Why This Works
+
+### Packet Path
+
+For the deployed tracker, the UDP receive path is approximately:
+
+```text
+NIC -> kernel RX interrupt -> softirq/ksoftirqd -> conntrack + Docker DNAT -> socket buffer -> tracker recv loop -> spawned async task
+```
+
+The important point is that conntrack lookup and DNAT happen in the kernel
+before the tracker reads the packet from the socket.
+
+### Failure Mechanism
+
+With Docker in the packet path, each UDP packet can create or refresh a
+conntrack entry.
+
+If all of these are true at the same time:
+
+- request rate is high
+- `nf_conntrack_max` is too small
+- UDP entry timeouts are too long
+
+then the steady-state number of tracked UDP flows grows until the table is full.
+Once full, the kernel drops new packets before the tracker can read them.
+
+### Why Increasing `nf_conntrack_max` Helps
+
+Increasing `nf_conntrack_max` raises the ceiling for concurrent tracked flows,
+reducing the chance that bursts or sustained load fill the table.
+
+### Why Reducing UDP Timeouts Helps
+
+Reducing `nf_conntrack_udp_timeout` and
+`nf_conntrack_udp_timeout_stream` shortens how long old UDP entries stay in the
+table.
+
+That reduces steady-state occupancy, which is often more important than raw CPU
+capacity for this failure mode.
+
+### Why The Tracker Code Is Not The Root Cause
+
+The tracker's UDP loop reads packets after the kernel has already:
+
+- handled the RX interrupt/softirq work
+- performed conntrack lookup
+- applied Docker NAT rules
+- copied the packet into the socket receive buffer
+
+If packets are being dropped because the conntrack table is full, the tracker
+never sees them.
+
+## Separate Future Tuning: RPS/RFS
+
+RPS and RFS are not part of the current deployed fix. They address a different
+bottleneck: one CPU being saturated by kernel softirq work while other CPUs sit
+idle. They solve a different problem from conntrack table saturation.
+
+### How To Detect The Need For RPS/RFS
+
+Run the load distribution check:
+
+```bash
+ssh demotracker '
+  uptime &&
+  nproc &&
+  mpstat -P ALL 1 1 &&
+  vmstat 1 3
+'
+```
+
+Signals that RPS/RFS may help:
+
+- one CPU shows `%soft` near or above 80–90% while other CPUs have significant
+  `%idle`
+- `vmstat` shows very high interrupt counts (`in` column) and many context
+  switches (`cs` column)
+- `ps` shows `ksoftirqd/<N>` for a single CPU near the top of CPU consumers
+
+This pattern was first observed on this server on 2026-04-27:
+
+```text
+CPU2:  %usr=4.76  %sys=4.76  %soft=80.95  %idle=9.52
+```
+
+All other CPUs were at approximately 44% idle at the same time.
+
+### Why It Happens
+
+The Hetzner VM's virtio-net NIC has a single RX queue. Linux assigns that
+queue's hardware interrupt to one CPU. All softirq processing for every incoming
+packet — UDP and TCP — flows through that one core.
+
+The softirq work includes:
+
+- NIC DMA and descriptor processing
+- IP and UDP checksums
+- conntrack lookup and Docker DNAT
+- socket demux and receive-buffer copy
+
+Until one of the RX-steering features is enabled, the kernel has no way to
+distribute this work.
+
+### How To Estimate Whether This Is A Real Bottleneck
+
+At current peak UDP traffic of ~750 req/s and HTTP of ~2000 req/s, the softirq
+CPU (CPU2) was at about 81%. Saturation would occur if that figure approaches
+100% consistently.
+
+A rough rule of thumb: if total req/s grows by roughly 2.5× from the 2026-04-27
+baseline (~2750 req/s combined) without any RPS/RFS tuning, CPU2 may saturate
+and become the next source of packet loss.
+
+### How To Fix It — RPS
+
+RPS tells the kernel to re-queue softirq processing for each packet onto a
+different CPU, chosen by hashing the packet's 4-tuple (src IP, src port, dst
+IP, dst port).
+
+Check the NIC and queue name first:
+
+```bash
+ssh demotracker 'ls /sys/class/net/'
+ssh demotracker 'ls /sys/class/net/eth0/queues/'
+```
+
+Enable RPS across all 8 CPUs:
+
+```bash
+ssh demotracker 'echo ff | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus'
+```
+
+The value `ff` is a bitmask: `0xff` = all 8 CPUs. Adjust for the actual CPU
+count if the server is resized.
+
+### How To Fix It — RFS
+
+RFS extends RPS by tracking which CPU most recently ran the application socket
+thread and steers softirq toward that CPU. This reduces cache misses when the
+kernel hands the packet to userspace.
+
+Enable RFS:
+
+```bash
+ssh demotracker '
+  sudo sysctl -w net.core.rps_sock_flow_entries=32768 &&
+  echo 4096 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_flow_cnt
+'
+```
+
+### Making RPS/RFS Persistent
+
+The `/sys/class/net/...` paths do not survive reboot. To persist them, add a
+`systemd` service or a `@reboot` cron entry, and record the kernel parameter in
+`/etc/sysctl.d/`.
+
+Example cron entry (`/etc/cron.d/rps`):
+
+```text
+@reboot root echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus && echo 4096 > /sys/class/net/eth0/queues/rx-0/rps_flow_cnt
+```
+
+If this is deployed permanently, add the config to:
+
+- `server/etc/sysctl.d/` for the `net.core.rps_sock_flow_entries` setting
+- `server/etc/cron.d/` for the sysfs writes
+
+### Validate After Enabling
+
+Re-run `mpstat -P ALL 1 5` and confirm that:
+
+- `%soft` is spread across multiple CPUs instead of concentrated on one
+- the formerly saturated CPU drops below 70–80%
+- external UDP uptime remains stable or improves
+
+### Why RPS/RFS Does Not Break Conntrack
+
+RPS reorders which CPU handles softirq, but does not bypass conntrack or DNAT.
+Each packet still goes through the full kernel stack; it just does so on a
+different CPU. The conntrack settings from `99-conntrack.conf` remain in effect
+independently.
+
+### Why This Is Separate From The Conntrack Fix
+
+Conntrack overflow causes packet **drops** — the kernel silently discards the
+packet before it ever enters a socket buffer. RPS/RFS addresses CPU **hotspot**
+— one core being too busy to process incoming packets fast enough. Both can
+cause UDP timeouts, but the diagnostic signals are different and the fixes do
+not overlap.
+
+## Reference Values From The 2026-04-27 Verification
+
+Recorded before merging PR #22:
+
+- peak UDP tracker traffic observed over the prior 7 days: about `750 req/s`
+- peak HTTP tracker traffic observed over the prior 7 days: about `2000 req/s`
+- `nf_conntrack_count`: `341652`
+- `nf_conntrack_max`: `1048576`
+- utilization: `32.59%`
+- `UdpRcvbufErrors`: `0`
+- `Udp6RcvbufErrors`: `56` cumulative since boot, not material at observed load
+
+## Related Files
+
+- [docs/infrastructure.md](infrastructure.md)
+- [docs/infrastructure-resize-history.md](infrastructure-resize-history.md)
+- [server/etc/sysctl.d/99-conntrack.conf](../server/etc/sysctl.d/99-conntrack.conf)
+- [server/etc/modules-load.d/conntrack.conf](../server/etc/modules-load.d/conntrack.conf)
+- [docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md](issues/ISSUE-21-scale-up-server-for-udp-uptime.md)
diff --git a/project-words.txt b/project-words.txt
index ce82253..b2c0d8b 100644
--- a/project-words.txt
+++ b/project-words.txt
@@ -23,9 +23,11 @@ augmentedcode
 behaviour
 bencoded
 bindv6only
+bitmask
 clippy
 codel
 conntrack
+cpus
 crontab
 ctstate
 demotracker
@@ -34,6 +36,7 @@ dockerized
 drilldown
 dport
 dtolnay
+hotspot
 efivarfs
 efivars
 ethernets
@@ -48,6 +51,7 @@ mkpath
 Mailgun
 mysqladmin
 netnsid
+netfilter
 netplan
 networkd
 newtrackon
@@ -61,6 +65,7 @@ post-mortems
 prometheus
 pyroscope
 datagrams
+demux
 HSTS
 nosniff
 parseable
@@ -68,10 +73,13 @@ qdisc
 qlen
 repomix
 rgba
+runbook
 rustfmt
 shellcheck
+snmp
 sourceable
 signup
+sysfs
 tcpdump
 tera
 timepicker
@@ -80,4 +88,6 @@ torrust
 tulpn
 ulnp
 userland
+userspace
 veth
+virtio
diff --git a/server/etc/modules-load.d/conntrack.conf b/server/etc/modules-load.d/conntrack.conf
new file mode 100644
index 0000000..ca3f9f2
--- /dev/null
+++ b/server/etc/modules-load.d/conntrack.conf
@@ -0,0 +1,7 @@
+# Pre-load nf_conntrack so that net.netfilter.* sysctl settings in
+# /etc/sysctl.d/99-conntrack.conf are applied at boot.
+#
+# Without this, systemd applies sysctl configs before Docker loads nf_conntrack,
+# so the net.netfilter.* keys do not exist yet and are silently skipped.
+# See: https://github.com/torrust/torrust-tracker-demo/issues/21
+nf_conntrack
diff --git a/server/etc/sysctl.d/99-conntrack.conf b/server/etc/sysctl.d/99-conntrack.conf
new file mode 100644
index 0000000..b50758b
--- /dev/null
+++ b/server/etc/sysctl.d/99-conntrack.conf
@@ -0,0 +1,22 @@
+# Kernel tuning for UDP tracker running behind Docker bridge networking.
+# Docker DNAT creates a conntrack entry for every packet. Under high UDP tracker
+# load the defaults cause silent packet drops and intermittent timeouts.
+# See: https://github.com/torrust/torrust-tracker-demo/issues/21
+#
+# NOTE: net.netfilter.* settings are silently skipped at boot if the
+# nf_conntrack module is not yet loaded. Pre-load it via:
+#   /etc/modules-load.d/conntrack.conf
+
+# Maximum conntrack table entries.
+# Default: 65536-262144. At 400 UDP req/s with a 120 s stream timeout the
+# table fills (400 * 120 = 48000 entries minimum). At 1500 req/s it overflows.
+# Each entry uses ~300 bytes; 1 M entries ≈ 300 MB.
+net.netfilter.nf_conntrack_max = 1048576
+
+# UDP stream timeout (bidirectional). Default: 120 s.
+# A tracker request-reply completes in milliseconds; 15 s is generous.
+# Reducing from 120 s cuts steady-state table size by ~8x.
+net.netfilter.nf_conntrack_udp_timeout_stream = 15
+
+# UDP single-direction timeout. Default: 30 s.
+net.netfilter.nf_conntrack_udp_timeout = 10