feat(issue-21): scale up server CCX23 → CCX33 for better UDP uptime by josecelano · Pull Request #22 · torrust/torrust-tracker-demo

josecelano · 2026-04-13T15:53:28Z

Summary

Scales the Hetzner server from CCX23 (4 vCPU, 16 GB RAM) to CCX33 (8 vCPU, 32 GB RAM) to address the UDP uptime issues tracked in #19. This PR contains the full evidence trail for the resize experiment, including pre-resize baseline, execution log, and daily observation templates.

Changes

docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md — issue spec with acceptance criteria
docs/infrastructure-resize-history.md — new file tracking server resize events with req/s load
docs/infrastructure.md — updated with traffic, price, and resize history link
docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md — measured Prometheus values before resize (HTTP ~1350 req/s, UDP ~1507 req/s)
docs/issues/evidence/ISSUE-21/01-resize-execution.md — full resize execution log with commands, outputs, and external health checks
docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md — 7-day daily observation template (to be filled over the next 7 days)
docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md — final comparison template (to be filled after observation window)

Resize Summary

Item	Before	After
Plan	CCX23	CCX33
vCPU	4	8
RAM	16 GB	32 GB
Traffic	20 TB	30 TB
Price	€31.49/mo	€62.49/mo
HTTP req/s	~1350	TBD
UDP req/s	~1507	TBD
UDP uptime	~92.20%	TBD

Observation Window

The resize was executed on 2026-04-13. This PR will be merged after a 7-day observation window ending around 2026-04-20, once daily checks have been collected and the pre/post comparison is complete.

Acceptance Criteria

UDP newTrackon uptime ≥ 99.0% over rolling 7 days post-resize
UDP buffer error counters remain near zero after the server has been under load
Host load average stays below 70% of available capacity
No new service degradation observed in HTTP tracker
Pre/post comparison documented in 03-pre-post-comparison.md
Resize workflow skill added and referenced

Refs: #21

Documents the full resize workflow: pre-resize baseline capture, graceful shutdown, Hetzner panel action by human operator, post-resize recovery and validation, evidence capture, and 7-day observation period. Notes the key behaviour that Hetzner in-place resizes preserve all IP addresses (public, private, and Floating IPs), so no DNS or IP reassignment is needed. Refs: #21

The default conntrack table (262144 entries) fills up under sustained UDP tracker load, causing "nf_conntrack: table full, dropping packet" kernel errors and intermittent UDP timeouts on uptime monitors. Applied kernel tunables: - nf_conntrack_max: 262144 → 1048576 (4x increase) - nf_conntrack_udp_timeout_stream: 120 s → 15 s (8x reduction) - nf_conntrack_udp_timeout: 30 s → 10 s Added /etc/modules-load.d/conntrack.conf to pre-load the nf_conntrack module at boot so sysctl settings are applied before Docker starts. Without this, net.netfilter.* keys don't exist when sysctl runs and the settings are silently skipped after a reboot. Refs: #21

Fill in the D+1 row (2026-04-20) in the daily checks log: - HTTP: ~1564 req/s, UDP: ~1015 req/s, total ~2579 req/s (~322/vCPU) - Host load: 6.05/5.49/4.80 - UDP newTrackon uptime: 83.9% (includes resize downtime + conntrack overflow period; fix applied same day) Update the pre/post comparison table with available metrics and mark the decision as "partial" — resize alone was insufficient, conntrack overflow was the actual bottleneck. Follow-up plan added. Refs: #21

josecelano added 3 commits April 13, 2026 16:31

docs(issue-21): record measured pre-resize load baseline

1a84f3b

Refs: #21

docs(issue-21): add resize execution runbook

56414cf

Refs: #21

docs(issue-21): document post-resize validation results

90e0653

Refs: #21

josecelano self-assigned this Apr 13, 2026

josecelano requested review from cgbosse and da2ce7 April 13, 2026 15:54

josecelano added 3 commits April 13, 2026 16:56

josecelano mentioned this pull request Apr 20, 2026

New article: nf_conntrack overflow causes intermittent UDP tracker downtime with Docker torrust/torrust-website#192

Open

docs(issue-21): record D+2 live UDP verification state

62fbef5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(issue-21): scale up server CCX23 → CCX33 for better UDP uptime#22

feat(issue-21): scale up server CCX23 → CCX33 for better UDP uptime#22
josecelano wants to merge 7 commits intomainfrom
issue-21-scale-up-server

josecelano commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

josecelano commented Apr 13, 2026

Summary

Changes

Resize Summary

Observation Window

Acceptance Criteria

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant