Skip to content

feat(issue-21): scale up server CCX23 → CCX33 for better UDP uptime#22

Draft
josecelano wants to merge 7 commits intomainfrom
issue-21-scale-up-server
Draft

feat(issue-21): scale up server CCX23 → CCX33 for better UDP uptime#22
josecelano wants to merge 7 commits intomainfrom
issue-21-scale-up-server

Conversation

@josecelano
Copy link
Copy Markdown
Member

Summary

Scales the Hetzner server from CCX23 (4 vCPU, 16 GB RAM) to CCX33 (8 vCPU, 32 GB RAM) to address the UDP uptime issues tracked in #19. This PR contains the full evidence trail for the resize experiment, including pre-resize baseline, execution log, and daily observation templates.

Changes

  • docs/issues/ISSUE-21-scale-up-server-for-udp-uptime.md — issue spec with acceptance criteria
  • docs/infrastructure-resize-history.md — new file tracking server resize events with req/s load
  • docs/infrastructure.md — updated with traffic, price, and resize history link
  • docs/issues/evidence/ISSUE-21/00-pre-resize-baseline.md — measured Prometheus values before resize (HTTP ~1350 req/s, UDP ~1507 req/s)
  • docs/issues/evidence/ISSUE-21/01-resize-execution.md — full resize execution log with commands, outputs, and external health checks
  • docs/issues/evidence/ISSUE-21/02-post-resize-daily-checks.md — 7-day daily observation template (to be filled over the next 7 days)
  • docs/issues/evidence/ISSUE-21/03-pre-post-comparison.md — final comparison template (to be filled after observation window)

Resize Summary

Item Before After
Plan CCX23 CCX33
vCPU 4 8
RAM 16 GB 32 GB
Traffic 20 TB 30 TB
Price €31.49/mo €62.49/mo
HTTP req/s ~1350 TBD
UDP req/s ~1507 TBD
UDP uptime ~92.20% TBD

Observation Window

The resize was executed on 2026-04-13. This PR will be merged after a 7-day observation window ending around 2026-04-20, once daily checks have been collected and the pre/post comparison is complete.

Acceptance Criteria

  • UDP newTrackon uptime ≥ 99.0% over rolling 7 days post-resize
  • UDP buffer error counters remain near zero after the server has been under load
  • Host load average stays below 70% of available capacity
  • No new service degradation observed in HTTP tracker
  • Pre/post comparison documented in 03-pre-post-comparison.md
  • Resize workflow skill added and referenced

Refs: #21

@josecelano josecelano self-assigned this Apr 13, 2026
@josecelano josecelano requested review from cgbosse and da2ce7 April 13, 2026 15:54
Documents the full resize workflow: pre-resize baseline capture, graceful shutdown, Hetzner panel action by human operator, post-resize recovery and validation, evidence capture, and 7-day observation period.

Notes the key behaviour that Hetzner in-place resizes preserve all IP addresses (public, private, and Floating IPs), so no DNS or IP reassignment is needed.

Refs: #21
The default conntrack table (262144 entries) fills up under sustained
UDP tracker load, causing "nf_conntrack: table full, dropping packet"
kernel errors and intermittent UDP timeouts on uptime monitors.

Applied kernel tunables:
- nf_conntrack_max: 262144 → 1048576 (4x increase)
- nf_conntrack_udp_timeout_stream: 120 s → 15 s (8x reduction)
- nf_conntrack_udp_timeout: 30 s → 10 s

Added /etc/modules-load.d/conntrack.conf to pre-load the nf_conntrack
module at boot so sysctl settings are applied before Docker starts.
Without this, net.netfilter.* keys don't exist when sysctl runs and
the settings are silently skipped after a reboot.

Refs: #21
Fill in the D+1 row (2026-04-20) in the daily checks log:
- HTTP: ~1564 req/s, UDP: ~1015 req/s, total ~2579 req/s (~322/vCPU)
- Host load: 6.05/5.49/4.80
- UDP newTrackon uptime: 83.9% (includes resize downtime + conntrack
  overflow period; fix applied same day)

Update the pre/post comparison table with available metrics and mark
the decision as "partial" — resize alone was insufficient, conntrack
overflow was the actual bottleneck. Follow-up plan added.

Refs: #21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant