Skip to content

Commit 675b826

Browse files
committed
docs: add improvements index for hetzner demo tracker deployment
Collect all deployer improvement recommendations found during this deployment in one file with links to full descriptions. 13 items across create/provision/run/cross-cutting/post-provision: - I-01: create template should document instance_name auto-generation - I-02: create template should default to [::] for public sockets - I-03: create template should prompt for database choice - I-04: provision SSH probe should distinguish failure reasons - I-05: provision error_kind should be specific for SSH auth failures - I-06: provision trace file should include per-attempt SSH details - I-07: provision SSH connectivity timeout should be configurable - I-08: provision should detect passphrase-protected SSH keys early - I-09: provision should have wait-for-ssh or --resume flag - I-10: provision output should include IPv6 address - I-11: run should add lightweight post-start health check - I-12: env config should support floating IP for DNS checks - I-13: post-provision netplan config should use correct permissions
1 parent 11c020e commit 675b826

2 files changed

Lines changed: 181 additions & 0 deletions

File tree

docs/deployments/hetzner-demo-tracker/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Deploy a public Torrust Tracker demo instance to Hetzner Cloud and document ever
4444
- [Secrets rotation](maintenance/secrets-rotation.md) — rotate all secrets after AI-assisted deployment
4545
10. [Tracker Registry](tracker-registry.md) — submit the tracker to public registries (newTrackon)
4646
11. [Bugs](bugs.md) — all deployer bugs discovered during this deployment (11 bugs, 1 fixed)
47+
12. [Improvements](improvements.md) — all improvement recommendations collected in one place (13 items)
4748

4849
## Deployment
4950

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Improvements & Recommendations from Hetzner Demo Tracker Deployment
2+
3+
All deployer improvements and recommendations identified during this deployment,
4+
collected in one place. Each entry links to the full description in the relevant
5+
document.
6+
7+
---
8+
9+
## `create` command
10+
11+
### I-01 — Document `instance_name: null` auto-generation in template
12+
13+
The `create template` output contains `"instance_name": null` with no explanation
14+
of the auto-generated value (`torrust-tracker-vm-{env_name}`). The template should
15+
include an inline comment or a `_comment` field describing this behavior.
16+
17+
Full description: [commands/create/problems.md — instance_name: null unexplained](commands/create/problems.md#problem-template-generates-instance_name-null-with-no-explanation)
18+
19+
---
20+
21+
### I-02 — Default bind addresses to `[::]` (dual-stack) for public trackers
22+
23+
The `create template` command defaults to `0.0.0.0` (IPv4 only). Public trackers
24+
should bind to `[::]`, which accepts both IPv4 and IPv6 on Linux. The template
25+
generator should either default to `[::]` or include a note about the trade-off.
26+
27+
Full description: [commands/create/problems.md — Template defaults to `0.0.0.0`](commands/create/problems.md#problem-template-defaults-bind-addresses-to-0000-ipv4-only)
28+
29+
---
30+
31+
### I-03 — Prompt for database choice or note SQLite dev default
32+
33+
The `create template` command silently selects SQLite without informing the user.
34+
It should either prompt for a database choice interactively or include a comment
35+
noting that SQLite is the development default and MySQL is recommended for
36+
production.
37+
38+
Full description: [commands/create/problems.md — Template silently defaults to SQLite](commands/create/problems.md#problem-template-silently-defaults-to-sqlite--no-database-choice-presented)
39+
40+
---
41+
42+
## `provision` command
43+
44+
### I-04 — Distinguish SSH failure reason in the probe loop
45+
46+
The SSH probe logs a generic "still waiting" message for every failed attempt
47+
regardless of whether the port is unreachable (TCP timeout) or sshd is up
48+
but authentication is rejected. Logging a different message per failure type
49+
would significantly reduce investigation time.
50+
51+
Full description: [commands/provision/improvements.md — Distinguish SSH failure reason](commands/provision/improvements.md#1-distinguish-ssh-failure-reason-in-the-probe-loop)
52+
53+
---
54+
55+
### I-05 — Classify `error_kind` more precisely for SSH auth failures
56+
57+
A `WaitSshConnectivity` failure is always recorded as `NetworkConnectivity` in
58+
the environment JSON, even when the root cause is authentication rejection (not
59+
a network problem). A more specific `SshAuthenticationFailed` variant would
60+
direct investigation to the right layer immediately.
61+
62+
Full description: [commands/provision/improvements.md — Classify error_kind more precisely](commands/provision/improvements.md#2-classify-error_kind-more-precisely-for-auth-failures)
63+
64+
---
65+
66+
### I-06 — Include per-attempt failure details in the provision trace file
67+
68+
The trace file only records a final summary. A condensed per-phase breakdown
69+
of the SSH probe (how many attempts timed out vs. were rejected by sshd) would
70+
be immediately actionable for operators without requiring analysis of
71+
`data/logs/log.txt`.
72+
73+
Full description: [commands/provision/improvements.md — Per-attempt details in trace file](commands/provision/improvements.md#3-include-per-attempt-failure-details-in-the-trace-file)
74+
75+
---
76+
77+
### I-07 — Make SSH connectivity timeout configurable
78+
79+
The probe budget is hardcoded at 60 × 2 s = 120 s. Hetzner servers with
80+
cloud-init user provisioning require over 3 minutes. This should be
81+
configurable per provider, per env config, and via a CLI flag, with a longer
82+
default for Hetzner.
83+
84+
Full description: [commands/provision/improvements.md — Configurable SSH connectivity timeout](commands/provision/improvements.md#4-support-configurable-ssh-connectivity-timeout)
85+
86+
---
87+
88+
### I-08 — Detect passphrase-protected SSH keys early and warn
89+
90+
The deployer does not check whether the configured SSH private key has a
91+
passphrase. When running inside Docker (no agent, no TTY), a passphrase-protected
92+
key silently fails every attempt. This should be caught at `create environment`
93+
or `validate` time, with a clear actionable warning.
94+
95+
Full description: [commands/provision/improvements.md — Detect passphrase-protected keys early](commands/provision/improvements.md#7-detect-passphrase-protected-ssh-keys-early-and-warn-the-user)
96+
97+
---
98+
99+
### I-09 — Add `wait-for-ssh` command or `provision --resume` flag
100+
101+
When `provision` fails at `WaitSshConnectivity`, the server itself is healthy
102+
but the environment must be destroyed and recreated from scratch. A
103+
`wait-for-ssh` command or `provision --resume` flag would allow retrying only
104+
the SSH probe step against an already-created server, saving the full
105+
`tofu apply` + cloud-init cycle.
106+
107+
Full description: [commands/provision/improvements.md — `wait-for-ssh` command](commands/provision/improvements.md#5-add-a-wait-for-ssh-command-or---resume-flag-on-provision)
108+
109+
---
110+
111+
### I-10 — Include IPv6 address in `provision` output
112+
113+
The `provision` JSON output only includes the IPv4 instance IP. Hetzner
114+
assigns an IPv6 address and /64 network to every server, but these are only
115+
visible in the raw Tofu state file. Exposing them in the output avoids
116+
operators having to consult the state file during post-provision steps like
117+
floating IP setup.
118+
119+
Full description: [commands/provision/improvements.md — Include IPv6 in provision output](commands/provision/improvements.md#6-include-ipv6-address-in-provision-command-output)
120+
121+
---
122+
123+
## `run` command
124+
125+
### I-11 — Add lightweight post-start health check to `run`
126+
127+
The `Running` state only indicates that `docker compose up -d` returned exit
128+
code 0, not that services are healthy. A lightweight poll of `docker compose ps`
129+
after startup (until no container is in `starting` or `restarting` state) would
130+
catch fast-failing containers — such as the tracker URL-encoding crash in this
131+
deployment — without duplicating the full `test` smoke-test logic.
132+
133+
Full description: [commands/run/improvements.md — `Running` state does not guarantee healthy services](commands/run/improvements.md#improvement-running-state-does-not-guarantee-services-are-healthy)
134+
135+
---
136+
137+
## Cross-cutting
138+
139+
### I-12 — Add floating IP support to environment config and DNS checks
140+
141+
The `test` command compares resolved DNS addresses against the bare instance IP.
142+
When a floating IP is in use (the recommended production setup), every domain
143+
produces a false-positive DNS warning. The environment config should allow
144+
specifying a `floating_ip` so the deployer uses it as the expected DNS target
145+
and can optionally auto-assign it during provisioning.
146+
147+
Full description: [commands/improvements.md — Deployer not aware of floating IPs](commands/improvements.md#improvement-deployer-is-not-aware-of-floating-ips)
148+
149+
---
150+
151+
## Post-provision / operational
152+
153+
### I-13 — Write netplan floating IP config with correct permissions from the start
154+
155+
The post-provision guide writes the netplan file with `sudo tee`, which creates
156+
it world-readable. Netplan then logs a warning requiring `chmod 600`. Using
157+
`sudo install -m 600 /dev/stdin /etc/netplan/60-floating-ip.yaml` instead
158+
avoids the warning and the manual fix step.
159+
160+
Full description: [post-provision/dns-setup.md — Improvements](post-provision/dns-setup.md#improvements)
161+
162+
---
163+
164+
## Summary
165+
166+
| ID | Area | Description |
167+
| ---- | -------------- | ------------------------------------------------------------ |
168+
| I-01 | `create` | Document `instance_name` auto-generation in template |
169+
| I-02 | `create` | Default bind addresses to `[::]` for public trackers |
170+
| I-03 | `create` | Prompt for database choice or note SQLite dev default |
171+
| I-04 | `provision` | Distinguish SSH failure reason in probe loop |
172+
| I-05 | `provision` | Classify `error_kind` more precisely for SSH auth failures |
173+
| I-06 | `provision` | Include per-attempt SSH failure details in trace file |
174+
| I-07 | `provision` | Make SSH connectivity timeout configurable |
175+
| I-08 | `provision` | Detect passphrase-protected SSH keys early and warn |
176+
| I-09 | `provision` | Add `wait-for-ssh` command or `provision --resume` flag |
177+
| I-10 | `provision` | Include IPv6 address in provision output |
178+
| I-11 | `run` | Add lightweight post-start health check |
179+
| I-12 | Cross-cutting | Add floating IP support to env config and DNS checks |
180+
| I-13 | Post-provision | Write netplan config with correct permissions from the start |

0 commit comments

Comments
 (0)