|
| 1 | +# Improvements & Recommendations from Hetzner Demo Tracker Deployment |
| 2 | + |
| 3 | +All deployer improvements and recommendations identified during this deployment, |
| 4 | +collected in one place. Each entry links to the full description in the relevant |
| 5 | +document. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## `create` command |
| 10 | + |
| 11 | +### I-01 — Document `instance_name: null` auto-generation in template |
| 12 | + |
| 13 | +The `create template` output contains `"instance_name": null` with no explanation |
| 14 | +of the auto-generated value (`torrust-tracker-vm-{env_name}`). The template should |
| 15 | +include an inline comment or a `_comment` field describing this behavior. |
| 16 | + |
| 17 | +Full description: [commands/create/problems.md — instance_name: null unexplained](commands/create/problems.md#problem-template-generates-instance_name-null-with-no-explanation) |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +### I-02 — Default bind addresses to `[::]` (dual-stack) for public trackers |
| 22 | + |
| 23 | +The `create template` command defaults to `0.0.0.0` (IPv4 only). Public trackers |
| 24 | +should bind to `[::]`, which accepts both IPv4 and IPv6 on Linux. The template |
| 25 | +generator should either default to `[::]` or include a note about the trade-off. |
| 26 | + |
| 27 | +Full description: [commands/create/problems.md — Template defaults to `0.0.0.0`](commands/create/problems.md#problem-template-defaults-bind-addresses-to-0000-ipv4-only) |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +### I-03 — Prompt for database choice or note SQLite dev default |
| 32 | + |
| 33 | +The `create template` command silently selects SQLite without informing the user. |
| 34 | +It should either prompt for a database choice interactively or include a comment |
| 35 | +noting that SQLite is the development default and MySQL is recommended for |
| 36 | +production. |
| 37 | + |
| 38 | +Full description: [commands/create/problems.md — Template silently defaults to SQLite](commands/create/problems.md#problem-template-silently-defaults-to-sqlite--no-database-choice-presented) |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +## `provision` command |
| 43 | + |
| 44 | +### I-04 — Distinguish SSH failure reason in the probe loop |
| 45 | + |
| 46 | +The SSH probe logs a generic "still waiting" message for every failed attempt |
| 47 | +regardless of whether the port is unreachable (TCP timeout) or sshd is up |
| 48 | +but authentication is rejected. Logging a different message per failure type |
| 49 | +would significantly reduce investigation time. |
| 50 | + |
| 51 | +Full description: [commands/provision/improvements.md — Distinguish SSH failure reason](commands/provision/improvements.md#1-distinguish-ssh-failure-reason-in-the-probe-loop) |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +### I-05 — Classify `error_kind` more precisely for SSH auth failures |
| 56 | + |
| 57 | +A `WaitSshConnectivity` failure is always recorded as `NetworkConnectivity` in |
| 58 | +the environment JSON, even when the root cause is authentication rejection (not |
| 59 | +a network problem). A more specific `SshAuthenticationFailed` variant would |
| 60 | +direct investigation to the right layer immediately. |
| 61 | + |
| 62 | +Full description: [commands/provision/improvements.md — Classify error_kind more precisely](commands/provision/improvements.md#2-classify-error_kind-more-precisely-for-auth-failures) |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +### I-06 — Include per-attempt failure details in the provision trace file |
| 67 | + |
| 68 | +The trace file only records a final summary. A condensed per-phase breakdown |
| 69 | +of the SSH probe (how many attempts timed out vs. were rejected by sshd) would |
| 70 | +be immediately actionable for operators without requiring analysis of |
| 71 | +`data/logs/log.txt`. |
| 72 | + |
| 73 | +Full description: [commands/provision/improvements.md — Per-attempt details in trace file](commands/provision/improvements.md#3-include-per-attempt-failure-details-in-the-trace-file) |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +### I-07 — Make SSH connectivity timeout configurable |
| 78 | + |
| 79 | +The probe budget is hardcoded at 60 × 2 s = 120 s. Hetzner servers with |
| 80 | +cloud-init user provisioning require over 3 minutes. This should be |
| 81 | +configurable per provider, per env config, and via a CLI flag, with a longer |
| 82 | +default for Hetzner. |
| 83 | + |
| 84 | +Full description: [commands/provision/improvements.md — Configurable SSH connectivity timeout](commands/provision/improvements.md#4-support-configurable-ssh-connectivity-timeout) |
| 85 | + |
| 86 | +--- |
| 87 | + |
| 88 | +### I-08 — Detect passphrase-protected SSH keys early and warn |
| 89 | + |
| 90 | +The deployer does not check whether the configured SSH private key has a |
| 91 | +passphrase. When running inside Docker (no agent, no TTY), a passphrase-protected |
| 92 | +key silently fails every attempt. This should be caught at `create environment` |
| 93 | +or `validate` time, with a clear actionable warning. |
| 94 | + |
| 95 | +Full description: [commands/provision/improvements.md — Detect passphrase-protected keys early](commands/provision/improvements.md#7-detect-passphrase-protected-ssh-keys-early-and-warn-the-user) |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +### I-09 — Add `wait-for-ssh` command or `provision --resume` flag |
| 100 | + |
| 101 | +When `provision` fails at `WaitSshConnectivity`, the server itself is healthy |
| 102 | +but the environment must be destroyed and recreated from scratch. A |
| 103 | +`wait-for-ssh` command or `provision --resume` flag would allow retrying only |
| 104 | +the SSH probe step against an already-created server, saving the full |
| 105 | +`tofu apply` + cloud-init cycle. |
| 106 | + |
| 107 | +Full description: [commands/provision/improvements.md — `wait-for-ssh` command](commands/provision/improvements.md#5-add-a-wait-for-ssh-command-or---resume-flag-on-provision) |
| 108 | + |
| 109 | +--- |
| 110 | + |
| 111 | +### I-10 — Include IPv6 address in `provision` output |
| 112 | + |
| 113 | +The `provision` JSON output only includes the IPv4 instance IP. Hetzner |
| 114 | +assigns an IPv6 address and /64 network to every server, but these are only |
| 115 | +visible in the raw Tofu state file. Exposing them in the output avoids |
| 116 | +operators having to consult the state file during post-provision steps like |
| 117 | +floating IP setup. |
| 118 | + |
| 119 | +Full description: [commands/provision/improvements.md — Include IPv6 in provision output](commands/provision/improvements.md#6-include-ipv6-address-in-provision-command-output) |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +## `run` command |
| 124 | + |
| 125 | +### I-11 — Add lightweight post-start health check to `run` |
| 126 | + |
| 127 | +The `Running` state only indicates that `docker compose up -d` returned exit |
| 128 | +code 0, not that services are healthy. A lightweight poll of `docker compose ps` |
| 129 | +after startup (until no container is in `starting` or `restarting` state) would |
| 130 | +catch fast-failing containers — such as the tracker URL-encoding crash in this |
| 131 | +deployment — without duplicating the full `test` smoke-test logic. |
| 132 | + |
| 133 | +Full description: [commands/run/improvements.md — `Running` state does not guarantee healthy services](commands/run/improvements.md#improvement-running-state-does-not-guarantee-services-are-healthy) |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Cross-cutting |
| 138 | + |
| 139 | +### I-12 — Add floating IP support to environment config and DNS checks |
| 140 | + |
| 141 | +The `test` command compares resolved DNS addresses against the bare instance IP. |
| 142 | +When a floating IP is in use (the recommended production setup), every domain |
| 143 | +produces a false-positive DNS warning. The environment config should allow |
| 144 | +specifying a `floating_ip` so the deployer uses it as the expected DNS target |
| 145 | +and can optionally auto-assign it during provisioning. |
| 146 | + |
| 147 | +Full description: [commands/improvements.md — Deployer not aware of floating IPs](commands/improvements.md#improvement-deployer-is-not-aware-of-floating-ips) |
| 148 | + |
| 149 | +--- |
| 150 | + |
| 151 | +## Post-provision / operational |
| 152 | + |
| 153 | +### I-13 — Write netplan floating IP config with correct permissions from the start |
| 154 | + |
| 155 | +The post-provision guide writes the netplan file with `sudo tee`, which creates |
| 156 | +it world-readable. Netplan then logs a warning requiring `chmod 600`. Using |
| 157 | +`sudo install -m 600 /dev/stdin /etc/netplan/60-floating-ip.yaml` instead |
| 158 | +avoids the warning and the manual fix step. |
| 159 | + |
| 160 | +Full description: [post-provision/dns-setup.md — Improvements](post-provision/dns-setup.md#improvements) |
| 161 | + |
| 162 | +--- |
| 163 | + |
| 164 | +## Summary |
| 165 | + |
| 166 | +| ID | Area | Description | |
| 167 | +| ---- | -------------- | ------------------------------------------------------------ | |
| 168 | +| I-01 | `create` | Document `instance_name` auto-generation in template | |
| 169 | +| I-02 | `create` | Default bind addresses to `[::]` for public trackers | |
| 170 | +| I-03 | `create` | Prompt for database choice or note SQLite dev default | |
| 171 | +| I-04 | `provision` | Distinguish SSH failure reason in probe loop | |
| 172 | +| I-05 | `provision` | Classify `error_kind` more precisely for SSH auth failures | |
| 173 | +| I-06 | `provision` | Include per-attempt SSH failure details in trace file | |
| 174 | +| I-07 | `provision` | Make SSH connectivity timeout configurable | |
| 175 | +| I-08 | `provision` | Detect passphrase-protected SSH keys early and warn | |
| 176 | +| I-09 | `provision` | Add `wait-for-ssh` command or `provision --resume` flag | |
| 177 | +| I-10 | `provision` | Include IPv6 address in provision output | |
| 178 | +| I-11 | `run` | Add lightweight post-start health check | |
| 179 | +| I-12 | Cross-cutting | Add floating IP support to env config and DNS checks | |
| 180 | +| I-13 | Post-provision | Write netplan config with correct permissions from the start | |
0 commit comments