feat(agent): Improve nginx vhost generation, CLI help, and service discovery#160
Conversation
Compose updates synced service metadata to disk but discarded any error from the save, so the compose file could update while service metadata silently went stale, leaving the two inconsistent. The save error is now surfaced and the update fails instead of diverging.
In a multi-service deployment the primary service was chosen by Go map iteration order, which is random, so the selected service could vary between runs and pick an unintended one. A service named app or web that exposes ports is now preferred, falling back to the previous first-with- ports behaviour, making the choice deterministic.
ACME challenge requests, especially 404s for challenges already cleaned up, flooded the nginx access and error logs and buried real errors. Logging is now turned off on the challenge locations so the noise no longer obscures genuine problems.
Code Review SummaryThis PR improves the agent's robustness and CLI usability. Key changes include a move to Cobra, deterministic service discovery, and much-improved Nginx vhost generation (WebSocket support, OCSP stapling logic, and service probing). 🚀 Key Improvements
💡 Minor Suggestions
|
| existing.Networking.Service = newMeta.Networking.Service | ||
| } | ||
| d.SaveMetadata(name, existing) | ||
| if err := d.SaveMetadata(name, existing); err != nil { |
There was a problem hiding this comment.
Propagating this error is a good safety improvement. However, since this typically runs inside a discovery loop, consider if a failure to save one service's metadata should crash the entire sync or just be logged as a warning to allow other services to be discovered.
| if err := d.SaveMetadata(name, existing); err != nil { | |
| if err := d.SaveMetadata(name, existing); err != nil { | |
| log.Printf("error: failed to save metadata for %s: %v", name, err) | |
| continue | |
| } |
| return false, false | ||
| } | ||
|
|
||
| probe := fmt.Sprintf("nc -z -w2 %s %d", service, port) |
There was a problem hiding this comment.
Using nc -z is efficient for probing. However, if the nginx container is heavily hardened (e.g., using a distroless or minimal alpine image), nc might not be present. While interpretProbe handles the missing tool, you might consider using /dev/tcp if the shell is bash for broader compatibility in some environments.
| probe := fmt.Sprintf("nc -z -w2 %s %d", service, port) | |
| probe := fmt.Sprintf("timeout 2 sh -c 'cat < /dev/null > /dev/tcp/%s/%d'", service, port) |
Top-level help only showed the root flags, so the update, setup, and version subcommands were undiscoverable, and a bare invocation tried to start the server and failed on a missing default config. The CLI now uses cobra: help lists every subcommand, a bare invocation prints that help instead of failing, and passing a config (or the serve command) still starts the agent. The single-dash -config and -version forms keep working, so existing launch commands are unaffected.
Generated vhosts pinned the proxy read and send timeout at 60s, which is too short for long-lived WebSocket connections: an idle socket with no frames for 60s was closed, causing reconnect loops on quiet connections. A domain can now set its own timeout (defaulting to 60s), so WebSocket routes can hold connections open.
Every generated location sent Connection: upgrade on every request, even on vhosts proxying plain HTTP, which disabled upstream keep-alive and sent a stray header. The header is now driven by a managed map so it is only sent when a client actually requests a WebSocket upgrade; ordinary HTTP requests keep their connection reusable.
ssl_stapling was emitted for every SSL vhost, so certificates without an OCSP responder logged a warning on every config test and reload, burying real errors in the output. Stapling is now enabled only when the domain's certificate actually advertises a responder, and stays on when the cert cannot be read so behaviour is unchanged on error.
A domain could be mapped to a service or port where nothing listens, and the vhost was generated and reloaded with no signal, leaving a silently dead route. Proxy setup now probes each target through the same path the proxy resolves and surfaces a warning when it is unreachable, without failing the setup or warning when the target simply can't be probed.
9ba3ffa to
b05710c
Compare
The simplest, self-contained items from the Albacore umbrella (#158), one commit each.
Fixes
SaveMetadataerrors during a compose update are now propagated instead of silently dropped, so the compose file andservice.ymlcan't diverge unnoticed.apporwebbefore falling back to the first with ports. Previously the pick depended on Go map iteration order and could vary between runs.access_log off/log_not_found off, so cleaned-up-challenge 404s stop burying real errors in the nginx logs.CLI (#130)
cmd/agentnow uses cobra. Top-level help lists every subcommand (serve,setup,update,version), which the hand-rolledflagusage never did. A bareflatrun-agentprints that help instead of trying to start the daemon and failing on a missing default config. Passing--config(or the newservecommand) still starts the agent, so the systemd launch (flatrun-agent --config ...) is unchanged and the installer needs no update. cobra was already in the module graph, so this adds no new external dependency.nginx vhost generation (#156)
proxy_timeout), defaulting to 60s, so long-lived sockets aren't closed after 60s idle.Connection: upgradeis sent only when a client requests an upgrade, via a managedmapsnippet, instead of unconditionally on every request. The snippet is written into the conf.d include dir so existing installs pick it up without re-applying the base config.ssl_staplingis emitted only when the certificate advertises an OCSP responder, removing the per-reload warning on certs without one. It stays enabled when the cert can't be read.Closes #110, #111, #112, #130, #156.