Skip to content

support saving bin logs on server#36

Merged
tridge merged 23 commits into
ArduPilot:mainfrom
tridge:pr-bin-logs
May 11, 2026
Merged

support saving bin logs on server#36
tridge merged 23 commits into
ArduPilot:mainfrom
tridge:pr-bin-logs

Conversation

@tridge
Copy link
Copy Markdown
Contributor

@tridge tridge commented May 11, 2026

still working on the details, but basic support works
image

tridge and others added 23 commits May 11, 2026 06:35
Reserves bit 3 of the existing flags word for the upcoming ArduPilot
binary-log (.bin) recorder. No struct change — bin files reuse
KeyEntry.tlog_retention_days (per spec, same retention rules as
.tlog). keydb.py setflag/clearflag binlog work for free via
FLAG_NAMES, and KeyEntry.__str__ already enumerates flag names so
'keydb.py list' shows 'binlog' alongside 'tlog'.

Tests cover the bit's round-trip through pack/unpack, coexistence
with FLAG_TLOG on one entry, and the CLI smoke path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move TlogWriter's per-day sessionN-scan out of tlog.cpp into a small
shared helper next_session_n() in new session.h / session.cpp. The
helper scans for BOTH session*.tlog AND session*.bin, so when the
upcoming binlog feature lands, a single fork's .tlog and .bin files
share the same N: per-child code computes the number once at fork
start and passes it to TlogWriter::open() (now takes an explicit
session_n; the internal scan is gone). The dir-creation helper
mkpath() is also lifted into session.cpp so binlog can reuse it.

No behaviour change for existing tlog deployments. Existing
session1.tlog / session2.tlog naming is preserved; the new scan
just happens to also notice .bin files (none of which exist yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When KEY_FLAG_BINLOG is set on an entry, the per-port-pair child
runs an ArduPilot "dataflash over MAVLink" sink for the user-side
link:

  * On the first REMOTE_LOG_DATA_BLOCK (msgid 184) from the user,
    open logs/<port2>/<YYYY-MM-DD>/sessionN.bin (lazy — an idle
    child never littered). N is the shared session_n from
    next_session_n() so .bin pairs with .tlog from the same fork.
  * Each block's 200 payload bytes are sparse-written at offset
    seqno * 200. Out-of-order blocks fill holes in-place; the file
    is always the size of the highest seen seqno + 1 blocks.
  * Each block gets a REMOTE_LOG_BLOCK_STATUS=ACK back through the
    user-side link. Forward jumps (seqno > highest_seen + 1) queue
    NACKs for the missing seqnos; tick() drains them, throttled to
    one NACK per missing seqno per 0.1s. We abandon a missing block
    after 60s wall time or once highest_seen is 200 ahead — both
    match MAVProxy's mavproxy_dataflash_logger.py.
  * Both REMOTE_LOG_* msgids (184, 185) are stripped from the
    user→engineer forward path so the support engineer's live
    session isn't polluted by log traffic. Engineer→user direction
    is left alone (per spec).

A few supporting pieces:

  * MAVLink::send_buf() — new thin public wrapper around the
    private send_data() that lets binlog (and any future caller)
    push *already-finalised* bytes through this link's transport.
    Needed because pack_chan() finalises the message (trims a
    trailing zero status byte for NACKs, writes CRC bytes into the
    payload area at the trimmed offset); calling send_message()
    afterwards would re-finalise, re-trim, see the CRC byte as
    "live payload", and emit the CRC byte on the wire where the
    receiver expects status — silent payload corruption on every
    NACK. The new send_buf() skips that second finalize.
  * Parse-guard relaxation in main_loop: the user-side
    receive_message() was gated on conn2_count > 0; binlog also
    needs to parse user packets, so the guard becomes
    `conn2_count > 0 || binlog_enabled`. Tlog behaviour is
    unchanged (still requires an engineer; pre-existing
    limitation, not regressed).

Tests in tests/test_binlog_capture.py drive a real proxy with a
synthetic vehicle (raw socket emitting hand-built DATA_BLOCKs):
  - blocks land at the right offsets, ACKs come back
  - REMOTE_LOG_* not seen on engineer side
  - forced gap triggers a NACK for the missing seqno
  - sessionN.tlog and sessionN.bin share N when both flags set

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleanup loop's per-file predicate (cleanup.cpp:26) becomes
is_session_file() and accepts both '.tlog' and '.bin'. The
retention rule (tlog_retention_days * 86400 by mtime) and the
empty-date-dir prune both apply uniformly — per spec, one
retention value covers both file types per entry.

Webadmin's session-file regex (webadmin/tlogs.py:24) broadens to
session\d+\.(tlog|bin). The listing pages and the
send_from_directory download endpoint flow through this regex
unchanged, so .bin files surface in the same per-date table and
download through the same path-traversal-safe route. The
private/no-store Cache-Control header from commit fc24fac
applies to .bin too (same sensitivity as .tlog).

Tests in tests/webadmin/test_tlog_routes.py:
  * owner + admin listings show .bin alongside .tlog
  * owner downloads a .bin successfully
  * .bin download carries Cache-Control: private, no-store
  * a session1.log / session1.pem in the same dir is still 404
    (regex caps the extension)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FireVPS was logging repeated [CRITICAL] WORKER TIMEOUT followed by
[ERROR] "Perhaps out of memory?" (gunicorn's stock SIGKILL line —
not an actual OOM). Every traceback ended in _sslobj.read() on a
TLS-wrapped socket; the public 8080 attracts a steady drip of
scanners that complete a TLS handshake then disappear, and the
default sync worker has no way to abandon a stalled connection —
the arbiter eventually kills it for missing the 30s heartbeat.

Switch to `-k gthread --threads 4 --timeout 60 --graceful-timeout 30`:
the arbiter heartbeat now ticks from the worker's main thread
regardless of what any individual request thread is blocked on, so
a scanner can hold one thread without taking the whole worker
down. -w 1 is still enough for the UI's load; threading is what
unlocks concurrency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KEY_FLAG_BINLOG was reachable only via 'keydb.py setflag <port2>
binlog' — the original plan said the CLI was enough, but in
practice owners and admins need a web UI affordance to turn .bin
recording on/off without shelling into the server.

Adds binlog_enabled (BooleanField) to OwnerEditForm and
AdminEditForm with help text noting the firmware-side
LOG_BACKEND_TYPE mavlink bit prerequisite. Routes set/clear
FLAG_BINLOG on save (mirroring the existing FLAG_TLOG handling)
and the first-enable default for retention now fires when either
flag transitions from off to on. Templates render the new
checkbox next to the existing tlog one in both owner.html and
admin_edit.html.

The retention label is updated to "Tlog + bin retention (days, …)"
since the same tlog_retention_days field governs cleanup for both
file types per spec.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A plain lexical sort puts session10.tlog between session1.tlog and
session2.tlog, which is exactly when you most want to scan a
listing visually (~10+ sessions in a day means something is
churning). Split each filename into digit-runs + lowercased non-
digit runs and use that as the key, so the rendered order is
session1, session2, …, session9, session10, session11, session20.

Applied to both _list_sessions (the per-date file list — primary
target) and _list_dates (YYYY-MM-DD already sorts correctly under
either scheme, but using the same helper keeps the call sites
symmetric).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Production found a corrupt session22.bin on FireVPS: 611 MB apparent
file size, only ~12 MB on disk, file starts with all zeros (an
ArduPilot .bin must begin with FMT records), and mavlogdump.py
refused to parse it. Root cause: a vehicle that was already logging
over MAVLink when SupportProxy activated has a seqno counter
already in the millions. We were obediently opening the file on
the *first* DATA_BLOCK regardless of seqno and sparse-writing at
offset seqno*200, so the resulting file is mostly holes with a
few clusters of real data — and crucially has no FMT records at
byte 0 so DFReader can't anchor.

Gate file open on seqno == 0 (matches MAVProxy's
mavproxy_dataflash_logger.py:344-348). Any DATA_BLOCK arriving
before that is silently discarded — both the file open and the
ACK side-effect. The vehicle restarting its log stream gives us a
fresh seqno=0; until then we wait.

This is intentionally strict for now; we may relax later (e.g. by
sending REMOTE_LOG_BLOCK_STATUS=START to nudge the vehicle into
restarting, or by accepting mid-stream seqnos when the user
explicitly opts in) but the strict gate is the simplest
correctness anchor.

Tests in tests/test_binlog_capture.py:

* test_strict_start_gate_discards_pre_seqno_0 — drives DATA_BLOCKs
  at seqno=1000..1003 through a fresh binlog-enabled child, asserts
  no .bin file is created and no ACKs are returned. Then sends
  seqno=0 + seqno=1 and asserts the file opens and is exactly 400
  bytes (no sparse extension to whatever the earlier seqnos were).

* test_bin_parses_with_dfreader — packs a minimal-but-valid
  ArduPilot bin payload (one FMT-of-FMT record padded to 200 bytes)
  into a DATA_BLOCK with seqno=0, drives it through the proxy, then
  opens the resulting sessionN.bin with
  pymavlink.DFReader.DFReader_binary and asserts the FMT record
  parses out. This is the test that would have caught the FireVPS
  corruption — without it we had no end-to-end "the file is
  actually a valid bin log" check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ArduPilot pre-arm rejected the vehicle on FireVPS with
"logging failed" whenever LOG_BACKEND_TYPE included the mavlink
backend (=3). Root cause: AP_Logger_MAVLink::logging_failed()
returns !_sending_to_client, and that flag only flips to true
when the vehicle receives a REMOTE_LOG_BLOCK_STATUS with status=ACK
and seqno=MAV_REMOTE_LOG_DATA_BLOCK_START (2147483646)
— see AP_Logger_MAVLink.cpp:240-251. We weren't sending it, so
the vehicle thought no GCS-side logger was listening and refused
to arm. (As a side-effect, the same START also resets the
vehicle's seqno counter to 0, which our strict file-open gate
already requires.)

BinlogWriter::tick() gains a "before any DATA_BLOCK" phase: while
binlog is enabled and any_block_seen is false, emit
REMOTE_LOG_BLOCK_STATUS(seqno=START_MAGIC, status=ACK) at 1 Hz,
throttled by last_start_sent_s. After the first DATA_BLOCK arrives
we fall through to the existing ACK/NACK pump; the continuous ACK
traffic from real blocks resets ArduPilot's 10-second
_last_response_time client-timeout for us.

target_system / target_component on the outgoing START are 0
(broadcast) — we haven't latched a vehicle sysid yet, and
ArduPilot's handle_ack() picks the proxy's identity off the
message header's src fields, not the body's target fields.

The tick site in supportproxy.cpp now fires on
`binlog_enabled && have_conn1` instead of `binlog.is_open()`,
since the START phase predates the file open.

Two tests in tests/test_binlog_capture.py:

* test_proxy_sends_remote_log_start_when_idle drives HEARTBEATs at
  ~1 Hz (the way a real ArduPilot does — main_loop's select wakes
  on each one, giving tick a chance to fire) and asserts ≥ 2 STARTs
  arrive on the user-side socket in 3 s.

* test_proxy_stops_sending_start_after_first_data_block sends
  seqno=0,1 right away, drains the ACKs, then watches for 2 s and
  asserts no further STARTs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MAX_ACKS_PER_TICK = 10 was carried over from MAVProxy's 100 Hz
idle-loop pump, where 10/loop already delivers 1 kHz of throughput.
Our main_loop wakes on each incoming packet rather than on a fixed
schedule, and under a TCP burst one recv() can return ~45 MAVLink
frames in a single call — the cap would let the ACK queue age 4+
ticks before catching up, prolonging the vehicle's
"pending-block" pressure exactly when the vehicle is logging
hardest.

Drop the cap; drain the whole pending_acks queue per tick. Each
ACK is a small UDP send; the cost difference is irrelevant next to
the benefit of freeing the vehicle's queue faster.

Field motivation: FireVPS DMS messages showed ~1 % message drop
during ArduPilot pre-arm bursts (~0.25 % in level flight). The
dominant cause is vehicle-side semaphore contention in
_WritePrioritisedBlock — we can't eliminate that from the proxy,
but the uncapped ACK pump is the one knob on our side that
strictly helps.

NACKs keep their MAX_NACKS_PER_TICK throttle so a wide gap can't
flood the wire with status messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The retention field is shared by .tlog and .bin recordings (the cleanup
loop already ages both), so the tlog-specific name was misleading. Rename
the C struct field, the Python helper (set_tlog_retention -> set_log_retention),
the default constant, and the __str__ output. The on-disk struct layout
is unchanged: same 4-byte float at the same offset, so existing keys.tdb
entries decode without migration.

The FLAG_TLOG bit, the 'tlog' flag name, and tlog-specific tests keep
their names — only the shared retention field is renamed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hourly cleanup worker now ages out both .tlog and .bin recordings
under the shared retention field, so the tlog-specific naming on the
loop entry point and the printf strings was misleading. Rename
tlog_cleanup_loop / tlog_cleanup_once and the surrounding log lines;
the file predicate and per-entry behaviour are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The listing and download blueprint now serves both .tlog and .bin
session files, and the retention form field is shared, so the
tlog-specific URLs, blueprint names, template names, and form labels
were misleading. Rename:

  webadmin/tlogs.py        -> webadmin/logs.py
  /admin/tlogs/<port2>/... -> /admin/logs/<port2>/...
  /me/tlogs/...            -> /me/logs/...
  admin_tlogs.html         -> admin_logs.html
  owner_tlogs.html         -> owner_logs.html
  form.tlog_retention_days -> form.log_retention_days
  blueprint admin_tlogs    -> admin_logs
  blueprint owner_tlogs    -> owner_logs

The tlog_enabled / binlog_enabled checkboxes keep their names since
they map 1:1 to the per-flag bits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claim one slot from KeyEntry.reserved[16] for fc_sysid (uint32, 0..255).
On-disk byte layout unchanged: existing records decode with fc_sysid=0,
which is the match-any default — so this is fully forward/backward
compatible without any migration step.

The field is consumed by the binlog reboot-detection path in a follow-up
commit. 0 = monitor packets from any MAVLink sysid (default); non-zero
restricts monitoring to that sysid so multi-FC setups can pin the
detection to the specific autopilot.

Adds:
  - keydb.h: uint32_t fc_sysid; reserved shrinks from [16] to [15]
  - keydb_lib.py: KeyEntry.fc_sysid, pack/unpack, set_fc_sysid helper,
    __str__ surfaces non-zero value
  - keydb.py: setsysid PORT2 SYSID CLI subcommand
  - tests: round-trip, range validation, CLI smoke, legacy-record
    zero-extends to fc_sysid=0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the FC reboots mid-session, ArduPilot resets _sending_to_client
to false and (once nudged) restarts REMOTE_LOG_DATA_BLOCK at seqno=0
with a fresh log. Previously SupportProxy would:

  1. Silently stop logging — the proxy's START sender stops the
     moment any_block_seen latches true, so a post-reboot vehicle is
     stuck with no streaming.
  2. Corrupt the bin file if streaming ever resumes — the seqno=0
     gate in handle_block() only fires when fp is null, so the new
     boot's seqno=0/1/2/... overwrite the old log's offsets 0/200/...,
     and seen_bitmap/highest_seen/NACK state are carried over from
     the old log. The resulting file is two boots mashed together
     and won't round-trip through DFReader.

This change:

* Adds BinlogWriter::observe(msg), called from the user-side receive
  path for every decoded MAVLink message. It watches SYSTEM_TIME
  packets from MAV_COMP_ID_AUTOPILOT1 (filtered by the per-entry
  fc_sysid if non-zero) for a >= 10 s backward jump in time_boot_ms —
  the unambiguous signature of an FC reboot. time_boot_ms is
  monotonic from boot; GPS sync corrections affect time_unix_usec,
  not this field.

* On detection, BinlogWriter::rotate_for_reboot() closes the current
  sessionN.bin, resets per-log state (seen_bitmap, highest_seen,
  pending_acks, nack_state, last_system_time_boot_ms_), zeros
  last_start_sent_s so the next tick() emits an immediate START,
  re-runs next_session_n() to find a fresh N, and opens
  sessionN+1.bin. Per-vehicle state (target_system, fc_sysid_filter_,
  any_block_seen) is preserved.

* Adds a 5 s keep-alive START (START_KEEPALIVE_S) emitted while
  any_block_seen is true. ArduPilot's handle_ack(seqno=START) just
  sets _sending_to_client = true — redundant on an already-streaming
  vehicle — so this is safe defence-in-depth that also covers
  _sending_to_client being cleared by something other than a reboot
  (e.g. its 10 s response-timeout firing during a long network
  stall). The existing inline START emit is factored into
  send_start_packet().

* BinlogWriter::open() now stores port2 and base_dir for
  rotate_for_reboot() to reuse, so the public API on handle_block()
  stays unchanged.

* supportproxy.cpp's listen_port gains fc_sysid, populated from
  KeyEntry.fc_sysid at every DB reload, and the per-pair child
  calls set_fc_sysid_filter() once at fork.

Tlogs intentionally do NOT rotate — the .tlog is the engineer's
session capture and the support call has not ended; only the
vehicle's log has. So after a reboot, sessionN.tlog covers the whole
engineer call while sessionN.bin / sessionN+1.bin / ... split per
vehicle boot.

Tests (tests/test_binlog_capture.py):
  - test_reboot_via_system_time_rotates: SYSTEM_TIME backward jump
    from compid=AUTOPILOT1 produces session2.bin while session1.bin
    keeps its original offset-0 data.
  - test_reboot_ignored_from_camera_compid: SYSTEM_TIME from
    compid=CAMERA doesn't rotate (other nodes have their own clocks).
  - test_reboot_blocked_by_fc_sysid_filter: with fc_sysid=1 on the
    entry, a sysid=2 SYSTEM_TIME doesn't rotate.
  - test_keepalive_start_after_first_block: START continues to fire
    at ~5 s cadence post-streaming so a future reboot recovers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both owner and admin can read/set the per-entry MAVLink sysid filter
that pins binlog reboot detection to a specific autopilot. 0 = match
any (default); non-zero = monitor SYSTEM_TIME packets only from that
sysid. Plumbed through both forms (IntegerField, NumberRange 0..255),
both routes (form.fc_sysid.data <-> ke.fc_sysid), and both templates
(rendered next to log_retention_days).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "mode" field in webui.json was overloaded: start_proxy.sh keys on
mode=standalone to launch gunicorn, while Flask keyed on mode=apache
to flip BEHIND_PROXY (which activates the ProxyFix x_prefix=1 wrap so
url_for() honours X-Forwarded-Prefix). On a host where we want both —
nginx fronting the webui — those two were mutually exclusive.

Add a separate "behind_proxy": true webui.json field that flips
BEHIND_PROXY independently. Keep the legacy mode=apache form working.

start_proxy.sh now parses behind_proxy and, when true:
  - exports WEBADMIN_BEHIND_PROXY=1 (config.py already maps this env
    var to app.config['BEHIND_PROXY']);
  - forces the no-TLS gunicorn launch even if fullchain.pem/privkey.pem
    exist (those pem files are for the supportproxy daemon's WSS
    endpoints, not for the webui — behind nginx the webui wants plain
    HTTP over loopback);
  - leaves WEBADMIN_INSECURE_COOKIES unset so the session cookie keeps
    its Secure flag (the reverse proxy forwards X-Forwarded-Proto:
    https, ProxyFix's x_proto=1 promotes request.is_secure).

No webadmin route, template, or test changes — the ProxyFix wrap at
webadmin/__init__.py:92-95 already exists; this commit just adds the
trigger that flips it on in a way compatible with running gunicorn
ourselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two units so the web UI can be restarted without dropping live
engineer/user MAVLink sessions on the proxy side:

* systemd/supportproxy.service — execs the supportproxy binary
  with WorkingDirectory=/home/support/proxy so it picks up
  keys.tdb / connections.tdb / proxy.log from the right place.
  Output goes to journal AND appended to proxy.log so existing
  tail-and-grep habits keep working.

* systemd/supportproxy-webadmin.service — runs the new
  start_webadmin.sh wrapper. Distinct lifecycle from the daemon.
  Output to journal + webui.log.

* start_webadmin.sh — foreground wrapper for the webadmin gunicorn
  (matching the lifecycle systemd expects, vs the backgrounded
  nohup launch start_proxy.sh does for cron deploys). Re-uses the
  same webui.json schema as start_proxy.sh: host / port /
  behind_proxy. Auto-creates .webadmin_secret if missing, handles
  the same three TLS branches (behind_proxy / standalone-HTTPS /
  plain-HTTP-fallback) as start_proxy.sh, activates the venv, then
  execs gunicorn.

README updated to document the systemd path as the recommended
production deploy, with the explicit warning not to run cron AND
systemd simultaneously (they'd race on the pidof check).

The cron-based start_proxy.sh stays for sites without systemd.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restart=on-failure does cover crashes (SIGSEGV / SIGABRT / OOM-kill
all count as failures), but it doesn't restart on clean exit (rare
but possible) or on SIGTERM from outside the service manager.

The cron-based model that these units replace was "if pidof shows
nothing, launch it" — i.e. respawn under any condition. Match that
with Restart=always, which still respects deliberate `systemctl
stop` / `systemctl restart` so it doesn't fight the operator.

Keep StartLimitBurst=5 / StartLimitIntervalSec=60 so a binary that's
genuinely broken trips into failed state for inspection rather than
crash-looping forever in the log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-shot migration script for SupportProxy hosts moving from the
apache2+cron deployment model to nginx (TLS termination + /dashboard
proxy) plus the new systemd units. Idempotent-ish: re-runnable if a
single step fails.

Driven by SP_USER and SP_DOMAIN env vars so it works for any host
(FireVPS uses SP_USER=fire, support.ardupilot.org would use
SP_USER=support). Optional SP_PURGE_APACHE=1 to apt-purge apache2
once the new stack has bedded in; default leaves apache stopped+
disabled so rollback is a single `systemctl start apache2` away.

Steps:
  1. Stop + disable apache2 (preserves install for rollback).
  2. apt install nginx + python3-certbot-nginx.
  3. /etc/nginx/sites-available/$SP_DOMAIN — HTTPS via existing LE
     cert, /dashboard proxied to 127.0.0.1:8080 with the X-Forwarded-
     Prefix dance.
  4. /var/www/$SP_DOMAIN/index.html — landing page linking to
     /dashboard/.
  5. nginx -t + enable + reload.
  6. Switch the certbot renewer from apache installer to nginx so
     future `certbot renew` runs don't try to drive apache.
  7. Patch ~/proxy/webui.json: host=127.0.0.1, behind_proxy=true.
  8. Install the two systemd units (with /home/support ->
     /home/$SP_USER substitution), reap any old supportproxy /
     udpproxy / webadmin procs, daemon-reload, enable --now.
  9. Print verification curls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues caught while migrating FireVPS:

* systemd[1]: "Unknown key 'StartLimitIntervalSec' in section [Service]"
  — those directives belong under [Unit], not [Service]. systemd
  silently ignored them and applied defaults. Move to [Unit] in both
  units.

* supportproxy-webadmin exited 127 ("command not found") because
  start_webadmin.sh's `exec gunicorn ...` runs under systemd's
  default PATH (no ~/.local/bin), and FireVPS uses a pip-user-install
  for gunicorn rather than a venv. The existing `source venv/bin/
  activate` line gracefully no-ops when there's no venv, but PATH
  was left without ~/.local/bin. Prepend it explicitly.

* migrate_to_systemd.sh had `$SUDO -u "$SP_USER" tee ...` which
  expands to `-u fire tee ...` (i.e. tries to run `-u` as a command)
  when SUDO is empty because we're already root. Write the file with
  $SUDO tee and chown it to the daemon user afterwards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For hosts where nginx + /dashboard + certbot-via-nginx are already
configured correctly AND the existing vhost has custom directives
that must NOT be overwritten (e.g. support.ardupilot.org's
^/$ -> ardupilot.org/dev/docs/support_proxy.html redirect), pass
SP_SKIP_NGINX=1 to run only the webui.json + systemd + verify
steps. Skips apache stop, nginx apt install, vhost write, landing
page write, nginx -t/reload, and the certbot renewer migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cert dir is only referenced when the script writes a new nginx
vhost. With SP_SKIP_NGINX=1 the existing nginx config is the source
of truth and the cert may live at a different LE name (e.g.
support.ardupilot.org's cert is at live/neon.ardupilot.org/ because
that's the host that originally requested it). Conditional the
check on SP_SKIP_NGINX so support.ardupilot.org doesn't fail
preflight.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tridge tridge merged commit 33f5658 into ArduPilot:main May 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant