Skip to content

minor fixes from testing#37

Merged
tridge merged 8 commits into
ArduPilot:mainfrom
tridge:pr-tweaks2
May 12, 2026
Merged

minor fixes from testing#37
tridge merged 8 commits into
ArduPilot:mainfrom
tridge:pr-tweaks2

Conversation

@tridge
Copy link
Copy Markdown
Contributor

@tridge tridge commented May 12, 2026

No description provided.

tridge and others added 8 commits May 11, 2026 21:09
…open

Addresses three findings from the codex review of the unsigned-by-
default user-side port surface:

* Forward-jump cap (100 MB). handle_block() drops any block whose
  resulting offset would expand the file by more than 100 MiB in a
  single seqno step. Without this an attacker (or buggy peer) sending
  seqno=0 followed by seqno=2^32-1 would sparse-extend the file to
  ~800 GB and grow seen_bitmap to 512 MB of RSS. 100 MiB easily
  covers a ~30-minute network outage at 400 blocks/s.

* Per-port2 on-disk quota (1 GiB). BinlogWriter caches the size of
  the other .tlog/.bin files under logs/<port2>/ at open() and
  refreshes periodically. Any block whose write would push the
  per-port-pair total past 1 GiB is silently dropped (no ACK), so
  the vehicle's pending queue ages it out naturally. Cleanup.cpp's
  hourly pass enforces the same cap by deleting oldest files until
  under 1 GiB — runs independently of per-entry retention so even
  retention=0 ("keep forever") entries can't fill the disk.

* Deferred file open after reboot rotation. rotate_for_reboot()
  previously close()d the old file and immediately open()d the new
  one, defeating the strict seqno=0 gate that was supposed to keep
  a delayed pre-reboot block from sparse-writing into sessionN+1.bin.
  Now rotate_for_reboot() only pre-computes pending_session_n_ for
  the next open; the gate at handle_block:fp==nullptr re-engages
  and only opens sessionN+1.bin when a fresh seqno=0 arrives from
  the rebooted vehicle. The keep-alive START in tick() is
  unaffected (it doesn't depend on fp) so the vehicle is still
  nudged into streaming.

Tests in test_binlog_capture.py: test_seqno_jump_above_100mb_rejected,
test_disk_quota_blocks_writes_when_over_cap,
test_late_old_block_after_rotation_dropped.

Tests in test_log_cleanup.py:
test_quota_deletes_oldest_even_with_retention_zero — uses sparse
files (truncate) to fake a 1.2 GiB pre-existing state without
consuming real disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KEY_FLAG_TLOG was supposed to mean "record this port pair's user-
side traffic from the first packet", but the parse loops at both
the UDP and TCP receive sites were gated on:

    if (conn2_count > 0 || binlog_enabled)

If only KEY_FLAG_TLOG was set and the user started sending before
an engineer connected (or no engineer ever connected), those
packets were read off the socket and discarded — never parsed,
never written to sessionN.tlog. Silent data loss.

Add tlog_enabled to both gates. Cost is the parse overhead the
binlog path already pays. Regression test in test_tlog_capture.py
drives only user-side HEARTBEATs with no engineer connecting and
asserts the user frames land in session1.tlog.

Reported by codex review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small UX changes on the admin "Add a new entry" form, plus
the validator change that goes with them:

* "Generate" button next to the passphrase input. Produces 12 random
  characters from [A-Za-z0-9] via window.crypto.getRandomValues()
  (cryptographically secure, no Math.random()), then flips the field
  to type=text so the admin can read and copy the value before
  submitting. The existing password-toggle eye widget still lets
  them switch it back to dots.

* port2 auto-suggest. When the admin types port1, port2 is
  pre-filled with port1+1000 (the SupportProxy convention). The
  auto-fill stops as soon as the admin types into port2 manually,
  so an explicit choice is never clobbered.

* Port range validation tightened from 1..65535 to 10000..60000 in
  the AdminAddForm. Stays clear of well-known ports and most of the
  ephemeral allocation range. Uniqueness (no overlap with existing
  port1 or port2 in keys.tdb) was already enforced by
  keydb_lib.add_entry(); the route's existing CLIError handler
  surfaces the message via flash.

Three new tests in tests/webadmin/test_admin.py:
test_admin_rejects_port_below_range,
test_admin_rejects_port_above_range,
test_admin_rejects_overlapping_port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original ask was "show the current passphrase" on the admin's
edit page, but KeyEntry only stores the SHA-256 of the passphrase
(which IS the MAVLink signing key) — the original string is
discarded on set_passphrase(). So "show" isn't a read of existing
data without a schema change to store the passphrase, which the
user opted not to do.

Instead, a Generate button next to the New passphrase / Confirm
passphrase pair: clicking it fills BOTH fields with a fresh
12-character [A-Za-z0-9] value via crypto.getRandomValues() and
flips both to type=text so the admin can see + copy the new
passphrase before submitting. EqualTo validator passes
immediately. Saving the form rotates the passphrase via the
existing set_passphrase() path.

The same admin-add.js module already had a single-field generator
for the AdminAddForm — refactored to share the random-string +
button-construction helpers. The dual-field generator attaches
wherever both new_passphrase and confirm_passphrase IDs exist on
the page, which covers admin_edit.html and owner.html naturally.

No new tests: the JS is purely client-side and the underlying
set-passphrase route is already covered by existing tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the old add_partner.sh workflow into the admin UI. The add
form gains a "Count" field (default 1, capped at 50). With count=1
it behaves exactly as before — explicit port1/port2/name. With
count>1 the route derives, for entry i (1-indexed):
  port1 = base + i - 1
  port2 = port1 + 1000
  name  = "<name><i>"
all sharing one passphrase, matching the script. The explicit port2
form field is ignored when count>1 (admin-add.js makes it readonly
and pins it to the base port2 to signal that).

The whole batch is range-checked up front (every derived port must
be 10000..60000) and created in a single TDB transaction, so a
collision with an existing entry rolls the entire batch back — no
partial create.

After a successful add the route flashes a 'partner_text' message:
a plain-text blurb listing each connection's user port, support
engineer port, and passphrase, plus instructions to change the
passphrase at https://support.ardupilot.org/dashboard (updated from
the old docs-page wording). base.html renders that category as a
read-only <textarea> with a "Copy to clipboard" button (wired in
admin-add.js).

Tests in tests/webadmin/test_admin.py:
test_admin_can_create_multiple_consecutive,
test_multi_create_is_atomic_on_overlap,
test_multi_create_rejects_out_of_range_derived_port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /me (owner) page both auto-refreshes every 5 s (to keep the
live-connections table current) and carries the owner's edit form.
The refresh swaps the whole <main> for the server's render — which
still has the OLD checkbox/field state, since the form wasn't
submitted — so toggling "record tlogs" (or editing any field) and
then waiting more than ~5 s silently reverted the change before the
user could click Save.

Fix: auto-refresh.js now checks whether any form control under
<main> differs from the state the server sent (compared against the
DOM's default* properties) and skips the swap while that's true.
It retries on the next tick; once the user submits (full page
reload) or resets the form, the form is clean again and refreshing
resumes. Pages with no form (connections list, log lists) are
unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On a host that uses the systemd units, the supportproxy daemon and
the webadmin gunicorn are managed by systemd with Restart=always.
But update_server.sh's tail did `pkill supportproxy; ./start_proxy.sh`,
and start_proxy.sh's `pidof supportproxy || nohup ... &` would launch
a competing copy in the ~5 s gap before systemd's Restart= respawned
its own. Whichever copy won the listening-port race left the other
spamming "Failed to open TCP port N - Address already in use" on
every 5 s keys.tdb reload — and the systemd unit thrashed (restart
counter in the hundreds). Same race took down supportproxy-webadmin
when its gunicorn lost port 8080.

Two fixes:

* start_proxy.sh now exits 0 immediately if `systemctl is-enabled
  supportproxy.service` succeeds (a read-only query, works as the
  non-root deploy user). A stray cron entry pointing at it, or
  update_server.sh calling it, becomes a harmless no-op on systemd
  hosts.

* update_server.sh detects systemd-managed hosts. On those it just
  kills the daemon + webadmin (they run as the deploy user) and
  waits out systemd's RestartSec so Restart=always brings the freshly
  built binary back — no start_proxy.sh call, no competing copy. On
  legacy cron hosts it keeps the old pkill + start_proxy.sh path.

No passwordless sudo is needed (we don't `systemctl restart` — we
let Restart= do it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Send this to the partner" textarea defaulted to wrap="soft", so
on a narrow browser window the lines wrapped mid-content ("support
engineer port:" / "41001" on two lines) and the box looked cramped.
Add wrap="off" — long lines now get a horizontal scrollbar instead
of wrapping, and the Copy button still copies the unwrapped value.

Also reformat _build_partner_text() so every line is <= ~50 columns
(the 66-char intro line was the worst offender), so even at a modest
width the box rarely needs to scroll horizontally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tridge tridge merged commit 3e335a1 into ArduPilot:main May 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant