Skip to content

Commit e4e1a5b

Browse files
TeoSlayerteovlclaude
authored
feat(pilotctl): agent-first CLI overhaul — bounded output, filters, styling (#247)
* feat(pilotctl): agent-first CLI overhaul — bounded output, filters, styling Inbox was unusable for agents (23 MB --json dumps, oldest-first, 80-char mid-token truncation) and several commands shared the same disease. This makes every high-traffic command bounded, filterable, non-interactive, and visually scannable, without breaking --json consumers. - inbox: newest-first, default --limit 10, --latest/--from/--since/ --full/read <id>/--clear --before; --json bounded (23 MB -> 3 KB) - received: same flag surface ported (mtime-ordered; sender metadata unavailable in dataexchange filenames) - peers: summary + exceptions-only view (surfaces unencrypted peers), colorized --all, --limit/--search - trust: newest-first, --limit 20, --search, one-way trust flagged - daemon status: fix contradiction (stale PID file reported "stopped" while live socket data printed); socket is now the source of truth - info: grouped identity/network/traffic/skills layout - ping: 5s default timeout (was 30s), relay-convergence hint on failure - send-message/ping --json: add "to" resolved-address field - send-message --wait, ping, bench, traceroute: animated elapsed line on stderr (TTY-only, erased on completion, no-op for pipes/--json) - config/skills status/updates/network list: aligned key-value, per-tool status dots, word-boundary wrap, dead MEMBERS column dropped - handshake/approve/untrust: next-step hints - context <command>: single-command spec (18 KB -> ~440 B) New style layer (cmd/pilotctl/style.go): semantic ANSI helpers gated on TTY + NO_COLOR/PILOT_NO_COLOR/TERM=dumb; tests pipe stdout so assertions stay plain-text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(cli-reference): regenerate for new received flags received gained --limit, --since, --clear --before in the CLI overhaul. The cli-reference-check gate caught the stale summary line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(send-file): bounded ACK wait, progress, throughput · docs rewrite Fixes the "send-file hangs ~120s then EOFs" report originally filed in BUG-updater-version-skew.md. Two distinct things ship here: 1. cmdSendFile reliability (M0 of the reliable-file-transfer proposal): - --timeout flag (default 90s) bounds the ACK wait. Was unbounded before, hitting SO_KEEPALIVE around 120s with an opaque EOF. On expiry we close the conn (unblocks the read goroutine), then surface a clear hint pointing at pilotctl ping <peer>. - Progress line on stderr via startWaitProgress, gated on TTY + not --json so agents don't see control chars. - Result JSON now carries elapsed_ms and throughput_mbps. - Receiver "ERR …" ACK already errored; tightened the hint. - parseFlags split for --timeout: positional args come from `pos`. 2. Docs: - BUG-updater-version-skew.md rewritten end-to-end. The RSS-stale mechanism the original claimed is wrong (updater hits the GitHub API directly per updater.go:247); the real cause is that pilot-updater ships but is never started (no launchd/systemd unit, not embedded in daemon). Also documents the missing pilot-gateway binary in v1.11.0 (confirmed by `tar tzf`). - PROPOSAL-reliable-file-transfer.md is the new home for the real fix — TypeFileStream wire type, INIT/CHUNK/ACK/DONE/ABORT/RESUME state machine, sliding-window backpressure, end-to-end SHA-256, resume protocol, backward-compat fallback to TypeFile. Six milestones; M0 ships in this commit. No wire-format change. No new deps. Full pilotctl suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(transport): dual-route key-exchange for dual-NAT convergence When two peers are both behind NAT (e.g. Mac home-NAT ↔ GCP VM stateful conntrack), the direct PILA key-exchange frame never lands, and the tunnel only reconverges after slow blackhole detection flips the peer to relay mode — measured 28s–3min on the canonical Mac↔VM rig, far longer than the dial/send timeouts, so send-file/send-message time out and the crypto state desyncs. sendKeyExchangeToNode now ALSO pushes the key-exchange via the beacon relay whenever the peer is not yet relay-flagged and a beacon is available. The relay copy converges in ~1 RTT. It is a no-op once the peer is relay-flagged (the primary send already went via relay), and relayProbeLoop keeps probing direct so a genuine direct path still upgrades the peer out of relay. Best-effort: a failed relay copy falls back to the existing slow path. Adds routing.SendRelayFrame (forced-relay send primitive, ignores the per-peer relay flag and blackhole heuristic) and the ClearRekeyGaveUp / ClearLastRekeyReq rekey-state shims. Verified on the canonical Mac↔VM dual-NAT rig: - G2 liveness: idle 5min (and 90min) then small msg arrives, no reset. - Small msg ACK in ~0.42s (was 28s–3min). - 64KB send-file byte-perfect (sha256 match), incl. from a cold daemon restart (fresh in-memory peer table) — tunnel re-converges in ~12s. - No regressions: 0 panics, 0 relay-copy failures on either end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(transport): NAT hole-punch direct-upgrade + streamed send-file + prefer-direct Make P2P transfers actually go direct (and stay direct) across NAT, and ship large files reliably. - daemon.go: rewrite relayProbeLoop → tryDirectUpgrade. The old loop sent a one-way SendDirectProbe every 5 min, which a stateful NAT/firewall always drops (no conntrack pinhole). Now it fires a beacon-coordinated RequestHolePunch to open the pinhole on both NATs, then pushes encrypted probes at the peer's REAL address so the peer's ClearRelayOnDirect promotes the path. Unpins blackhole-pinned (non-relay-only) peers, resolves fresh when uncached, and runs every 15 s (was 5 min). - tunnel.go: add SendDirectProbeTo — encrypted probe to an explicit real address (the upgrade primitive; the stored peers[] entry for a relay peer is the beacon placeholder). - ipc.go: handlePreferDirect — drop tunnel + cached resolution so the next dial re-runs resolve + punch; unpin relay. - pilotctl/main.go: send-file streams by default (TypeFileStream) and falls back to single-frame TypeFile when the peer never sends an INIT-ACK (back-compat); --no-stream forces legacy; reports transport/sha256/throughput. Adds `prefer-direct` command + --prefer-direct/--timeout flags. Verified on Mac↔GCP-VM (true dual-NAT) and a fresh throwaway VM: tunnel goes relay=False via hole-punch in ~8 s through a default-deny firewall, holds direct through a 50 MB transfer (no flip), byte-perfect sha256, ~7-15× the relay throughput. Survives a cold restart of both ends. go.mod points common/dataexchange at branch commits (pseudo-versions) pending their tagged releases; the version-bump to proper tags happens at release time (v1.11.1), which is intentionally held for review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): prepare 1.11.1 — NAT-reliable transfer (held for review) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Teodor Calin <teodor@vulturelabs.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 388805b commit e4e1a5b

21 files changed

Lines changed: 2547 additions & 451 deletions

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,37 @@ project uses [Semantic Versioning](https://semver.org/).
77
Detailed per-release notes are on the
88
[GitHub Releases page](https://github.com/TeoSlayer/pilotprotocol/releases).
99

10+
## [Unreleased]
11+
12+
Reliable P2P data transfer across NAT. Tag intentionally held for review.
13+
14+
### Added
15+
- **Chunked, ACK'd, resumable file transfer (`TypeFileStream`).** `pilotctl
16+
send-file` now streams files in 48 KiB chunks with per-chunk ACKs, an
17+
end-to-end SHA-256 integrity check, and automatic resume from the last
18+
contiguous byte after an interrupted transfer. Replaces the single
19+
atomic frame that stalled large transfers on any non-trivial path.
20+
Backward compatible: falls back to the legacy `TypeFile` path when the
21+
receiver is too old to answer the stream handshake. `--no-stream` forces
22+
the legacy path.
23+
- **`pilotctl prefer-direct <peer>`** and **`send-file --prefer-direct`**
24+
drop a peer's tunnel + cached resolution so the next dial re-runs the
25+
full resolve + NAT hole-punch flow and prefers the direct path.
26+
- `send-file` reports `transport`, `sha256`, and `throughput_mbps`; adds
27+
`--timeout`.
28+
29+
### Fixed
30+
- **NAT traversal now actually establishes (and holds) a direct path.** The
31+
relay→direct upgrade sent a one-way probe that a stateful NAT/firewall
32+
always dropped, so peers stayed on the beacon relay indefinitely. The
33+
daemon now runs a beacon-coordinated hole-punch and immediately probes
34+
the peer's real address to promote the path, retrying every 15 s (was
35+
5 min). Result on the dual-NAT rig: relay→direct in ~8 s, held through a
36+
50 MB transfer, ~7–15× the relay throughput.
37+
- **Dual-NAT key-exchange convergence.** Key exchange is now sent over both
38+
the direct and relay paths, so two NAT'd peers reconverge in ~1 RTT
39+
instead of waiting 28 s–3 min for blackhole detection.
40+
1041
## [1.11.2] - 2026-06-15
1142

1243
### Added

0 commit comments

Comments
 (0)