|
| 1 | +# QUIC Agent Tunnel Protocol |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The agent tunnel enables Devolutions Gateway to proxy RDP/SSH/Kerberos connections through agents deployed in private networks. |
| 6 | +Agents connect **outbound** to the gateway — no inbound firewall rules needed. |
| 7 | +The tunnel uses QUIC (via Cloudflare's quiche library) with mutual TLS for transport. |
| 8 | + |
| 9 | +## Trust Model |
| 10 | + |
| 11 | +### Private PKI |
| 12 | + |
| 13 | +The gateway runs its own Certificate Authority (CA). |
| 14 | +No system trust store or public CA is involved. |
| 15 | + |
| 16 | +``` |
| 17 | +Gateway CA (ECDSA P-256, self-signed, 10-year validity) |
| 18 | + ├── Server Cert (signed by CA, 1-year validity, SAN = gateway hostname) |
| 19 | + ├── Agent Cert A (signed by CA, 1-year validity, SAN = urn:uuid:{agent_id}) |
| 20 | + └── Agent Cert B (signed by CA, 1-year validity, SAN = urn:uuid:{agent_id}) |
| 21 | +``` |
| 22 | + |
| 23 | +The CA is generated automatically on first gateway startup. |
| 24 | +Agent certificates are issued during enrollment. |
| 25 | + |
| 26 | +### CSR-Based Enrollment (Private Key Never Leaves Agent) |
| 27 | + |
| 28 | +``` |
| 29 | +Agent Gateway |
| 30 | + │ │ |
| 31 | + │ Generate ECDSA P-256 key pair locally │ |
| 32 | + │ Generate CSR (contains public key only) │ |
| 33 | + │ Write private key to disk immediately │ |
| 34 | + │ │ |
| 35 | + │── POST /jet/agent-tunnel/enroll ─────────>│ |
| 36 | + │ Bearer: <enrollment-token> │ |
| 37 | + │ Body: { agent_name, csr_pem } │ |
| 38 | + │ │ Verify CSR signature |
| 39 | + │ │ Extract public key from CSR |
| 40 | + │ │ Generate agent UUID |
| 41 | + │ │ Sign cert with CA (embed UUID in SAN) |
| 42 | + │ │ |
| 43 | + │<──────────────────────────────────────────│ |
| 44 | + │ { agent_id, client_cert_pem, │ |
| 45 | + │ gateway_ca_cert_pem, quic_endpoint } │ |
| 46 | + │ │ |
| 47 | + │ Write cert + CA cert to disk │ |
| 48 | + │ │ |
| 49 | + │ Private key NEVER transmitted │ |
| 50 | +``` |
| 51 | + |
| 52 | +The enrollment token is either: |
| 53 | +- A **one-time token** (UUID, 122-bit entropy) generated by the gateway webapp/DVLS — consumed on use, cannot be replayed. |
| 54 | +- A **static secret** configured in gateway.json — compared in constant time. |
| 55 | + |
| 56 | +The enrollment endpoint (`/jet/agent-tunnel/enroll`) bypasses the normal JWT auth middleware and uses its own Bearer token validation. |
| 57 | + |
| 58 | +## Transport: QUIC over UDP |
| 59 | + |
| 60 | +### Why QUIC |
| 61 | + |
| 62 | +- **Outbound-only** — agent initiates the connection, traverses NAT/firewalls. |
| 63 | +- **Built-in TLS 1.3** — mutual authentication is part of the QUIC handshake, not a separate layer. |
| 64 | +- **Stream multiplexing** — one connection carries many independent bidirectional streams without head-of-line blocking. |
| 65 | +- **Connection migration** — IP changes don't break the connection (disabled in our config for simplicity). |
| 66 | + |
| 67 | +### QUIC Configuration |
| 68 | + |
| 69 | +Both sides configure matching transport parameters: |
| 70 | + |
| 71 | +| Parameter | Value | Why | |
| 72 | +|---|---|---| |
| 73 | +| `max_idle_timeout` | 120,000 ms | Close connection after 2 min of silence. | |
| 74 | +| `max_recv/send_udp_payload_size` | 1,350 bytes | Conservative MTU to avoid fragmentation. | |
| 75 | +| `initial_max_data` | 10 MB | Total connection-level flow control window. | |
| 76 | +| `initial_max_stream_data_bidi_local` | 1 MB | Per-stream flow control (locally-opened streams). | |
| 77 | +| `initial_max_stream_data_bidi_remote` | 1 MB | Per-stream flow control (remotely-opened streams). | |
| 78 | +| `initial_max_streams_bidi` | 100 | Max concurrent bidirectional streams. | |
| 79 | +| `disable_active_migration` | true | Don't migrate on IP change. | |
| 80 | + |
| 81 | +ALPN protocol identifier: `devolutions-agent-tunnel` |
| 82 | + |
| 83 | +### mTLS Handshake |
| 84 | + |
| 85 | +``` |
| 86 | +Agent Gateway |
| 87 | + │ │ |
| 88 | + │── QUIC Initial (ClientHello) ────────────>│ |
| 89 | + │ Client cert: agent-{uuid}-cert.pem │ |
| 90 | + │ │ |
| 91 | + │<──── QUIC Handshake (ServerHello) ────────│ |
| 92 | + │ Server cert: agent-tunnel-server.pem │ |
| 93 | + │ │ |
| 94 | + │ Verify: server cert signed by CA? │ Verify: client cert signed by CA? |
| 95 | + │ (using gateway-ca.pem from enrollment) │ (using own CA cert) |
| 96 | + │ │ Extract agent UUID from SAN |
| 97 | + │ │ Register in AgentRegistry |
| 98 | + │ │ |
| 99 | + │ QUIC connection established │ |
| 100 | +``` |
| 101 | + |
| 102 | +Both sides call `verify_peer(true)`. |
| 103 | +A rogue agent cannot connect without a CA-signed certificate. |
| 104 | +A rogue server cannot impersonate the gateway without the CA-signed server certificate. |
| 105 | + |
| 106 | +## Stream Model |
| 107 | + |
| 108 | +QUIC provides multiplexed bidirectional streams on a single connection. |
| 109 | +Stream IDs follow the QUIC specification: |
| 110 | + |
| 111 | +| ID pattern | Initiator | Type | Usage | |
| 112 | +|---|---|---|---| |
| 113 | +| 0, 4, 8, … | Client (agent) | Bidirectional | Control stream (stream 0 only) | |
| 114 | +| 1, 5, 9, … | Server (gateway) | Bidirectional | Session proxy streams | |
| 115 | + |
| 116 | +### Stream 0: Control Stream |
| 117 | + |
| 118 | +Always open. |
| 119 | +Carries periodic control messages between agent and gateway. |
| 120 | + |
| 121 | +### Streams 1, 5, 9, …: Session Proxy Streams |
| 122 | + |
| 123 | +One stream per proxied connection (RDP session, SSH session, etc.). |
| 124 | +Gateway opens the stream and sends a `ConnectMessage`. |
| 125 | +After setup, the stream carries raw TCP bytes bidirectionally. |
| 126 | + |
| 127 | +## Message Encoding |
| 128 | + |
| 129 | +All messages use **length-prefixed bincode**: |
| 130 | + |
| 131 | +``` |
| 132 | +┌─────────────────────────┬──────────────────────────────┐ |
| 133 | +│ 4 bytes (big-endian u32)│ N bytes (bincode payload) │ |
| 134 | +│ message_length = N │ │ |
| 135 | +└─────────────────────────┴──────────────────────────────┘ |
| 136 | +``` |
| 137 | + |
| 138 | +bincode is a compact binary serialization format native to Rust's serde ecosystem. |
| 139 | +No schema files needed — types are defined directly in Rust structs. |
| 140 | + |
| 141 | +### Size Limits |
| 142 | + |
| 143 | +| Message type | Max size | Enforced where | |
| 144 | +|---|---|---| |
| 145 | +| Control messages | 1 MiB (`MAX_CONTROL_MESSAGE_SIZE`) | Gateway + Agent | |
| 146 | +| Session messages | 64 KiB (`MAX_SESSION_MESSAGE_SIZE`) | Gateway + Agent | |
| 147 | + |
| 148 | +The length prefix is checked **before** deserialization. |
| 149 | +bincode deserialization is bounded with `bincode::options().with_limit()` to prevent crafted payloads with huge internal `Vec` lengths from causing OOM. |
| 150 | + |
| 151 | +## Control Messages (Stream 0) |
| 152 | + |
| 153 | +```rust |
| 154 | +enum ControlMessage { |
| 155 | + RouteAdvertise { |
| 156 | + protocol_version: u16, // Currently 2 |
| 157 | + epoch: u64, // Route data version number |
| 158 | + subnets: Vec<Ipv4Network>, // e.g. [10.0.0.0/8, 192.168.1.0/24] |
| 159 | + domains: Vec<DomainAdvertisement>, // e.g. [{domain: "contoso.local", auto_detected: true}] |
| 160 | + }, |
| 161 | + Heartbeat { |
| 162 | + timestamp_ms: u64, // Agent's current time (ms since epoch) |
| 163 | + active_stream_count: u32, // Number of active proxy sessions |
| 164 | + }, |
| 165 | + HeartbeatAck { |
| 166 | + timestamp_ms: u64, // Echo back agent's timestamp (for RTT calculation) |
| 167 | + }, |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +### RouteAdvertise |
| 172 | + |
| 173 | +Tells the gateway what networks and domains this agent can reach. |
| 174 | + |
| 175 | +- Sent immediately after handshake with `epoch=1`. |
| 176 | +- Re-sent every 30 seconds with the same epoch (gateway refreshes `last_seen`, does not update routes). |
| 177 | +- If route data changes, epoch is incremented (gateway replaces old routes). |
| 178 | +- Stale epochs (older than current) are ignored. |
| 179 | + |
| 180 | +**Domain auto-detection:** |
| 181 | +- Windows: reads `USERDNSDOMAIN` environment variable, falls back to `GetComputerNameExW(ComputerNameDnsDomain)`. |
| 182 | +- Linux: parses `/etc/resolv.conf` for `search` or `domain` directives. |
| 183 | +- Auto-detected domains are tagged `auto_detected: true`. |
| 184 | + |
| 185 | +### Heartbeat / HeartbeatAck |
| 186 | + |
| 187 | +- Agent sends `Heartbeat` every 60 seconds. |
| 188 | +- Gateway replies with `HeartbeatAck` echoing the timestamp. |
| 189 | +- Agent computes `RTT = now - echoed_timestamp`. |
| 190 | +- Gateway uses heartbeat to update `last_seen`. |
| 191 | +- Agent offline threshold: 90 seconds without heartbeat. |
| 192 | + |
| 193 | +## Session Messages (Streams 1, 5, 9, …) |
| 194 | + |
| 195 | +### ConnectMessage (Gateway → Agent) |
| 196 | + |
| 197 | +```rust |
| 198 | +struct ConnectMessage { |
| 199 | + session_id: Uuid, // Unique session identifier |
| 200 | + target: String, // e.g. "10.0.0.5:3389" or "dc01.contoso.local:88" |
| 201 | +} |
| 202 | +``` |
| 203 | + |
| 204 | +Gateway opens a new server-initiated bidirectional stream and sends this message. |
| 205 | + |
| 206 | +### ConnectResponse (Agent → Gateway) |
| 207 | + |
| 208 | +```rust |
| 209 | +enum ConnectResponse { |
| 210 | + Success, |
| 211 | + Error { code: u16, reason: String }, |
| 212 | +} |
| 213 | +``` |
| 214 | + |
| 215 | +### After ConnectResponse::Success |
| 216 | + |
| 217 | +The stream becomes a **raw byte pipe** — no more framing. |
| 218 | +Every byte written to one end comes out the other end, in order. |
| 219 | +This carries the actual protocol traffic (RDP, SSH, Kerberos, etc.). |
| 220 | + |
| 221 | +``` |
| 222 | +Browser ↔ WebSocket ↔ Gateway ↔ QUIC stream ↔ Agent ↔ TCP ↔ Target |
| 223 | +``` |
| 224 | + |
| 225 | +## Session Proxy Flow (Complete) |
| 226 | + |
| 227 | +``` |
| 228 | +1. User opens SSH session to 10.0.0.5 in webapp |
| 229 | +2. Webapp gets session token (JWT with destination=10.0.0.5:22) |
| 230 | +3. Browser opens WebSocket to /jet/fwd/tcp/{session_id} |
| 231 | +4. Gateway routing: 10.0.0.5 matches agent's 10.0.0.0/8 subnet |
| 232 | +5. Gateway opens QUIC stream (stream_id=1) to agent |
| 233 | +6. Gateway sends ConnectMessage { session_id, target: "10.0.0.5:22" } |
| 234 | +7. Agent receives ConnectMessage on stream 1 |
| 235 | +8. Agent validates: 10.0.0.5 is in its advertised 10.0.0.0/8? YES |
| 236 | +9. Agent connects TCP to 10.0.0.5:22 |
| 237 | +10. Agent sends ConnectResponse::Success |
| 238 | +11. Bidirectional data flow: |
| 239 | + Browser → WS → Gateway → QUIC stream 1 → Agent → TCP → SSH server |
| 240 | + SSH server → TCP → Agent → QUIC stream 1 → Gateway → WS → Browser |
| 241 | +``` |
| 242 | + |
| 243 | +## Agent Subnet Validation |
| 244 | + |
| 245 | +When the agent receives a `ConnectMessage`, it: |
| 246 | + |
| 247 | +1. DNS-resolves the target hostname to IP addresses. |
| 248 | +2. Filters: only keep IPs that fall within the agent's configured `advertise_subnets`. |
| 249 | +3. IPv6 addresses are dropped (only IPv4 subnet matching is supported). |
| 250 | +4. If no reachable IPs remain → `ConnectResponse::Error` ("target not in advertised subnets"). |
| 251 | +5. Otherwise → TCP connect to the first reachable IP. |
| 252 | + |
| 253 | +This prevents a compromised gateway from using the agent as an open proxy to arbitrary hosts. |
| 254 | + |
| 255 | +## Auto-Reconnect |
| 256 | + |
| 257 | +The agent's tunnel task runs an outer reconnection loop: |
| 258 | + |
| 259 | +``` |
| 260 | +loop { |
| 261 | + let start = now(); |
| 262 | + match run_single_connection().await { |
| 263 | + Ok(()) => return; // Graceful shutdown |
| 264 | + Err(e) => log(e); // Connection lost |
| 265 | + } |
| 266 | + if elapsed > 30s { backoff = 1s; } // Was working, reset |
| 267 | + sleep(backoff); // Respect shutdown signal |
| 268 | + backoff = (backoff * 2 * jitter).min(60s); |
| 269 | +} |
| 270 | +``` |
| 271 | + |
| 272 | +- Initial backoff: 1 second. |
| 273 | +- Max backoff: 60 seconds. |
| 274 | +- Jitter: ±25% to prevent thundering herd. |
| 275 | +- Backoff resets after 30 seconds of stable connection. |
| 276 | +- Shutdown signal is checked during backoff sleep. |
| 277 | +- Each reconnection re-reads config (supports config changes) and re-resolves DNS. |
| 278 | + |
| 279 | +## Security Measures Summary |
| 280 | + |
| 281 | +| Measure | Where | |
| 282 | +|---|---| |
| 283 | +| mTLS (mutual certificate verification) | QUIC handshake | |
| 284 | +| Private PKI (no system trust store) | cert.rs | |
| 285 | +| CSR-based enrollment (key never leaves agent) | enrollment.rs, cert.rs | |
| 286 | +| One-time enrollment tokens (122-bit UUID) | enrollment_store.rs | |
| 287 | +| Constant-time secret comparison | agent_enrollment.rs | |
| 288 | +| Agent subnet validation (no open proxy) | tunnel.rs | |
| 289 | +| Bounded bincode deserialization | connection.rs, tunnel.rs | |
| 290 | +| Buffer size limits on control/proxy streams | connection.rs | |
| 291 | +| Bounded read channels (backpressure) | stream.rs | |
| 292 | +| Max connection limit (1000) | listener.rs | |
| 293 | +| File permissions 0o600 on private keys (Unix) | cert.rs, enrollment.rs | |
| 294 | +| Agent name validation (printable ASCII, max 255) | agent_enrollment.rs | |
0 commit comments