Skip to content

Commit 0ae3790

Browse files
feat: QUIC agent tunnel — protocol, listener, agent client
Add QUIC-based agent tunnel core infrastructure. Agents in private networks connect outbound to Gateway via QUIC/mTLS, advertise reachable subnets and domains, and proxy TCP connections on behalf of Gateway. Protocol (agent-tunnel-proto crate): - RouteAdvertise with subnets + domain advertisements - ConnectMessage/ConnectResponse for session stream setup - Heartbeat/HeartbeatAck for liveness detection - Protocol version negotiation (v2) Gateway (agent_tunnel module): - QUIC listener with mTLS authentication - Agent registry with subnet/domain tracking - Certificate authority for agent enrollment - Enrollment token store (one-time tokens) - Bidirectional proxy stream multiplexing Agent (devolutions-agent): - QUIC client with auto-reconnect and exponential backoff - Agent enrollment with config merge (preserves existing settings) - Domain auto-detection (Windows: USERDNSDOMAIN, Linux: resolv.conf) - Subnet validation on incoming connections - Certificate file permissions (0o600 on Unix) API endpoints: - POST /jet/agent-tunnel/enroll — agent enrollment - GET /jet/agent-tunnel/agents — list agents - GET /jet/agent-tunnel/agents/{id} — get agent - DELETE /jet/agent-tunnel/agents/{id} — delete agent - POST /jet/agent-tunnel/agents/resolve-target — routing diagnostics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e4052b2 commit 0ae3790

37 files changed

Lines changed: 4850 additions & 96 deletions

Cargo.lock

Lines changed: 306 additions & 87 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

PROTOCOL.md

Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# QUIC Agent Tunnel Protocol
2+
3+
## Overview
4+
5+
The agent tunnel enables Devolutions Gateway to proxy RDP/SSH/Kerberos connections through agents deployed in private networks.
6+
Agents connect **outbound** to the gateway — no inbound firewall rules needed.
7+
The tunnel uses QUIC (via Cloudflare's quiche library) with mutual TLS for transport.
8+
9+
## Trust Model
10+
11+
### Private PKI
12+
13+
The gateway runs its own Certificate Authority (CA).
14+
No system trust store or public CA is involved.
15+
16+
```
17+
Gateway CA (ECDSA P-256, self-signed, 10-year validity)
18+
├── Server Cert (signed by CA, 1-year validity, SAN = gateway hostname)
19+
├── Agent Cert A (signed by CA, 1-year validity, SAN = urn:uuid:{agent_id})
20+
└── Agent Cert B (signed by CA, 1-year validity, SAN = urn:uuid:{agent_id})
21+
```
22+
23+
The CA is generated automatically on first gateway startup.
24+
Agent certificates are issued during enrollment.
25+
26+
### CSR-Based Enrollment (Private Key Never Leaves Agent)
27+
28+
```
29+
Agent Gateway
30+
│ │
31+
│ Generate ECDSA P-256 key pair locally │
32+
│ Generate CSR (contains public key only) │
33+
│ Write private key to disk immediately │
34+
│ │
35+
│── POST /jet/agent-tunnel/enroll ─────────>│
36+
│ Bearer: <enrollment-token> │
37+
│ Body: { agent_name, csr_pem } │
38+
│ │ Verify CSR signature
39+
│ │ Extract public key from CSR
40+
│ │ Generate agent UUID
41+
│ │ Sign cert with CA (embed UUID in SAN)
42+
│ │
43+
│<──────────────────────────────────────────│
44+
│ { agent_id, client_cert_pem, │
45+
│ gateway_ca_cert_pem, quic_endpoint } │
46+
│ │
47+
│ Write cert + CA cert to disk │
48+
│ │
49+
│ Private key NEVER transmitted │
50+
```
51+
52+
The enrollment token is either:
53+
- A **one-time token** (UUID, 122-bit entropy) generated by the gateway webapp/DVLS — consumed on use, cannot be replayed.
54+
- A **static secret** configured in gateway.json — compared in constant time.
55+
56+
The enrollment endpoint (`/jet/agent-tunnel/enroll`) bypasses the normal JWT auth middleware and uses its own Bearer token validation.
57+
58+
## Transport: QUIC over UDP
59+
60+
### Why QUIC
61+
62+
- **Outbound-only** — agent initiates the connection, traverses NAT/firewalls.
63+
- **Built-in TLS 1.3** — mutual authentication is part of the QUIC handshake, not a separate layer.
64+
- **Stream multiplexing** — one connection carries many independent bidirectional streams without head-of-line blocking.
65+
- **Connection migration** — IP changes don't break the connection (disabled in our config for simplicity).
66+
67+
### QUIC Configuration
68+
69+
Both sides configure matching transport parameters:
70+
71+
| Parameter | Value | Why |
72+
|---|---|---|
73+
| `max_idle_timeout` | 120,000 ms | Close connection after 2 min of silence. |
74+
| `max_recv/send_udp_payload_size` | 1,350 bytes | Conservative MTU to avoid fragmentation. |
75+
| `initial_max_data` | 10 MB | Total connection-level flow control window. |
76+
| `initial_max_stream_data_bidi_local` | 1 MB | Per-stream flow control (locally-opened streams). |
77+
| `initial_max_stream_data_bidi_remote` | 1 MB | Per-stream flow control (remotely-opened streams). |
78+
| `initial_max_streams_bidi` | 100 | Max concurrent bidirectional streams. |
79+
| `disable_active_migration` | true | Don't migrate on IP change. |
80+
81+
ALPN protocol identifier: `devolutions-agent-tunnel`
82+
83+
### mTLS Handshake
84+
85+
```
86+
Agent Gateway
87+
│ │
88+
│── QUIC Initial (ClientHello) ────────────>│
89+
│ Client cert: agent-{uuid}-cert.pem │
90+
│ │
91+
│<──── QUIC Handshake (ServerHello) ────────│
92+
│ Server cert: agent-tunnel-server.pem │
93+
│ │
94+
│ Verify: server cert signed by CA? │ Verify: client cert signed by CA?
95+
│ (using gateway-ca.pem from enrollment) │ (using own CA cert)
96+
│ │ Extract agent UUID from SAN
97+
│ │ Register in AgentRegistry
98+
│ │
99+
│ QUIC connection established │
100+
```
101+
102+
Both sides call `verify_peer(true)`.
103+
A rogue agent cannot connect without a CA-signed certificate.
104+
A rogue server cannot impersonate the gateway without the CA-signed server certificate.
105+
106+
## Stream Model
107+
108+
QUIC provides multiplexed bidirectional streams on a single connection.
109+
Stream IDs follow the QUIC specification:
110+
111+
| ID pattern | Initiator | Type | Usage |
112+
|---|---|---|---|
113+
| 0, 4, 8, … | Client (agent) | Bidirectional | Control stream (stream 0 only) |
114+
| 1, 5, 9, … | Server (gateway) | Bidirectional | Session proxy streams |
115+
116+
### Stream 0: Control Stream
117+
118+
Always open.
119+
Carries periodic control messages between agent and gateway.
120+
121+
### Streams 1, 5, 9, …: Session Proxy Streams
122+
123+
One stream per proxied connection (RDP session, SSH session, etc.).
124+
Gateway opens the stream and sends a `ConnectMessage`.
125+
After setup, the stream carries raw TCP bytes bidirectionally.
126+
127+
## Message Encoding
128+
129+
All messages use **length-prefixed bincode**:
130+
131+
```
132+
┌─────────────────────────┬──────────────────────────────┐
133+
│ 4 bytes (big-endian u32)│ N bytes (bincode payload) │
134+
│ message_length = N │ │
135+
└─────────────────────────┴──────────────────────────────┘
136+
```
137+
138+
bincode is a compact binary serialization format native to Rust's serde ecosystem.
139+
No schema files needed — types are defined directly in Rust structs.
140+
141+
### Size Limits
142+
143+
| Message type | Max size | Enforced where |
144+
|---|---|---|
145+
| Control messages | 1 MiB (`MAX_CONTROL_MESSAGE_SIZE`) | Gateway + Agent |
146+
| Session messages | 64 KiB (`MAX_SESSION_MESSAGE_SIZE`) | Gateway + Agent |
147+
148+
The length prefix is checked **before** deserialization.
149+
bincode deserialization is bounded with `bincode::options().with_limit()` to prevent crafted payloads with huge internal `Vec` lengths from causing OOM.
150+
151+
## Control Messages (Stream 0)
152+
153+
```rust
154+
enum ControlMessage {
155+
RouteAdvertise {
156+
protocol_version: u16, // Currently 2
157+
epoch: u64, // Route data version number
158+
subnets: Vec<Ipv4Network>, // e.g. [10.0.0.0/8, 192.168.1.0/24]
159+
domains: Vec<DomainAdvertisement>, // e.g. [{domain: "contoso.local", auto_detected: true}]
160+
},
161+
Heartbeat {
162+
timestamp_ms: u64, // Agent's current time (ms since epoch)
163+
active_stream_count: u32, // Number of active proxy sessions
164+
},
165+
HeartbeatAck {
166+
timestamp_ms: u64, // Echo back agent's timestamp (for RTT calculation)
167+
},
168+
}
169+
```
170+
171+
### RouteAdvertise
172+
173+
Tells the gateway what networks and domains this agent can reach.
174+
175+
- Sent immediately after handshake with `epoch=1`.
176+
- Re-sent every 30 seconds with the same epoch (gateway refreshes `last_seen`, does not update routes).
177+
- If route data changes, epoch is incremented (gateway replaces old routes).
178+
- Stale epochs (older than current) are ignored.
179+
180+
**Domain auto-detection:**
181+
- Windows: reads `USERDNSDOMAIN` environment variable, falls back to `GetComputerNameExW(ComputerNameDnsDomain)`.
182+
- Linux: parses `/etc/resolv.conf` for `search` or `domain` directives.
183+
- Auto-detected domains are tagged `auto_detected: true`.
184+
185+
### Heartbeat / HeartbeatAck
186+
187+
- Agent sends `Heartbeat` every 60 seconds.
188+
- Gateway replies with `HeartbeatAck` echoing the timestamp.
189+
- Agent computes `RTT = now - echoed_timestamp`.
190+
- Gateway uses heartbeat to update `last_seen`.
191+
- Agent offline threshold: 90 seconds without heartbeat.
192+
193+
## Session Messages (Streams 1, 5, 9, …)
194+
195+
### ConnectMessage (Gateway → Agent)
196+
197+
```rust
198+
struct ConnectMessage {
199+
session_id: Uuid, // Unique session identifier
200+
target: String, // e.g. "10.0.0.5:3389" or "dc01.contoso.local:88"
201+
}
202+
```
203+
204+
Gateway opens a new server-initiated bidirectional stream and sends this message.
205+
206+
### ConnectResponse (Agent → Gateway)
207+
208+
```rust
209+
enum ConnectResponse {
210+
Success,
211+
Error { code: u16, reason: String },
212+
}
213+
```
214+
215+
### After ConnectResponse::Success
216+
217+
The stream becomes a **raw byte pipe** — no more framing.
218+
Every byte written to one end comes out the other end, in order.
219+
This carries the actual protocol traffic (RDP, SSH, Kerberos, etc.).
220+
221+
```
222+
Browser ↔ WebSocket ↔ Gateway ↔ QUIC stream ↔ Agent ↔ TCP ↔ Target
223+
```
224+
225+
## Session Proxy Flow (Complete)
226+
227+
```
228+
1. User opens SSH session to 10.0.0.5 in webapp
229+
2. Webapp gets session token (JWT with destination=10.0.0.5:22)
230+
3. Browser opens WebSocket to /jet/fwd/tcp/{session_id}
231+
4. Gateway routing: 10.0.0.5 matches agent's 10.0.0.0/8 subnet
232+
5. Gateway opens QUIC stream (stream_id=1) to agent
233+
6. Gateway sends ConnectMessage { session_id, target: "10.0.0.5:22" }
234+
7. Agent receives ConnectMessage on stream 1
235+
8. Agent validates: 10.0.0.5 is in its advertised 10.0.0.0/8? YES
236+
9. Agent connects TCP to 10.0.0.5:22
237+
10. Agent sends ConnectResponse::Success
238+
11. Bidirectional data flow:
239+
Browser → WS → Gateway → QUIC stream 1 → Agent → TCP → SSH server
240+
SSH server → TCP → Agent → QUIC stream 1 → Gateway → WS → Browser
241+
```
242+
243+
## Agent Subnet Validation
244+
245+
When the agent receives a `ConnectMessage`, it:
246+
247+
1. DNS-resolves the target hostname to IP addresses.
248+
2. Filters: only keep IPs that fall within the agent's configured `advertise_subnets`.
249+
3. IPv6 addresses are dropped (only IPv4 subnet matching is supported).
250+
4. If no reachable IPs remain → `ConnectResponse::Error` ("target not in advertised subnets").
251+
5. Otherwise → TCP connect to the first reachable IP.
252+
253+
This prevents a compromised gateway from using the agent as an open proxy to arbitrary hosts.
254+
255+
## Auto-Reconnect
256+
257+
The agent's tunnel task runs an outer reconnection loop:
258+
259+
```
260+
loop {
261+
let start = now();
262+
match run_single_connection().await {
263+
Ok(()) => return; // Graceful shutdown
264+
Err(e) => log(e); // Connection lost
265+
}
266+
if elapsed > 30s { backoff = 1s; } // Was working, reset
267+
sleep(backoff); // Respect shutdown signal
268+
backoff = (backoff * 2 * jitter).min(60s);
269+
}
270+
```
271+
272+
- Initial backoff: 1 second.
273+
- Max backoff: 60 seconds.
274+
- Jitter: ±25% to prevent thundering herd.
275+
- Backoff resets after 30 seconds of stable connection.
276+
- Shutdown signal is checked during backoff sleep.
277+
- Each reconnection re-reads config (supports config changes) and re-resolves DNS.
278+
279+
## Security Measures Summary
280+
281+
| Measure | Where |
282+
|---|---|
283+
| mTLS (mutual certificate verification) | QUIC handshake |
284+
| Private PKI (no system trust store) | cert.rs |
285+
| CSR-based enrollment (key never leaves agent) | enrollment.rs, cert.rs |
286+
| One-time enrollment tokens (122-bit UUID) | enrollment_store.rs |
287+
| Constant-time secret comparison | agent_enrollment.rs |
288+
| Agent subnet validation (no open proxy) | tunnel.rs |
289+
| Bounded bincode deserialization | connection.rs, tunnel.rs |
290+
| Buffer size limits on control/proxy streams | connection.rs |
291+
| Bounded read channels (backpressure) | stream.rs |
292+
| Max connection limit (1000) | listener.rs |
293+
| File permissions 0o600 on private keys (Unix) | cert.rs, enrollment.rs |
294+
| Agent name validation (printable ASCII, max 255) | agent_enrollment.rs |

0 commit comments

Comments
 (0)