A modern, API-first BGP daemon in Rust, inspired by GoBGP's ergonomics and "drive it via gRPC" operating model.
Author: lance0 Status: pre-1.0 hardening — P0/P1/P2/P2.5 complete, publishing prep Last updated: 2026-05-21
API-first routing control plane. gRPC is the primary interface for all configuration and operations. The config file is a convenience for initial boot state — once the daemon is running, gRPC owns the truth. Clients in Python, Go, Rust, and Node should have a clean, typed experience from day one.
Interop correctness over feature breadth. RFC-compliant session behavior and attribute encoding/decoding, validated against real peers (FRR, BIRD, Junos, Arista EOS, Cisco IOS-XE/NX-OS where possible). A small feature set that works correctly is worth more than a large one that doesn't.
Observable by default. Prometheus metrics, structured logs, and machine-parseable errors everywhere. Operators should never have to guess what the daemon is doing or why a session flapped.
Safe, boring, maintainable. Minimal unsafe (one module for TCP MD5/GTSM socket options). Fuzzed wire decoder. Explicit resource limits. No clever tricks — just correct, auditable Rust.
This is not a full routing suite replacement. rustbgpd will not implement OSPF, IS-IS, LDP, full VRF support, or a complete policy language in v1. It will not attempt every BGP extension at once (Confederation, VPNv4/v6, MPLS-EVPN encap, etc.). The goal is a reliable, API-driven BGP speaker — not a kitchen sink.
Route server mode (IX-style). Many peers, simple policies, RIB dump and monitoring, API-driven automation.
Programmable edge speaker. Inject and withdraw prefixes programmatically. Minimal, reliable session handling.
EVPN Route Reflector (VXLAN-EVPN DC fabric). iBGP route reflector for Type 1-5 RFC 7432 routes between VTEPs; control plane only, VTEPs handle their own DF election and data-plane encapsulation. See ADR-0050.
EVPN VTEP — bidirectional (Phase 2: declarative instance schema, FDB reconciler, local MAC + MAC+IP origination, VTEP convergence). Local EVI/VNI domain types (crates/evpn) and an [[evpn_instances]] TOML schema with a read-only EvpnService.ListEvpnInstances gRPC surface (declarative EVPN instance schema, ADR-0052). The EVPN VXLAN VTEP dataplane (Linux FDB reconciler) programs remote-MAC FDB entries from received Type 2 routes (ADR-0054). EVPN local MAC origination subscribes to RTNLGRP_NEIGH and emits Type 2 routes per RFC 7432 §15.1 mobility sequencing, plus one Type 3 IMET per L2VNI carrying the PMSI Tunnel attribute (Type-2 + Type-3 IMET, ADR-0055). advertise_svi_mac originates a Type 2 for the bridge's own MAC (RFC 9135 §6.1) on instance-Ready by surfacing the bridge link-layer address through InstanceDataplaneStatus.bridge_mac; sticky_macs (ADR-0056) marks origination with the RFC 7432 §15.4 sticky bit. MAC-with-IP origination closes the MAC+IP path: with bridge link set ... neigh_suppress on, ARP/ND-snooped (IP, MAC) bindings on the bridge's neighbour table drive MAC+IP Type 2 origination under the FRR-style replace model — one Type 2 per MAC at any time, IpAdded upgrades from MAC-only to MAC+IP, last IpRemoved downgrades back. Mobility events propagate sub-second via the EVPN-keyed EvpnRouteEvent broadcast in crates/rib; the 5 s QueryEvpnRoutes poll stays as a Lagged / cold-start backstop (EVPN VTEP convergence). RR-only deployments (empty [[evpn_instances]]) spawn no kernel-facing tasks for either direction.
Later: the EVPN runtime convergence remainder (L3VNI/device/table IP-VRF identity redefine — restart-required by design — and non-teardown mixed edits; the rest of the ApplyEvpnRuntime shape set already commits live — see evpn-enablement.md for the full live-vs-fail-closed breakdown), VPNv4/v6, MPLS-EVPN encap. (Shipped since: duplicate-MAC remote-route suppression + manual clear, production-default apply_bum_enforcement / apply_aliasing_ecmp enforcement, auto-derived Route Targets per RFC 8365 §5.1.2.1, receive-side RFC 9135 overlay-index Type 5 recursion with fail-closed unresolved / ambiguous gateways, and controller Type 5 Gateway Address injection.)
Split protocol core from I/O. The codec and FSM must be testable without sockets. The FSM is a pure state machine that consumes messages and timer events, and produces messages and state transitions. It never touches a socket, never spawns a task, never calls tokio::time directly.
Make invalid states unrepresentable. Types and enums for message and attribute invariants. If the type system can prevent a bug, it should.
Limits everywhere. Max prefixes per peer, max attribute sizes, max message size, explicit queueing policy. Every resource has a defined behavior under pressure, and exceeding limits produces a structured error, not a crash.
Interop test before "feature complete." Correctness is measured by real peers in containers, not unit tests alone.
Errors are first-class. Every error condition — BGP NOTIFICATION, channel overflow, config rejection — produces a structured, machine-parseable event. Operators and automation get rich error codes, not strings.
For crate dependency graph, runtime model, ownership model, data flow, lifecycle flows, backpressure model, and the "where to change X" guide, see ARCHITECTURE.md.
Path attribute representation: The wire crate uses a typed + raw hybrid model. Known attributes (ORIGIN, AS_PATH, NEXT_HOP, etc.) are decoded into typed Rust enums. Unknown attributes are preserved as RawAttribute { flags, type_code, data: Bytes } alongside typed ones. This is a hard architectural requirement — the daemon must re-emit unknown optional transitive attributes byte-for-byte with the Partial bit set correctly. Dropping unknown transitive attributes is a protocol correctness bug.
RIB snapshot model: Snapshots are generation-based, not deep copies. The RIB stores immutable per-prefix route sets behind Arc. Paginated gRPC queries iterate a snapshot handle while the active RIB advances generations without blocking readers. This avoids O(n) cloning on every query.
Redesign triggers (instrumented from day one):
rib_update_latency_p99— if p99 exceeds 10ms under sustained load, evaluate sharding or batch coalescing.rib_channel_backpressure_total— any non-zero sustained rate means session tasks are stalling.adjribout_channel_drops_total— non-zero means a peer is falling behind.rib_snapshot_generation_lag— high lag means a slow consumer is pinning old state.
The threshold for triggering a redesign conversation is: sustained p99 RIB latency above 10ms, or any backpressure-induced session flap in the interop test suite.
rustbgpd defines its own .proto files from day one. No GoBGP proto reuse.
Rationale: GoBGP's protos carry Go-specific patterns and years of accumulated feature baggage. Anyone writing automation against rustbgpd is writing new client code regardless. Our protos should map 1:1 to Rust domain types — NeighborState as a proper enum, AFI/SAFI as typed enums, not integers. A GoBGP-compat adapter can be written later if anyone actually asks for it.
Eleven separate gRPC services (Global, Config, Neighbor, Policy, PeerGroup, Rib, Bfd, Event, Injection, Control, Evpn), not one. This forces API boundary clarity, prevents god-service creep, enables permission scoping (for example, read-only listeners for monitoring), and mirrors internal architecture.
// Global daemon configuration and identity
service GlobalService {
rpc GetGlobal(GetGlobalRequest) returns (GlobalState);
rpc SetGlobal(SetGlobalRequest) returns (SetGlobalResponse);
}
// Neighbor lifecycle and state
service NeighborService {
rpc AddNeighbor(AddNeighborRequest) returns (AddNeighborResponse);
rpc DeleteNeighbor(DeleteNeighborRequest) returns (DeleteNeighborResponse);
rpc ListNeighbors(ListNeighborsRequest) returns (ListNeighborsResponse);
rpc GetNeighborState(GetNeighborStateRequest) returns (NeighborState);
rpc EnableNeighbor(EnableNeighborRequest) returns (EnableNeighborResponse);
rpc DisableNeighbor(DisableNeighborRequest) returns (DisableNeighborResponse);
rpc SoftResetIn(SoftResetInRequest) returns (SoftResetInResponse);
}
// RIB queries — paginated unary for point-in-time, streaming for live watch
service RibService {
rpc ListReceivedRoutes(ListRoutesRequest) returns (ListRoutesResponse);
rpc ListBestRoutes(ListRoutesRequest) returns (ListRoutesResponse);
rpc ListAdvertisedRoutes(ListRoutesRequest) returns (ListRoutesResponse);
rpc ExplainAdvertisedRoute(ExplainAdvertisedRouteRequest) returns (ExplainAdvertisedRouteResponse);
rpc ExplainBestPath(ExplainBestPathRequest) returns (ExplainBestPathResponse);
rpc ListRouteEvents(ListRouteEventsRequest) returns (ListRouteEventsResponse);
rpc WatchRoutes(WatchRoutesRequest) returns (stream RouteEvent);
rpc ListFlowSpecRoutes(ListFlowSpecRequest) returns (ListFlowSpecResponse);
}
// Route injection and withdrawal
service InjectionService {
rpc AddPath(AddPathRequest) returns (AddPathResponse);
rpc DeletePath(DeletePathRequest) returns (DeletePathResponse);
rpc AddFlowSpec(AddFlowSpecRequest) returns (AddFlowSpecResponse);
rpc DeleteFlowSpec(DeleteFlowSpecRequest) returns (DeleteFlowSpecResponse);
}
// Policy CRUD and chain assignment
service PolicyService { /* 14 RPCs: List/Get/Set/Delete for policies, neighbor sets, chains */ }
// Peer group CRUD
service PeerGroupService { /* 6 RPCs: List/Get/Set/Delete groups, Set/Clear neighbor membership */ }
// Daemon control and health
service ControlService {
rpc Shutdown(ShutdownRequest) returns (ShutdownResponse);
rpc GetHealth(HealthRequest) returns (HealthResponse);
rpc GetMetrics(MetricsRequest) returns (MetricsResponse);
rpc TriggerMrtDump(TriggerMrtDumpRequest) returns (TriggerMrtDumpResponse);
}Paginated unary (default). ListRoutesRequest includes a page_size (max results per page, capped server-side) and an opaque page_token (cursor). The RIB snapshots at the start of the first page request; subsequent pages iterate the same snapshot for consistency. No lock held on the RIB task — the snapshot is a read-only copy.
message ListRoutesRequest {
string neighbor_address = 1; // filter by peer (empty = all)
AddressFamily afi_safi = 2; // address family filter
uint32 page_size = 3; // max results (server-capped at 10000)
string page_token = 4; // opaque cursor for next page
}
message ListRoutesResponse {
repeated Route routes = 1;
string next_page_token = 2; // empty = no more pages
uint64 total_count = 3; // total matching routes (for UI/progress)
}Streaming watch (opt-in). WatchRoutes returns a live stream of RouteEvent messages (add, withdraw, best-path change). Backpressure via bounded server-side channel — if the consumer falls behind, the stream is terminated with a RESOURCE_EXHAUSTED status and the client must reconnect. This prevents a slow consumer from becoming a DoS vector.
Recent event history. ListRouteEvents exposes the same unicast
best-path event shape from a bounded in-memory RIB ring for after-the-fact
debugging. It is a diagnostic timeline, not durable audit storage: the ring is
process-local, fixed-size, and resets on daemon restart.
Watch stream semantics:
- Delivery guarantee: Best effort. Events may be dropped if the consumer is slow. This is not an "at least once" stream — it is a live feed with finite buffer.
- Ordering: Ordered per peer event queue, not globally. Events from the same peer arrive in order; events across peers may interleave arbitrarily.
- Reconnect model: No cursor or resume token. On reconnect, clients issue a paginated snapshot query (
ListBestRoutesorListReceivedRoutes) to establish current state, then resume watching for deltas. This is simple, correct, and avoids server-side cursor tracking overhead. - Payload scope: RouteEvent contains route identifiers (prefix, peer, AFI/SAFI) and minimal metadata (event type, timestamp). Full route details (attributes, path) are retrieved via
List*RPCs. This keeps the stream lightweight and prevents accidental performance traps from fat streaming payloads.
Errors are domain-typed, not collapsed into BGP semantics. gRPC responses use proper status codes with a ErrorDetail detail payload:
message ErrorDetail {
oneof kind {
BgpProtocolError bgp = 1;
ResourceLimitError resource = 2;
ConfigError config = 3;
}
}
message BgpProtocolError {
uint32 error_code = 1; // RFC 4271 §4.5 error code
uint32 error_subcode = 2; // RFC 4271 §4.5 error subcode
string description = 3; // human-readable description
string peer_address = 4; // peer involved
}
message ResourceLimitError {
string limit_name = 1; // e.g., "max_prefixes", "channel_capacity"
uint64 current_value = 2; // current usage
uint64 max_value = 3; // configured limit
string peer_address = 4; // peer involved, if applicable
}
message ConfigError {
string field_path = 1; // e.g., "neighbors[0].hold_time"
string message = 2; // validation failure description
string provided_value = 3; // what was given
}No generic INTERNAL with a string. Machine-parseable errors for every failure path. Each error domain carries its own context fields.
The boot config file (TOML) provides initial state. At startup, the daemon loads the file, translates it into the equivalent of gRPC commands, and applies them. From that point forward, gRPC owns runtime state.
The contract:
- Peers can be added, removed, enabled, and disabled at runtime via gRPC. Zero restarts required.
- Neighbor add/delete mutations made via gRPC are persisted back to the config file via atomic write (temp file + rename).
SIGHUPtriggers a config reload:diff_neighbors()computes the delta andReconcilePeersapplies structured per-peer add/delete operations.- If the file changes on disk, a restart picks up the new file state.
[global]
asn = 65001
router_id = "10.0.0.1"
listen_port = 179
[global.telemetry]
prometheus_addr = "0.0.0.0:9179"
log_format = "json"
[[neighbors]]
address = "10.0.0.2"
remote_asn = 65002
description = "peer-frr-lab"
hold_time = 90
max_prefixes = 100_000
[[neighbors]]
address = "10.0.0.3"
remote_asn = 65001
description = "ibgp-reflector"
hold_time = 90
[[neighbors.policy]]
import = "allow-all"
export = "deny-all"Shutdown is triggered by SIGTERM or by the Shutdown gRPC RPC:
- Stop accepting new gRPC commands.
- Send NOTIFICATION/Cease (Administrative Shutdown, subcode 2) to every established peer.
- Wait up to 5 seconds for TCP sends to flush. Hard-drop after the timeout — don't hang.
- Drop all sessions and close listener sockets.
- Flush final telemetry (last metrics scrape, final log entries).
- Exit.
Neighbor add/delete mutations made via gRPC are persisted back to the config file (ADR-0043). Full route-state persistence remains deferred — restart replays the config file and re-learns routes from peers.
Every operationally significant event emits a structured log entry with typed fields:
{
"event": "notification_sent",
"peer": "198.51.100.1",
"code": 3,
"subcode": 1,
"description": "UPDATE Message Error / Malformed Attribute List",
"timestamp": "2026-02-27T14:30:00Z"
}{
"event": "session_state_change",
"peer": "198.51.100.1",
"from": "OpenConfirm",
"to": "Established",
"timestamp": "2026-02-27T14:30:01Z"
}Categories of structured events:
- Session state transitions (every FSM transition, not just Established)
- NOTIFICATIONs sent and received (with full code/subcode)
- RIB changes (route learned, route withdrawn, best-path change)
- Policy actions (route filtered, max-prefix exceeded)
- Resource limit hits (channel full, prefix limit reached)
- gRPC command results (neighbor added, path injected, errors)
Implement OPEN, KEEPALIVE, NOTIFICATION. FSM transitions and timer handling. Session reaches Established and stays there.
Exit criteria:
- Establish and hold for 30+ minutes with steady keepalives against FRR (container) and BIRD (container).
- Survive peer restart: peer goes down, comes back, session re-establishes cleanly.
- Survive TCP reset: unexpected connection drop, FSM returns to Idle/Active, retries on schedule.
- Correct NOTIFICATION on malformed OPEN (wrong ASN, bad hold time, unsupported capability).
- Prometheus metrics capture all state transitions and flap events.
- Structured log events for every FSM transition.
UPDATE processing is where most BGP implementations accumulate subtle bugs. rustbgpd validates every attribute against RFC 4271 with explicit, auditable checks.
| Validation | RFC Reference | Behavior on Failure |
|---|---|---|
| Mandatory attributes present (ORIGIN, AS_PATH, NEXT_HOP for eBGP) | RFC 4271 §5.1.2 | NOTIFICATION (3, 3) — Missing Well-known Attribute |
| No duplicate attributes in a single UPDATE | RFC 4271 §5 | NOTIFICATION (3, 1) — Malformed Attribute List |
| Attribute flags match type (well-known, transitive, etc.) | RFC 4271 §4.3 | NOTIFICATION (3, 4) — Attribute Flags Error |
| Attribute ordering (well-known before optional) | RFC 4271 §4.3 | Accept out-of-order but log; strict mode configurable |
| AS_PATH segment type valid (AS_SET, AS_SEQUENCE) | RFC 4271 §4.3 | NOTIFICATION (3, 11) — Malformed AS_PATH |
| AS_PATH length consistent with segment encoding | RFC 4271 §4.3 | NOTIFICATION (3, 11) — Malformed AS_PATH |
| 4-byte ASN handling (AS_TRANS mapping) | RFC 6793 | Map AS_TRANS correctly; reject inconsistent mappings |
| NEXT_HOP is valid IP, not 0.0.0.0, not multicast | RFC 4271 §5.1.3 | NOTIFICATION (3, 8) — Invalid NEXT_HOP Attribute |
| ORIGIN value is valid (IGP, EGP, INCOMPLETE) | RFC 4271 §4.3 | NOTIFICATION (3, 6) — Invalid ORIGIN Attribute |
| Attribute length does not exceed UPDATE length | RFC 4271 §4.3 | NOTIFICATION (3, 1) — Malformed Attribute List |
| Total path attributes length consistent with UPDATE length | RFC 4271 §4.3 | NOTIFICATION (3, 1) — Malformed Attribute List |
| Unrecognized well-known attribute | RFC 4271 §5 | NOTIFICATION (2, 7) — Unrecognized Well-known Attribute |
| Unrecognized optional non-transitive attribute | RFC 4271 §5 | Silently ignore (do NOT drop silently — emit structured event) |
| Unrecognized optional transitive attribute | RFC 4271 §5 | Pass through, set Partial bit (see policy below) |
| Attribute exceeds configured max size | rustbgpd limit | NOTIFICATION (3, 1) + structured event |
Every validation failure produces a structured log event with the peer address, attribute type code, raw bytes (truncated), and the RFC section violated. No silent drops.
When rustbgpd re-advertises an unrecognized optional transitive attribute, it ensures the Partial bit (flag 0x20) is set. The attribute bytes and all other flags are preserved unchanged — only the Partial bit is OR'd. If the Partial bit was already set on receipt, this is a no-op.
Rationale: rustbgpd has not validated the semantics of the attribute, so marking it Partial is the correct conservative signal to downstream peers. This matches the behavior of most production implementations and avoids ambiguity about whether the daemon "understood" the attribute. This is not configurable in v1.
Decode UPDATEs. Support IPv4 unicast NLRI. Support attributes: ORIGIN, AS_PATH (2-byte and 4-byte as negotiated), NEXT_HOP, LOCAL_PREF (iBGP), MED (optional, low effort). Store in Adj-RIB-In. Expose via ListReceivedRoutes.
Exit criteria:
- RIB dump matches peer's advertised routes for a controlled prefix set.
- Fuzz harness in CI for the UPDATE decoder (at least smoke-level coverage).
- Structured events for every route learned and withdrawn.
Loc-RIB best-path selection — minimal but deterministic. The comparison function is a total ordering: it must never return equality for distinct paths (from distinct peers).
Best-path rules (implemented), applied in order:
- Highest LOCAL_PREF (default 100 if absent)
- Shortest AS_PATH (AS_SET counts as 1, per RFC 4271 §9.1.2.2)
- Lowest ORIGIN (IGP < EGP < INCOMPLETE)
- Lowest MED (deterministic — always-compare across all peers, not just same-AS)
- eBGP over iBGP (only
RouteOrigin::Ebgp; Local uses LOCAL_PREF/AS_PATH) 5.5. Shortest CLUSTER_LIST length (RFC 4456 §9) 5.6. Lowest ORIGINATOR_ID (RFC 4456 §9) — only when both routes carry the attribute - Lowest peer address (final disambiguator — guarantees strict ordering)
Implementation choices (ADR-0014):
best_path_cmp()is a standalone function, notOrdonRoute. Domain-specific ordering doesn't belong as a trait impl — multiple orderings may be needed.- Deterministic MED (always-compare) matches GoBGP default. Simpler and avoids ordering sensitivity.
Routecarriesorigin_type: RouteOrigin(Ebgp/Ibgp/Local) for eBGP-over-iBGP preference (step 5) and iBGP split-horizon. Note:Localsorts equal to iBGP at step 5 — local routes win via LOCAL_PREF or shorter AS_PATH, not an explicit origin preference.LocRiblives insideRibManager— same single-task ownership pattern, no new locks.- Incremental recompute: only prefixes affected by each update are re-evaluated.
Exposed via ListBestRoutes gRPC endpoint with offset pagination.
Exit criteria:
- Deterministic outcomes for all decision inputs, verified by property tests (antisymmetry, transitivity, totality).
- Stable best-path selection with multiple paths from multiple peers.
- Structured debug events for best-path changes.
- 388 tests pass (v0.2.0), clippy clean, fmt clean.
Inject and withdraw routes via gRPC (AddPath / DeletePath). Build Adj-RIB-Out per neighbor. Advertise to peers, withdrawals work correctly. v1 policy: import/export allow/deny lists + max-prefix guard. TCP MD5 authentication and GTSM/TTL security.
Implementation choices:
- Adj-RIB-Out lives inside
RibManager— same single-task ownership, no new locks (ADR-0015). - Per-peer outbound channel (mpsc, capacity 4096) created in
PeerSession, sender registered viaPeerUpmessage on Established. - Outbound UPDATEs bypass the pure FSM — consistent with inbound pattern.
- Injected routes stored under sentinel peer
0.0.0.0in standard Adj-RIB-In, participating in normal best-path selection and distribution. UpdateMessage::build()high-level constructor for outbound UPDATEs.- eBGP outbound: prepend local ASN to AS_PATH, set NEXT_HOP to session's local IPv4 socket address (reachable, not router-id), strip LOCAL_PREF.
- iBGP outbound: ensure LOCAL_PREF present (default 100), pass NEXT_HOP through.
- TCP MD5 and GTSM require
socket2::Socketfor pre-connectsetsockoptcalls (ADR-0016). Onlyunsafecode in the project, isolated tosocket_optsmodule. - Policy engine: first-match-wins evaluation with match conditions (prefix, community, AS_PATH regex) and route modifications (LOCAL_PREF, MED, communities, AS_PATH prepend, next-hop). Separate import/export policies.
Exit criteria:
- A client can programmatically announce a prefix and verify it appears on the peer.
- Withdrawals propagate correctly.
- Max-prefix enforcement drops session with NOTIFICATION when exceeded.
- Resource limits enforced and observable via metrics.
- 284 tests pass (M3), clippy clean, fmt clean.
Dynamic peer management, per-peer policy, typed communities, real-time route event streaming.
Implementation choices:
PeerManageruses the same channel-based single-task ownership pattern asRibManager(ADR-0017). Commands arrive via bounded mpsc, replies via oneshot.- Shared types (
PeerManagerCommand,PeerInfo) live incrates/api/src/peer_types.rsto avoid circular dependencies between the binary and API crates. - Per-peer export policy:
RibManagerstores per-peer policies fromPeerUp, resolves viaexport_policy_for()(per-peer overrides global). Config supports per-neighborimport_policy/export_policysections. - Typed COMMUNITIES (RFC 1997):
PathAttribute::Communities(Vec<u32>)replaces opaqueUnknownfor type code 8. Eachu32is(ASN << 16) | value. WatchRoutesusestokio::sync::broadcast(ADR-0018) — zero overhead with no subscribers, independent receivers, lagged subscribers get error instead of blocking.PeerHandle::query_state()enables FSM state queries from PeerManager without shared mutable state.- Starting with zero configured neighbors is now valid — peers can be added entirely via gRPC.
Exit criteria:
- Dynamic peer add/remove via gRPC, verified end-to-end.
- Per-peer export policy enforcement (different peers see different routes).
- Communities decoded, exposed in gRPC, injected via AddPath.
- WatchRoutes streams real-time route events to multiple subscribers.
- 10-peer interop validated against FRR 10.3.1 (17/17 automated tests pass).
- 306 tests pass (M4), clippy clean, fmt clean.
Added 2026-04 per ADR-0050. Extends the RIB / transport / gRPC stack with a parallel typed-NLRI family for RFC 7432 routes, following the FlowSpec pattern (ADR-0035). Scope is RR role only: reflect all 5 route types between VTEP peers per RFC 4456 without local EVI state or data-plane integration.
Parallel tables, not Prefix extension. Prefix is Copy and participates in longest-prefix-match semantics — neither fits EVPN routes, which are variable-length typed TLVs. AdjRibIn, AdjRibOut, and LocRib each gain HashMap<EvpnRouteKey, EvpnRibRoute> tables alongside flowspec_routes. The compiler enforces parallel method coverage; FlowSpec already proved the pattern scales.
Split payload from identity. EvpnRoute carries the full RFC 7432 wire payload (labels, optional IPs, gateway) — needed to round-trip through reflection. EvpnRouteKey carries only the identifying fields per route type and is Copy + Eq + Hash — suitable as the RIB HashMap key. EAD per-ES and EAD per-EVI share wire format but get distinct key variants so the RIB never collapses them.
Reflection reuses existing RFC 4456 helper. stage_evpn_routes builds a synthetic Route probe carrying only peer / router-id / origin-type metadata and passes it to the existing should_suppress_ibgp_inner. Same pattern FlowSpec uses — no EVPN-specific reflection logic.
Best-path: type-specific head + shared BGP body. evpn_tiebreak_simple runs a Type-2-specific MAC Mobility head (sticky flag + sequence per RFC 7432 §15.1), then falls through to the standard BGP chain (LocalPref → AS_PATH → MED → eBGP>iBGP → peer). Type 1/4 DF-election tiebreaks are not implemented — the RR reflects, downstream VTEPs elect. Types 3/5 have no type-specific head.
Policy uses placeholder prefix. EVPN RouteContext carries a synthesized 0.0.0.0/0 prefix — the existing context fields (extended communities, communities, AS_PATH, peer metadata) are what operators actually filter on. RT-based filtering works through the existing match_community clause. A dedicated match_evpn_route_type clause is a Phase 1.5 item if operators need it.
Next-hop preserved across reflection. Outbound EVPN MP_REACH_NLRI carries the originating VTEP's loopback IP as next-hop, not the RR's address. This is what lets downstream VTEPs build VXLAN tunnels correctly — the RR is a control-plane waypoint, not a data-plane middlebox.
Withdrawal wire framing from keys. Outbound EVPN withdrawals emit MP_UNREACH_NLRI with routes reconstructed from EvpnRouteKey via evpn_route_from_key. Unknown label / optional fields are zeroed; receivers identify by key only, so round-trip fidelity is unnecessary on the withdrawal path.
Phase 1 hardening (the RR enablement ladder in evpn-enablement.md) covers reflection of all five RFC 7432 route types, GR + LLGR + Enhanced Route Refresh, MAC mobility / sticky preservation, multi-homing Type 4 ES reflection (Type 1 EAD-per-EVI is wire-codec-tested but not gated end-to-end — FRR origination requires VLAN-aware bridge + SVI which is Phase 3 scope), scale validation (50k Type 2 + churn), and controller-driven injection for Type 2 / Type 3. What remains:
- VTEP mode: local EVI / VRF / VNI state and kernel FDB MAC learning are
shipped (declarative instance schema, FDB reconciler, local MAC + MAC+IP
origination, VTEP convergence); the daemon now both
programs remote MACs into the kernel FDB and originates local
Type 2 + Type 3 IMET routes from kernel-learned MACs.
advertise_svi_macoriginates the bridge's own MAC on instance-Ready,sticky_macs(ADR-0056) marks origination with the RFC 7432 §15.4 sticky bit, MAC-with-IP origination adds MAC+IP Type 2 origination via ARP/ND suppression under the FRR replace model (requiresbridge neigh_suppress on), and EVPN VTEP convergence switches the originator from a 5 s poll to a push-notified RIB broadcast for sub-second mobility convergence. Later EVPN work added remote duplicate-MAC suppression + manual clear, so the remaining VTEP tail is native overlay-index local origination / recursion-path interop and standards features outside the Linux/VXLAN alpha boundary. - Multi-homing execution: EVPN multi-homing (ESI, Type-1/Type-4) plus
BUM-flood suppression + DF election cover rustbgpd-as-VTEP
DF election (RFC 7432 §8 + RFC 8584), Type 1/4 origination, opt-in
Non-DF BUM suppression, ESI-aware Type 2 origination, aliasing
projection, and receive-side EAD-per-ES mass-withdraw filtering.
ADR-0059 closes the aliasing-ECMP receive-path data path via
FDB nexthop groups (shipped on
main, M40 hosted smoke validated against FRR EVPN-MH 10.3.1); aliasing-ECMP hardening (PRs #91 / #92 / #93) followed up with theapply_aliasing_ecmpper-instance off-switch, periodicRTM_GETNEXTHOPdrift recovery, and homogeneous IPv6 alias members. The MAC-churn variant of the BUM-state soak passed 2026-05-16 (docs/soak-gate8b-mac-churn-24h.md), which unblocks flipping theapply_bum_enforcementandapply_aliasing_ecmpdefaults totrue; the flip itself is a separate release decision. - Symmetric Interface-less IRB: EVPN symmetric IRB (Type-5 / L3VNI) ships end-to-end in
v0.18.0 — RFC 9136 §4.4.2 / ADR-0058. The
[[evpn_ip_vrfs]]config object,IpVrfStatusreadiness probe, Linux VRF + L3VXLAN dumps, per-IP-VRF kernel-route observation, Type 5 origination viaRibUpdate::InjectEvpn, remote Type 5 import through the transactionalL3OwnedStatemodel with four-phase apply ordering, Router MAC conflict detection, and the M39 hosted smoke are all onmain. Auto-derived RTs (RFC 8365 §5.1.2.1) shipped in v0.25.0. Receive-side RFC 9135 overlay-index Type 5 recursion now resolves non-zero Gateway Address routes through unambiguous linked Type 2 MAC/IP state while leaving unresolved or ambiguous gateways fail-closed. - Controller injection beyond Type 2 / Type 3 / Type 5: Type 5
IP-Prefix injection, including non-zero Gateway Address for targeted
overlay-index testing, is exposed in the injection RPCs. Type 1 / Type 4
multi-homing route injection is not exposed; native daemon Type 1/4
origination exists via
[[ethernet_segments]]. - RFC 9251 Route Types 6-8 (IGMP multicast), RFC 9572 Route Types 9-11 (BUM segmentation), RFC 7623 PBB-EVPN, MPLS encap, Add-Path for EVPN (RFC 9252) (Phase 5).
Primary targets (containerlab-based, run in CI):
- FRR (bgpd)
- BIRD
- GoBGP (as peer)
Stretch targets (lab environments):
- Junos vMX/vPTX
- Arista cEOS
- Cisco (if available)
containerlab is the test harness — not "where feasible," but the default. Every interop scenario is a reproducible topology file.
libFuzzer harnesses for:
- Message decoding (all message types)
- Attribute decoding (all supported attributes)
- NLRI parsing (IPv4 unicast)
Short fuzz runs on every PR. Extended fuzz on nightly CI schedule.
encode(decode(x)) == xroundtrip invariants for all valid message types.- Decoder rejects: length mismatches, invalid attribute flags, truncated NLRI, oversized attributes beyond configured limits.
- FSM property: no invalid state transitions for any sequence of valid inputs.
- Unit tests (every PR)
- Fuzz smoke — short run (every PR)
- Extended fuzz (nightly)
- Interop tests via containerlab (every PR, against FRR and BIRD at minimum)
- Clippy + deny(warnings) + cargo deny for dependency audit
This section defines the security stance for rustbgpd. Not all items are v1 implementations, but the posture is established now so that design decisions don't foreclose security later.
Supported platforms (v1): Linux (x86_64, aarch64). TCP MD5, GTSM via IP_TTL, and certain socket options are Linux-specific. macOS and BSD may work for development builds but are not tested or supported targets. This is stated explicitly to prevent bug reports about platform-specific socket behavior.
TCP MD5 (RFC 2385): Supported in v1. This is table stakes for any BGP daemon deployed in production — most peers will require it. Implemented via setsockopt(TCP_MD5SIG) on the listener and per-peer outbound sockets. Linux only.
TCP-AO (RFC 5925): Staged via ADR-0062. Static-neighbor tcp_ao TOML is
validated and installed on Linux startup sockets: active-open sessions install
the key before connect(), and the passive BGP listener installs configured
peer keys before listen(). Runtime key rotation, dynamic-neighbor wildcard
MKTs, multi-key rollover, and protected interop smoke remain follow-up work.
GTSM (RFC 5082): Supported in v1 as a configurable option (ttl_security = true per neighbor). Sets IP_TTL to 255 on outbound and checks inbound TTL >= 254. Simple, effective, and prevents most remote session hijacking.
- Max inbound TCP connections per source IP: configurable, default 5 per minute.
- Max total pending connections: configurable, default 100.
- Connections from unconfigured peers are dropped immediately after TCP accept — no BGP processing.
- All rate limit events produce structured log entries.
- Never panic on malformed input. Any input from the network is untrusted. Panics on malformed BGP messages are security vulnerabilities.
- Always NOTIFICATION. Every malformed message produces the correct NOTIFICATION error code per RFC 4271, followed by session teardown. No silent drops, no "log and ignore."
- Always log. Every malformed message produces a structured event with peer address, message type, error description, and truncated raw bytes for forensic analysis.
- Fuzz everything. The wire decoder is the attack surface. It runs under continuous fuzzing in CI.
Bounded channels, prefix limits, and backpressure behavior are detailed in ARCHITECTURE.md — Failure and Backpressure Model. Additional guards:
- UPDATE attribute size limits enforced at decode time. Oversized attributes are rejected before allocation.
- gRPC request size limits enforced by tonic configuration.
When max_total_routes is exceeded, the offending session is torn down with NOTIFICATION Cease (Out of Resources, subcode 8) as defined in RFC 4486 §3. The structured event includes the peer address, the route that triggered the limit, and the current total count.
Interop note: Cease subcodes are defined in RFC 4486, not RFC 4271. If interop testing reveals a peer that rejects unknown Cease subcodes, the fallback is generic Cease (code 6, subcode 0). This is documented in INTEROP.md per peer.
This is a deliberate choice. The alternative — partial acceptance (reject individual prefixes while keeping the session established) — introduces per-UPDATE partial semantics that generate subtle correctness bugs and are difficult to reason about operationally. Option A (tear down the session) is explainable, safe, and what operators expect.
If the global limit is hit, it means either the limit is configured too low or the peer is sending more routes than expected — both conditions warrant human attention, not silent partial behavior.
- gRPC listens on a configurable address (default: localhost only).
- No built-in TLS in v1. For non-loopback exposure, front rustbgpd with an mTLS/TLS-authenticated proxy.
- Per-listener access mode (
read_only/read_write) controls which RPCs are available. The eleven-service split supports per-service auth policies when finer-grained authorization is added.
| Limit | Default | Notes |
|---|---|---|
| Max message size | 4096 bytes (65535 with RFC 8654) | 4096 by default; raised per-session only when Extended Messages is negotiated |
| Max attributes per UPDATE | 256 | Safety bound |
| Max prefixes per neighbor | 1,000,000 | NOTIFICATION on exceed |
| Max total routes | 10,000,000 | Backpressure, not crash |
| Bounded channel size | 4096 | Per-session and RIB channels |
| Connect retry interval | 5s | Reduced from RFC 4271 default of 120s |
| Hold time | 90s | Negotiated per-peer |
All limits are configurable via TOML and overridable per-peer via gRPC.
See ARCHITECTURE.md — Where to Change X for a task-oriented guide. The crate dependency graph and runtime model are also in ARCHITECTURE.md.
- Plugin-based policy engine (WASM or embedded DSL) — only after core stability
This matrix tracks every protocol behavior: its RFC basis, implementation status, and interop validation. It is the source of truth for what rustbgpd does and does not do, and it stays current as the project evolves. Milestone targets (M0–M4) indicate planned implementation phase — not current status.
| Behavior | RFC | Target Milestone | Interop Targets | Notes |
|---|---|---|---|---|
| OPEN / KEEPALIVE / NOTIFICATION | 4271 §4.2–4.5 | M0 | FRR, BIRD | — |
| FSM state transitions | 4271 §8 | M0 | FRR, BIRD | Includes retry and error paths |
| 4-byte ASN capability | 6793 | M0 | FRR, BIRD | AS_TRANS mapping |
| UPDATE decode (IPv4 unicast) | 4271 §4.3 | M1 | FRR, BIRD | — |
| ORIGIN attribute | 4271 §5.1.1 | M1 | FRR, BIRD | — |
| AS_PATH attribute | 4271 §5.1.2 | M1 | FRR, BIRD | 2-byte and 4-byte |
| NEXT_HOP attribute | 4271 §5.1.3 | M1 | FRR, BIRD | Validation per RFC |
| LOCAL_PREF attribute | 4271 §5.1.5 | M1 | FRR, BIRD | iBGP only |
| MED attribute | 4271 §5.1.4 | M1 | FRR, BIRD | Optional, same-AS comparison configurable |
| Unknown transitive attr pass-through | 4271 §5 | M1 | FRR | Partial bit set, raw bytes preserved |
| Best-path selection | 4271 §9.1.2 | M2 | FRR, BIRD | Total ordering, see decision rules |
| UPDATE encoding / Adj-RIB-Out | 4271 §9.2 | M3 | FRR, BIRD | — |
| Route injection via gRPC | rustbgpd | M3 | FRR | — |
| Max-prefix enforcement | rustbgpd | M3 | FRR | NOTIFICATION Cease |
| TCP MD5 authentication | 2385 | M3 | FRR | Linux only |
| GTSM (TTL security) | 5082 | M3 | FRR | Configurable per-peer |
| Route server mode (many peers) | — | M4 | FRR, BIRD, GoBGP | No transit by default |
| MP-BGP (IPv6 unicast) | 4760 | v0.2.0 | FRR | MP_REACH_NLRI / MP_UNREACH_NLRI, Prefix enum, AFI/SAFI negotiation |
| Communities (standard) | 1997 | M4 | FRR | Typed decode/encode, gRPC exposure |
| Extended communities | 4360 | v0.3.0+ | FRR | RT, RO, 4-byte AS (ADR-0025/0026) |
| FlowSpec | 8955 | post-v0.3.0 | — | IPv4/IPv6 unicast FlowSpec implemented; speaker-mode hardening continues |
| Graceful restart (receiving speaker) | 4724 | v0.3.0 | FRR | Stale demotion, per-family EoR, two-phase timer (ADR-0024) |
| LLGR (two-phase GR timer) | 9494 | post-v0.3.0 | FRR | Implemented; GR-stale → LLGR-stale promotion, configurable stale time |
| TCP-AO | 5925 | Post-v1 | — | Static-neighbor startup install; dynamic / rollover follow-ups deferred |
| BMP exporter | 7854 | post-v0.3.0 | — | Implemented (ADR-0041); reconnect replay + periodic stats + coordinated-shutdown termination |
| MRT dump export | 6396 | post-v0.3.0 | — | Implemented (ADR-0044); TABLE_DUMP_V2 periodic + on-demand, gzip optional |
| RPKI / RTR client | 8210 | post-v0.3.0 | — | Implemented (ADR-0034); runtime gRPC management deferred |
This matrix is updated with every milestone. "Interop Tested" means validated by a documented containerlab or privileged-netns procedure. CI-gated rows are called out explicitly; privileged kernel dataplane smokes run locally until a privileged runner is available.
- v1: Linux (x86_64, aarch64). These are the only tested and supported targets.
- macOS and BSD may compile and run for development purposes but are not CI-tested. Platform-specific socket options (TCP_MD5SIG, IP_TTL for GTSM) are Linux-only.
- Windows is not supported.
- Must not break: FRR. Core FRR interop rows are gated on every PR.
- Should not break: BIRD and GoBGP as peers. BIRD has documented M0 validation; GoBGP rows run in the interop suite but are not the primary compatibility gate.
- Best effort: Junos, Arista cEOS, Cisco. Lab-tested when available, not CI-gated.
gRPC proto definitions are treated with semver discipline:
- Pre-1.0: Breaking changes allowed with a changelog entry and migration notes.
- Post-1.0: No breaking changes to existing RPCs or message fields. New fields are additive. New RPCs are additive. Deprecation requires a full minor version cycle before removal.
Milestone-based releases. Each milestone (M0–M4) is a tagged release with:
- Passing CI (unit tests, fuzz smoke, interop)
- Updated compatibility matrix
- Updated CHANGELOG
- Migration notes if protos changed
- Bug fixes and test improvements: PR directly.
- New protocol behavior: Requires an issue with RFC citation and proposed interop test plan before implementation.
- Architectural changes: Requires design discussion in an issue or discussion thread. No surprise features.
- All PRs must pass CI, including interop tests, and must not violate any design constraint.
- Vulnerabilities are reported via email (address TBD) or GitHub security advisories.
- Critical vulnerabilities (remote crash, session hijack) are patched and released within 72 hours of confirmation.
- The wire decoder is the primary attack surface and runs under continuous fuzzing.
rustbgpd is:
- API-first BGP control plane — gRPC is the primary interface, not CLI
- Correctness and observability focused — tested against real peers, observable by default
- Rust-native, GoBGP-shaped — familiar operating model, memory-safe implementation
- Not a kitchen sink routing suite — does one thing well
The 8 non-negotiable constraints are defined in ARCHITECTURE.md — Design Invariants. They cover: pure FSM, independent wire crate, bounded channels, no silent drops, no panics on malformed input, structured protocol violation events, enforced resource limits, and interop-tested features.