|
| 1 | +# Architecture Overview |
| 2 | + |
| 3 | +## Vision |
| 4 | +Offline-First Multi-Agent Autonomy SDK enables a group of robots to operate collaboratively without a central server, using local state machines, conflict‑free replicated data types (CRDTs) for state synchronization, opportunistic mesh networking, and bounded consensus. |
| 5 | + |
| 6 | +## Core Principles |
| 7 | +1. **Offline‑first** – Every agent remains fully functional when disconnected. |
| 8 | +2. **Decentralized** – No single point of failure; peer‑to‑peer communication. |
| 9 | +3. **Eventually consistent** – State converges across the swarm via CRDTs. |
| 10 | +4. **Resource‑aware** – Agents monitor and adapt to local constraints (CPU, battery, bandwidth). |
| 11 | +5. **Pluggable transport** – Support for various network layers (Wi‑Fi, Bluetooth, LoRa, etc.). |
| 12 | + |
| 13 | +## High‑Level Architecture |
| 14 | + |
| 15 | +```mermaid |
| 16 | +graph TB |
| 17 | + subgraph "Agent Node" |
| 18 | + AC[Agent Core] |
| 19 | + LP[Local Planner] |
| 20 | + SS[State Sync CRDT] |
| 21 | + MT[Mesh Transport] |
| 22 | + RM[Resource Monitor] |
| 23 | +
|
| 24 | + AC --> LP |
| 25 | + AC --> SS |
| 26 | + AC --> MT |
| 27 | + AC --> RM |
| 28 | + SS --> MT |
| 29 | + MT --> SS |
| 30 | + end |
| 31 | +
|
| 32 | + subgraph "Swarm" |
| 33 | + N1[Agent 1] |
| 34 | + N2[Agent 2] |
| 35 | + N3[Agent 3] |
| 36 | + N1 -- Mesh Network --> N2 |
| 37 | + N2 -- Mesh Network --> N3 |
| 38 | + N3 -- Mesh Network --> N1 |
| 39 | + end |
| 40 | +
|
| 41 | + subgraph "External" |
| 42 | + SIM[Simulation Gazebo/ROS2] |
| 43 | + CLI[Python CLI] |
| 44 | + SIM --> AC |
| 45 | + CLI --> AC |
| 46 | + end |
| 47 | +``` |
| 48 | + |
| 49 | +## Component Responsibilities |
| 50 | + |
| 51 | +### 1. Mesh Transport |
| 52 | +- **Purpose**: Reliable, unordered, peer‑to‑peer message passing over ad‑hoc networks. |
| 53 | +- **Features**: |
| 54 | + - Discovery (mDNS, manual peer list) |
| 55 | + - Connection management (TCP, WebRTC, QUIC) |
| 56 | + - Message routing (flooding, greedy perimeter) |
| 57 | + - Quality‑of‑Service (priority, retransmission) |
| 58 | +- **Technology**: Rust crate built on `libp2p` or `smol‑net`. |
| 59 | + |
| 60 | +### 2. State Sync (CRDT) |
| 61 | +- **Purpose**: Maintain a shared, eventually‑consistent key‑value store across agents. |
| 62 | +- **Features**: |
| 63 | + - CRDT‑based map (`aw‑map`, `lseq‑tree`) |
| 64 | + - Conflict‑free merge of concurrent updates |
| 65 | + - Tombstone‑free garbage collection |
| 66 | + - Version vectors / dotted version vectors |
| 67 | +- **Technology**: Rust crate leveraging `automerge` or custom CRDT implementation. |
| 68 | + |
| 69 | +### 3. Local Planner |
| 70 | +- **Purpose**: Execute autonomous tasks based on local state and shared swarm intent. |
| 71 | +- **Features**: |
| 72 | + - Finite‑state machine (FSM) definition and execution |
| 73 | + - Task scheduling and interruption |
| 74 | + - Integration with ROS2 navigation stack |
| 75 | +- **Technology**: Rust crate with `behavior‑tree` or `smach`‑like DSL. |
| 76 | + |
| 77 | +### 4. Resource Monitor |
| 78 | +- **Purpose**: Observe local hardware constraints and adjust agent behavior. |
| 79 | +- **Metrics**: CPU usage, battery level, network latency, memory pressure. |
| 80 | +- **Actions**: Throttle planning frequency, reduce communication rate, switch to low‑power mode. |
| 81 | + |
| 82 | +### 5. Agent Core |
| 83 | +- **Purpose**: Glue component that orchestrates the above modules. |
| 84 | +- **Lifecycle**: Initialization, event loop, graceful shutdown. |
| 85 | +- **API**: Exposes a unified Rust trait and Python binding. |
| 86 | + |
| 87 | +## Data Flow |
| 88 | +1. Agent starts, joins mesh network via Transport. |
| 89 | +2. Agent subscribes to shared CRDT keys (e.g., `swarm/goal`). |
| 90 | +3. Local Planner reads local CRDT copy and decides next action. |
| 91 | +4. Actions may update CRDT (e.g., `agent/status = moving`). |
| 92 | +5. Transport propagates CRDT deltas to neighbors. |
| 93 | +6. Resource Monitor may throttle outgoing messages if battery low. |
| 94 | +7. On network partition, each agent continues with its last known state; merge occurs when connectivity resumes. |
| 95 | + |
| 96 | +## Development Roadmap |
| 97 | + |
| 98 | +### Phase 1 – Foundation (Current) |
| 99 | +- Mesh Transport (basic peer discovery + messaging) |
| 100 | +- State Sync (single‑type CRDT map) |
| 101 | +- Integration test with two nodes |
| 102 | + |
| 103 | +### Phase 2 – Autonomy |
| 104 | +- Local Planner FSM |
| 105 | +- Resource Monitor skeleton |
| 106 | +- Python bindings for all components |
| 107 | + |
| 108 | +### Phase 3 – Realism |
| 109 | +- ROS2 integration |
| 110 | +- Gazebo simulation with multiple robots |
| 111 | +- Performance benchmarking |
| 112 | + |
| 113 | +### Phase 4 – Production |
| 114 | +- CI/CD, packaging (Debian, PyPI, crates.io) |
| 115 | +- Comprehensive documentation |
| 116 | +- Security audit |
| 117 | + |
| 118 | +## Technology Stack |
| 119 | +- **Language**: Rust (core), Python (bindings & high‑level API) |
| 120 | +- **Networking**: `libp2p‑rust` or custom `smol‑net` |
| 121 | +- **CRDT**: `automerge‑rs` or custom implementation |
| 122 | +- **Simulation**: ROS2 Humble, Gazebo Classic / Ignition |
| 123 | +- **Build**: Cargo workspace, `pyo3`, `maturin` |
| 124 | +- **CI**: GitHub Actions, `cargo‑test`, `pytest` |
| 125 | + |
| 126 | +## Directory Layout |
| 127 | +See `README.md` for the exact folder structure. |
| 128 | + |
| 129 | +## Contributing |
| 130 | +Please read `CONTRIBUTING.md` (to be created) for guidelines on code style, testing, and pull requests. |
| 131 | + |
| 132 | +--- |
| 133 | +*Last updated: 2026‑03‑26* |
0 commit comments