|
| 1 | +# Mosaico M0 integration demo |
| 2 | + |
| 3 | +End-to-end pipeline demonstrating that **medkit fault snapshots flow into Mosaico as queryable, structured forensic data** - with no robot hardware and no recording 24/7. |
| 4 | + |
| 5 | +> Robot stack runs in Docker. Inject high noise on the simulated LiDAR. The medkit gateway detects it via the standard `/diagnostics` ROS topic, confirms the fault, and flushes its 15-second pre-fault + 10-second post-fault ring buffer to an `.mcap` file. A small Python bridge listens on the gateway's `/faults/stream` SSE endpoint, downloads the bag via REST, and ingests it into mosaicod via Apache Arrow Flight using Mosaico's own Python SDK. From `docker compose up` to a queryable `Sequence` in mosaicod takes about a minute. |
| 6 | +
|
| 7 | +There are two variants of the stack: |
| 8 | + |
| 9 | +- **Single-robot** (`docker-compose.yml`): one sensor-demo + one bridge. Good for stepping through the pipeline the first time and for the notebook. |
| 10 | +- **Fleet** (`docker-compose.fleet.yml`): three sensor-demos (warehouse-A, warehouse-B, outdoor-yard) each with its own bridge, all ingesting into one shared mosaicod. Produces three heterogeneous fault snapshots (LiDAR noise, IMU failure, LiDAR drift) under distinct `robot_id` metadata so cross-robot queries actually have something to filter on. |
| 11 | + |
| 12 | +## Quick start |
| 13 | + |
| 14 | +```bash |
| 15 | +git clone https://github.com/selfpatch/selfpatch_demos.git |
| 16 | +cd selfpatch_demos/demos/mosaico_integration |
| 17 | + |
| 18 | +# Bring up postgres + mosaicod + sensor-demo + bridge |
| 19 | +docker compose up -d |
| 20 | + |
| 21 | +# Wait until everything is healthy (sensor-demo healthcheck takes ~30s on first run) |
| 22 | +docker compose ps |
| 23 | + |
| 24 | +# Inject the LiDAR HIGH_NOISE fault. The bridge picks up the SSE event, |
| 25 | +# downloads the bag and ingests it into mosaicod within ~5s. |
| 26 | +./scripts/trigger-fault.sh |
| 27 | + |
| 28 | +# Watch the bridge do its thing |
| 29 | +docker compose logs -f bridge |
| 30 | + |
| 31 | +# Open the notebook (locally or via VS Code) |
| 32 | +jupyter notebook notebooks/mosaico_demo.ipynb |
| 33 | +``` |
| 34 | + |
| 35 | +The notebook connects to `localhost:16726` (mosaicod Arrow Flight) and runs four queries against your freshly-ingested fault snapshot, ending with a three-panel time-series plot showing the LiDAR noise spike alongside a stationary IMU - exactly the kind of cross-topic forensic correlation Mosaico is designed for. |
| 36 | + |
| 37 | +### Fleet variant |
| 38 | + |
| 39 | +```bash |
| 40 | +# Three robots + three bridges + shared mosaicod. Ports 18081/2/3 for the |
| 41 | +# robots, 16726 for mosaicod (same as single-robot). |
| 42 | +docker compose -f docker-compose.fleet.yml up -d |
| 43 | + |
| 44 | +# Wait ~25s after all containers are healthy so the ring buffer has a |
| 45 | +# pre-fault baseline, then inject three different fault signatures: |
| 46 | +# robot-01 warehouse-A : LiDAR noise_stddev = 0.5 (range std spike) |
| 47 | +# robot-02 warehouse-B : IMU failure |
| 48 | +# robot-03 outdoor-yard: LiDAR drift_rate = 0.5 (range mean shift) |
| 49 | +./scripts/trigger-fleet-faults.sh |
| 50 | + |
| 51 | +# Each bridge independently ingests its robot's snapshot. After ~45s you |
| 52 | +# will have three Sequences in mosaicod with distinct robot_id metadata. |
| 53 | +docker compose -f docker-compose.fleet.yml logs -f bridge-01 bridge-02 bridge-03 |
| 54 | + |
| 55 | +docker compose -f docker-compose.fleet.yml down -v |
| 56 | +``` |
| 57 | + |
| 58 | +## Architecture |
| 59 | + |
| 60 | +``` |
| 61 | +docker compose stack (network: mosaico_m0_net) |
| 62 | +
|
| 63 | + ┌─────────────┐ ┌─────────────────────────┐ |
| 64 | + │ postgres │◄────────┤ mosaicod │ |
| 65 | + │ :5432 │ │ ghcr.io/mosaico-labs/ │ |
| 66 | + │ (internal) │ │ mosaicod:v0.3.0 │ |
| 67 | + └─────────────┘ │ --local-store /data │ |
| 68 | + │ port 6726 ──> host:16726 |
| 69 | + └─────────────────────────┘ |
| 70 | + ▲ |
| 71 | + │ Arrow Flight |
| 72 | + │ (RosbagInjector via mosaicolabs SDK) |
| 73 | + │ |
| 74 | + ┌─────────────────────────┐ ┌────────┴────────┐ |
| 75 | + │ sensor-demo │ │ bridge │ |
| 76 | + │ built from sibling │◄─►│ python:3.11 │ |
| 77 | + │ ../sensor_diagnostics/ │ │ + mosaicolabs │ |
| 78 | + │ │ │ pinned 8e090cd |
| 79 | + │ - ros2_medkit gateway │ │ + httpx │ |
| 80 | + │ - lidar/imu/gps/camera │ │ │ |
| 81 | + │ sim nodes │ │ Subscribes: │ |
| 82 | + │ - diagnostic_bridge │ │ GET /api/v1/ │ |
| 83 | + │ - fault_manager │ │ faults/ │ |
| 84 | + │ - rosbag ring buffer │ │ stream │ |
| 85 | + │ (10s pre + 2s post) │ │ (SSE) │ |
| 86 | + │ │ │ │ |
| 87 | + │ port 8080 ──> host:18080│ │ Downloads: │ |
| 88 | + └─────────────────────────┘ │ GET /api/v1/ │ |
| 89 | + │ apps/ │ |
| 90 | + │ diagnostic- │ |
| 91 | + │ bridge/ │ |
| 92 | + │ bulk-data/ │ |
| 93 | + │ rosbags/... │ |
| 94 | + └─────────────────┘ |
| 95 | +``` |
| 96 | + |
| 97 | +**License-safe**: mosaicod runs as the unmodified upstream Docker image. The bridge is a separate Python process that talks Apache Arrow Flight (a public Apache standard) to mosaicod via Mosaico's own Python SDK. We never link or modify mosaicod or its Rust crates. |
| 98 | + |
| 99 | +## What lands in Mosaico (verified end to end on this stack) |
| 100 | + |
| 101 | +| ROS topic | ROS message type | Mosaico ontology | Status | |
| 102 | +|------------------------|----------------------------------------|---------------------------------|-------------------------------------| |
| 103 | +| `/sensors/scan` | `sensor_msgs/msg/LaserScan` | `LaserScan` (`futures.laser`) | ✅ via [Mosaico PR #368][pr368] | |
| 104 | +| `/sensors/imu` | `sensor_msgs/msg/Imu` | `IMU` | ✅ shipped adapter | |
| 105 | +| `/sensors/fix` | `sensor_msgs/msg/NavSatFix` | `GPS` | ✅ shipped adapter | |
| 106 | +| `/sensors/image_raw` | `sensor_msgs/msg/Image` | `Image` | ✅ shipped adapter | |
| 107 | +| `/diagnostics` | `diagnostic_msgs/msg/DiagnosticArray` | (none) | ⚠️ silently dropped, no adapter yet | |
| 108 | + |
| 109 | +The `/diagnostics` drop is the only gap. We use it on the medkit side to flag the fault (via diagnostic_bridge → fault_manager), but it does not reach Mosaico storage. For M1 we would either ship a `DiagnosticArray` adapter or define a dedicated `MedkitFault` ontology and write its adapter. |
| 110 | + |
| 111 | +## Mosaico SDK pin: PR #368 (LaserScanAdapter) |
| 112 | + |
| 113 | +The `LaserScanAdapter` we need lives in [mosaico-labs/mosaico#368][pr368], still in `draft` at the time we built this stack. The bridge `Dockerfile` pins to commit `8e090cd` and installs the SDK in editable mode from a local clone. Once PR #368 lands in a Mosaico release we will swap the pin for a clean `pip install mosaicolabs==<version>`. |
| 114 | + |
| 115 | +The same draft works on the **read** side too, as long as the consumer imports `mosaicolabs.models.futures.laser` to register the `laser_scan` ontology in the global registry. The notebook does this in its first cell. |
| 116 | + |
| 117 | +[pr368]: https://github.com/mosaico-labs/mosaico/pull/368 |
| 118 | + |
| 119 | +## Smart snapshots, not 24/7 recording |
| 120 | + |
| 121 | +This is the value prop: each entry in the Mosaico catalog is **only the 25 seconds around a confirmed fault** (15 s pre-fault baseline + 10 s post-fault), not hours of "nothing is happening" telemetry. Three fleet snapshots together weigh roughly 1.5 GB of MCAP - a naive 24/7 recording of the same four sensors at the same rates would be closer to 120 GB per robot per day. The pipeline preserves the signal and discards the noise. |
| 122 | + |
| 123 | +## Files in this directory |
| 124 | + |
| 125 | +| Path | What | |
| 126 | +|---|---| |
| 127 | +| `docker-compose.yml` | Single-robot stack: postgres + mosaicod + sensor-demo + bridge | |
| 128 | +| `docker-compose.fleet.yml` | Fleet stack: postgres + mosaicod + 3×(sensor-demo + bridge) | |
| 129 | +| `bridge/bridge.py` | Subscribes SSE, downloads bag, calls `RosbagInjector`. Honors `POST_FAULT_WAIT_SEC` (default 12s) before download so the post-fault ring segment is finalized | |
| 130 | +| `bridge/Dockerfile` | python:3.11-slim + Mosaico SDK pinned to PR #368 commit `8e090cd` | |
| 131 | +| `bridge/requirements.txt` | `httpx>=0.27,<0.30` | |
| 132 | +| `medkit_overrides/medkit_params.yaml` | Sensor-demo medkit config: 15s pre + 10s post ring buffer, single 2 GB bag cap, `auto_cleanup: false` | |
| 133 | +| `notebooks/mosaico_demo.ipynb` | Connect, list, query, plot - 7 cells | |
| 134 | +| `scripts/trigger-fault.sh` | Single-robot: inject high noise on lidar-sim on `localhost:18080` | |
| 135 | +| `scripts/trigger-fleet-faults.sh` | Fleet: inject three different fault signatures on robots 01/02/03 | |
| 136 | +| `docs/lidar_noise_plot.png` | The output of the notebook's money shot plot | |
| 137 | + |
| 138 | +## Verified end-to-end on this machine |
| 139 | + |
| 140 | +| What | Status | |
| 141 | +|---|---| |
| 142 | +| `docker compose build bridge` (PR #368 sanity import passes) | ✅ | |
| 143 | +| `docker compose up -d` brings four containers healthy | ✅ | |
| 144 | +| medkit gateway responds at `localhost:18080/api/v1/health` | ✅ | |
| 145 | +| `./scripts/trigger-fault.sh` injects fault, gateway returns CONFIRMED | ✅ | |
| 146 | +| Bridge SSE connects, picks up `fault_confirmed` event | ✅ | |
| 147 | +| Bridge resolves entity `apps/diagnostic-bridge` and downloads ~500 MB MCAP (25 s of four sensors) | ✅ | |
| 148 | +| `RosbagInjector` finalizes 4 TopicWriters (`/sensors/{scan,imu,fix,image_raw}`) | ✅ | |
| 149 | +| `MosaicoClient.list_sequences()` shows the new sequence within ~25 s of fault confirmation | ✅ | |
| 150 | +| Notebook reads back `LaserScan` data with `range_min`, `range_max`, `ranges`, `intensities`, `frame_id` populated | ✅ | |
| 151 | +| `IMU.Q.acceleration.z.between(...)` filter returns sequences | ✅ | |
| 152 | + |
| 153 | +## Known surprises (we hit them so you don't have to) |
| 154 | + |
| 155 | +1. **Medkit gateway path prefix**: SSE lives at `GET /api/v1/faults/stream`, not `GET /faults/stream`. Same for `/api/v1/components/...`. The bridge bakes the prefix into every URL. |
| 156 | +2. **`reporting_sources` ≠ SOVD entity ID**: medkit reports the ROS publisher node name (`/bridge/diagnostic_bridge`), not the SOVD entity that owns the bag. The bridge enumerates apps + components via the gateway and HEAD-probes for `bulk-data/rosbags/{fault_code}` until one returns 200. |
| 157 | +3. **Faults from the legacy diagnostic path land under `apps/diagnostic-bridge`**, not `apps/lidar-sim`. The diagnostic_bridge node is what owns the snapshot bag in this demo. |
| 158 | +4. **Mosaico read-side registry**: even with PR #368 installed, you must `import mosaicolabs.models.futures.laser` before reading `LaserScan` data. Otherwise the topic reader raises `No ontology registered with tag 'laser_scan'`. The bridge does not need this (write side resolves adapters by ROS msg type) but the notebook does. |
| 159 | +5. **Not all 5 listed topics actually land in Mosaico**: `/diagnostics` drops silently because no adapter is registered. The medkit ring buffer captures it; Mosaico just does not know what to do with it. See the table above. |
| 160 | +6. **Initial post-fault wait**: medkit holds the rosbag2 writer open for `duration_after_sec` (10s in this config) after `fault_confirmed`. The bridge waits `POST_FAULT_WAIT_SEC` seconds (default 12) before downloading so the trailing ring segment is finalized. |
| 161 | +7. **Gateway port conflict on dev boxes**: the single-robot stack publishes on `18080` and `16726`; the fleet stack uses `18081/18082/18083` for the three gateways with `16726` shared by mosaicod. Adjust if you prefer defaults. |
| 162 | +8. **`rosbag2` file splitting**: if `max_bag_size_mb` is hit mid-recording, `rosbag2` splits into `_0.mcap`, `_1.mcap`, ... and the medkit gateway serves only the first split. The 2 GB cap in `medkit_params.yaml` is there to prevent splitting for any realistic 25 s snapshot. |
| 163 | + |
| 164 | +## Troubleshooting |
| 165 | + |
| 166 | +- `docker compose build bridge` fails on the import sanity check → PR #368 has been force-pushed or branch was deleted. Update the `MOSAICO_PIN` build arg in `bridge/Dockerfile` to a current commit on `issue/367`. |
| 167 | +- `./scripts/trigger-fault.sh` returns curl error 22 → the gateway is up but needs `{"execution_type": "now"}` in the POST body. The script already does that; verify your gateway is actually `localhost:18080`. |
| 168 | +- Bridge logs `Could not resolve entity for fault_code=X` → enumerate `/api/v1/apps` and `/api/v1/components` manually with `curl` and check whether any of them list the fault under `bulk-data/rosbags`. If none do, the gateway has not registered the bag yet (post-fault timer hasn't fired). Wait a few seconds and re-trigger. |
| 169 | +- Notebook raises `No ontology registered with tag 'laser_scan'` → the `import mosaicolabs.models.futures.laser` cell did not run. Re-run it. |
| 170 | +- `docker compose pull mosaicod` is slow on first run → the upstream image is ~110 MB, distroless. Expect 30-90s on a slow link. |
| 171 | + |
| 172 | +## Cleanup |
| 173 | + |
| 174 | +```bash |
| 175 | +docker compose down -v # removes containers + named volumes |
| 176 | +``` |
0 commit comments