Skip to content

Commit f48e837

Browse files
author
Michał Fąferek
committed
feat(mosaico_integration): Mosaico ingestion demo, single + fleet variants
Docker Compose demo showing medkit fault snapshots flowing into mosaicod as queryable, structured forensic data. Runs on any laptop, no robot hardware, no 24/7 recording. Two variants share the same bridge/mosaicod components: Single-robot (docker-compose.yml): postgres + mosaicod + one sensor-demo + one bridge. Proves the pipeline: SSE fault event -> bag download via REST -> Arrow Flight ingest via mosaicolabs SDK -> listable sequence in mosaicod. Fleet (docker-compose.fleet.yml): three sensor-demos (warehouse-A, warehouse-B, outdoor-yard) each with its own bridge, sharing one mosaicod + postgres. trigger-fleet-faults.sh injects three heterogeneous fault signatures: robot-01 warehouse-A LIDAR noise_stddev=0.5 (range std spikes) robot-02 warehouse-B IMU failure (different sensor) robot-03 outdoor-yard LIDAR drift_rate=0.5 (range mean shifts) Each bridge tags ingested sequences with distinct robot_id metadata, so cross-robot .Q queries have something real to filter on. Ring buffer in medkit_params.yaml is sized to 15 s pre + 10 s post so every snapshot carries enough pre-fault baseline for drift-vs-noise comparison. max_bag_size_mb raised to 2000 so rosbag2 does not split the recording mid-flight (the gateway only serves the first split). The bridge honors POST_FAULT_WAIT_SEC (default 12 s) before downloading so the longer duration_after_sec finishes first. License-safe by construction: mosaicod runs as the unmodified upstream Docker image (ghcr.io/mosaico-labs/mosaicod:v0.3.0), the bridge talks Apache Arrow Flight (a public Apache standard) via the mosaicolabs Python SDK in a separate process. No linking, no modification of mosaicod or its Rust crates. LaserScanAdapter is pinned to Mosaico PR #368 commit 8e090cd until it lands in a Mosaico release. Verified end-to-end: 3 fault injections in the fleet variant produce 3 sequences in mosaicod, all listable via MosaicoClient.list_sequences() from an external Python process and tagged with the expected metadata. Cold start to last verified ingest ~75 s.
1 parent 142b243 commit f48e837

11 files changed

Lines changed: 1424 additions & 0 deletions

File tree

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# Mosaico M0 integration demo
2+
3+
End-to-end pipeline demonstrating that **medkit fault snapshots flow into Mosaico as queryable, structured forensic data** - with no robot hardware and no recording 24/7.
4+
5+
> Robot stack runs in Docker. Inject high noise on the simulated LiDAR. The medkit gateway detects it via the standard `/diagnostics` ROS topic, confirms the fault, and flushes its 15-second pre-fault + 10-second post-fault ring buffer to an `.mcap` file. A small Python bridge listens on the gateway's `/faults/stream` SSE endpoint, downloads the bag via REST, and ingests it into mosaicod via Apache Arrow Flight using Mosaico's own Python SDK. From `docker compose up` to a queryable `Sequence` in mosaicod takes about a minute.
6+
7+
There are two variants of the stack:
8+
9+
- **Single-robot** (`docker-compose.yml`): one sensor-demo + one bridge. Good for stepping through the pipeline the first time and for the notebook.
10+
- **Fleet** (`docker-compose.fleet.yml`): three sensor-demos (warehouse-A, warehouse-B, outdoor-yard) each with its own bridge, all ingesting into one shared mosaicod. Produces three heterogeneous fault snapshots (LiDAR noise, IMU failure, LiDAR drift) under distinct `robot_id` metadata so cross-robot queries actually have something to filter on.
11+
12+
## Quick start
13+
14+
```bash
15+
git clone https://github.com/selfpatch/selfpatch_demos.git
16+
cd selfpatch_demos/demos/mosaico_integration
17+
18+
# Bring up postgres + mosaicod + sensor-demo + bridge
19+
docker compose up -d
20+
21+
# Wait until everything is healthy (sensor-demo healthcheck takes ~30s on first run)
22+
docker compose ps
23+
24+
# Inject the LiDAR HIGH_NOISE fault. The bridge picks up the SSE event,
25+
# downloads the bag and ingests it into mosaicod within ~5s.
26+
./scripts/trigger-fault.sh
27+
28+
# Watch the bridge do its thing
29+
docker compose logs -f bridge
30+
31+
# Open the notebook (locally or via VS Code)
32+
jupyter notebook notebooks/mosaico_demo.ipynb
33+
```
34+
35+
The notebook connects to `localhost:16726` (mosaicod Arrow Flight) and runs four queries against your freshly-ingested fault snapshot, ending with a three-panel time-series plot showing the LiDAR noise spike alongside a stationary IMU - exactly the kind of cross-topic forensic correlation Mosaico is designed for.
36+
37+
### Fleet variant
38+
39+
```bash
40+
# Three robots + three bridges + shared mosaicod. Ports 18081/2/3 for the
41+
# robots, 16726 for mosaicod (same as single-robot).
42+
docker compose -f docker-compose.fleet.yml up -d
43+
44+
# Wait ~25s after all containers are healthy so the ring buffer has a
45+
# pre-fault baseline, then inject three different fault signatures:
46+
# robot-01 warehouse-A : LiDAR noise_stddev = 0.5 (range std spike)
47+
# robot-02 warehouse-B : IMU failure
48+
# robot-03 outdoor-yard: LiDAR drift_rate = 0.5 (range mean shift)
49+
./scripts/trigger-fleet-faults.sh
50+
51+
# Each bridge independently ingests its robot's snapshot. After ~45s you
52+
# will have three Sequences in mosaicod with distinct robot_id metadata.
53+
docker compose -f docker-compose.fleet.yml logs -f bridge-01 bridge-02 bridge-03
54+
55+
docker compose -f docker-compose.fleet.yml down -v
56+
```
57+
58+
## Architecture
59+
60+
```
61+
docker compose stack (network: mosaico_m0_net)
62+
63+
┌─────────────┐ ┌─────────────────────────┐
64+
│ postgres │◄────────┤ mosaicod │
65+
│ :5432 │ │ ghcr.io/mosaico-labs/ │
66+
│ (internal) │ │ mosaicod:v0.3.0 │
67+
└─────────────┘ │ --local-store /data │
68+
│ port 6726 ──> host:16726
69+
└─────────────────────────┘
70+
71+
│ Arrow Flight
72+
│ (RosbagInjector via mosaicolabs SDK)
73+
74+
┌─────────────────────────┐ ┌────────┴────────┐
75+
│ sensor-demo │ │ bridge │
76+
│ built from sibling │◄─►│ python:3.11 │
77+
│ ../sensor_diagnostics/ │ │ + mosaicolabs │
78+
│ │ │ pinned 8e090cd
79+
│ - ros2_medkit gateway │ │ + httpx │
80+
│ - lidar/imu/gps/camera │ │ │
81+
│ sim nodes │ │ Subscribes: │
82+
│ - diagnostic_bridge │ │ GET /api/v1/ │
83+
│ - fault_manager │ │ faults/ │
84+
│ - rosbag ring buffer │ │ stream │
85+
│ (10s pre + 2s post) │ │ (SSE) │
86+
│ │ │ │
87+
│ port 8080 ──> host:18080│ │ Downloads: │
88+
└─────────────────────────┘ │ GET /api/v1/ │
89+
│ apps/ │
90+
│ diagnostic- │
91+
│ bridge/ │
92+
│ bulk-data/ │
93+
│ rosbags/... │
94+
└─────────────────┘
95+
```
96+
97+
**License-safe**: mosaicod runs as the unmodified upstream Docker image. The bridge is a separate Python process that talks Apache Arrow Flight (a public Apache standard) to mosaicod via Mosaico's own Python SDK. We never link or modify mosaicod or its Rust crates.
98+
99+
## What lands in Mosaico (verified end to end on this stack)
100+
101+
| ROS topic | ROS message type | Mosaico ontology | Status |
102+
|------------------------|----------------------------------------|---------------------------------|-------------------------------------|
103+
| `/sensors/scan` | `sensor_msgs/msg/LaserScan` | `LaserScan` (`futures.laser`) | ✅ via [Mosaico PR #368][pr368] |
104+
| `/sensors/imu` | `sensor_msgs/msg/Imu` | `IMU` | ✅ shipped adapter |
105+
| `/sensors/fix` | `sensor_msgs/msg/NavSatFix` | `GPS` | ✅ shipped adapter |
106+
| `/sensors/image_raw` | `sensor_msgs/msg/Image` | `Image` | ✅ shipped adapter |
107+
| `/diagnostics` | `diagnostic_msgs/msg/DiagnosticArray` | (none) | ⚠️ silently dropped, no adapter yet |
108+
109+
The `/diagnostics` drop is the only gap. We use it on the medkit side to flag the fault (via diagnostic_bridge → fault_manager), but it does not reach Mosaico storage. For M1 we would either ship a `DiagnosticArray` adapter or define a dedicated `MedkitFault` ontology and write its adapter.
110+
111+
## Mosaico SDK pin: PR #368 (LaserScanAdapter)
112+
113+
The `LaserScanAdapter` we need lives in [mosaico-labs/mosaico#368][pr368], still in `draft` at the time we built this stack. The bridge `Dockerfile` pins to commit `8e090cd` and installs the SDK in editable mode from a local clone. Once PR #368 lands in a Mosaico release we will swap the pin for a clean `pip install mosaicolabs==<version>`.
114+
115+
The same draft works on the **read** side too, as long as the consumer imports `mosaicolabs.models.futures.laser` to register the `laser_scan` ontology in the global registry. The notebook does this in its first cell.
116+
117+
[pr368]: https://github.com/mosaico-labs/mosaico/pull/368
118+
119+
## Smart snapshots, not 24/7 recording
120+
121+
This is the value prop: each entry in the Mosaico catalog is **only the 25 seconds around a confirmed fault** (15 s pre-fault baseline + 10 s post-fault), not hours of "nothing is happening" telemetry. Three fleet snapshots together weigh roughly 1.5 GB of MCAP - a naive 24/7 recording of the same four sensors at the same rates would be closer to 120 GB per robot per day. The pipeline preserves the signal and discards the noise.
122+
123+
## Files in this directory
124+
125+
| Path | What |
126+
|---|---|
127+
| `docker-compose.yml` | Single-robot stack: postgres + mosaicod + sensor-demo + bridge |
128+
| `docker-compose.fleet.yml` | Fleet stack: postgres + mosaicod + 3×(sensor-demo + bridge) |
129+
| `bridge/bridge.py` | Subscribes SSE, downloads bag, calls `RosbagInjector`. Honors `POST_FAULT_WAIT_SEC` (default 12s) before download so the post-fault ring segment is finalized |
130+
| `bridge/Dockerfile` | python:3.11-slim + Mosaico SDK pinned to PR #368 commit `8e090cd` |
131+
| `bridge/requirements.txt` | `httpx>=0.27,<0.30` |
132+
| `medkit_overrides/medkit_params.yaml` | Sensor-demo medkit config: 15s pre + 10s post ring buffer, single 2 GB bag cap, `auto_cleanup: false` |
133+
| `notebooks/mosaico_demo.ipynb` | Connect, list, query, plot - 7 cells |
134+
| `scripts/trigger-fault.sh` | Single-robot: inject high noise on lidar-sim on `localhost:18080` |
135+
| `scripts/trigger-fleet-faults.sh` | Fleet: inject three different fault signatures on robots 01/02/03 |
136+
| `docs/lidar_noise_plot.png` | The output of the notebook's money shot plot |
137+
138+
## Verified end-to-end on this machine
139+
140+
| What | Status |
141+
|---|---|
142+
| `docker compose build bridge` (PR #368 sanity import passes) ||
143+
| `docker compose up -d` brings four containers healthy ||
144+
| medkit gateway responds at `localhost:18080/api/v1/health` ||
145+
| `./scripts/trigger-fault.sh` injects fault, gateway returns CONFIRMED ||
146+
| Bridge SSE connects, picks up `fault_confirmed` event ||
147+
| Bridge resolves entity `apps/diagnostic-bridge` and downloads ~500 MB MCAP (25 s of four sensors) ||
148+
| `RosbagInjector` finalizes 4 TopicWriters (`/sensors/{scan,imu,fix,image_raw}`) ||
149+
| `MosaicoClient.list_sequences()` shows the new sequence within ~25 s of fault confirmation ||
150+
| Notebook reads back `LaserScan` data with `range_min`, `range_max`, `ranges`, `intensities`, `frame_id` populated ||
151+
| `IMU.Q.acceleration.z.between(...)` filter returns sequences ||
152+
153+
## Known surprises (we hit them so you don't have to)
154+
155+
1. **Medkit gateway path prefix**: SSE lives at `GET /api/v1/faults/stream`, not `GET /faults/stream`. Same for `/api/v1/components/...`. The bridge bakes the prefix into every URL.
156+
2. **`reporting_sources` ≠ SOVD entity ID**: medkit reports the ROS publisher node name (`/bridge/diagnostic_bridge`), not the SOVD entity that owns the bag. The bridge enumerates apps + components via the gateway and HEAD-probes for `bulk-data/rosbags/{fault_code}` until one returns 200.
157+
3. **Faults from the legacy diagnostic path land under `apps/diagnostic-bridge`**, not `apps/lidar-sim`. The diagnostic_bridge node is what owns the snapshot bag in this demo.
158+
4. **Mosaico read-side registry**: even with PR #368 installed, you must `import mosaicolabs.models.futures.laser` before reading `LaserScan` data. Otherwise the topic reader raises `No ontology registered with tag 'laser_scan'`. The bridge does not need this (write side resolves adapters by ROS msg type) but the notebook does.
159+
5. **Not all 5 listed topics actually land in Mosaico**: `/diagnostics` drops silently because no adapter is registered. The medkit ring buffer captures it; Mosaico just does not know what to do with it. See the table above.
160+
6. **Initial post-fault wait**: medkit holds the rosbag2 writer open for `duration_after_sec` (10s in this config) after `fault_confirmed`. The bridge waits `POST_FAULT_WAIT_SEC` seconds (default 12) before downloading so the trailing ring segment is finalized.
161+
7. **Gateway port conflict on dev boxes**: the single-robot stack publishes on `18080` and `16726`; the fleet stack uses `18081/18082/18083` for the three gateways with `16726` shared by mosaicod. Adjust if you prefer defaults.
162+
8. **`rosbag2` file splitting**: if `max_bag_size_mb` is hit mid-recording, `rosbag2` splits into `_0.mcap`, `_1.mcap`, ... and the medkit gateway serves only the first split. The 2 GB cap in `medkit_params.yaml` is there to prevent splitting for any realistic 25 s snapshot.
163+
164+
## Troubleshooting
165+
166+
- `docker compose build bridge` fails on the import sanity check → PR #368 has been force-pushed or branch was deleted. Update the `MOSAICO_PIN` build arg in `bridge/Dockerfile` to a current commit on `issue/367`.
167+
- `./scripts/trigger-fault.sh` returns curl error 22 → the gateway is up but needs `{"execution_type": "now"}` in the POST body. The script already does that; verify your gateway is actually `localhost:18080`.
168+
- Bridge logs `Could not resolve entity for fault_code=X` → enumerate `/api/v1/apps` and `/api/v1/components` manually with `curl` and check whether any of them list the fault under `bulk-data/rosbags`. If none do, the gateway has not registered the bag yet (post-fault timer hasn't fired). Wait a few seconds and re-trigger.
169+
- Notebook raises `No ontology registered with tag 'laser_scan'` → the `import mosaicolabs.models.futures.laser` cell did not run. Re-run it.
170+
- `docker compose pull mosaicod` is slow on first run → the upstream image is ~110 MB, distroless. Expect 30-90s on a slow link.
171+
172+
## Cleanup
173+
174+
```bash
175+
docker compose down -v # removes containers + named volumes
176+
```
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Bridge container for Mosaico M0 demo.
2+
#
3+
# Subscribes to medkit /faults/stream SSE, downloads each fault snapshot
4+
# bag from the gateway REST API, and ingests it into mosaicod via the
5+
# mosaicolabs Python SDK over Apache Arrow Flight.
6+
#
7+
# License-safe pattern: bridge is a separate Python process that talks
8+
# the public Apache Arrow Flight protocol to an unmodified mosaicod
9+
# Docker image. We do not link or modify mosaicod or its Rust crates.
10+
#
11+
# We pin to PR #368 commit 8e090cd because it adds the LaserScanAdapter
12+
# we need (sensor_msgs/msg/LaserScan -> Mosaico futures.LaserScan
13+
# ontology). Once PR #368 lands in a Mosaico release, this will be
14+
# replaced with a plain `pip install mosaicolabs==<version>`.
15+
16+
FROM python:3.11-slim
17+
18+
ENV PYTHONUNBUFFERED=1
19+
ENV PIP_NO_CACHE_DIR=1
20+
ENV PIP_DISABLE_PIP_VERSION_CHECK=1
21+
22+
# git is needed for `pip install -e <repo>` from a local checkout.
23+
RUN apt-get update \
24+
&& apt-get install -y --no-install-recommends git ca-certificates \
25+
&& rm -rf /var/lib/apt/lists/*
26+
27+
# Pin Mosaico SDK to PR #368 (LaserScanAdapter draft) commit 8e090cd.
28+
ARG MOSAICO_REPO=https://github.com/mosaico-labs/mosaico.git
29+
ARG MOSAICO_PIN=8e090cd
30+
RUN git clone "${MOSAICO_REPO}" /opt/mosaico \
31+
&& cd /opt/mosaico \
32+
&& git checkout "${MOSAICO_PIN}" \
33+
&& git rev-parse HEAD > /opt/mosaico/.pinned_sha
34+
35+
# Install the SDK in editable mode plus our own deps. Mosaico's SDK is a
36+
# Poetry project; pip can install it directly via the pyproject.toml.
37+
COPY requirements.txt /tmp/requirements.txt
38+
RUN pip install -r /tmp/requirements.txt \
39+
&& pip install /opt/mosaico/mosaico-sdk-py
40+
41+
# Sanity check at build time: import the SDK and the LaserScan ontology
42+
# so we fail fast if PR #368 ever drifts.
43+
RUN python -c "from mosaicolabs import MosaicoClient; from mosaicolabs.ros_bridge import RosbagInjector, ROSInjectionConfig; from mosaicolabs.models.futures.laser import LaserScan; print('mosaicolabs SDK + LaserScan ontology import OK')"
44+
45+
WORKDIR /app
46+
COPY bridge.py /app/bridge.py
47+
48+
CMD ["python", "-u", "/app/bridge.py"]

0 commit comments

Comments
 (0)