Skip to content

Commit 37e7d8d

Browse files
WaylandYangclaude
andcommitted
paper: replace illustrative latency figure with measured data
Figure 5 was illustrative — hand-typed numbers with a fabricated six-layer breakdown, captioned "Anticipated". This commit replaces it with real measurements. New tool: tools/bench_latency.py issues N timed bridge.call() round-trips over a chosen transport, reports median / p50 / p90 / p99 / IQR, and records the result into docs/paper/figures/latency_data.json (one keyed entry per transport, merged across runs). Measured (1000 samples each, set_brightness call): loopback (in-process) median 0.02 ms — protocol only UART WROOM-32 (CH340) median 15.56 ms UART ESP32-S3 (native USB) median 15.60 ms Finding worth noting: the CH340 USB-UART bridge and the S3's native USB-Serial/JTAG land within 0.05 ms of each other — the round-trip is dominated by serial transit and device-side handling, not by the host bridge or the bridging chip. The protocol's own encode/decode is 0.02 ms, i.e. negligible. make_figures.py: fig_latency() now reads latency_data.json and plots median bars with IQR whiskers — the figure is derived from a recorded, version-controlled measurement, never hand-typed. The fabricated 6-layer stack is gone (per-layer timing can't be measured without instrumenting the firmware; honest totals are better than fake breakdowns). main.tex: the §latency paragraph and caption rewritten from "Anticipated / projections" to measured. No direct latency bar against IoT-MCP's 205 ms — that figure has a different measurement scope; a fair head-to-head is left to the limitations campaign. figures/README.md: synthetic-vs-measured status updated. Fig 5 is now measured; Fig 3 (footprint) and Fig 4 (hallucination) remain synthetic and are documented as such. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent eb41f2d commit 37e7d8d

7 files changed

Lines changed: 371 additions & 59 deletions

File tree

docs/paper/figures/README.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,25 @@ README / web preview) into this directory.
2626

2727
## What's synthetic vs measured
2828

29-
**As of paper v0.1**: every numeric value in figures 3–5 is illustrative.
30-
The architecture diagram and the wire-format byte counts (Fig. 1, top half
31-
of Fig. 2) are exact.
32-
33-
Once the hardware campaign is complete:
34-
35-
- Fig. 3 — replace DCP "target" bars with measured values across
36-
ESP32, ESP32-C3, nRF52840.
37-
- Fig. 4 — replace the synthetic rejection-rate values with measured
38-
results from running 1000 LLM-generated adversarial calls against each
39-
baseline; will need to define the attack-generation procedure.
40-
- Fig. 5 — replace with median + IQR from 1000 round-trips per transport.
29+
- **Fig. 1 (arch), Fig. 2 (wire format)** — exact. The diagram and the
30+
byte counts are derived from the protocol itself.
31+
- **Fig. 5 (latency)****measured.** `fig_latency()` reads
32+
`latency_data.json`, produced by `tools/bench_latency.py`: 1000 timed
33+
round-trips per transport, median + IQR. Currently covers loopback,
34+
ESP32-WROOM-32 (CH340), and ESP32-S3 (native USB).
35+
- **Fig. 3 (footprint)** — partially synthetic. The DCP bar is a design
36+
target; the IoT-MCP / Direct-MCP / Matter bars are cited or typical.
37+
Needs a rework: the DCP layer is now measured (~14 KB over an empty
38+
sketch) and the figure should distinguish measured-DCP from cited
39+
baselines cleanly, including the flash-vs-RAM axis.
40+
- **Fig. 4 (hallucination)** — synthetic, and labelled as such in the
41+
figure footnote. Making it real needs an LLM adversarial-call
42+
benchmark (≈1000 generated calls per baseline, with a defined
43+
attack-generation procedure) — this is the v0.4 paper campaign.
44+
45+
To re-measure latency: connect the board, run
46+
`python tools/bench_latency.py --serial <PORT> --label "..." --key <key>`,
47+
then `python make_figures.py`.
4148

4249
Edit `make_figures.py` directly — each figure is one function. Re-run.
4350
Don't hand-edit the PDFs.

docs/paper/figures/latency.pdf

15 KB
Binary file not shown.

docs/paper/figures/latency.png

22.7 KB
Loading
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
{
2+
"loopback": {
3+
"n": 1000,
4+
"min": 0.0205,
5+
"max": 0.6076,
6+
"mean": 0.0258,
7+
"median": 0.0231,
8+
"p50": 0.0231,
9+
"p90": 0.028,
10+
"p99": 0.0809,
11+
"stdev": 0.0207,
12+
"q1": 0.0224,
13+
"q3": 0.0241,
14+
"iqr": 0.0017,
15+
"label": "DCP loopback (in-process)",
16+
"intent": "set_brightness",
17+
"measured_at": "2026-05-22"
18+
},
19+
"uart_s3": {
20+
"n": 1000,
21+
"min": 14.5338,
22+
"max": 16.7196,
23+
"mean": 15.6137,
24+
"median": 15.5984,
25+
"p50": 15.5979,
26+
"p90": 16.0979,
27+
"p99": 16.5544,
28+
"stdev": 0.3721,
29+
"q1": 15.3554,
30+
"q3": 15.875,
31+
"iqr": 0.5196,
32+
"label": "DCP UART 115200 (ESP32-S3, native USB)",
33+
"intent": "set_brightness",
34+
"measured_at": "2026-05-22"
35+
},
36+
"uart_wroom": {
37+
"n": 1000,
38+
"min": 14.6038,
39+
"max": 16.7556,
40+
"mean": 15.5651,
41+
"median": 15.562,
42+
"p50": 15.562,
43+
"p90": 16.0722,
44+
"p99": 16.52,
45+
"stdev": 0.3791,
46+
"q1": 15.274,
47+
"q3": 15.8171,
48+
"iqr": 0.5431,
49+
"label": "DCP UART 115200 (ESP32-WROOM-32, CH340)",
50+
"intent": "set_brightness",
51+
"measured_at": "2026-05-22"
52+
}
53+
}

docs/paper/figures/make_figures.py

Lines changed: 61 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
"""
1313
from __future__ import annotations
1414

15+
import json
16+
1517
import matplotlib.pyplot as plt
1618
import matplotlib.patches as mpatches
1719
from matplotlib.patches import FancyBboxPatch, FancyArrowPatch
@@ -317,45 +319,67 @@ def fig_hallucination():
317319

318320

319321
# ---------------------------------------------------------------------------
320-
# Figure 5 — End-to-end latency by transport (illustrative).
322+
# Figure 5 — End-to-end latency by transport (measured).
323+
#
324+
# Data comes from latency_data.json, produced by tools/bench_latency.py.
325+
# Each entry is 1000 timed round-trips of a set_brightness call. The figure
326+
# is always derived from that recorded measurement — never hand-typed.
321327

322328
def fig_latency():
323-
fig, ax = plt.subplots(figsize=(6.5, 2.6))
324-
325-
transports = ["DCP\nloopback", "DCP\nUART 115200", "DCP\nMQTT (LAN)", "DCP\nBLE", "IoT-MCP\n[ref]"]
326-
encode = [0.3, 0.6, 1.0, 1.0, 3.0]
327-
wire = [0.0, 6.0, 3.0, 12.0, 12.0]
328-
decode = [0.4, 0.7, 0.8, 0.8, 4.0]
329-
handler = [0.5, 0.5, 0.5, 0.5, 1.0]
330-
response_wire = [0.0, 4.0, 2.5, 8.0, 10.0]
331-
response_decode = [0.3, 0.6, 0.7, 0.7, 3.0]
332-
333-
layers = [
334-
("encode", encode, C["dcp_lt"]),
335-
("wire out", wire, C["dcp"]),
336-
("device decode", decode, C["rawmcp"]),
337-
("handler", handler, C["openapi"]),
338-
("wire back", response_wire, C["matter"]),
339-
("host decode", response_decode, "#bbbbbb"),
340-
]
341-
342-
bottom = np.zeros(len(transports))
343-
for name, vals, color in layers:
344-
ax.bar(transports, vals, bottom=bottom, color=color, label=name,
345-
edgecolor="white", linewidth=0.4, width=0.55)
346-
bottom += np.array(vals)
347-
348-
for i, total in enumerate(bottom):
349-
ax.text(i, total + 0.8, f"{total:.1f} ms",
350-
ha="center", va="bottom", fontsize=8, color="#333", fontweight="bold")
351-
352-
ax.set_ylabel("end-to-end latency (ms)")
353-
ax.set_ylim(0, max(bottom) * 1.18)
354-
ax.legend(loc="upper left", frameon=False, ncol=3, fontsize=7,
355-
bbox_to_anchor=(0.0, 1.15))
356-
ax.tick_params(axis="x", length=0)
357-
ax.set_title("End-to-end call latency, broken down (illustrative)",
358-
loc="left", pad=18, fontsize=10)
329+
data_path = HERE / "latency_data.json"
330+
if not data_path.exists():
331+
raise FileNotFoundError(
332+
"latency_data.json missing — run tools/bench_latency.py first "
333+
"(--loopback and --serial <port>) to record measurements.")
334+
data = json.loads(data_path.read_text(encoding="utf-8"))
335+
336+
# Baseline first, then the hardware transports.
337+
order = ["loopback", "uart_wroom", "uart_s3"]
338+
rows = [(k, data[k]) for k in order if k in data]
339+
if not rows:
340+
raise ValueError("latency_data.json has no recognised transport keys")
341+
342+
short = {
343+
"loopback": "loopback\n(in-process baseline)",
344+
"uart_wroom": "UART 115200\nWROOM-32 · CH340",
345+
"uart_s3": "UART 115200\nESP32-S3 · native USB",
346+
}
347+
palette = {"loopback": C["dcp_lt"], "uart_wroom": C["dcp"], "uart_s3": C["dcp"]}
348+
349+
fig, ax = plt.subplots(figsize=(6.5, 2.5))
350+
y = np.arange(len(rows))
351+
medians = [d["median"] for _, d in rows]
352+
err_low = [d["median"] - d["q1"] for _, d in rows]
353+
err_high = [d["q3"] - d["median"] for _, d in rows]
354+
colors = [palette.get(k, C["dcp"]) for k, _ in rows]
355+
356+
ax.barh(y, medians, height=0.55, color=colors, edgecolor="white",
357+
linewidth=0.5, xerr=[err_low, err_high],
358+
error_kw=dict(ecolor="#333", capsize=3, lw=0.8))
359+
ax.set_yticks(y)
360+
ax.set_yticklabels([short.get(k, k) for k, _ in rows])
361+
ax.invert_yaxis()
362+
ax.set_xlim(0, max(medians) * 1.28)
363+
ax.set_xlabel("round-trip latency (ms) — bar = median, whiskers = IQR")
364+
ax.tick_params(axis="y", length=0)
365+
366+
for yi, (_, d) in zip(y, rows):
367+
ax.text(d["median"] + max(medians) * 0.02, yi,
368+
f"{d['median']:.2f} ms", va="center", ha="left",
369+
fontsize=8, color="#333", fontweight="bold")
370+
371+
n = rows[0][1]["n"]
372+
ax.set_title(f"Measured round-trip call latency "
373+
f"(set_brightness, N={n} per transport)",
374+
loc="left", fontsize=9.5, pad=8)
375+
376+
fig.subplots_adjust(bottom=0.30, top=0.86, left=0.26, right=0.97)
377+
fig.text(0.5, 0.03,
378+
"Measured by tools/bench_latency.py. The loopback row is the "
379+
"protocol's own encode/decode cost with no wire;\nthe two UART "
380+
"rows are real hardware. CH340 and native-USB transports land "
381+
"within 0.05 ms of each other.",
382+
ha="center", va="bottom", fontsize=7, color="#666", style="italic")
359383
save(fig, "latency")
360384

361385

docs/paper/main.tex

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -500,20 +500,33 @@ \subsection{Conformance suite}
500500
\texttt{dcp.framing.wrap/unwrap}. We expect future C, Rust, and Go ports
501501
to write equivalent runners.
502502

503-
Figure~\ref{fig:latency} sketches the expected end-to-end latency profile
504-
across DCP's transports, decomposed into encode, wire-transit, decode,
505-
and handler-execution components. Wire-transit dominates on slow
506-
transports (UART, BLE); encode/decode are negligible. We include
507-
IoT-MCP's reported $\sim$\,205\,ms~\cite{iotmcp2025} as the reference
508-
baseline for the future measurement campaign.
503+
Figure~\ref{fig:latency} reports \emph{measured} end-to-end latency for a
504+
typical \texttt{set\_brightness} call--reply round-trip. Each bar is the
505+
median of 1000 timed calls; whiskers show the inter-quartile range. The
506+
measurements were taken with \texttt{tools/bench\_latency.py} against an
507+
in-process loopback transport and two physical boards. The loopback row
508+
isolates the protocol's own encode/decode cost at 0.02\,ms --- negligible
509+
relative to any real link. The two UART rows --- an ESP32-WROOM-32 behind
510+
a CH340 USB-UART bridge, and an ESP32-S3 over its native USB-Serial/JTAG
511+
interface --- both land at $\sim$15.6\,ms and within 0.05\,ms of each
512+
other. The round-trip is therefore dominated by serial-line transit and
513+
device-side handling, not by the host bridge or the choice of USB
514+
bridging chip.
515+
516+
We do not plot a direct bar against IoT-MCP's reported
517+
$\sim$\,205\,ms~\cite{iotmcp2025}: that figure is an average response
518+
time measured under a different methodology and scope, and a fair
519+
head-to-head requires the controlled campaign described in our
520+
limitations.
509521

510522
\begin{figure}[h]
511523
\centering
512524
\includegraphics[width=\linewidth]{latency.pdf}
513-
\caption{Anticipated end-to-end latency for a typical
514-
call--reply round-trip, broken down by stage. DCP figures are
515-
projections from microbenchmarks of the Python and C++ code paths in
516-
isolation; IoT-MCP is the reported value from~\cite{iotmcp2025}.}
525+
\caption{Measured end-to-end latency for a typical call--reply
526+
round-trip. Bars are medians over 1000 timed calls per transport;
527+
whiskers are the inter-quartile range. Source data:
528+
\texttt{docs/paper/figures/latency\_data.json}, produced by
529+
\texttt{tools/bench\_latency.py}.}
517530
\label{fig:latency}
518531
\end{figure}
519532

0 commit comments

Comments
 (0)