device-context-protocol
diff --git a/‎docs/paper/figures/README.md‎
Lines changed: 19 additions & 12 deletions b/‎docs/paper/figures/README.md‎
Lines changed: 19 additions & 12 deletions
diff --git a/‎docs/paper/figures/latency.pdf‎
15 KB b/‎docs/paper/figures/latency.pdf‎
15 KB
diff --git a/‎docs/paper/figures/latency.png‎
22.7 KB b/‎docs/paper/figures/latency.png‎
22.7 KB
diff --git a/‎docs/paper/figures/latency_data.json‎
Lines changed: 53 additions & 0 deletions b/‎docs/paper/figures/latency_data.json‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎docs/paper/figures/make_figures.py‎
Lines changed: 61 additions & 37 deletions b/‎docs/paper/figures/make_figures.py‎
Lines changed: 61 additions & 37 deletions
diff --git a/‎docs/paper/main.tex‎
Lines changed: 23 additions & 10 deletions b/‎docs/paper/main.tex‎
Lines changed: 23 additions & 10 deletions
@@ -26,18 +26,25 @@ README / web preview) into this directory.
 
 ## What's synthetic vs measured
 
-**As of paper v0.1**: every numeric value in figures 3–5 is illustrative.
-The architecture diagram and the wire-format byte counts (Fig. 1, top half
-of Fig. 2) are exact.
-
-Once the hardware campaign is complete:
-
-- Fig. 3 — replace DCP "target" bars with measured values across
-  ESP32, ESP32-C3, nRF52840.
-- Fig. 4 — replace the synthetic rejection-rate values with measured
-  results from running 1000 LLM-generated adversarial calls against each
-  baseline; will need to define the attack-generation procedure.
-- Fig. 5 — replace with median + IQR from 1000 round-trips per transport.
+- **Fig. 1 (arch), Fig. 2 (wire format)** — exact. The diagram and the
+  byte counts are derived from the protocol itself.
+- **Fig. 5 (latency)** — **measured.** `fig_latency()` reads
+  `latency_data.json`, produced by `tools/bench_latency.py`: 1000 timed
+  round-trips per transport, median + IQR. Currently covers loopback,
+  ESP32-WROOM-32 (CH340), and ESP32-S3 (native USB).
+- **Fig. 3 (footprint)** — partially synthetic. The DCP bar is a design
+  target; the IoT-MCP / Direct-MCP / Matter bars are cited or typical.
+  Needs a rework: the DCP layer is now measured (~14 KB over an empty
+  sketch) and the figure should distinguish measured-DCP from cited
+  baselines cleanly, including the flash-vs-RAM axis.
+- **Fig. 4 (hallucination)** — synthetic, and labelled as such in the
+  figure footnote. Making it real needs an LLM adversarial-call
+  benchmark (≈1000 generated calls per baseline, with a defined
+  attack-generation procedure) — this is the v0.4 paper campaign.
+
+To re-measure latency: connect the board, run
+`python tools/bench_latency.py --serial <PORT> --label "..." --key <key>`,
+then `python make_figures.py`.
 
 Edit `make_figures.py` directly — each figure is one function. Re-run.
 Don't hand-edit the PDFs.
@@ -0,0 +1,53 @@
+{
+  "loopback": {
+    "n": 1000,
+    "min": 0.0205,
+    "max": 0.6076,
+    "mean": 0.0258,
+    "median": 0.0231,
+    "p50": 0.0231,
+    "p90": 0.028,
+    "p99": 0.0809,
+    "stdev": 0.0207,
+    "q1": 0.0224,
+    "q3": 0.0241,
+    "iqr": 0.0017,
+    "label": "DCP loopback (in-process)",
+    "intent": "set_brightness",
+    "measured_at": "2026-05-22"
+  },
+  "uart_s3": {
+    "n": 1000,
+    "min": 14.5338,
+    "max": 16.7196,
+    "mean": 15.6137,
+    "median": 15.5984,
+    "p50": 15.5979,
+    "p90": 16.0979,
+    "p99": 16.5544,
+    "stdev": 0.3721,
+    "q1": 15.3554,
+    "q3": 15.875,
+    "iqr": 0.5196,
+    "label": "DCP UART 115200 (ESP32-S3, native USB)",
+    "intent": "set_brightness",
+    "measured_at": "2026-05-22"
+  },
+  "uart_wroom": {
+    "n": 1000,
+    "min": 14.6038,
+    "max": 16.7556,
+    "mean": 15.5651,
+    "median": 15.562,
+    "p50": 15.562,
+    "p90": 16.0722,
+    "p99": 16.52,
+    "stdev": 0.3791,
+    "q1": 15.274,
+    "q3": 15.8171,
+    "iqr": 0.5431,
+    "label": "DCP UART 115200 (ESP32-WROOM-32, CH340)",
+    "intent": "set_brightness",
+    "measured_at": "2026-05-22"
+  }
+}
@@ -12,6 +12,8 @@
 """
 from __future__ import annotations
 
+import json
+
 import matplotlib.pyplot as plt
 import matplotlib.patches as mpatches
 from matplotlib.patches import FancyBboxPatch, FancyArrowPatch
@@ -317,45 +319,67 @@ def fig_hallucination():
 
 
 # ---------------------------------------------------------------------------
-# Figure 5 — End-to-end latency by transport (illustrative).
+# Figure 5 — End-to-end latency by transport (measured).
+#
+# Data comes from latency_data.json, produced by tools/bench_latency.py.
+# Each entry is 1000 timed round-trips of a set_brightness call. The figure
+# is always derived from that recorded measurement — never hand-typed.
 
 def fig_latency():
-    fig, ax = plt.subplots(figsize=(6.5, 2.6))
-
-    transports = ["DCP\nloopback", "DCP\nUART 115200", "DCP\nMQTT (LAN)", "DCP\nBLE", "IoT-MCP\n[ref]"]
-    encode  = [0.3, 0.6,  1.0, 1.0, 3.0]
-    wire    = [0.0, 6.0,  3.0, 12.0, 12.0]
-    decode  = [0.4, 0.7,  0.8, 0.8, 4.0]
-    handler = [0.5, 0.5,  0.5, 0.5, 1.0]
-    response_wire = [0.0, 4.0, 2.5, 8.0, 10.0]
-    response_decode = [0.3, 0.6, 0.7, 0.7, 3.0]
-
-    layers = [
-        ("encode",          encode,          C["dcp_lt"]),
-        ("wire out",        wire,            C["dcp"]),
-        ("device decode",   decode,          C["rawmcp"]),
-        ("handler",         handler,         C["openapi"]),
-        ("wire back",       response_wire,   C["matter"]),
-        ("host decode",     response_decode, "#bbbbbb"),
-    ]
-
-    bottom = np.zeros(len(transports))
-    for name, vals, color in layers:
-        ax.bar(transports, vals, bottom=bottom, color=color, label=name,
-               edgecolor="white", linewidth=0.4, width=0.55)
-        bottom += np.array(vals)
-
-    for i, total in enumerate(bottom):
-        ax.text(i, total + 0.8, f"{total:.1f} ms",
-                ha="center", va="bottom", fontsize=8, color="#333", fontweight="bold")
-
-    ax.set_ylabel("end-to-end latency (ms)")
-    ax.set_ylim(0, max(bottom) * 1.18)
-    ax.legend(loc="upper left", frameon=False, ncol=3, fontsize=7,
-              bbox_to_anchor=(0.0, 1.15))
-    ax.tick_params(axis="x", length=0)
-    ax.set_title("End-to-end call latency, broken down (illustrative)",
-                 loc="left", pad=18, fontsize=10)
+    data_path = HERE / "latency_data.json"
+    if not data_path.exists():
+        raise FileNotFoundError(
+            "latency_data.json missing — run tools/bench_latency.py first "
+            "(--loopback and --serial <port>) to record measurements.")
+    data = json.loads(data_path.read_text(encoding="utf-8"))
+
+    # Baseline first, then the hardware transports.
+    order = ["loopback", "uart_wroom", "uart_s3"]
+    rows = [(k, data[k]) for k in order if k in data]
+    if not rows:
+        raise ValueError("latency_data.json has no recognised transport keys")
+
+    short = {
+        "loopback":   "loopback\n(in-process baseline)",
+        "uart_wroom": "UART 115200\nWROOM-32 · CH340",
+        "uart_s3":    "UART 115200\nESP32-S3 · native USB",
+    }
+    palette = {"loopback": C["dcp_lt"], "uart_wroom": C["dcp"], "uart_s3": C["dcp"]}
+
+    fig, ax = plt.subplots(figsize=(6.5, 2.5))
+    y = np.arange(len(rows))
+    medians  = [d["median"] for _, d in rows]
+    err_low  = [d["median"] - d["q1"] for _, d in rows]
+    err_high = [d["q3"] - d["median"] for _, d in rows]
+    colors   = [palette.get(k, C["dcp"]) for k, _ in rows]
+
+    ax.barh(y, medians, height=0.55, color=colors, edgecolor="white",
+            linewidth=0.5, xerr=[err_low, err_high],
+            error_kw=dict(ecolor="#333", capsize=3, lw=0.8))
+    ax.set_yticks(y)
+    ax.set_yticklabels([short.get(k, k) for k, _ in rows])
+    ax.invert_yaxis()
+    ax.set_xlim(0, max(medians) * 1.28)
+    ax.set_xlabel("round-trip latency (ms)  —  bar = median, whiskers = IQR")
+    ax.tick_params(axis="y", length=0)
+
+    for yi, (_, d) in zip(y, rows):
+        ax.text(d["median"] + max(medians) * 0.02, yi,
+                f"{d['median']:.2f} ms", va="center", ha="left",
+                fontsize=8, color="#333", fontweight="bold")
+
+    n = rows[0][1]["n"]
+    ax.set_title(f"Measured round-trip call latency  "
+                 f"(set_brightness, N={n} per transport)",
+                 loc="left", fontsize=9.5, pad=8)
+
+    fig.subplots_adjust(bottom=0.30, top=0.86, left=0.26, right=0.97)
+    fig.text(0.5, 0.03,
+             "Measured by tools/bench_latency.py. The loopback row is the "
+             "protocol's own encode/decode cost with no wire;\nthe two UART "
+             "rows are real hardware. CH340 and native-USB transports land "
+             "within 0.05 ms of each other.",
+             ha="center", va="bottom", fontsize=7, color="#666", style="italic")
     save(fig, "latency")
 
 
 
@@ -500,20 +500,33 @@ \subsection{Conformance suite}
 \texttt{dcp.framing.wrap/unwrap}. We expect future C, Rust, and Go ports
 to write equivalent runners.
 
-Figure~\ref{fig:latency} sketches the expected end-to-end latency profile
-across DCP's transports, decomposed into encode, wire-transit, decode,
-and handler-execution components. Wire-transit dominates on slow
-transports (UART, BLE); encode/decode are negligible. We include
-IoT-MCP's reported $\sim$\,205\,ms~\cite{iotmcp2025} as the reference
-baseline for the future measurement campaign.
+Figure~\ref{fig:latency} reports \emph{measured} end-to-end latency for a
+typical \texttt{set\_brightness} call--reply round-trip. Each bar is the
+median of 1000 timed calls; whiskers show the inter-quartile range. The
+measurements were taken with \texttt{tools/bench\_latency.py} against an
+in-process loopback transport and two physical boards. The loopback row
+isolates the protocol's own encode/decode cost at 0.02\,ms --- negligible
+relative to any real link. The two UART rows --- an ESP32-WROOM-32 behind
+a CH340 USB-UART bridge, and an ESP32-S3 over its native USB-Serial/JTAG
+interface --- both land at $\sim$15.6\,ms and within 0.05\,ms of each
+other. The round-trip is therefore dominated by serial-line transit and
+device-side handling, not by the host bridge or the choice of USB
+bridging chip.
+
+We do not plot a direct bar against IoT-MCP's reported
+$\sim$\,205\,ms~\cite{iotmcp2025}: that figure is an average response
+time measured under a different methodology and scope, and a fair
+head-to-head requires the controlled campaign described in our
+limitations.
 
 \begin{figure}[h]
 \centering
 \includegraphics[width=\linewidth]{latency.pdf}
-\caption{Anticipated end-to-end latency for a typical
-call--reply round-trip, broken down by stage. DCP figures are
-projections from microbenchmarks of the Python and C++ code paths in
-isolation; IoT-MCP is the reported value from~\cite{iotmcp2025}.}
+\caption{Measured end-to-end latency for a typical call--reply
+round-trip. Bars are medians over 1000 timed calls per transport;
+whiskers are the inter-quartile range. Source data:
+\texttt{docs/paper/figures/latency\_data.json}, produced by
+\texttt{tools/bench\_latency.py}.}
 \label{fig:latency}
 \end{figure}