Skip to content

Commit 59ca90d

Browse files
committed
feat(otel): optional OpenTelemetry export — metrics (alt to Prometheus) + GenAI spans
Add an optional OTLP export path, in two independent off-by-default parts. Metrics export (mockserver.otelMetricsEnabled) — OtelMetricsExporter bridges the existing Metrics.Name gauges (the same set exposed for Prometheus, incl. the LLM and chaos counters) to OTLP as observable gauges that read the current values, so the Prometheus and OTLP views stay consistent. An alternative to the Prometheus endpoint, not a replacement. GenAI span export (mockserver.otelTracesEnabled) — HttpLlmResponseActionHandler calls GenAiSpans.recordCompletion on each served completion (streaming and non-streaming), emitting one span with GenAI semantic-convention attributes: gen_ai.system (mapped to the semconv registry value per provider, not the raw enum name), gen_ai.request.model, gen_ai.usage.input/output_tokens, gen_ai.response.finish_reasons (string[]), and a namespaced mockserver.gen_ai.tool_call_count. Explicit spans MockServer codes — NO auto-instrumentation, and no prompt/response content is captured (privacy). Both: OTLP HTTP/protobuf with the JDK HttpClient sender (okhttp sender excluded, no gRPC); share mockserver.otelEndpoint (a base collector URL, /v1/metrics and /v1/traces appended by OtelEndpoints); started/stopped in LifeCycle; fail-soft (a setup error logs one line and never stops the server or affects a response). io.opentelemetry is relocated to shaded_package.io.opentelemetry in the uber-jar (verified: the shaded mockserver-netty-no-dependencies jar builds cleanly). Deps: opentelemetry-bom 1.45.0; api, sdk-common, sdk-metrics, sdk-trace, exporter-otlp (+ sender-jdk), sdk-testing (test). Docs: docs/code/llm-mocking.md (OpenTelemetry section + source refs), consumer configuration-properties page, roadmap, changelog. Tests: OtelMetricsExporterTest (observable gauge reads the metric; disabled→null), GenAiSpansTest (semconv attributes incl. string[] finish_reasons, provider→system mapping, no-op when disabled), OtelEndpointsTest (base/trailing-slash/signal-path edge cases). All green.
1 parent b2acd76 commit 59ca90d

16 files changed

Lines changed: 666 additions & 2 deletions

File tree

changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
## [Unreleased]
88

99
### Added
10+
- Added optional **OpenTelemetry (OTLP) export**, in two independent, off-by-default parts. (1) **Metrics export** — MockServer's existing metrics (the same explicitly-defined gauges already exposed for Prometheus: `REQUESTS_RECEIVED_COUNT`, `RESPONSE_EXPECTATIONS_MATCHED_COUNT`, the LLM/SSE/chaos counters, etc.) can also be pushed to an OTLP collector as an alternative to Prometheus (`mockserver.otelMetricsEnabled`). Implemented as OTel observable gauges reading the current values, so the Prometheus and OTLP views stay in lock-step. (2) **GenAI span export** — MockServer emits one explicit OpenTelemetry GenAI semantic-convention span per LLM completion it serves (`gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`, `gen_ai.response.finish_reasons`, tool-call count) (`mockserver.otelTracesEnabled`). These are spans MockServer codes deliberately — **no auto-instrumentation** is added. Both use the OTLP HTTP/protobuf exporter with the JDK HttpClient sender (no gRPC/OkHttp), share `mockserver.otelEndpoint`, and are fail-soft (a setup error logs one line and never stops the server or affects a response). `io.opentelemetry.*` is relocated in the shaded JAR. See the configuration properties page.
1011
- Added **drift detection** for LLM fixtures (`detect_llm_drift` MCP tool): replays a recorded cassette's exchanges against the live provider (via the runtime-LLM client SPI) and reports **structural** drift — new/removed fields and type changes in the responses — not semantic differences, so benign wording changes never flag. Built on a reusable, pure `StructuralShapeDiff` and a `DriftDetector` that **fails closed** per exchange (a network error or non-2xx live response is reported as could-not-check, never as drift, never thrown). Off unless a runtime backend is configured. Intended for an opt-in/scheduled CI lane (real API keys + tokens), never the per-commit build. See the AI/MCP tools page and `docs/code/llm-mocking.md`.
1112
- Completed the **VCR (record/replay) toolkit** for LLM fixtures with three additions. (1) **Strict mode**`load_expectations_from_file` accepts `strict` (or set `mockserver.llmVcrStrict`), which registers a low-priority catch-all per cassette path so a request matching no recorded fixture returns HTTP 599 instead of silently falling through. (2) **Body-field redaction**`record_llm_fixtures` accepts `redactBodyFields` (or set `mockserver.fixtureBodyRedactFields`) to redact named JSON fields from recorded request/response bodies, complementing the existing header redaction. (3) **Replay field normalisation**`load_expectations_from_file` accepts `normalizeRequestBodyFields` to drop volatile JSON fields from each recorded request body and match the remainder loosely (ignoring extra fields), so per-run values (request ids, timestamps) do not block replay. These are operational settings exposed via config and MCP. See the AI/MCP tools and configuration properties pages.
1213
- Added declarative **LLM fault/chaos profiles** for resilience testing, attachable to any mock LLM response (`mock_llm_completion`, each `create_llm_conversation` turn, the Java `LlmConversationBuilder`, and raw expectation JSON via a `chaos` block). Supports probabilistic provider errors (e.g. 429/529 with a `Retry-After` header), mid-stream truncation of an SSE stream (keep a leading fraction of events), and appending a malformed (broken-JSON) SSE chunk. Errors are deterministic at probability 0.0/1.0 and reproducible at fractional probabilities via a `seed`; truncation and malformed-SSE are always deterministic. A new `LLM_CHAOS_INJECTED_COUNT` metric tracks injections. The dashboard conversation wizard exposes the profile per turn. See the AI/MCP tools page and `docs/code/llm-mocking.md`.

docs/code/llm-mocking.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,15 @@ Adding a provider = implement `LlmClient` + one `register(...)` line — the sam
300300

301301
This SPI is never on the deterministic assertion/matching path. The features that consume it (drift detection, semantic matching) are tracked in `docs/plans/mockserver-llm-mocking.md`.
302302

303+
## OpenTelemetry export
304+
305+
Optional, off-by-default OTLP export, in two independent parts (both fail-soft — a setup error logs one line and never affects the server or a response; `io.opentelemetry` is relocated in the shaded jar):
306+
307+
- **Metrics** (`org.mockserver.metrics.OtelMetricsExporter`, `mockserver.otelMetricsEnabled`) — bridges the existing `Metrics.Name` gauges (the same set exposed for Prometheus, including the LLM/SSE/chaos counters) to OTLP as observable gauges that read the current values, so Prometheus and OTLP stay consistent. An alternative to the Prometheus endpoint.
308+
- **GenAI spans** (`org.mockserver.telemetry.GenAiSpanExporter` + `GenAiSpans`, `mockserver.otelTracesEnabled`) — `HttpLlmResponseActionHandler` calls `GenAiSpans.recordCompletion(provider, model, completion)` on each served completion (streaming and non-streaming), emitting one span with GenAI semantic-convention attributes (`gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`, tool-call count). These are spans MockServer codes deliberately — **no auto-instrumentation**. `GenAiSpans` is a process-wide no-op until `GenAiSpanExporter` installs a tracer.
309+
310+
Both use the OTLP HTTP/protobuf exporter with the JDK HttpClient sender (no gRPC/OkHttp) and share `mockserver.otelEndpoint` (a base collector URL; `/v1/metrics` and `/v1/traces` appended per signal, resolved by `telemetry.OtelEndpoints`).
311+
303312
## Drift detection
304313

305314
`detect_llm_drift` (MCP) closes the loop on stale cassettes: it replays a recorded cassette's exchanges against the **live** provider and reports structural drift in the responses. Built from two pieces in `org.mockserver.llm.drift`:
@@ -381,3 +390,5 @@ Key source files under `mockserver/mockserver-core/src/main/java/org/mockserver/
381390
| `fixture/FixtureRedactor.java` | Masks sensitive headers and (optional) JSON body fields when recording fixtures |
382391
| `llm/drift/StructuralShapeDiff.java` | Pure JSON shape diff (added/removed/type-changed paths) |
383392
| `llm/drift/DriftDetector.java` + `DriftReport.java` | Replays a cassette against the live provider and reports structural drift, fail-closed |
393+
| `metrics/OtelMetricsExporter.java` | Optional OTLP metrics export bridging the Prometheus gauges (off by default) |
394+
| `telemetry/GenAiSpanExporter.java` + `GenAiSpans.java` + `OtelEndpoints.java` | Optional explicit GenAI span export per served completion (off by default) |

docs/plans/mockserver-llm-mocking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ The original RFC (RFC-1 LLM Response Builder + RFC-2 Stateful Scripted Conversat
3333
| 8 | MCP/A2A conformance contract testing (`run_mcp_contract_test`) | ❌ Not started — see Item assessments below |
3434
| 9a | Normalised prompt matching (deterministic) | ✅ Shipped — `NormalizationOptions` on `ConversationPredicates`, applied by `PromptNormalizer` before the text predicates (whitespace/case/JSON-key-sort/volatile-field drop) |
3535
| 9b | Semantic prompt matching (runtime LLM / embeddings) | ❌ Not started — opt-in only, not for assertions |
36-
| 10 | OTel GenAI / OpenInference span export | ❌ Not started |
36+
| 10 | OTel GenAI span export (+ metrics export) | ✅ Shipped — explicit GenAI spans per served completion (`GenAiSpanExporter`/`GenAiSpans`, `mockserver.otelTracesEnabled`) AND metrics export bridging the existing Prometheus gauges to OTLP (`OtelMetricsExporter`, `mockserver.otelMetricsEnabled`), as an alternative to Prometheus. Explicit spans only — no auto-instrumentation. Both off by default, fail-soft, OTLP HTTP/JDK-sender, `io.opentelemetry` relocated in shade. |
3737
| 11 | Correlated agent-run session / call-graph view | ❌ Not started |
3838
| 12 | Prompt-injection / adversarial-response harness | ❌ Not started |
3939
| 13 | Drift detection (fixtures vs real API in CI) | ✅ Shipped — `detect_llm_drift` MCP tool over `StructuralShapeDiff` + `DriftDetector` (replays cassette via runtime-LLM SPI, diffs response shape, fails closed); off unless a backend resolves; opt-in CI lane |

jekyll-www.mock-server.com/mock_server/configuration_properties.html

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -888,6 +888,26 @@ <h2>Streaming Proxy Configuration:</h2>
888888
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.llmVcrStrict="true"</code></pre>
889889
</div>
890890

891+
<button id="button_configuration_otel" class="accordion title"><strong>OpenTelemetry Export (Metrics &amp; GenAI Spans)</strong></button>
892+
<div class="panel title">
893+
<p>MockServer can export to an OpenTelemetry (OTLP) collector, in two independent parts that are each <strong>off by default</strong> and <strong>fail-soft</strong> (a startup error logs one line and never stops the server or affects a response). Both use the OTLP HTTP/protobuf exporter with the JDK HTTP client (no gRPC/OkHttp) and share the same endpoint.</p>
894+
<p><strong>1. Metrics export</strong> &mdash; push MockServer's explicitly-defined metrics (request counts, expectation-match counts, action counts including the LLM and chaos counters) to OTLP, as an alternative to the Prometheus endpoint. Implemented as observable gauges reading the current values, so the Prometheus and OTLP views stay consistent. It does <strong>not</strong> add tracing or automatic instrumentation.</p>
895+
<p>Type: <span class="keyword">boolean</span> Default: <span class="this_value">false</span></p>
896+
<pre class="prettyprint lang-java code"><code class="code">ConfigurationProperties.otelMetricsEnabled(boolean enabled)</code></pre>
897+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.otelMetricsEnabled=... &nbsp; MOCKSERVER_OTEL_METRICS_ENABLED=...</code></pre>
898+
<p>Export interval (seconds), default <span class="this_value">60</span>:</p>
899+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.otelMetricsExportIntervalSeconds=... &nbsp; MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS=...</code></pre>
900+
<p><strong>2. GenAI span export</strong> &mdash; emit one OpenTelemetry GenAI semantic-convention span per LLM completion MockServer serves, carrying provider (<span class="inline_code">gen_ai.system</span>), model, token usage and finish reason. These are spans MockServer codes deliberately &mdash; <strong>no</strong> auto-instrumentation is added.</p>
901+
<p>Type: <span class="keyword">boolean</span> Default: <span class="this_value">false</span></p>
902+
<pre class="prettyprint lang-java code"><code class="code">ConfigurationProperties.otelTracesEnabled(boolean enabled)</code></pre>
903+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.otelTracesEnabled=... &nbsp; MOCKSERVER_OTEL_TRACES_ENABLED=...</code></pre>
904+
<p><strong>OTLP endpoint (shared)</strong> &mdash; the collector base URL (e.g. <span class="inline_code">http://localhost:4318</span>); the <span class="inline_code">/v1/metrics</span> and <span class="inline_code">/v1/traces</span> paths are appended per signal. Empty uses the OTLP default (<span class="inline_code">http://localhost:4318</span>).</p>
905+
<pre class="prettyprint lang-java code"><code class="code">ConfigurationProperties.otelEndpoint(String baseUrl)</code></pre>
906+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.otelEndpoint=... &nbsp; MOCKSERVER_OTEL_ENDPOINT=...</code></pre>
907+
<p>Example (both signals to a collector):</p>
908+
<pre class="code" style="padding: 2px;"><code class="code">-Dmockserver.otelMetricsEnabled="true" -Dmockserver.otelTracesEnabled="true" -Dmockserver.otelEndpoint="http://otel-collector:4318"</code></pre>
909+
</div>
910+
891911
<button id="button_configuration_stream_idle_timeout_seconds" class="accordion title"><strong>Streaming Response Idle Timeout</strong></button>
892912
<div class="panel title">
893913
<p>The maximum time in seconds a streaming response connection may be idle (no chunk received from the upstream server) before MockServer closes it and logs the captured portion as truncated. This replaces the fixed global socket timeout for streaming responses, which would otherwise terminate long-lived LLM completions. The timeout resets on every chunk received, so a slow-but-active stream is never cut off prematurely.</p>

mockserver/mockserver-core/pom.xml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,45 @@
287287
<artifactId>prometheus-metrics-model</artifactId>
288288
</dependency>
289289

290+
<!-- opentelemetry (optional metrics export; off by default).
291+
OTLP HTTP/protobuf via the JDK HttpClient sender — okhttp sender excluded
292+
to avoid pulling okhttp + kotlin-stdlib into the shaded jar. -->
293+
<dependency>
294+
<groupId>io.opentelemetry</groupId>
295+
<artifactId>opentelemetry-api</artifactId>
296+
</dependency>
297+
<dependency>
298+
<groupId>io.opentelemetry</groupId>
299+
<artifactId>opentelemetry-sdk-common</artifactId>
300+
</dependency>
301+
<dependency>
302+
<groupId>io.opentelemetry</groupId>
303+
<artifactId>opentelemetry-sdk-metrics</artifactId>
304+
</dependency>
305+
<dependency>
306+
<groupId>io.opentelemetry</groupId>
307+
<artifactId>opentelemetry-sdk-trace</artifactId>
308+
</dependency>
309+
<dependency>
310+
<groupId>io.opentelemetry</groupId>
311+
<artifactId>opentelemetry-exporter-otlp</artifactId>
312+
<exclusions>
313+
<exclusion>
314+
<groupId>io.opentelemetry</groupId>
315+
<artifactId>opentelemetry-exporter-sender-okhttp</artifactId>
316+
</exclusion>
317+
</exclusions>
318+
</dependency>
319+
<dependency>
320+
<groupId>io.opentelemetry</groupId>
321+
<artifactId>opentelemetry-exporter-sender-jdk</artifactId>
322+
</dependency>
323+
<dependency>
324+
<groupId>io.opentelemetry</groupId>
325+
<artifactId>opentelemetry-sdk-testing</artifactId>
326+
<scope>test</scope>
327+
</dependency>
328+
290329
<!-- test -->
291330
<dependency>
292331
<groupId>junit</groupId>

mockserver/mockserver-core/src/main/java/org/mockserver/configuration/ConfigurationProperties.java

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,10 @@ public class ConfigurationProperties {
9393
private static final String MOCKSERVER_LLM_REQUEST_TIMEOUT_MILLIS = "mockserver.llmRequestTimeoutMillis";
9494
private static final String MOCKSERVER_FIXTURE_BODY_REDACT_FIELDS = "mockserver.fixtureBodyRedactFields";
9595
private static final String MOCKSERVER_LLM_VCR_STRICT = "mockserver.llmVcrStrict";
96+
private static final String MOCKSERVER_OTEL_METRICS_ENABLED = "mockserver.otelMetricsEnabled";
97+
private static final String MOCKSERVER_OTEL_TRACES_ENABLED = "mockserver.otelTracesEnabled";
98+
private static final String MOCKSERVER_OTEL_ENDPOINT = "mockserver.otelEndpoint";
99+
private static final String MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS = "mockserver.otelMetricsExportIntervalSeconds";
96100
private static final String MOCKSERVER_USE_SEMICOLON_AS_QUERY_PARAMETER_SEPARATOR = "mockserver.useSemicolonAsQueryParameterSeparator";
97101
private static final String MOCKSERVER_ASSUME_ALL_REQUESTS_ARE_HTTP = "mockserver.assumeAllRequestsAreHttp";
98102
private static final String MOCKSERVER_HTTP2_ENABLED = "mockserver.http2Enabled";
@@ -1068,6 +1072,60 @@ public static void llmVcrStrict(boolean strict) {
10681072
setProperty(MOCKSERVER_LLM_VCR_STRICT, "" + strict);
10691073
}
10701074

1075+
/**
1076+
* When true, MockServer's explicitly-defined metrics (the same gauges exposed
1077+
* for Prometheus) are also exported via OpenTelemetry OTLP. Off by default.
1078+
* No spans or auto-instrumentation are added — metrics only.
1079+
*/
1080+
public static boolean otelMetricsEnabled() {
1081+
return Boolean.parseBoolean(readPropertyHierarchically(PROPERTIES, MOCKSERVER_OTEL_METRICS_ENABLED, "MOCKSERVER_OTEL_METRICS_ENABLED", "" + false));
1082+
}
1083+
1084+
public static void otelMetricsEnabled(boolean enabled) {
1085+
setProperty(MOCKSERVER_OTEL_METRICS_ENABLED, "" + enabled);
1086+
}
1087+
1088+
/**
1089+
* When true, MockServer emits explicit GenAI semantic-convention spans for LLM
1090+
* traffic it serves (one span per completion, carrying provider, model, token
1091+
* usage and finish reason) via OpenTelemetry OTLP. Off by default. These are
1092+
* spans MockServer codes deliberately — no auto-instrumentation is added.
1093+
*/
1094+
public static boolean otelTracesEnabled() {
1095+
return Boolean.parseBoolean(readPropertyHierarchically(PROPERTIES, MOCKSERVER_OTEL_TRACES_ENABLED, "MOCKSERVER_OTEL_TRACES_ENABLED", "" + false));
1096+
}
1097+
1098+
public static void otelTracesEnabled(boolean enabled) {
1099+
setProperty(MOCKSERVER_OTEL_TRACES_ENABLED, "" + enabled);
1100+
}
1101+
1102+
/**
1103+
* Base OTLP HTTP endpoint for the collector (e.g. {@code http://localhost:4318}).
1104+
* The {@code /v1/metrics} and {@code /v1/traces} paths are appended per signal.
1105+
* Empty uses the OTLP exporter defaults ({@code http://localhost:4318}). A value
1106+
* that already ends in {@code /v1/metrics} or {@code /v1/traces} is accepted and
1107+
* normalised to the base.
1108+
*/
1109+
public static String otelEndpoint() {
1110+
return readPropertyHierarchically(PROPERTIES, MOCKSERVER_OTEL_ENDPOINT, "MOCKSERVER_OTEL_ENDPOINT", "");
1111+
}
1112+
1113+
public static void otelEndpoint(String endpoint) {
1114+
setProperty(MOCKSERVER_OTEL_ENDPOINT, endpoint);
1115+
}
1116+
1117+
/**
1118+
* How often (seconds) OTel metrics are exported. Default 60.
1119+
*/
1120+
public static long otelMetricsExportIntervalSeconds() {
1121+
// clamp to >= 1s; a zero/negative interval would make PeriodicMetricReader throw
1122+
return Math.max(1L, readLongProperty(MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS, "MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS", 60L));
1123+
}
1124+
1125+
public static void otelMetricsExportIntervalSeconds(long seconds) {
1126+
setProperty(MOCKSERVER_OTEL_METRICS_EXPORT_INTERVAL_SECONDS, "" + seconds);
1127+
}
1128+
10711129
public static long regexMatchingTimeoutMillis() {
10721130
return readLongProperty(MOCKSERVER_REGEX_MATCHING_TIMEOUT_MILLIS, "MOCKSERVER_REGEX_MATCHING_TIMEOUT_MILLIS", 5000L);
10731131
}

0 commit comments

Comments
 (0)