44
55[ Envoy AI Gateway] ( https://aigateway.envoyproxy.io/ ) is a gateway/proxy for AI/LLM API traffic
66(OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, Google Gemini, etc.) built on top of Envoy Proxy.
7- It natively emits GenAI metrics and access logs via OTLP, following
8- [ OpenTelemetry GenAI Semantic Conventions] ( https://opentelemetry.io/docs/specs/semconv/gen-ai/ ) .
7+ It natively emits GenAI metrics following
8+ [ OpenTelemetry GenAI Semantic Conventions] ( https://opentelemetry.io/docs/specs/semconv/gen-ai/ ) ,
9+ and also emits MCP (Model Context Protocol) metrics and access logs via OTLP.
910
1011SkyWalking receives OTLP metrics and logs directly on its gRPC port (11800) — no OpenTelemetry
1112Collector is needed between the AI Gateway and SkyWalking OAP.
@@ -15,7 +16,7 @@ Collector is needed between the AI Gateway and SkyWalking OAP.
1516 [ Envoy AI Gateway getting started] ( https://aigateway.envoyproxy.io/docs/getting-started/ ) for installation.
1617
1718### Data flow
18- 1 . Envoy AI Gateway processes LLM API requests and records GenAI metrics (token usage, latency, TTFT, TPOT) .
19+ 1 . Envoy AI Gateway processes LLM API requests and MCP requests, recording GenAI metrics and MCP metrics .
19202 . The AI Gateway pushes metrics and access logs via OTLP gRPC to SkyWalking OAP.
20213 . SkyWalking OAP parses metrics with [ MAL] ( ../../concepts-and-designs/mal.md ) rules and access logs
2122 with [ LAL] ( ../../concepts-and-designs/lal.md ) rules.
@@ -27,14 +28,14 @@ in SkyWalking OAP. No OAP-side configuration is needed.
2728
2829Configure the AI Gateway to push OTLP to SkyWalking by setting these environment variables:
2930
30- | Env Var | Value | Purpose |
31- | ---------| -------| ---------|
32- | ` OTEL_SERVICE_NAME ` | Per-deployment gateway name (e.g., ` my-ai-gateway ` ) | SkyWalking service name |
33- | ` OTEL_EXPORTER_OTLP_ENDPOINT ` | ` http://skywalking-oap:11800 ` | SkyWalking OAP gRPC receiver |
34- | ` OTEL_EXPORTER_OTLP_PROTOCOL ` | ` grpc ` | OTLP transport |
35- | ` OTEL_METRICS_EXPORTER ` | ` otlp ` | Enable OTLP metrics push |
36- | ` OTEL_LOGS_EXPORTER ` | ` otlp ` | Enable OTLP access log push |
37- | ` OTEL_RESOURCE_ATTRIBUTES ` | See below | Routing + instance + layer |
31+ | Env Var | Value | Purpose |
32+ | ------------------------------- | ----------------------------------------------------- | --------------------- ---------|
33+ | ` OTEL_SERVICE_NAME ` | Per-deployment gateway name (e.g., ` my-ai-gateway ` ) | SkyWalking service name |
34+ | ` OTEL_EXPORTER_OTLP_ENDPOINT ` | ` http://skywalking-oap:11800 ` | SkyWalking OAP gRPC receiver |
35+ | ` OTEL_EXPORTER_OTLP_PROTOCOL ` | ` grpc ` | OTLP transport |
36+ | ` OTEL_METRICS_EXPORTER ` | ` otlp ` | Enable OTLP metrics push |
37+ | ` OTEL_LOGS_EXPORTER ` | ` otlp ` | Enable OTLP access log push |
38+ | ` OTEL_RESOURCE_ATTRIBUTES ` | See below | Routing + instance + layer |
3839
3940** Required resource attributes** (in ` OTEL_RESOURCE_ATTRIBUTES ` ):
4041- ` job_name=envoy-ai-gateway ` — Fixed routing tag for MAL/LAL rules. Same for all AI Gateway deployments.
@@ -58,47 +59,86 @@ is a service, each pod is an instance. Metrics include per-provider and per-mode
5859
5960#### Service Metrics
6061
61- | Monitoring Panel | Unit | Metric Name | Description |
62- | ---| ---| ---| ---|
63- | Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per minute |
64- | Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average request duration |
65- | Request Latency Percentile | ms | meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 |
66- | Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input (prompt) tokens per minute |
67- | Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per minute |
68- | TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming only) |
69- | TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | P50/P75/P90/P95/P99 TTFT |
70- | TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token (streaming only) |
71- | TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | P50/P75/P90/P95/P99 TPOT |
62+ | Monitoring Panel | Unit | Metric Name | Description |
63+ | ---------------------------- | ------------ | ---------------------------------------------- | ------------------------------------- ---|
64+ | Request CPM | calls/min | meter_envoy_ai_gw_request_cpm | Requests per minute |
65+ | Request Latency Avg | ms | meter_envoy_ai_gw_request_latency_avg | Average request duration |
66+ | Request Latency Percentile | ms | meter_envoy_ai_gw_request_latency_percentile | P50/P75/P90/P95/P99 |
67+ | Input Token Rate | tokens/min | meter_envoy_ai_gw_input_token_rate | Input (prompt) tokens per minute |
68+ | Output Token Rate | tokens/min | meter_envoy_ai_gw_output_token_rate | Output (completion) tokens per minute |
69+ | TTFT Avg | ms | meter_envoy_ai_gw_ttft_avg | Time to First Token (streaming only) |
70+ | TTFT Percentile | ms | meter_envoy_ai_gw_ttft_percentile | P50/P75/P90/P95/P99 TTFT |
71+ | TPOT Avg | ms | meter_envoy_ai_gw_tpot_avg | Time Per Output Token (streaming only) |
72+ | TPOT Percentile | ms | meter_envoy_ai_gw_tpot_percentile | P50/P75/P90/P95/P99 TPOT |
7273
7374#### Provider Breakdown Metrics
7475
75- | Monitoring Panel | Unit | Metric Name | Description |
76- | ---| ---| ---| ---|
77- | Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | Requests by provider |
78- | Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | Token rate by provider |
79- | Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency by provider |
76+ | Monitoring Panel | Unit | Metric Name | Description |
77+ | ---------------------- | ------------ | ---------------------------------------- | --------------------- ---|
78+ | Provider Request CPM | calls/min | meter_envoy_ai_gw_provider_request_cpm | Requests by provider |
79+ | Provider Token Rate | tokens/min | meter_envoy_ai_gw_provider_token_rate | Token rate by provider |
80+ | Provider Latency Avg | ms | meter_envoy_ai_gw_provider_latency_avg | Latency by provider |
8081
8182#### Model Breakdown Metrics
8283
83- | Monitoring Panel | Unit | Metric Name | Description |
84- | ---| ---| ---| ---|
85- | Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | Requests by model |
86- | Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token rate by model |
87- | Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by model |
88- | Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model |
89- | Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model |
84+ | Monitoring Panel | Unit | Metric Name | Description |
85+ | ------------------- | ------------ | ------------------------------------- | ------------------ ---|
86+ | Model Request CPM | calls/min | meter_envoy_ai_gw_model_request_cpm | Requests by model |
87+ | Model Token Rate | tokens/min | meter_envoy_ai_gw_model_token_rate | Token rate by model |
88+ | Model Latency Avg | ms | meter_envoy_ai_gw_model_latency_avg | Latency by model |
89+ | Model TTFT Avg | ms | meter_envoy_ai_gw_model_ttft_avg | TTFT by model |
90+ | Model TPOT Avg | ms | meter_envoy_ai_gw_model_tpot_avg | TPOT by model |
9091
9192#### Instance Metrics
9293
9394All service-level metrics are also available per instance (pod) with ` meter_envoy_ai_gw_instance_ ` prefix,
9495including per-provider and per-model breakdowns.
9596
97+ ### MCP Metrics
98+
99+ When the AI Gateway is configured with MCP (Model Context Protocol) routes, SkyWalking collects
100+ MCP-specific metrics. These appear in the ** MCP** tab on the service and instance dashboards.
101+
102+ #### MCP Service Metrics
103+
104+ | Monitoring Panel | Unit | Metric Name | Description |
105+ | ---------------------------------------| -----------| ---------------------------------------------------------| -------------------------------------------------------------------|
106+ | MCP Request CPM | calls/min | meter_envoy_ai_gw_mcp_request_cpm | MCP requests per minute |
107+ | MCP Request Latency Avg | ms | meter_envoy_ai_gw_mcp_request_latency_avg | Average MCP request duration |
108+ | MCP Request Latency Percentile | ms | meter_envoy_ai_gw_mcp_request_latency_percentile | P50/P75/P90/P95/P99 |
109+ | MCP Method CPM | calls/min | meter_envoy_ai_gw_mcp_method_cpm | Requests by MCP method (initialize, tools/list, tools/call, etc.) |
110+ | MCP Error CPM | calls/min | meter_envoy_ai_gw_mcp_error_cpm | MCP error requests per minute |
111+ | MCP Initialization Latency Avg | ms | meter_envoy_ai_gw_mcp_initialization_latency_avg | Average MCP session initialization time |
112+ | MCP Initialization Latency Percentile | ms | meter_envoy_ai_gw_mcp_initialization_latency_percentile | P50/P75/P90/P95/P99 |
113+ | MCP Capabilities CPM | calls/min | meter_envoy_ai_gw_mcp_capabilities_cpm | Capabilities negotiated by type |
114+
115+ #### MCP Backend Breakdown Metrics
116+
117+ | Monitoring Panel | Unit | Metric Name | Description |
118+ | --------------------------| -----------| ----------------------------------------------------------| --------------------------------|
119+ | Backend Request CPM | calls/min | meter_envoy_ai_gw_mcp_backend_request_cpm | Requests by MCP backend |
120+ | Backend Latency Avg | ms | meter_envoy_ai_gw_mcp_backend_request_latency_avg | Latency by MCP backend |
121+ | Backend Method CPM | calls/min | meter_envoy_ai_gw_mcp_backend_method_cpm | Requests by backend and method |
122+ | Backend Error CPM | calls/min | meter_envoy_ai_gw_mcp_backend_error_cpm | Errors by MCP backend |
123+ | Backend Init Latency Avg | ms | meter_envoy_ai_gw_mcp_backend_initialization_latency_avg | Init latency by backend |
124+
125+ #### MCP Instance Metrics
126+
127+ All MCP service-level metrics are also available per instance with ` meter_envoy_ai_gw_mcp_instance_ ` prefix.
128+
96129### Access Log Sampling
97130
98- The LAL rules apply a sampling policy to reduce storage:
131+ Access logs are tagged with ` ai_route_type ` (` llm ` or ` mcp ` ) for filtering in the log query UI.
132+ The ` ai_route_type ` tag is searchable by default.
133+
134+ ** LLM route logs:**
99135- ** Error responses** (HTTP status >= 400) — always persisted.
100136- ** Upstream failures** — always persisted.
101137- ** High token cost** (>= 10,000 total tokens) — persisted for cost anomaly detection.
102138- Normal successful responses with low token counts are dropped.
103139
104- The token threshold can be adjusted in ` lal/envoy-ai-gateway.yaml ` .
140+ ** MCP route logs:**
141+ - ** Error responses** (HTTP status >= 400) — always persisted.
142+ - Normal MCP requests are dropped (MCP observability is covered by metrics).
143+
144+ The sampling policy can be adjusted in ` lal/envoy-ai-gateway.yaml ` .
0 commit comments