Skip to content

Commit e3a2ff7

Browse files
vastinADOT Patch workflow
andauthored
fix(serviceevents): align route/operation labels with Application Signals (#784)
## Description Service Events emitted operation/route labels that diverged from CloudWatch Application Signals for two cases. This PR aligns Service Events **to** App Signals (one-way; App Signals behavior is untouched). ### 1. Django leading slash Django routes were prepended with a leading slash (`GET /billings/<int:owner_id>/...`), whereas App Signals emits them slash-less (`GET billings/<int:owner_id>/...`). The slash-prepend in `django_instrumentation` is removed so matched Django routes match App Signals. ### 2. Unmatched-route sentinel Unmatched requests (404s, scanner/bot traffic) were collapsed to a `<unmatched>` sentinel, while App Signals collapses an unresolved span name to its **first path segment**. The sentinel is replaced with a shared `unmatched_route_label()` that delegates to App Signals' `extract_api_path_value`, so both produce the same first-segment label (`/wp-admin/setup.php` → `/wp-admin`) across Flask, FastAPI, and Django. This keeps the cardinality bound on scanner traffic while matching App Signals. ### Docs Because Django routes are now recorded slash-less, `config.py` documents that Django endpoint include/exclude patterns and latency thresholds must omit the leading slash (e.g. `GET api/*`), while Flask/FastAPI keep it (e.g. `GET /api/*`). ## Testing - `pytest` serviceevents suite: 714 passing - `black --check`, `isort --check`, `codespell`: clean - Contract-test helpers updated (`route_label`/`operation_label`; Django overrides return slash-less). ## Note App Signals' extractor (`_aws_span_processing_util.extract_api_path_value`) is **not** modified — Service Events only delegates to it. --------- Co-authored-by: ADOT Patch workflow <adot-patch-workflow@github.com>
1 parent e9750eb commit e3a2ff7

11 files changed

Lines changed: 165 additions & 84 deletions

File tree

aws-opentelemetry-distro/src/amazon/opentelemetry/distro/serviceevents/config.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,10 @@ class ServiceEventsConfig:
186186
# Per-endpoint latency thresholds for latency-triggered incident snapshots
187187
# Format: "METHOD /route:threshold_ms,METHOD /route:threshold_ms,..."
188188
# Example: "POST /api/checkout:500,GET /api/health:50,GET /api/reports:5000"
189+
# Django note: routes are matched WITHOUT a leading slash (e.g. "api/checkout"), to
190+
# mirror Application Signals. Write Django patterns slash-less, e.g. "POST api/checkout:500".
191+
# Exception: unmatched/scanner requests (no route) are recorded with a leading-slash
192+
# first-segment label (e.g. "/wp-admin"), so thresholds targeting them keep the slash.
189193
latency_thresholds: List[str] = field(default_factory=list) # OTEL_AWS_SERVICE_EVENTS_LATENCY_THRESHOLDS
190194

191195
# Incident Snapshot Request Payload Capture Settings. Hardcoded off — no longer a
@@ -195,6 +199,12 @@ class ServiceEventsConfig:
195199
# Endpoint Filtering - glob patterns in format "METHOD /route" or "* /route" or "METHOD *"
196200
# If include_patterns is set, only track matching endpoints; then exclude_patterns removes from that set
197201
# Example: "GET /api/*,POST /api/*" or "* /health,* /metrics"
202+
# Django note: routes are recorded WITHOUT a leading slash (e.g. "api/users"), to mirror
203+
# Application Signals. Write Django patterns slash-less, e.g. "GET api/*"; Flask/FastAPI
204+
# routes carry the leading slash, so their patterns keep it, e.g. "GET /api/*".
205+
# Exception: unmatched/scanner requests (no route) are recorded with a leading-slash
206+
# first-segment label (e.g. "/wp-admin"), so filters targeting them keep the slash even
207+
# on Django, e.g. "* /wp-admin".
198208
endpoint_include_patterns: List[str] = field(
199209
default_factory=list
200210
) # OTEL_AWS_SERVICE_EVENTS_ENDPOINT_INCLUDE_PATTERNS
@@ -430,6 +440,12 @@ def get_latency_threshold_patterns(self) -> List[Tuple[str, float]]:
430440
- "GET /api/*:100" - any GET to /api/* routes
431441
- "* *:200" - all endpoints (catch-all)
432442
443+
Django note: routes are recorded WITHOUT a leading slash (to mirror Application
444+
Signals), so Django thresholds must omit it, e.g. "GET api/users:500". Flask/FastAPI
445+
routes carry the leading slash, e.g. "GET /api/users:500". Exception: unmatched/scanner
446+
requests (no route) are recorded with a leading-slash first-segment label (e.g.
447+
"/wp-admin"), so thresholds targeting them keep the slash even on Django.
448+
433449
Returns:
434450
List of (pattern, threshold_ms) tuples. Order matters - first match wins.
435451
"""
@@ -481,8 +497,14 @@ def should_track_endpoint(self, route: str, method: str) -> bool:
481497
- "?" matches any single character
482498
- Format: "METHOD /route" (e.g., "GET /api/*", "* /health", "POST /api/users")
483499
500+
Django note: routes are recorded WITHOUT a leading slash (to mirror Application
501+
Signals), so Django patterns must omit it, e.g. "GET api/*". Flask/FastAPI routes
502+
carry the leading slash, e.g. "GET /api/*". Exception: unmatched/scanner requests
503+
(no route) are recorded with a leading-slash first-segment label (e.g. "/wp-admin"),
504+
so patterns targeting them keep the slash even on Django, e.g. "* /wp-admin".
505+
484506
Args:
485-
route: The endpoint route (e.g., "/api/users")
507+
route: The endpoint route (e.g., "/api/users" for Flask/FastAPI, "api/users" for Django)
486508
method: The HTTP method (e.g., "GET", "POST")
487509
488510
Returns:
Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,36 @@
11
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4-
"""Shared constants for the framework instrumentations (Flask, FastAPI, Django).
4+
"""Shared route helpers for the framework instrumentations (Flask, FastAPI, Django).
55
66
Keeping these in one module makes cross-framework behavior structural rather than a
77
promise enforced by comments — a single edit here changes every framework at once.
88
"""
99

10-
# Route label for a request that matched no URL rule (unmatched 404s, scanner/bot traffic
11-
# to nonexistent URLs like /wp-admin or /.env). Recording the raw path for these would make
12-
# every probed URL its own metric series — a cardinality explosion. Collapsing to one
13-
# sentinel keeps the 404-rate signal without the cardinality, identically across frameworks.
14-
#
15-
# Note: this value also flows through ServiceEventsConfig.should_track_endpoint(route, ...).
16-
# With the default (empty) endpoint filters every request is tracked, so unmatched requests
17-
# are still recorded under this label. A customer who sets endpoint_include_patterns scopes
18-
# tracking to matching routes only, so unmatched requests are intentionally excluded then.
19-
UNMATCHED_ROUTE = "<unmatched>"
10+
from typing import Optional
11+
12+
from amazon.opentelemetry.distro._aws_span_processing_util import extract_api_path_value
13+
14+
15+
def unmatched_route_label(raw_path: Optional[str]) -> str:
16+
"""Route label for a request that matched no URL rule (unmatched 404s, scanner/bot
17+
traffic to nonexistent URLs like /wp-admin or /.env).
18+
19+
Recording the raw path for these would make every probed URL its own metric series —
20+
a cardinality explosion. Instead, collapse to the first path segment, e.g.
21+
"/wp-admin/setup.php" -> "/wp-admin". This is exactly what Application Signals does
22+
for a span whose name can't be resolved to a route
23+
(_aws_span_processing_util.extract_api_path_value), so ServiceEvents and Application
24+
Signals produce the same operation label for unmatched requests. The first-segment
25+
bound keeps cardinality from a single nonexistent prefix in check while still
26+
distinguishing the common scanner targets.
27+
28+
Note: this value also flows through ServiceEventsConfig.should_track_endpoint(route,
29+
...). With the default (empty) endpoint filters every request is tracked, so unmatched
30+
requests are still recorded (under their first-segment label). A customer who sets
31+
endpoint_include_patterns scopes tracking to matching routes only, so unmatched
32+
requests are excluded unless a pattern matches the first-segment label.
33+
"""
34+
# extract_api_path_value treats None/"" as "/" and always returns a leading-slash,
35+
# first-segment value, identically across frameworks.
36+
return extract_api_path_value(raw_path or "")

aws-opentelemetry-distro/src/amazon/opentelemetry/distro/serviceevents/instrumentation/django_instrumentation.py

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import time
1818
from typing import Any, Optional
1919

20-
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import UNMATCHED_ROUTE
20+
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import unmatched_route_label
2121
from amazon.opentelemetry.distro.serviceevents.instrumentation.flask_instrumentation import (
2222
_extract_error_from_call_path,
2323
)
@@ -35,8 +35,8 @@
3535
_serviceevents_config = None
3636

3737
# Django only calls process_view after a URL resolves to a view, so an unmatched 404 never
38-
# gets a route stored and _finalize_request falls back to the shared UNMATCHED_ROUTE sentinel
39-
# (see _constants for why the raw path is not used).
38+
# gets a route stored and _finalize_request falls back to the shared first-segment unmatched
39+
# label derived from the raw path (see _constants for the cardinality rationale).
4040

4141

4242
def _get_request_body(request) -> Optional[Any]:
@@ -81,21 +81,24 @@ def _get_request_body(request) -> Optional[Any]:
8181

8282
def _get_route_pattern(request) -> str:
8383
"""
84-
Get the route pattern (e.g. /users/<int:id>/) from the Django request.
84+
Get the route pattern (e.g. users/<int:id>/) from the Django request.
8585
8686
Args:
8787
request: Django HttpRequest object
8888
8989
Returns:
90-
Route pattern string (e.g., "/users/<int:id>/")
90+
Route pattern string (e.g., "users/<int:id>/")
9191
"""
9292
# Try to get the route pattern from resolver_match (available after URL resolution)
9393
if hasattr(request, "resolver_match") and request.resolver_match is not None:
94-
# resolver_match.route is the URL pattern (e.g., "users/<int:id>/")
94+
# resolver_match.route is the URL pattern (e.g., "users/<int:id>/"). Django stores
95+
# it without a leading slash by convention; return it verbatim so the operation
96+
# label matches Application Signals, which derives the same value from span.name
97+
# (the upstream OTel Django instrumentation also leaves it slash-less). Flask and
98+
# FastAPI route patterns already carry a leading slash natively, so all three
99+
# frameworks agree with App Signals without further normalization here.
95100
route = getattr(request.resolver_match, "route", None)
96101
if route:
97-
if not route.startswith("/"):
98-
route = "/" + route
99102
return route
100103

101104
# Fallback to path_info
@@ -159,11 +162,13 @@ def _finalize_request(request, response, exception): # pylint: disable=too-many
159162

160163
# Get route and method from stored values (set in process_view). A missing
161164
# _serviceevents_route means process_view never ran — i.e. the URL matched no
162-
# urlpattern (unmatched 404). Use a single sentinel for those instead of the raw
163-
# path so scanner/bot traffic to nonexistent URLs can't explode metric cardinality.
165+
# urlpattern (unmatched 404). Collapse those to the first path segment instead of
166+
# the raw path so scanner/bot traffic to nonexistent URLs can't explode metric
167+
# cardinality, matching Application Signals' unmatched-route handling.
164168
route = getattr(request, "_serviceevents_route", None)
165169
if route is None:
166-
route = UNMATCHED_ROUTE
170+
raw_path = getattr(request, "path_info", None) or getattr(request, "path", None)
171+
route = unmatched_route_label(raw_path)
167172
method = getattr(request, "_serviceevents_method", request.method)
168173

169174
# Extract error info if error occurred

aws-opentelemetry-distro/src/amazon/opentelemetry/distro/serviceevents/instrumentation/fastapi_instrumentation.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from typing import Any, Dict, Optional
1818
from urllib.parse import parse_qs
1919

20-
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import UNMATCHED_ROUTE
20+
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import unmatched_route_label
2121
from amazon.opentelemetry.distro.serviceevents.instrumentation.flask_instrumentation import (
2222
_capture_active_trace_context,
2323
_extract_error_from_call_path,
@@ -98,9 +98,10 @@ def _get_route_pattern(scope) -> str:
9898
if resolved is not None:
9999
return resolved
100100

101-
# No route matched (404 / scanner traffic). Collapse to a single sentinel rather than
102-
# the raw path so probed URLs can't explode metric cardinality. Matches Flask/Django.
103-
return UNMATCHED_ROUTE
101+
# No route matched (404 / scanner traffic). Collapse to the first path segment rather
102+
# than the raw path so probed URLs can't explode metric cardinality. Matches
103+
# Flask/Django and Application Signals' unmatched-route handling.
104+
return unmatched_route_label(scope.get("path"))
104105

105106

106107
def _resolve_route_template(scope) -> Optional[str]:
@@ -304,8 +305,8 @@ async def send_wrapper(message):
304305

305306
# Re-resolve the route now that the app has run. At middleware entry
306307
# scope["route"] is unset, so `route` was whatever _get_route_pattern could
307-
# determine pre-routing: a pre-resolved template, or the <unmatched> sentinel
308-
# when nothing matched. Starlette/FastAPI routing populates scope["route"] in
308+
# determine pre-routing: a pre-resolved template, or the first-segment
309+
# unmatched label when nothing matched. Starlette/FastAPI routing populates scope["route"] in
309310
# place during dispatch, so by here it resolves to the template (/users/{id}).
310311
# Use the template for the exported endpoint telemetry below to avoid a
311312
# per-URL metric cardinality explosion. (The pre-await operation context

aws-opentelemetry-distro/src/amazon/opentelemetry/distro/serviceevents/instrumentation/flask_instrumentation.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
import time
1515
from typing import Any, Dict, Optional
1616

17-
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import UNMATCHED_ROUTE
17+
from amazon.opentelemetry.distro.serviceevents.instrumentation._constants import unmatched_route_label
1818
from amazon.opentelemetry.distro.serviceevents.instrumentation._safety import never_raises
1919
from amazon.opentelemetry.distro.serviceevents.python_monitor import (
2020
_ServiceEventsMonitorState,
@@ -349,9 +349,9 @@ def _get_route_pattern(request) -> str:
349349
return f"/{request.endpoint}"
350350

351351
# No url_rule and no endpoint means the URL matched no route (404 / scanner traffic).
352-
# Collapse to a single sentinel rather than the raw path so probed URLs can't explode
353-
# metric cardinality.
354-
return UNMATCHED_ROUTE
352+
# Collapse to the first path segment rather than the raw path so probed URLs can't
353+
# explode metric cardinality, matching Application Signals' unmatched-route handling.
354+
return unmatched_route_label(getattr(request, "path", None))
355355

356356

357357
def _extract_error_from_call_path(exception, route, method) -> Optional[Dict]:

aws-opentelemetry-distro/tests/amazon/opentelemetry/distro/serviceevents/instrumentation/test_django_instrumentation.py

Lines changed: 23 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@ class TestGetRoutePattern(TestCase):
2020
"""Tests for _get_route_pattern."""
2121

2222
def test_get_route_pattern_from_resolver_match(self):
23-
"""resolver_match.route is returned when available."""
23+
"""resolver_match.route is returned verbatim (slash-less, as Django stores it)."""
2424
request = MagicMock()
2525
request.resolver_match = MagicMock()
2626
request.resolver_match.route = "users/<int:id>"
2727
result = _get_route_pattern(request)
28-
self.assertEqual(result, "/users/<int:id>")
28+
self.assertEqual(result, "users/<int:id>")
2929

3030
def test_get_route_pattern_from_path_info(self):
3131
"""Falls back to path_info when resolver_match is None."""
@@ -44,13 +44,14 @@ def test_get_route_pattern_from_path(self):
4444
result = _get_route_pattern(request)
4545
self.assertEqual(result, "/api/users/42")
4646

47-
def test_get_route_pattern_adds_leading_slash(self):
48-
"""Leading slash is added if route doesn't start with one."""
47+
def test_get_route_pattern_preserves_slashless_route(self):
48+
"""A slash-less Django route is returned as-is (no leading slash added), to
49+
match Application Signals which derives the same value from span.name."""
4950
request = MagicMock()
5051
request.resolver_match = MagicMock()
5152
request.resolver_match.route = "api/orders"
5253
result = _get_route_pattern(request)
53-
self.assertEqual(result, "/api/orders")
54+
self.assertEqual(result, "api/orders")
5455

5556

5657
class TestGetRequestBody(TestCase):
@@ -404,17 +405,17 @@ def test_call_handles_exception(self):
404405

405406
@patch("amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.set_current_operation")
406407
def test_process_view_sets_operation_context(self, mock_set_operation):
407-
"""process_view calls set_current_operation with a METHOD /route string."""
408+
"""process_view calls set_current_operation with a "METHOD route" string."""
408409
mock_get_response = MagicMock()
409410
middleware = ServiceEventsDjangoMiddleware(mock_get_response)
410411
request = self._make_request()
411412

412413
middleware.process_view(request, MagicMock(), [], {})
413414

414415
mock_set_operation.assert_called_once()
415-
# The operation should be "METHOD /route"
416+
# The operation should be "METHOD route" (Django route is slash-less, matching App Signals)
416417
call_args = mock_set_operation.call_args[0][0]
417-
self.assertEqual(call_args, "GET /api/test")
418+
self.assertEqual(call_args, "GET api/test")
418419

419420
@patch("amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.set_current_operation")
420421
def test_process_view_begins_investigation(self, mock_set_operation):
@@ -503,7 +504,7 @@ def test_process_view_swallows_trace_capture_failure(self, mock_set_operation):
503504

504505
self.assertIsNone(result)
505506
# Setup block ran before the trace capture failed, so route context was stored.
506-
self.assertEqual(request._serviceevents_route, "/api/test")
507+
self.assertEqual(request._serviceevents_route, "api/test")
507508

508509
@patch(
509510
"amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.set_current_operation",
@@ -561,8 +562,9 @@ def _make_request(self, method="GET", path="/api/test", route="api/test", view_n
561562
request.META = {}
562563
request._serviceevents_start_time = 1000000000
563564
request._serviceevents_skip = False
564-
# Set serviceevents context attributes (normally set by process_view)
565-
request._serviceevents_route = "/" + route if not route.startswith("/") else route
565+
# Set serviceevents context attributes (normally set by process_view). The route is
566+
# stored verbatim (Django routes are slash-less, matching App Signals).
567+
request._serviceevents_route = route
566568
request._serviceevents_method = method
567569
request._serviceevents_path = path
568570
request._serviceevents_endpoint = view_name
@@ -588,7 +590,7 @@ def test_records_endpoint_metric(self, mock_clear):
588590

589591
mock_ec.record_request.assert_called_once()
590592
call_kwargs = mock_ec.record_request.call_args[1]
591-
self.assertEqual(call_kwargs["route"], "/api/test")
593+
self.assertEqual(call_kwargs["route"], "api/test")
592594
self.assertEqual(call_kwargs["method"], "GET")
593595
self.assertEqual(call_kwargs["status_code"], 200)
594596
self.assertIsNone(call_kwargs["error_info"])
@@ -693,8 +695,9 @@ def test_skips_when_no_start_time(self, mock_clear):
693695
mock_ec.record_request.assert_not_called()
694696

695697
@patch("amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.clear_current_operation")
696-
def test_unmatched_route_collapses_to_sentinel(self, mock_clear):
697-
"""An unmatched-route 404 (process_view never ran) records the sentinel, not the raw path."""
698+
def test_unmatched_route_collapses_to_first_segment(self, mock_clear):
699+
"""An unmatched-route 404 (process_view never ran) records the first path segment,
700+
not the full raw path — matching Application Signals."""
698701
mock_ec = MagicMock()
699702
django_mod._endpoint_collector = mock_ec
700703

@@ -728,14 +731,14 @@ def test_unmatched_route_collapses_to_sentinel(self, mock_clear):
728731

729732
mock_ec.record_request.assert_called_once()
730733
call_kwargs = mock_ec.record_request.call_args[1]
731-
# Must be the collapsed sentinel, NOT the raw probed path.
732-
self.assertEqual(call_kwargs["route"], "<unmatched>")
733-
self.assertNotIn("wp-admin", call_kwargs["route"])
734+
# Must be the collapsed first segment, NOT the full raw probed path.
735+
self.assertEqual(call_kwargs["route"], "/wp-admin")
736+
self.assertNotIn("setup-config.php", call_kwargs["route"])
734737
self.assertEqual(call_kwargs["status_code"], 404)
735738

736739
@patch("amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.clear_current_operation")
737-
def test_matched_route_unaffected_by_sentinel(self, mock_clear):
738-
"""A normally-resolved request still records its real route, not the sentinel."""
740+
def test_matched_route_unaffected_by_unmatched_label(self, mock_clear):
741+
"""A normally-resolved request still records its real route, not the unmatched label."""
739742
mock_ec = MagicMock()
740743
django_mod._endpoint_collector = mock_ec
741744

@@ -750,7 +753,7 @@ def test_matched_route_unaffected_by_sentinel(self, mock_clear):
750753
_finalize_request(request, response, None)
751754

752755
call_kwargs = mock_ec.record_request.call_args[1]
753-
self.assertEqual(call_kwargs["route"], "/users/<int:id>")
756+
self.assertEqual(call_kwargs["route"], "users/<int:id>")
754757

755758
@patch("amazon.opentelemetry.distro.serviceevents.instrumentation.django_instrumentation.clear_current_operation")
756759
def test_passes_correct_request_data(self, mock_clear):

0 commit comments

Comments
 (0)