Adaptive Sampling Support by majanjua-amzn · Pull Request #576 · aws-observability/aws-otel-python-instrumentation

majanjua-amzn · 2026-01-13T01:18:21Z

Background

AWS X-Ray sampling rules now support an adaptive sampling configurations, and as part of that effort the ADOT SDKs must be updated to support the ingestion of the new fields and appropriate functional changes to support [1] boosting sampling rate based on detected anomalies and [2] detect/capture anomalies based on a configuration local to the SDK.

The same changes have been made for the ADOT Java SDK here in the upstream OTel Contrib repo: open-telemetry/opentelemetry-java-contrib#2147

Overall, the goal of this PR is to meet the same needs as the one in ADOT Java with some additional improvements:

Appropriately generate anomaly statistics, send them through GetSamplingTargets, and adjust sampling behaviour according to the response (sampling boost)
Read configuration local to the SDK to allow users to set the definition of anomalies in their system (status code, operation, and/or latency)
Capture anomalies based on the local configuration
[NEW/REQUIRED] General sampling statistics improvements: Do not call GetSamplingTarges if there are no sampling or anomaly statistics
[NEW/REQUIRED] Added a new attribute aws.xray.adaptive_sampling_configured to identify spans that were generated from an SDK with a local adaptive sampling configuration

Changes

Linked sampler, processor, and exporter such that when a span is ended, the processor forwards it to adaptive sampling code and the adaptive sampling logic determines when to capture anomalies, doing so using the exporter
Implemented anomaly detection logic (same as in ADOT Java)
- Implemented local SDK configuration parsing logic (YAML)
Implemented the usage of a cache for keeping trace IDs so we don't recount the same traces for anomaly statistics
Fixed sampling statistics to count only the root span of a trace, completely eliminating the need to report statistics for downstream services
Implemented GetSamplingTargets skipping logic when there are no sampling statistics or sampling boost statistics
Unit tests and related files

Testing

Unit tests for each component (maintaining and increasing the code coverage)
Rigorous manual E2E tests using 3 services, A (root) -> B -> C (generates anomalies):
- Tested basic anomaly detection without any local configuration, where service C generates a 500 response: Appropriately detects and captures anomalies + responds to boost sent by server
- Tested with local configuration with errorCodeRegex: "^500|501$" where service C generates a 500 response: Appropriately detects and captures anomalies + responds to boost sent by server
- Tested with local configuration with errorCodeRegex: "^500|501$", operations: ["GET /status"], where service C generates a 500 response from /status/c/500: Appropriately detects and captures anomalies + responds to boost sent by server
- Tested with local configuration with errorCodeRegex: "^500|501$", highLatencyMs: 2000, where service C generates a 3 second span with 200 or 500 response: Appropriately not treated as an anomaly when 200, and as an anomaly when 500
- Tested local configuration with highLatencyMs: 2000, where service C generates a 3 second span with 200 response: Appropriately treated as an anomaly
- Tested that the anomaly counts, request and total counts, etc, are all correct relative to the number of API invocations done in the last 10 seconds

Links

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

### Background Recently, a new field was added to the X-Ray GetSamplingRules API that was not accounted for in the AWS X-Ray Remote Sampler implementation done in ADOT Python. As a result, enabling this new field would cause a failure and cease the parsing of any other rules in a given API response. Example: Received 10 rules from the API, third of which has the SamplingRateBoost field. The SDK will successfully parse the first two, fail on the third, then stop there. As such, the SDK will only have 2/10 of the sampling rules and will not be able to effectively make sampling decisions based on the sampling rules set by the user. Any unmatched spans will use the _FallbackSampler. ### Changes - Add usage of `kwargs` in X-Ray sampling API related objects, e.g. SamplingRule, SamplingTarget, etc. - Add unit tests proving additional fields do not cause errors. ### Testing - Unit tests - Tested in depth as part of #576, which this change was a part of but is now separated out to get it in more quickly By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Recently, a new field was added to the X-Ray GetSamplingRules API that was not accounted for in the AWS X-Ray Remote Sampler implementation done in ADOT Python. As a result, enabling this new field would cause a failure and cease the parsing of any other rules in a given API response. Example: Received 10 rules from the API, third of which has the SamplingRateBoost field. The SDK will successfully parse the first two, fail on the third, then stop there. As such, the SDK will only have 2/10 of the sampling rules and will not be able to effectively make sampling decisions based on the sampling rules set by the user. Any unmatched spans will use the _FallbackSampler. - Add usage of `kwargs` in X-Ray sampling API related objects, e.g. SamplingRule, SamplingTarget, etc. - Add unit tests proving additional fields do not cause errors. - Unit tests - Tested in depth as part of aws-observability#576, which this change was a part of but is now separated out to get it in more quickly By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

wangzlei

LGTM

majanjua-amzn self-assigned this Jan 13, 2026

majanjua-amzn requested a review from a team as a code owner January 13, 2026 01:18

majanjua-amzn added bug Something isn't working enhancement New feature or request python Pull requests that update Python code labels Jan 13, 2026

majanjua-amzn force-pushed the adaptive-sampling branch from 573ecdf to 11a96d7 Compare January 13, 2026 16:49

majanjua-amzn requested a review from wangzlei January 13, 2026 16:49

majanjua-amzn force-pushed the adaptive-sampling branch 8 times, most recently from d9a38a2 to a39c57b Compare January 14, 2026 01:22

majanjua-amzn mentioned this pull request Jan 14, 2026

Fix: Support new fields in X-Ray API responses #577

Merged

majanjua-amzn force-pushed the adaptive-sampling branch from a39c57b to df9a816 Compare January 19, 2026 22:41

majanjua-amzn marked this pull request as draft January 19, 2026 22:45

majanjua-amzn force-pushed the adaptive-sampling branch 4 times, most recently from 9d2eba7 to 930a1ec Compare January 20, 2026 00:49

majanjua-amzn marked this pull request as ready for review January 20, 2026 18:13

majanjua-amzn removed the bug Something isn't working label Jan 21, 2026

wangzlei reviewed Jan 21, 2026

View reviewed changes

Comment thread ...lemetry-distro/src/amazon/opentelemetry/distro/sampler/_aws_xray_adaptive_sampling_config.py

wangzlei reviewed Jan 21, 2026

View reviewed changes

Comment thread aws-opentelemetry-distro/src/amazon/opentelemetry/distro/aws_span_metrics_processor.py

wangzlei reviewed Jan 22, 2026

View reviewed changes

Comment thread aws-opentelemetry-distro/src/amazon/opentelemetry/distro/sampler/_rule_cache.py

Comment thread aws-opentelemetry-distro/src/amazon/opentelemetry/distro/aws_span_metrics_processor.py

majanjua-amzn force-pushed the adaptive-sampling branch from cdf83b8 to a6d3d22 Compare January 24, 2026 01:07

majanjua-amzn force-pushed the adaptive-sampling branch 2 times, most recently from 6bb1e54 to cc9c0b8 Compare January 24, 2026 01:28

wangzlei approved these changes Jan 26, 2026

View reviewed changes

majanjua-amzn added 2 commits January 27, 2026 14:50

Adaptive Sampling Support

88c6a3e

Ensure thread safety + parsing errors don't crash instrumentation

45a83e4

majanjua-amzn force-pushed the adaptive-sampling branch from 1ba476e to 45a83e4 Compare January 27, 2026 22:51

majanjua-amzn enabled auto-merge (squash) January 27, 2026 23:15

majanjua-amzn disabled auto-merge January 27, 2026 23:17

majanjua-amzn merged commit 2b4d0ac into main Jan 27, 2026
25 of 27 checks passed

majanjua-amzn deleted the adaptive-sampling branch January 27, 2026 23:25

majanjua-amzn mentioned this pull request Jan 28, 2026

[Python] Add API for testing downstream error code aws-observability/aws-application-signals-test-framework#545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive Sampling Support#576

Adaptive Sampling Support#576
majanjua-amzn merged 2 commits intomainfrom
adaptive-sampling

majanjua-amzn commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangzlei left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

majanjua-amzn commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

Testing

Links

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangzlei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

majanjua-amzn commented Jan 13, 2026 •

edited

Loading