|
| 1 | +# AWS Durable Execution SDK - OpenTelemetry Plugin |
| 2 | + |
| 3 | +> **Experimental Feature:** This plugin is currently experimental. Functionality may change without notice between releases. It is not recommended for production workloads at this time. |
| 4 | +
|
| 5 | +OpenTelemetry instrumentation plugin for the AWS Lambda Durable Execution SDK for Java. Emits distributed traces that correlate across multiple Lambda invocations of a single durable execution, producing deterministic span and trace IDs so that spans from different invocations are stitched into a single coherent trace. |
| 6 | + |
| 7 | +## Features |
| 8 | + |
| 9 | +- **Deterministic Trace IDs**: All invocations of the same durable execution share a single trace, derived from the X-Ray trace header or execution ARN |
| 10 | +- **Span-per-Operation**: Each durable operation (step, wait, map, etc.) gets its own span with accurate timing |
| 11 | +- **Attempt Spans**: Each user function execution (step attempt, child context run) gets a span, including retries |
| 12 | +- **Log Correlation**: Injects `trace_id` and `span_id` into SLF4J MDC for end-to-end observability |
| 13 | +- **Self-Contained Setup**: No manual TracerProvider configuration required beyond the exporter |
| 14 | + |
| 15 | +## Installation |
| 16 | + |
| 17 | +```xml |
| 18 | +<dependency> |
| 19 | + <groupId>software.amazon.lambda.durable</groupId> |
| 20 | + <artifactId>aws-durable-execution-sdk-java-plugin-otel</artifactId> |
| 21 | + <version>0.1.0</version> |
| 22 | +</dependency> |
| 23 | +``` |
| 24 | + |
| 25 | +You also need the OpenTelemetry SDK and an exporter: |
| 26 | + |
| 27 | +```xml |
| 28 | +<dependency> |
| 29 | + <groupId>io.opentelemetry</groupId> |
| 30 | + <artifactId>opentelemetry-sdk</artifactId> |
| 31 | + <version>1.63.0</version> |
| 32 | +</dependency> |
| 33 | +<dependency> |
| 34 | + <groupId>io.opentelemetry</groupId> |
| 35 | + <artifactId>opentelemetry-exporter-otlp</artifactId> |
| 36 | + <version>1.63.0</version> |
| 37 | +</dependency> |
| 38 | +``` |
| 39 | + |
| 40 | +## Quick Start using X-Ray/CloudWatch Tracing |
| 41 | + |
| 42 | +1. Add the ADOT Lambda Layer to your function and set `AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler` |
| 43 | +2. Enable X-Ray Active Tracing on the function |
| 44 | +3. Register `OpenTelemetryDurablePlugin` in your handler's `DurableConfig` |
| 45 | +4. Grant X-Ray write permissions |
| 46 | + |
| 47 | +### 1. ADOT Lambda Layer |
| 48 | + |
| 49 | +This plugin requires the [AWS Distro for OpenTelemetry (ADOT) Lambda layer](https://aws-otel.github.io/docs/getting-started/lambda) to export traces from your Lambda function. |
| 50 | + |
| 51 | +The layer ARN follows the format: |
| 52 | + |
| 53 | +``` |
| 54 | +arn:aws:lambda:<region>:<account-id>:layer:AWSOpenTelemetryDistroJava:<version> |
| 55 | +``` |
| 56 | + |
| 57 | +The account ID varies by region. Refer to the [ADOT Lambda Layer ARNs](https://aws-otel.github.io/docs/getting-started/lambda#aws-lambda-layer-for-opentelemetry-arns) page for region-specific ARNs, account IDs, and the latest version number. |
| 58 | + |
| 59 | +**AWS CLI:** |
| 60 | + |
| 61 | +```bash |
| 62 | +aws lambda update-function-configuration \ |
| 63 | + --function-name your-function-name \ |
| 64 | + --layers "arn:aws:lambda:<region>:<account-id>:layer:AWSOpenTelemetryDistroJava:<version>" |
| 65 | +``` |
| 66 | + |
| 67 | +You must also set the `AWS_LAMBDA_EXEC_WRAPPER` environment variable: |
| 68 | + |
| 69 | +```bash |
| 70 | +aws lambda update-function-configuration \ |
| 71 | + --function-name your-function-name \ |
| 72 | + --environment "Variables={AWS_LAMBDA_EXEC_WRAPPER=/opt/otel-handler}" |
| 73 | +``` |
| 74 | + |
| 75 | +**CloudFormation / SAM:** |
| 76 | + |
| 77 | +```yaml |
| 78 | +MyFunction: |
| 79 | + Type: AWS::Serverless::Function |
| 80 | + Properties: |
| 81 | + Layers: |
| 82 | + - !Sub arn:aws:lambda:${AWS::Region}:<account-id>:layer:AWSOpenTelemetryDistroJava:<version> |
| 83 | + Environment: |
| 84 | + Variables: |
| 85 | + AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-handler |
| 86 | +``` |
| 87 | +
|
| 88 | +**CDK (Java):** |
| 89 | +
|
| 90 | +```java |
| 91 | +import software.amazon.awscdk.services.lambda.*; |
| 92 | + |
| 93 | +var adotLayer = LayerVersion.fromLayerVersionArn(this, "AdotLayer", |
| 94 | + String.format("arn:aws:lambda:%s:<account-id>:layer:AWSOpenTelemetryDistroJava:<version>", |
| 95 | + this.getRegion())); |
| 96 | + |
| 97 | +Function.Builder.create(this, "MyFunction") |
| 98 | + .runtime(Runtime.JAVA_17) |
| 99 | + .handler("com.example.MyHandler::handleRequest") |
| 100 | + .code(Code.fromAsset("target/my-function.jar")) |
| 101 | + .layers(List.of(adotLayer)) |
| 102 | + .environment(Map.of("AWS_LAMBDA_EXEC_WRAPPER", "/opt/otel-handler")) |
| 103 | + .build(); |
| 104 | +``` |
| 105 | + |
| 106 | +### 2. AWS X-Ray Active Tracing |
| 107 | + |
| 108 | +Enable active tracing on your Lambda function so the `_X_AMZN_TRACE_ID` environment variable is populated at invocation time. The plugin uses this header to derive deterministic trace IDs that remain consistent across all invocations of the same durable execution. |
| 109 | + |
| 110 | +**AWS Console:** Lambda > Configuration > Monitoring and operations tools > Active tracing > Enable |
| 111 | + |
| 112 | +**AWS CLI:** |
| 113 | + |
| 114 | +```bash |
| 115 | +aws lambda update-function-configuration \ |
| 116 | + --function-name your-function-name \ |
| 117 | + --tracing-config Mode=Active |
| 118 | +``` |
| 119 | + |
| 120 | +**CloudFormation / SAM:** |
| 121 | + |
| 122 | +```yaml |
| 123 | +MyFunction: |
| 124 | + Type: AWS::Lambda::Function |
| 125 | + Properties: |
| 126 | + TracingConfig: |
| 127 | + Mode: Active |
| 128 | +``` |
| 129 | +
|
| 130 | +**CDK (Java):** |
| 131 | +
|
| 132 | +```java |
| 133 | +Function.Builder.create(this, "MyFunction") |
| 134 | + .tracing(Tracing.ACTIVE) |
| 135 | + .build(); |
| 136 | +``` |
| 137 | + |
| 138 | +### 3. In Your Lambda Handler |
| 139 | + |
| 140 | +```java |
| 141 | +import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter; |
| 142 | +import io.opentelemetry.sdk.trace.SdkTracerProvider; |
| 143 | +import io.opentelemetry.sdk.trace.export.SimpleSpanProcessor; |
| 144 | +import software.amazon.lambda.durable.DurableConfig; |
| 145 | +import software.amazon.lambda.durable.DurableContext; |
| 146 | +import software.amazon.lambda.durable.DurableHandler; |
| 147 | +import software.amazon.lambda.durable.otel.OpenTelemetryDurablePlugin; |
| 148 | + |
| 149 | +public class MyHandler extends DurableHandler<MyInput, MyOutput> { |
| 150 | + |
| 151 | + @Override |
| 152 | + protected DurableConfig createConfiguration() { |
| 153 | + // OTLP exporter sends spans to the ADOT collector (localhost:4317 by default) |
| 154 | + var otlpExporter = OtlpGrpcSpanExporter.getDefault(); |
| 155 | + |
| 156 | + var otelPlugin = new OpenTelemetryDurablePlugin( |
| 157 | + SdkTracerProvider.builder() |
| 158 | + .addSpanProcessor(SimpleSpanProcessor.create(otlpExporter))); |
| 159 | + |
| 160 | + return DurableConfig.builder().withPlugins(otelPlugin).build(); |
| 161 | + } |
| 162 | + |
| 163 | + @Override |
| 164 | + public MyOutput handleRequest(MyInput input, DurableContext context) { |
| 165 | + var result = context.step("fetch-data", String.class, stepCtx -> { |
| 166 | + return fetchData(input.getId()); |
| 167 | + }); |
| 168 | + |
| 169 | + context.wait("cool-down", Duration.ofSeconds(5)); |
| 170 | + |
| 171 | + context.step("process", Void.class, stepCtx -> { |
| 172 | + process(result); |
| 173 | + return null; |
| 174 | + }); |
| 175 | + |
| 176 | + return new MyOutput(result); |
| 177 | + } |
| 178 | +} |
| 179 | +``` |
| 180 | + |
| 181 | +That's it. The plugin handles TracerProvider setup, deterministic ID generation, and span lifecycle internally. |
| 182 | + |
| 183 | +### 4. Grant Permissions |
| 184 | + |
| 185 | +The function's execution role needs the `AWSXRayDaemonWriteAccess` managed policy (or equivalent permissions) to write traces to X-Ray. |
| 186 | + |
| 187 | +## Trace Structure |
| 188 | + |
| 189 | +The plugin creates spans at three levels: |
| 190 | + |
| 191 | +``` |
| 192 | +durable.invocation |
| 193 | +├── durable.step:fetch-data |
| 194 | +│ └── durable.step:fetch-data [attempt 1] |
| 195 | +├── durable.wait:cool-down |
| 196 | +└── durable.step:process |
| 197 | + └── durable.step:process [attempt 1] |
| 198 | +``` |
| 199 | + |
| 200 | +- **Invocation span** — one per Lambda invocation, covers the entire invocation lifecycle |
| 201 | +- **Operation span** — one per durable operation, named after your step/wait names |
| 202 | +- **Attempt span** — one per user function execution (retries produce additional attempt spans) |
| 203 | + |
| 204 | +## Configuration |
| 205 | + |
| 206 | +### Constructor Options |
| 207 | + |
| 208 | +```java |
| 209 | +// Default: X-Ray context extraction, MDC enabled |
| 210 | +new OpenTelemetryDurablePlugin(tracerProviderBuilder); |
| 211 | + |
| 212 | +// Custom context extractor, MDC enabled |
| 213 | +new OpenTelemetryDurablePlugin(tracerProviderBuilder, contextExtractor); |
| 214 | + |
| 215 | +// Full configuration |
| 216 | +new OpenTelemetryDurablePlugin(tracerProviderBuilder, contextExtractor, enableMdc); |
| 217 | +``` |
| 218 | + |
| 219 | +| Parameter | Description | Default | |
| 220 | +|-----------|-------------|---------| |
| 221 | +| `tracerProviderBuilder` | `SdkTracerProviderBuilder` with your exporter/processor configured | Required | |
| 222 | +| `contextExtractor` | Extracts parent trace context from the Lambda environment | `XRayContextExtractor` | |
| 223 | +| `enableMdc` | If true, injects `trace_id`/`span_id` into SLF4J MDC | `true` | |
| 224 | + |
| 225 | +### Environment Variables for ADOT Layer |
| 226 | + |
| 227 | +| Variable | Description | Default | |
| 228 | +|----------|-------------|---------| |
| 229 | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Endpoint for the OTLP exporter | Set by ADOT layer | |
| 230 | +| `AWS_LAMBDA_EXEC_WRAPPER` | Set to `/opt/otel-handler` for ADOT layer instrumentation | — | |
| 231 | +| `OTEL_TRACES_SAMPLER` | Sampler to use (e.g., `traceidratio` for ratio-based sampling) | `always_on` | |
| 232 | +| `OTEL_TRACES_SAMPLER_ARG` | Argument for the sampler (e.g., `0.3` to sample 30%) | — | |
| 233 | + |
| 234 | +## Verification |
| 235 | + |
| 236 | +After deploying your function with the plugin configured: |
| 237 | + |
| 238 | +1. **Invoke your durable function** — trigger at least one execution that includes multiple steps or a wait/resume cycle. |
| 239 | + |
| 240 | +2. **Check CloudWatch console** — Navigate to CloudWatch > Traces. You should see a trace with: |
| 241 | + - An invocation span per Lambda invocation |
| 242 | + - Child spans for each durable operation (named after your step names) |
| 243 | + - All invocations of the same execution grouped under one trace ID |
| 244 | + |
| 245 | +3. **Check log correlation** — Verify that your logs include `trace_id` and `span_id` fields matching the spans in the trace view. |
| 246 | + |
| 247 | +4. **Confirm sampling** — If you set `OTEL_TRACES_SAMPLER=traceidratio` with an arg less than 1.0, verify that only the expected proportion of traces appear. |
| 248 | + |
| 249 | +### Troubleshooting |
| 250 | + |
| 251 | +| Symptom | Likely Cause | |
| 252 | +|---------|-------------| |
| 253 | +| No traces appear | ADOT layer not configured, or `AWS_LAMBDA_EXEC_WRAPPER` not set | |
| 254 | +| Traces appear but are fragmented | X-Ray active tracing not enabled on the Lambda function | |
| 255 | +| Missing spans for some operations | `OTEL_TRACES_SAMPLER_ARG` set below 1.0 | |
| 256 | +| `_X_AMZN_TRACE_ID` not populated | X-Ray active tracing not enabled | |
| 257 | + |
| 258 | +## Local Development |
| 259 | + |
| 260 | +For local testing, use a logging exporter to print spans to stdout: |
| 261 | + |
| 262 | +```java |
| 263 | +import io.opentelemetry.exporter.logging.LoggingSpanExporter; |
| 264 | + |
| 265 | +var otelPlugin = new OpenTelemetryDurablePlugin( |
| 266 | + SdkTracerProvider.builder() |
| 267 | + .addSpanProcessor(SimpleSpanProcessor.create(LoggingSpanExporter.create()))); |
| 268 | +``` |
| 269 | + |
| 270 | +## API Reference |
| 271 | + |
| 272 | +### `OpenTelemetryDurablePlugin` |
| 273 | + |
| 274 | +The main plugin class. Implements `DurableExecutionPlugin` from the core SDK. |
| 275 | + |
| 276 | +```java |
| 277 | +new OpenTelemetryDurablePlugin(SdkTracerProviderBuilder tracerProviderBuilder) |
| 278 | +new OpenTelemetryDurablePlugin(SdkTracerProviderBuilder tracerProviderBuilder, ContextExtractor contextExtractor) |
| 279 | +new OpenTelemetryDurablePlugin(SdkTracerProviderBuilder tracerProviderBuilder, ContextExtractor contextExtractor, boolean enableMdc) |
| 280 | +``` |
| 281 | + |
| 282 | +### `XRayContextExtractor` |
| 283 | + |
| 284 | +Default context extractor. Reads the `_X_AMZN_TRACE_ID` environment variable to derive trace context. |
| 285 | + |
| 286 | +### `ContextExtractor` |
| 287 | + |
| 288 | +Interface for custom context extractor implementations. |
| 289 | + |
| 290 | +## Requirements |
| 291 | + |
| 292 | +- Java 17+ |
| 293 | +- AWS Durable Execution SDK for Java 1.2.1+ |
| 294 | +- OpenTelemetry SDK 1.20.0+ |
| 295 | + |
| 296 | +## License |
| 297 | + |
| 298 | +Apache-2.0 |
0 commit comments