diff --git a/aws/durable_function_event_forwarder/README.md b/aws/durable_function_event_forwarder/README.md new file mode 100644 index 000000000..94606d5f6 --- /dev/null +++ b/aws/durable_function_event_forwarder/README.md @@ -0,0 +1,91 @@ +# Datadog Lambda Durable Function Event Forwarder + +A self-contained CloudFormation template that captures AWS Lambda Durable +Function execution status change events and delivers them to the Datadog +HTTP intake via Amazon Data Firehose. Records arrive at Datadog as the +raw EventBridge envelope; any reshaping (field renaming, ARN qualifier +stripping, timestamp parsing) is configured on the Datadog side via a +logs processing pipeline. + +## Architecture + +``` +EventBridge rule -> Firehose -> Datadog HTTP intake (raw EventBridge JSON) + \ + -> S3 backup bucket (failed records only) +``` + +- The EventBridge rule subscribes to `aws.lambda` source events with + detail-type `Durable Execution Status Change` and routes them to + Firehose. +- Firehose forwards each record unchanged to + `https://aws-kinesis-http-intake.logs./v1/input` using the + Datadog API key as the `X-Amz-Firehose-Access-Key` header. The stack + does **not** attach any custom metadata to Firehose's outbound + requests; tagging and reshaping are handled on the Datadog side. +- Records the endpoint rejects are written to the S3 backup bucket + (`S3BackupMode: FailedDataOnly`); under normal operation the bucket + stays empty. + +## Parameters + +| Parameter | Required | Default | Description | +| --- | --- | --- | --- | +| `DdApiKey` | one of three | "" | Plaintext Datadog API key (`NoEcho`). | +| `DdApiKeySecretArn` | one of three | "" | ARN of a Secrets Manager secret whose `SecretString` is the API key. Resolved via `{{resolve:secretsmanager:...}}`. | +| `DdApiKeySsmParameterName` | one of three | "" | Name of an SSM SecureString parameter holding the API key. Resolved via `{{resolve:ssm-secure:...}}`. | +| `DdSite` | no | `datadoghq.com` | Datadog site; used to build the Firehose destination URL. | +| `Statuses` | no | "" | EventBridge `detail.status` values to forward (uppercase, comma-delimited). Empty (the default) forwards **all** statuses. | +| `FunctionArnFilter1` … `FunctionArnFilter5` | no | "" | Up to 5 independent function-ARN filters. Each accepts an **unqualified** function ARN or an EventBridge wildcard over one (for example `arn:aws:lambda:us-east-2:123456789012:function:my-durable-*`); do not add a version/alias suffix — `:*` is appended automatically. All five empty matches all functions in the region. | +| `BufferIntervalSeconds` | no | `60` | Firehose buffer interval (60–900). | + +`Rules.ApiKeyRequired` asserts at least one of the three API key parameters +is set and fails the stack action with a clear message otherwise. + +## Outputs + +| Output | Description | +| --- | --- | +| `DeliveryStreamArn` | Firehose delivery stream ARN. | +| `BackupBucketName` | S3 bucket name for failed records. | +| `EventRuleArn` | EventBridge rule ARN. | +| `ForwarderVersion` | Template version (from `Mappings.Constants`). | + +## Forwarded log shape + +The stack does **no transformation in AWS**. Firehose forwards each +EventBridge record to Datadog verbatim, so Datadog receives the raw +envelope. See AWS's +[Monitoring durable functions](https://docs.aws.amazon.com/lambda/latest/dg/durable-monitoring.html#durable-monitoring-eventbridge) +for the full event schema and the five `status` values (`RUNNING`, +`SUCCEEDED`, `FAILED`, `TIMED_OUT`, `STOPPED`): + +```json +{ + "version": "0", + "id": "d019b03c-a8a3-9d58-85de-241e96206538", + "detail-type": "Durable Execution Status Change", + "source": "aws.lambda", + "account": "123456789012", + "time": "2025-11-20T13:08:22Z", + "region": "us-east-1", + "resources": [], + "detail": { + "durableExecutionArn": "arn:aws:lambda:us-east-1:123456789012:function:my-function:$LATEST/durable-execution/090c4189-b18b-4296-9d0c-cfd01dc3a122/9f7d84c9-ea3d-3ffc-b3e5-5ec51c34ffc9", + "durableExecutionName": "order-123", + "functionArn": "arn:aws:lambda:us-east-1:123456789012:function:my-function:2", + "status": "RUNNING", + "startTimestamp": "2025-11-20T13:08:22.345Z" + } +} +``` + +Terminal states (`SUCCEEDED`, `STOPPED`, `FAILED`, `TIMED_OUT`) also include +an `endTimestamp`. + +### Datadog-side processing pipeline + +Install the **AWS Lambda** integration in Datadog; its out-of-the-box logs +pipeline is provisioned automatically and reshapes these events (field +renaming, ARN qualifier stripping, timestamp parsing, human-readable +message). No manual pipeline setup is required. diff --git a/aws/durable_function_event_forwarder/template.yaml b/aws/durable_function_event_forwarder/template.yaml new file mode 100644 index 000000000..5630c4562 --- /dev/null +++ b/aws/durable_function_event_forwarder/template.yaml @@ -0,0 +1,437 @@ +AWSTemplateFormatVersion: "2010-09-09" +Description: >- + Captures AWS Lambda Durable Function execution status change events from + EventBridge, transforms them into Datadog log documents, and forwards them + to the Datadog HTTP intake via Amazon Data Firehose. + +Metadata: + AWS::CloudFormation::Interface: + ParameterGroups: + - Label: + default: Datadog API key (one required) + Parameters: + - DdApiKey + - DdApiKeySecretArn + - DdApiKeySsmParameterName + - Label: + default: Datadog routing + Parameters: + - DdSite + - Label: + default: Event filters (Optional) + Parameters: + - Statuses + - FunctionArnFilter1 + - FunctionArnFilter2 + - FunctionArnFilter3 + - FunctionArnFilter4 + - FunctionArnFilter5 + - Label: + default: Tuning + Parameters: + - BufferIntervalSeconds + ParameterLabels: + DdApiKey: { default: API key (plaintext) } + DdApiKeySecretArn: { default: Secrets Manager secret ARN } + DdApiKeySsmParameterName: { default: SSM SecureString parameter name } + DdSite: { default: Datadog site } + Statuses: { default: Statuses to forward (optional) } + FunctionArnFilter1: { default: Function ARN filter 1 (optional) } + FunctionArnFilter2: { default: Function ARN filter 2 (optional) } + FunctionArnFilter3: { default: Function ARN filter 3 (optional) } + FunctionArnFilter4: { default: Function ARN filter 4 (optional) } + FunctionArnFilter5: { default: Function ARN filter 5 (optional) } + BufferIntervalSeconds: { default: Firehose buffer interval (seconds) } + +Mappings: + Constants: + DdDurableEventForwarder: + Version: "0.1.0" + +Parameters: + # ---- Datadog API key (exactly one of the three is required) ---- + DdApiKey: + Type: String + NoEcho: true + Default: "" + Description: >- + Datadog API key. Provide a plaintext value here OR set DdApiKeySecretArn + OR DdApiKeySsmParameterName instead. + DdApiKeySecretArn: + Type: String + Default: "" + AllowedPattern: "^$|^arn:.*:secretsmanager:.*" + Description: >- + ARN of a Secrets Manager secret whose SecretString is the Datadog API + key. + DdApiKeySsmParameterName: + Type: String + Default: "" + AllowedPattern: "^$|^/[a-zA-Z0-9/_.-]+$" + Description: >- + Name (not ARN) of an SSM Parameter Store SecureString parameter that + holds the Datadog API key. + + # ---- Routing ---- + DdSite: + Type: String + Default: datadoghq.com + AllowedPattern: .+ + Description: Datadog site to deliver events to. + + # ---- Event filters ---- + Statuses: + Type: CommaDelimitedList + Default: "" + Description: >- + Comma-separated list of execution status values to forward. Valid values + are RUNNING, SUCCEEDED, FAILED, TIMED_OUT, and STOPPED. Leave empty (the + default) to forward every status. + # Up to 5 independent function-ARN filters. CloudFormation has no + # native iteration that fits AWS::Events::Rule.EventPattern (a Json blob, + # not a schema-typed list), so each slot is exposed as its own optional + # parameter. Each populated slot emits one wildcard matcher: the supplied + # UNqualified ARN with ":*" appended, since the event's functionArn always + # carries a version/alias qualifier. The AllowedPattern rejects a trailing + # qualifier so a pasted qualified ARN fails at deploy time instead of + # silently matching nothing. Slots left empty are removed from the + # EventPattern via AWS::NoValue, so they have no effect on the rendered rule. + FunctionArnFilter1: + Type: String + Default: "" + AllowedPattern: "^$|^arn:aws[a-z-]*:lambda:[a-z0-9-*]*:[0-9*]*:function:[a-zA-Z0-9_*-]+$" + Description: >- + Optional UNqualified Lambda function ARN, or an EventBridge wildcard + pattern over one (for example + "arn:aws:lambda:us-east-2:123456789012:function:my-durable-*"), used to + restrict which functions' events are captured. Do not include a version + or alias suffix - ":*" is appended automatically to match any qualifier. + Scope by region and account by including them in the pattern. If all + five FunctionArnFilterN parameters are empty, the rule matches every + function in this region. + FunctionArnFilter2: + Type: String + Default: "" + AllowedPattern: "^$|^arn:aws[a-z-]*:lambda:[a-z0-9-*]*:[0-9*]*:function:[a-zA-Z0-9_*-]+$" + Description: Optional additional unqualified function ARN or wildcard pattern. + FunctionArnFilter3: + Type: String + Default: "" + AllowedPattern: "^$|^arn:aws[a-z-]*:lambda:[a-z0-9-*]*:[0-9*]*:function:[a-zA-Z0-9_*-]+$" + Description: Optional additional unqualified function ARN or wildcard pattern. + FunctionArnFilter4: + Type: String + Default: "" + AllowedPattern: "^$|^arn:aws[a-z-]*:lambda:[a-z0-9-*]*:[0-9*]*:function:[a-zA-Z0-9_*-]+$" + Description: Optional additional unqualified function ARN or wildcard pattern. + FunctionArnFilter5: + Type: String + Default: "" + AllowedPattern: "^$|^arn:aws[a-z-]*:lambda:[a-z0-9-*]*:[0-9*]*:function:[a-zA-Z0-9_*-]+$" + Description: Optional additional unqualified function ARN or wildcard pattern. + + # ---- Tuning ---- + BufferIntervalSeconds: + Type: Number + Default: 60 + MinValue: 60 + MaxValue: 900 + Description: >- + Firehose buffer interval in seconds. Increasing this trades freshness + for fewer outbound requests; the maximum (900) is fine for low-volume + durable-execution streams. + + +Conditions: + UseApiKey: !Not [!Equals [!Ref DdApiKey, ""]] + UseApiKeySecret: !Not [!Equals [!Ref DdApiKeySecretArn, ""]] + UseApiKeySsm: !Not [!Equals [!Ref DdApiKeySsmParameterName, ""]] + # Statuses is a CommaDelimitedList; an empty default joins to "" so this is + # false, which drops the status key from the EventPattern (forward all). + HasStatusFilter: !Not [!Equals [!Join ["", !Ref Statuses], ""]] + HasFilter1: !Not [!Equals [!Ref FunctionArnFilter1, ""]] + HasFilter2: !Not [!Equals [!Ref FunctionArnFilter2, ""]] + HasFilter3: !Not [!Equals [!Ref FunctionArnFilter3, ""]] + HasFilter4: !Not [!Equals [!Ref FunctionArnFilter4, ""]] + HasFilter5: !Not [!Equals [!Ref FunctionArnFilter5, ""]] + HasFunctionFilter: !Or + - !Condition HasFilter1 + - !Condition HasFilter2 + - !Condition HasFilter3 + - !Condition HasFilter4 + - !Condition HasFilter5 + # When neither a status nor a function filter is set, the detail block is + # omitted entirely - an empty "detail: {}" is rejected by EventBridge. + HasDetailFilter: !Or + - !Condition HasStatusFilter + - !Condition HasFunctionFilter + +Rules: + ApiKeyRequired: + Assertions: + - Assert: !Or + - !Not [!Equals [!Ref DdApiKey, ""]] + - !Not [!Equals [!Ref DdApiKeySecretArn, ""]] + - !Not [!Equals [!Ref DdApiKeySsmParameterName, ""]] + AssertDescription: >- + One of DdApiKey, DdApiKeySecretArn, or DdApiKeySsmParameterName + must be set. + +Resources: + # --------------------------------------------------------------------------- + # Firehose backup bucket. Receives only records that fail to deliver to the + # Datadog endpoint (S3BackupMode: FailedDataOnly), so it stays empty under + # normal operation. Retained on stack deletion to preserve any failed + # records the operator may need to inspect or replay. + # --------------------------------------------------------------------------- + BackupBucket: + Type: AWS::S3::Bucket + DeletionPolicy: Retain + UpdateReplacePolicy: Retain + Properties: + BucketEncryption: + ServerSideEncryptionConfiguration: + - ServerSideEncryptionByDefault: + SSEAlgorithm: AES256 + PublicAccessBlockConfiguration: + BlockPublicAcls: true + BlockPublicPolicy: true + IgnorePublicAcls: true + RestrictPublicBuckets: true + OwnershipControls: + Rules: + - ObjectOwnership: BucketOwnerEnforced + + BackupBucketPolicy: + Type: AWS::S3::BucketPolicy + Properties: + Bucket: !Ref BackupBucket + PolicyDocument: + Version: "2012-10-17" + Statement: + - Sid: EnforceSSL + Effect: Deny + Principal: "*" + Action: s3:* + Resource: + - !GetAtt BackupBucket.Arn + - !Sub "${BackupBucket.Arn}/*" + Condition: + Bool: + aws:SecureTransport: "false" + + # --------------------------------------------------------------------------- + # Firehose delivery stream. HTTP endpoint destination targets the Datadog + # Firehose-specific intake (which speaks the Firehose protocol - do not use + # the standard /api/v2/logs endpoint here). Backup mode is FailedDataOnly so + # the bucket only receives records the endpoint rejected. + # --------------------------------------------------------------------------- + FirehoseLogGroup: + Type: AWS::Logs::LogGroup + Properties: + LogGroupName: !Sub "/aws/kinesisfirehose/${AWS::StackName}" + RetentionInDays: 7 + + FirehoseHttpLogStream: + Type: AWS::Logs::LogStream + Properties: + LogGroupName: !Ref FirehoseLogGroup + LogStreamName: HttpEndpointDelivery + + FirehoseS3LogStream: + Type: AWS::Logs::LogStream + Properties: + LogGroupName: !Ref FirehoseLogGroup + LogStreamName: S3Backup + + FirehoseRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: firehose.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: FirehoseDelivery + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - s3:AbortMultipartUpload + - s3:GetBucketLocation + - s3:GetObject + - s3:ListBucket + - s3:ListBucketMultipartUploads + - s3:PutObject + Resource: + - !GetAtt BackupBucket.Arn + - !Sub "${BackupBucket.Arn}/*" + - Effect: Allow + Action: + - logs:PutLogEvents + Resource: + - !GetAtt FirehoseLogGroup.Arn + + DeliveryStream: + Type: AWS::KinesisFirehose::DeliveryStream + Properties: + DeliveryStreamType: DirectPut + HttpEndpointDestinationConfiguration: + EndpointConfiguration: + Name: Datadog + # Firehose's Url field accepts only https://[/path], no + # query string. Static metadata is attached via CommonAttributes + # below (Firehose sends them as the X-Amz-Firehose-Common- + # Attributes header on each request, which Datadog's Firehose + # intake parses into log metadata / tags). + Url: !Sub "https://aws-kinesis-http-intake.logs.${DdSite}/v1/input" + # The API key becomes the X-Amz-Firehose-Access-Key header on each + # request and is stored opaquely by Firehose. The two dynamic- + # reference paths resolve the value straight into this resource at + # deploy time, so the plaintext never appears in the template source, + # the stack parameters, or stack events. + AccessKey: !If + - UseApiKey + - !Ref DdApiKey + - !If + - UseApiKeySecret + - !Sub "{{resolve:secretsmanager:${DdApiKeySecretArn}:SecretString}}" + - !If + - UseApiKeySsm + - !Sub "{{resolve:ssm-secure:${DdApiKeySsmParameterName}}}" + - !Ref AWS::NoValue + BufferingHints: + IntervalInSeconds: !Ref BufferIntervalSeconds + SizeInMBs: 4 + RetryOptions: + DurationInSeconds: 60 + # Datadog's Firehose intake does not interpret common-attributes + # header keys (dd-service / dd-source / dd-tags) as log metadata - + # it surfaces each as a raw tag with the literal key, and tag + # values can't contain commas so a joined dd-tags value would be + # mangled. We explicitly set CommonAttributes: [] (instead of + # omitting RequestConfiguration entirely) because CloudFormation + # does not push outright property removals to Firehose - omission + # would leave previously-configured attributes live on the stream. + # Datadog's AWS integration auto-tags service/source/region/ + # aws_account from the raw envelope's source field and the + # Firehose ARN, so we get those for free. Any extra metadata + # (service override, env, version, custom tags) is set by a + # Datadog log processing pipeline against these events. + RequestConfiguration: + CommonAttributes: [] + CloudWatchLoggingOptions: + Enabled: true + LogGroupName: !Ref FirehoseLogGroup + LogStreamName: !Ref FirehoseHttpLogStream + RoleARN: !GetAtt FirehoseRole.Arn + S3BackupMode: FailedDataOnly + S3Configuration: + BucketARN: !GetAtt BackupBucket.Arn + RoleARN: !GetAtt FirehoseRole.Arn + BufferingHints: + IntervalInSeconds: 300 + SizeInMBs: 5 + CompressionFormat: GZIP + CloudWatchLoggingOptions: + Enabled: true + LogGroupName: !Ref FirehoseLogGroup + LogStreamName: !Ref FirehoseS3LogStream + # Firehose forwards each EventBridge envelope to Datadog unchanged; + # all reshaping (function ARN qualifier stripping, detail.* + # flattening, ISO timestamp parsing) is configured on the Datadog + # side via a logs processing pipeline. We explicitly set + # Enabled: false instead of omitting ProcessingConfiguration - + # CloudFormation does not push outright property removals to + # Firehose, so omitting it would leave a previously-attached Lambda + # processor live on the stream. + ProcessingConfiguration: + Enabled: false + + # --------------------------------------------------------------------------- + # EventBridge rule. Captures aws.lambda "Durable Execution Status Change" + # events and routes them to Firehose. Each filter is an unqualified + # function ARN with ":*" appended, because the event's detail.functionArn + # always carries a version/alias qualifier. + # --------------------------------------------------------------------------- + EventRule: + Type: AWS::Events::Rule + Properties: + Description: >- + Routes Lambda Durable Function execution status-change events to the + Datadog Firehose delivery stream. + State: ENABLED + EventPattern: + source: + - aws.lambda + detail-type: + - Durable Execution Status Change + # detail is omitted entirely when neither a status nor a function + # filter is set (an empty "detail: {}" is rejected by EventBridge), + # so the default rule matches on source + detail-type alone. + detail: !If + - HasDetailFilter + - # Status key omitted when Statuses is empty, so the rule forwards + # every status by default. + status: !If [HasStatusFilter, !Ref Statuses, !Ref AWS::NoValue] + # One wildcard matcher per populated filter slot. The user supplies + # an UNqualified function ARN (or a wildcard over one) and we append + # ":*" - the durable-execution detail.functionArn is always + # version/alias-qualified (see AWS "Monitoring durable functions" + # docs), so the ":*" is what actually matches and a bare-ARN matcher + # would never fire. Empty slots resolve to AWS::NoValue and are + # stripped from the rendered list by CloudFormation, so the + # EventPattern ends up with exactly N matchers where N is the count + # of populated FunctionArnFilterN slots. + functionArn: !If + - HasFunctionFilter + - - !If [HasFilter1, {wildcard: !Sub "${FunctionArnFilter1}:*"}, !Ref AWS::NoValue] + - !If [HasFilter2, {wildcard: !Sub "${FunctionArnFilter2}:*"}, !Ref AWS::NoValue] + - !If [HasFilter3, {wildcard: !Sub "${FunctionArnFilter3}:*"}, !Ref AWS::NoValue] + - !If [HasFilter4, {wildcard: !Sub "${FunctionArnFilter4}:*"}, !Ref AWS::NoValue] + - !If [HasFilter5, {wildcard: !Sub "${FunctionArnFilter5}:*"}, !Ref AWS::NoValue] + - !Ref AWS::NoValue + - !Ref AWS::NoValue + Targets: + - Id: FirehoseTarget + Arn: !GetAtt DeliveryStream.Arn + RoleArn: !GetAtt EventBridgeRole.Arn + + EventBridgeRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Principal: + Service: events.amazonaws.com + Action: sts:AssumeRole + Policies: + - PolicyName: PutToFirehose + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - firehose:PutRecord + - firehose:PutRecordBatch + Resource: !GetAtt DeliveryStream.Arn + +Outputs: + DeliveryStreamArn: + Description: ARN of the Firehose delivery stream. + Value: !GetAtt DeliveryStream.Arn + BackupBucketName: + Description: S3 bucket that captures records the Datadog intake rejected. + Value: !Ref BackupBucket + EventRuleArn: + Description: ARN of the EventBridge rule capturing durable execution events. + Value: !GetAtt EventRule.Arn + ForwarderVersion: + Description: Version of this forwarder template. + Value: !FindInMap [Constants, DdDurableEventForwarder, Version]