Skip to content

feat(cloudwatch-logs): Add Entity config support to sink#6864

Open
bagmarnikhil wants to merge 4 commits into
opensearch-project:mainfrom
bagmarnikhil:feature/cwl-sink-entity-config
Open

feat(cloudwatch-logs): Add Entity config support to sink#6864
bagmarnikhil wants to merge 4 commits into
opensearch-project:mainfrom
bagmarnikhil:feature/cwl-sink-entity-config

Conversation

@bagmarnikhil
Copy link
Copy Markdown
Contributor

@bagmarnikhil bagmarnikhil commented May 15, 2026

Add an optional 'entity' configuration block on the CloudWatch Logs sink that attaches CloudWatch Entity metadata (key_attributes + attributes) to every PutLogEvents request, enabling entity-based correlation in CloudWatch.

When 'entity' is omitted the sink behaves identically to before. When configured, an SDK Entity is built and passed to the dispatcher via the builder; the dispatcher conditionally sets it on each PLE request.

If CloudWatch rejects the entity, the request is still considered successful (events are released), a WARN log records the RejectedEntityInfo error type, and a new 'cloudWatchLogsEntityRejected' counter is incremented as the primary alarmable signal.

Validation is intentionally minimal — @notempty on key_attributes only. AWS-owned limits (max entries, allowed key names, length caps) are enforced server-side and surfaced via the rejection metric, avoiding client-side drift when the Entity API contract changes.

Tests follow TDD: each test was written before the code that makes it pass. New unit tests cover EntityConfig defaults, deserialization, and @notempty validation; @Valid cascade from CloudWatchLogsSinkConfig; the new counter; dispatcher entity-on-request, no-entity, and rejection paths; and sink wiring of EntityConfig -> Entity -> dispatcher. TestSinkOperationWithEntity integration test exercises the SDK end-to-end.

Acceptance criteria from the spec are met: backward compatibility, entity attached when configured, non-fatal rejection with metric and log, all existing tests green, new tests cover new code paths, license headers on new files.

Refs: data-prepper-plugins/cloudwatch-logs/specs/entity-config.md

Description

[Describe what this change achieves]

Issues Resolved

Resolves #6860

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

✅ License Header Check Passed

All newly added files have proper license headers. Great work! 🎉

Add an optional 'entity' configuration block on the CloudWatch Logs sink
that attaches CloudWatch Entity metadata (key_attributes + attributes)
to every PutLogEvents request, enabling entity-based correlation in
CloudWatch.

When 'entity' is omitted the sink behaves identically to before. When
configured, an SDK Entity is built and passed to the dispatcher via the
builder; the dispatcher conditionally sets it on each PLE request.

If CloudWatch rejects the entity, the request is still considered
successful (events are released), a WARN log records the
RejectedEntityInfo error type, and a new 'cloudWatchLogsEntityRejected'
counter is incremented as the primary alarmable signal.

Validation is intentionally minimal — @notempty on key_attributes only.
AWS-owned limits (max entries, allowed key names, length caps) are
enforced server-side and surfaced via the rejection metric, avoiding
client-side drift when the Entity API contract changes.

Tests follow TDD: each test was written before the code that makes it
pass. New unit tests cover EntityConfig defaults, deserialization, and
@notempty validation; @Valid cascade from CloudWatchLogsSinkConfig;
the new counter; dispatcher entity-on-request, no-entity, and rejection
paths; and sink wiring of EntityConfig -> Entity -> dispatcher.
TestSinkOperationWithEntity integration test exercises the SDK
end-to-end.

Acceptance criteria from the spec are met: backward compatibility,
entity attached when configured, non-fatal rejection with metric and
log, all existing tests green, new tests cover new code paths, license
headers on new files.

Refs: data-prepper-plugins/cloudwatch-logs/specs/entity-config.md
Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
- Remove duplicate @test annotations from auto-merge
- Add Hamcrest imports for entity assertion matchers
- Update EXPECTED_DISPATCHER_ARITY to 11 (entity field added)

Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
@bagmarnikhil bagmarnikhil force-pushed the feature/cwl-sink-entity-config branch from f18af61 to 60dc039 Compare May 21, 2026 21:26
…rejection check

- Default keyAttributes and attributes to Collections.emptyMap()
- Return Collections.unmodifiableMap() from both getters
- Move entity rejection metric before releaseEventHandles
- Add unmodifiable map and createResources exception tests

Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
Copy link
Copy Markdown
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bagmarnikhil for this contribution!

@NotEmpty
private Map<String, String> keyAttributes = Collections.emptyMap();

@JsonProperty("attributes")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes has no @NotNull. Explicit attributes: null in YAML deserializes to null and unmodifiableMap(null) throws NPE; either add @NotNull or null-guard the getter.

dlqObjects = getDlqObjectsFromResponse(putLogEventsResponse);
}
if (putLogEventsResponse != null && putLogEventsResponse.rejectedEntityInfo() != null) {
cloudWatchLogsMetrics.increaseEntityRejectedCounter(1);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A misconfigured entity will return rejectedEntityInfo on every successful PutLogEvents. This WARN will fire per batch under load; consider NOISY marker or rate-limited logging like the SDK exception path.

public class EntityConfig {

@JsonProperty("key_attributes")
@NotEmpty
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notempty only rejects empty maps. Values can still be empty/blank strings and produce a server rejection on every request; consider @SiZe(max=…) and @notblank on values to fail fast

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I intentionally kept validation minimal here — the Entity API has a complex set of constraints (key names restricted to 5 allowed values, min 2 / max 4 entries for keyAttributes, value lengths 1–512, max 10 attributes, etc.).

Replicating a subset client-side (e.g. @notblank on values) creates a false sense of completeness while still drifting if AWS changes the contract. The rejection metric (cloudWatchLogsEntityRejected) + WARN log are designed to be the primary signal for misconfiguration.

This avoids coupling to server-side limits that may evolve.

private int retryCount;
private boolean createLogGroup;
private boolean createLogStream;
private Entity entity;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inner Uploader declares its private final fields but the outer dispatcher leaves entity (and the older fields above) non-final. Flag if you want the new field to follow the inner class' immutability convention.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing fields in the outer dispatcher (retryCount, createLogGroup, createLogStream, etc.) are all non-final. Making only entity final would be inconsistent with the rest of the class.

Happy to make it final if we want to refactor all the outer fields to final in a follow-up, but keeping it consistent with the existing style for now.

assertThat(entityConfig.getAttributes(), aMapWithSize(0));
}

@Test
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIVEN_entity_config_with_empty_key_attributes… and GIVEN_entity_config_with_default_key_attributes… exercise the same constraint (empty map default vs explicitly-set empty map). Collapse into one parameterized test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Entity attributes in CloudWatch Logs sink

4 participants