fix(lifecycle): release invocation context after platform report to prevent memory leak by litianningdatadog · Pull Request #1050 · DataDog/datadog-lambda-extension

litianningdatadog · 2026-03-02T02:25:36Z

JIRA: https://datadoghq.atlassian.net/browse/SVLS-8625
Git issue reported: #1049

What does this PR do?

Fixes a memory leak in the extension where ContextBuffer accumulated Context objects indefinitely across warm Lambda invocations.

Root cause: on_platform_report() reads the Context for runtime duration and CPU/network enhanced metrics, but never called context_buffer.remove() afterwards. Each warm invocation appended a new Context to the buffer; none were ever freed.

Fix: Call self.context_buffer.remove(request_id) at the end of on_platform_report(), after all processing for the invocation is complete.

Impact: Most visible when DD_CAPTURE_LAMBDA_PAYLOAD=true — each retained Context holds the full captured request/response payload as span metadata (~500 KB per invocation). Without the fix, a Lambda function invoked thousands of times would accumulate hundreds of MB of leaked memory in the extension process.

Testing

Unit tests
A test with 50+ invocations shows

without the fix	with the fix
`buffer size after remove: 52`	`buffer size after remove: 1`

lym953

Thanks for the fix!
I'm worried that sometimes telemetry events can arrive out of order, e.g. PlatformReport arrives before other events such as PlatformRuntimeDone. To capture this, can we add some log when context_buffer.get(request_id) returns None?

lym953 · 2026-03-02T17:58:36Z

+        debug!(
+            "Context released (buffer size after remove: {})",
+            self.context_buffer.size()
+        );


Do we expect this to be frequently used for debugging? If no, maybe change it to "trace" level.

litianningdatadog · 2026-03-02T18:15:44Z

Thanks for the fix! I'm worried that sometimes telemetry events can arrive out of order, e.g. PlatformReport arrives before other events such as PlatformRuntimeDone. To capture this, can we add some log when context_buffer.get(request_id) returns None?

https://github.com/DataDog/datadog-lambda-extension/blob/main/bottlecap/src/lifecycle/invocation/context.rs#L202 will log in case the request id could not be found.

…revent memory leak Contexts added to ContextBuffer on every invocation were never removed after processing, causing unbounded memory growth across warm invocations. The growth was most visible when DD_CAPTURE_LAMBDA_PAYLOAD=true with large response payloads (issue #1049), but affects all invocations. Remove the context at the end of on_platform_report, which is the last point in the lifecycle where downstream code still reads context fields (runtime_duration_ms for post-runtime metrics, enhanced_metric_data for network/CPU metrics). Both on-demand and managed instance paths are fixed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

duncanista · 2026-03-03T15:24:48Z

I remembered we remove this because at some point, we could have multiple invocations whose data was not yet available for us to process/flush and we were receiving the report and doing what this PR does, might be good to check what was the real reason and how can we avoid this

duncanista · 2026-03-03T15:28:40Z

This PR introduced increasing the buffer size: #801

Might be good digging in the code when we removed this code that we are adding back on this PR

litianningdatadog · 2026-03-03T15:43:12Z

This PR introduced increasing the buffer size: #801

Might be good digging in the code when we removed this code that we are adding back on this PR

Found them:

78f5a640 introduced the change
8146fcc5 removed it to handle cold start span

I think my change is safe: by the time on_platform_report is called, the cold start span has already been sent in on_platform_runtime_done. The context is no longer needed for that purpose. remove() at the end just cleans up after all legitimate reads are complete. Let me know if I missed something.

duncanista · 2026-03-04T15:49:26Z

@litianningdatadog thanks for finding them!

I think we might need to run this by doing a load test with hundred thousands of invocations and see if we are missing any data - I'm worried of a Lambda with high throughput (i.e. a Go service) which might generate more data without allowing shippings to happen as we expect and we might be losing data.

Happy to reintroduce if we confirm that everything is in place – mainly because this is one of those tricky removals

litianningdatadog · 2026-03-05T21:46:47Z

Stress testing shows no sign of missed requests.

litianningdatadog requested a review from a team as a code owner March 2, 2026 02:25

litianningdatadog requested a review from jchrostek-dd March 2, 2026 02:25

litianningdatadog force-pushed the tianning.li/SVLS-8625 branch 4 times, most recently from da979a7 to 53cbf0b Compare March 2, 2026 17:31

lym953 approved these changes Mar 2, 2026

View reviewed changes

litianningdatadog force-pushed the tianning.li/SVLS-8625 branch from 53cbf0b to d0b4b3d Compare March 2, 2026 18:18

litianningdatadog force-pushed the tianning.li/SVLS-8625 branch from d0b4b3d to 03f60b4 Compare March 3, 2026 15:25

litianningdatadog merged commit db59a28 into main Mar 5, 2026
50 checks passed

litianningdatadog deleted the tianning.li/SVLS-8625 branch March 5, 2026 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(lifecycle): release invocation context after platform report to prevent memory leak#1050

fix(lifecycle): release invocation context after platform report to prevent memory leak#1050
litianningdatadog merged 1 commit intomainfrom
tianning.li/SVLS-8625

litianningdatadog commented Mar 2, 2026 •

edited

Loading

Uh oh!

lym953 left a comment •

edited

Loading

Uh oh!

lym953 Mar 2, 2026

Uh oh!

litianningdatadog Mar 2, 2026

Uh oh!

litianningdatadog commented Mar 2, 2026

Uh oh!

duncanista commented Mar 3, 2026

Uh oh!

duncanista commented Mar 3, 2026

Uh oh!

litianningdatadog commented Mar 3, 2026 •

edited

Loading

Uh oh!

duncanista commented Mar 4, 2026

Uh oh!

litianningdatadog commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

litianningdatadog commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Uh oh!

lym953 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lym953 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

litianningdatadog Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

litianningdatadog commented Mar 2, 2026

Uh oh!

duncanista commented Mar 3, 2026

Uh oh!

duncanista commented Mar 3, 2026

Uh oh!

litianningdatadog commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duncanista commented Mar 4, 2026

Uh oh!

litianningdatadog commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

litianningdatadog commented Mar 2, 2026 •

edited

Loading

lym953 left a comment •

edited

Loading

litianningdatadog commented Mar 3, 2026 •

edited

Loading