Commit 7bc7e7c
fix: handle_end_invocation race condition when extracting payload (#1024)
## Overview
1. The first thing
[handle_end_invocation](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/listener.rs#L163)
does is spawn a task, let's call it `anonymousTask`
2. `handle_end_invocation` then [immediately returns
200](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/listener.rs#L181)
so the tracer can continue
3. `anonymousTask` is busy with a complex body in `extract_request_body`
for the time being
4. Because the tracer has continued, eventually `PlatformRuntimeDone`
[is
processed](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/invocation/processor_service.rs#L481)
5. Given our customer is not managed (`initialization_type: SnapStart`)
then `PlatformRuntimeDone` [tries to
pair_platform_runtime_done_event](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/invocation/processor.rs#L496)
which is `None` because `anonymousTask` is still busy with the body
6. We then [jump to
process_on_platform_runtime_done](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/invocation/processor.rs#L506)
7. Span and trace ids [are not there
yet](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/invocation/processor.rs#L517),
and they are never checked again after this
8. `anonymousTask` finally completes, but that's irrelevant because
`send_ctx_spans` is only run on `PlatformRuntimeDone` which assumes
[universal_instrumentation_end has already been
sent](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/listener.rs#L177)
## Why this looks likely
In the customer's logs we can see
* `05:11:48.463` `datadog.trace.agent.core.DDSpan - Finished span
(WRITTEN): DDSpan [ t_id=2742542901019652192`
* `05:11:48.489` PlatformRuntimeDone received
* `05:11:48.630` REPORT RequestId `1db22159-7200-43c8-bec1-11b89df4f099`
(last log emitted in an execution)
* `05:11:53.784` START RequestId: `8c801767-e21b-43f7-bd11-078bb64bc430`
(new request id, 5s later)
* `05:11:53.789` [Received end invocation request from
headers:{""x-datadog-trace-id"":
""2742542901019652192"...](https://github.com/DataDog/datadog-lambda-extension/blob/2c61aca453a3ab41df58d407e1659705cdcee273/bottlecap/src/lifecycle/listener.rs#L258)
-> we are now trying to finish the span after the request is long gone 🙃
In this specific run, the lambda even had time to stop before continuin
with the anonymous task from `handle_end_invocation`.
## Motivation
[SLES-2666](https://datadoghq.atlassian.net/browse/SLES-2666)
## Performance
This PR makes the reading of the body synchronous with the response.
This will delay handing over execution to outside the extension until
the body is read. But that is irrelevant because it is a requirement to
read the body and send `universal_instrumentation_end` before
relinquishing control.
## Testing
Suggestions very welcome
[SLES-2666]:
https://datadoghq.atlassian.net/browse/SLES-2666?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
---------
Co-authored-by: Jordan Gonzalez <30836115+duncanista@users.noreply.github.com>1 parent cd9d1bb commit 7bc7e7c
1 file changed
+11
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
168 | 168 | | |
169 | 169 | | |
170 | 170 | | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
171 | 182 | | |
172 | 183 | | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | 184 | | |
182 | 185 | | |
183 | 186 | | |
| |||
0 commit comments