Skip to content

fix(otlp-http): limit response body size to prevent memory exhaustion#3501

Open
SATVIKsynopsis wants to merge 19 commits into
open-telemetry:mainfrom
SATVIKsynopsis:oltp-http
Open

fix(otlp-http): limit response body size to prevent memory exhaustion#3501
SATVIKsynopsis wants to merge 19 commits into
open-telemetry:mainfrom
SATVIKsynopsis:oltp-http

Conversation

@SATVIKsynopsis

@SATVIKsynopsis SATVIKsynopsis commented May 7, 2026

Copy link
Copy Markdown

Fixes #3439

Changes

I traced all HTTP response handling to export_http_once in opentelemetry-oltp/src/exporter/http/mod.rs and since all three signal exporters funnel through this one function, fixing it here covered everything. I also added a MAX_RESPONSE_BODY_BYTES (10 mb) cap on 2xx response bodies and stopped reading non-2xx bodies entirely. I also added a test with a mock client that returns an oversized body to verify the cap works.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@SATVIKsynopsis SATVIKsynopsis requested a review from a team as a code owner May 7, 2026 13:03
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 7, 2026

Copy link
Copy Markdown

CLA Signed

The committers listed above are authorized under a signed CLA.

@codecov

codecov Bot commented May 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.43590% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.9%. Comparing base (8882149) to head (2faf365).

Files with missing lines Patch % Lines
opentelemetry-http/src/lib.rs 97.4% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3501   +/-   ##
=====================================
  Coverage   82.9%   82.9%           
=====================================
  Files        130     130           
  Lines      27484   27523   +39     
=====================================
+ Hits       22800   22839   +39     
  Misses      4684    4684           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SATVIKsynopsis SATVIKsynopsis changed the title OLTP HTTP tracing for non-2xx bodies fix(otlp-http): limit response body size to prevent memory exhaustion May 7, 2026
Comment thread opentelemetry-otlp/src/exporter/http/mod.rs Outdated
Comment thread opentelemetry-otlp/src/exporter/http/mod.rs Outdated
Comment thread opentelemetry-otlp/src/exporter/http/mod.rs Outdated
@SATVIKsynopsis

Copy link
Copy Markdown
Author

The current fix checks the size after the full body is already buffered. To truly prevent the allocation, the body needs to be read in chunks and aborted early as suggested earlier as well. I have an approach using bytes_stream() for reqwest async, Read::take() for reqwest blocking and body.frame() for hyper. should i go ahead with this or is there a preferred streaming pattern in this codebase i should follow?

@lalitb

lalitb commented May 8, 2026

Copy link
Copy Markdown
Member

The current fix checks the size after the full body is already buffered. To truly prevent the allocation, the body needs to be read in chunks and aborted early as suggested earlier as well. I have an approach using bytes_stream() for reqwest async, Read::take() for reqwest blocking and body.frame() for hyper. should i go ahead with this or is there a preferred streaming pattern in this codebase i should follow?

Yes, that approach sounds right. The key requirement is that the limit is enforced while the body is being read, before it becomes Response<Bytes>. Using bytes_stream() for reqwest async, Read::take() for reqwest blocking, and body.frame() for hyper sounds like the right direction here. Please make sure each path aborts as soon as the accumulated bytes exceed MAX_RESPONSE_BODY_BYTES, so OtlpHttpClient only receives already-bounded Bytes.

Can we try to accommodate this in next release - #3508 - @cijothomas if it can be fixed soon

@SATVIKsynopsis

SATVIKsynopsis commented May 9, 2026

Copy link
Copy Markdown
Author

Updated the fix to use chunked reading in all three HTTP client implementations as in reqwest async, reqwest blocking and hyper so the limit is enforced while the body is being read before it becomes Response<Bytes>. adding unit tests for the body limit in opentelemetry-http directly would require a mock HTTP server. happy to add one if you can suggest a preferred test crate for this codebase.

Comment thread opentelemetry-http/src/lib.rs Outdated
Comment thread opentelemetry-http/src/lib.rs Outdated
@SATVIKsynopsis

Copy link
Copy Markdown
Author

I have added the Err case in hyper client and resolved the nitpick.

@SATVIKsynopsis

Copy link
Copy Markdown
Author

Hi, just checking in. let me know if any further changes are needed or if there's anything blocking the review. Happy to address any feedback.

Comment thread opentelemetry-http/src/lib.rs

@lalitb lalitb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since this is a behavior change in the built-in opentelemetry-http clients, we should add a short vNext changelog entry for opentelemetry-http before merging/releasing.

@SATVIKsynopsis

Copy link
Copy Markdown
Author

LGTM. Since this is a behavior change in the built-in opentelemetry-http clients, we should add a short vNext changelog entry for opentelemetry-http before merging/releasing.

Thanks. will add a short vNext changelog entry for before merging/releasing.

@SATVIKsynopsis

Copy link
Copy Markdown
Author

I have added the vNext changelog.

Comment thread opentelemetry-http/src/lib.rs Outdated
Comment thread opentelemetry-http/CHANGELOG.md Outdated
@cijothomas

Copy link
Copy Markdown
Member

I also added a test with a mock client that returns an oversized body to verify the cap works.

Is that included in the PR? I am not seeing tests added.?

Comment thread opentelemetry-http/src/lib.rs Outdated
let status = response.status();
let headers = std::mem::take(response.headers_mut());

let mut body_bytes = bytes::BytesMut::new();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we start with some capacity other than 0 to avoid reallocation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let me check whether we can leverage the response body size hint or content length to preallocate an appropriate capacity rather than starting from 0.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked both implementations and was able to leverage the response size information for pre allocation and in both cases the capacity is capped at MAX_RESPONSE_BODY_BYTES before allocating.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut body_bytes = bytes::BytesMut::new();

I don't see pre-allocation still?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, that's still showing the previous version. the pre allocation change is currently only in my local branch using the response size hint/content length); I'll push it shortly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@SATVIKsynopsis

SATVIKsynopsis commented Jun 9, 2026

Copy link
Copy Markdown
Author

I also added a test with a mock client that returns an oversized body to verify the cap works.

Is that included in the PR? I am not seeing tests added.?

I had added a test earlier in opentelemetry-otlp but forgot to move it when the logic was moved to opentelemetry-http. I'll add coverage in opentelemetry-http using a small Axum test server.

Comment thread opentelemetry-http/src/lib.rs Outdated
.take(MAX_RESPONSE_BODY_BYTES as u64 + 1)
.read_to_end(&mut body_bytes)?;
if body_bytes.len() > MAX_RESPONSE_BODY_BYTES {
return Err("response body too large".into());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. If we return Err from here, I think this would cause OTLP Exporter to consider the request as failed, and likely initiate retry waves.
If the response is too big, we don't need to retry (we shouldn't).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. returning an Err here would indeed make the exporter treat it as a request failure and potentially retry so then i will take a look at how the response body is consumed by the oltp exporter and adjust the handling so oversized responses don't trigger retries.

@cijothomas cijothomas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. We need to do a more holistic approach to handle this - when response size is bigger, this PR returns Err and the OTLP Exporter would retry.
We should prevent that. Most likely, this'd mean conditionally reading the response body (we only need the body for partial response case, and then for loggging purposes.)

(Also - small note on pre-allocating the capacity for reading response).

@SATVIKsynopsis

Copy link
Copy Markdown
Author

Thanks for the feedback. looking through the oltp http flow, successful response bodies appear to be consumed primarily for partial success parsing so for oversized responses, would you prefer that we treat them similarly to an empty response body like skip partial success parsing and continue without retrying or would you rather move/condition the response body reading so it's only performed when needed for partial success or logging scenarios?

@SATVIKsynopsis

SATVIKsynopsis commented Jun 10, 2026

Copy link
Copy Markdown
Author

I have added the pre-allocation step in all three, added the http axum mock server, and the oltp exporter retry fix. All tests pass now. happy to address any feedback.

Comment thread opentelemetry-http/src/lib.rs Outdated
let mut response = self.execute(request)?.error_for_status()?;
let status = response.status();
let headers = std::mem::take(response.headers_mut());
let mut body_bytes = Vec::new();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can also be pre-allocated right?

Comment thread opentelemetry-http/Cargo.toml Outdated
tokio = { workspace = true, features = ["time"], optional = true }

[dev-dependencies]
axum = { version = "0.8" }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand its dev-dev only, but wondering if we can avoid this ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to a minimal raw TCP test server instead so we don't need to add axum.

@SATVIKsynopsis

Copy link
Copy Markdown
Author

I have added the pre-allocation on reqwest-blocking and added a minimal TCP test instead of axum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTLP HTTP exporter buffers unbounded response bodies into memory

3 participants