Skip to content

Cap HTTP response body size to bound adversarial-target memory use (CWE-770)#161

Open
OffByQuant wants to merge 1 commit into
google:masterfrom
OffByQuant:security/cap-http-response-body
Open

Cap HTTP response body size to bound adversarial-target memory use (CWE-770)#161
OffByQuant wants to merge 1 commit into
google:masterfrom
OffByQuant:security/cap-http-response-body

Conversation

@OffByQuant

Copy link
Copy Markdown

Summary

Tsunami's HTTP egress paths buffered response bodies into memory with no size cap on three independent code paths. Because Tsunami probes adversarial endpoints by definition, a compromised target can detect the TsunamiSecurityScanner User-Agent on inbound requests and serve an unbounded chunked response — each chunk arrives within the read-timeout window so the read timeout never fires, and the body buffer grows until the JVM heap or the Python plugin server process is exhausted (CWE-770 / CWE-400). The end result: a compromised asset can deterministically crash the scanner that is probing it, producing the exact false-negative "scanned, no findings" outcome an attacker wants from a detection tool.

This patch caps response-body reads on all three paths.

Affected sites

  1. common/.../OkHttpHttpClient.java :: parseResponse — backs every detector going through send / sendAsync. okResponse.body().bytes() only enforces a 2 GB ceiling on bodies whose Content-Length is advertised; chunked-encoded responses with no Content-Length bypass it entirely.
  2. common/.../OkHttpHttpClient.java :: sendAsIs — used by the crafted-URL probes (path-traversal probes, percent-encoded edge cases) that are most likely to fire against hostile targets. ByteString.readFrom drains the raw HttpURLConnection stream until EOF with no ceiling at all.
  3. plugin_server/py/.../requests_http_client.pysession.send() defaults to stream=False, which buffers the entire body inside the Response object before returning. res.content then returns those cached bytes unconditionally; a malicious chunked peer OOMs the Python plugin server before _parse_response is ever entered.

Fix

Introduce a configurable max_response_body_bytes cap (default 100 MB) enforced at every body-ingestion point.

  • Reads are drained chunk-by-chunk into a bounded buffer.
  • Content-Length, when advertised, is checked against the cap up front (cheap pre-flight rejection).
  • Once the running total exceeds the cap, the underlying connection is closed and the call fails with IOException (Java) / IOError (Python).
  • The affected probe errors out cleanly; the surrounding scan continues, including all subsequent detectors against the same and other targets.
  • No detector code is touched, and no detector behavior changes for any response under the cap.

Java wiring

  • New HttpClientModule.@MaxResponseBodyBytes Guice qualifier + provider resolves CLI → config → default in the same null-coalescing pattern already used for the timeout fields:
    • CLI: --http-client-max-response-body-mb (in HttpClientCliOptions)
    • Config: common.net.http.maxResponseBodyMb (in HttpClientConfigProperties)
    • Default: 100 MB (OkHttpHttpClient.DEFAULT_MAX_RESPONSE_BODY_BYTES)
  • New OkHttpHttpClient constructor + OkHttpHttpClientBuilder.setMaxResponseBodyBytes setter expose the cap. The legacy 6-arg constructor is preserved (it now delegates to the 7-arg form with the default), so external callers stay source-compatible.

Python wiring

  • New --max_response_body_mb absl flag in plugin_server.py plumbed through RequestsHttpClientBuilder.set_max_response_body_bytes into RequestsHttpClient.
  • session.send() now passes stream=True on both the sync and async paths so the body never lands in the Response object before we have a chance to cap it.

Tests

  • OkHttpHttpClientTest:
    • send_whenResponseBodyExceedsCap_throwsIOException
    • sendAsIs_whenResponseBodyExceedsCap_throwsIOException
    • send_whenResponseBodyWithinCap_returnsExpectedHttpResponse
  • requests_http_client_test:
    • test_send_when_response_body_exceeds_cap_raises_io_error (cap hit on streamed bytes)
    • test_send_when_advertised_content_length_exceeds_cap_raises_io_error (cap hit on advertised Content-Length pre-flight)
    • test_send_when_response_body_within_cap_returns_expected_response (golden-path regression)

Compatibility

  • Default cap of 100 MB is orders of magnitude larger than any legitimate HTML/JSON detector response. No existing detector should observe behavior changes.
  • Java legacy 6-arg OkHttpHttpClient constructor preserved with default cap, so any out-of-tree caller (unlikely given the package-private constructor, but listed for completeness) continues to compile.
  • Operators who legitimately need to ingest very large bodies (e.g. a multi-GB diagnostics endpoint) can raise the cap via --http-client-max-response-body-mb (Java) or --max_response_body_mb (Python) — or fall to a smaller value to harden further against malicious targets.

Test plan

  • CI runs ./gradlew test against the patched Java sources (I was unable to run gradle locally — no wrapper checked into the repo and no system gradle on my dev box; relying on this PR's CI).
  • CI runs the Python plugin-server pytest suite against the patched requests_http_client_test.py (requests-mock + absl-py not in my local env; relying on CI).
  • Manual sanity test: run the scanner against a target that streams a chunked response with no Content-Length and verify the affected probe errors out with the cap message rather than crashing the JVM / Python process.

References

  • CWE-770: Allocation of Resources Without Limits or Throttling
  • CWE-400: Uncontrolled Resource Consumption
  • OkHttp ResponseBody documentation — note on body size and streaming

@google-cla

google-cla Bot commented Apr 29, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@OffByQuant

OffByQuant commented Apr 29, 2026

Copy link
Copy Markdown
Author

@googlebot I signed it!

The scanner's HTTP egress paths buffered response bodies into memory with
no size ceiling on three independent code paths:

  * common/.../OkHttpHttpClient.java :: parseResponse — used by every
    detector going through send / sendAsync. Calls
    okResponse.body().bytes(), which only enforces a 2 GB cap on bodies
    advertising a Content-Length and that ceiling is bypassed entirely
    by chunked encoding with no Content-Length header.
  * common/.../OkHttpHttpClient.java :: sendAsIs — used by the
    crafted-URL probes that fire against hostile targets. Calls
    ByteString.readFrom on the raw HttpURLConnection input stream with
    no ceiling at all.
  * plugin_server/py/.../requests_http_client.py :: send +
    _parse_response — uses session.send() with the default
    stream=False, which buffers the entire body into the Response
    object before returning, then reads res.content unconditionally.

Because Tsunami probes adversarial endpoints by definition, an
attacker-controlled target can detect the TsunamiSecurityScanner
User-Agent on inbound requests and serve an unbounded chunked response.
Each chunk arrives within the read-timeout window so the read timeout
never fires, and the body buffer grows until the JVM heap or Python
process is exhausted (CWE-770 / CWE-400). The result: a compromised
asset can deterministically crash the scanner that is probing it,
producing the false-negative "scanned, no findings" outcome an attacker
wants from a detection tool.

Fix: introduce a configurable max-response-body cap (default 100 MB)
enforced at every body-ingestion point. Reads are drained chunk by
chunk into a bounded buffer; once the cap is exceeded, the underlying
connection is closed and the call fails with IOException / IOError so
the affected probe errors out cleanly while the surrounding scan
continues.

Wiring:
  * Java: new HttpClientModule.@MaxResponseBodyBytes provider resolves
    --http-client-max-response-body-mb (CLI) → maxResponseBodyMb
    (config) → 100 MB default, mirroring the existing timeout flag
    pattern. New OkHttpHttpClient constructor + builder setter expose
    the cap; the legacy 6-arg constructor is preserved with the
    default value to keep external callers source-compatible.
  * Python: new --max_response_body_mb absl flag plumbed through
    RequestsHttpClientBuilder.set_max_response_body_bytes into
    RequestsHttpClient. session.send() now passes stream=True so the
    body never lands in the Response object before we get a chance to
    cap it.

Tests:
  * OkHttpHttpClientTest: send / sendAsIs throw IOException when the
    body exceeds the cap; bodies within the cap pass through unchanged.
  * requests_http_client_test: send raises IOError on advertised
    Content-Length over cap, on actual streamed bytes over cap, and
    passes through small bodies unchanged.

Disclosure note: the unbounded reads were originally surfaced by an
external security review. The patch is local to the HTTP-client layer;
no detector code changes and no detector behavior changes for any
response under the cap. Couldn't run gradle/pytest locally (no wrapper
checked in, no project-wide pytest harness configured); relying on CI.
@OffByQuant OffByQuant force-pushed the security/cap-http-response-body branch from a0035e7 to 42e39b4 Compare April 29, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant