Skip to content

Fix for Issue #1282: Guard deserialization against unbounded allocation from declared sizes#1284

Open
joszamama wants to merge 10 commits into
EsotericSoftware:masterfrom
joszamama:fix/string-collection-map-allocation
Open

Fix for Issue #1282: Guard deserialization against unbounded allocation from declared sizes#1284
joszamama wants to merge 10 commits into
EsotericSoftware:masterfrom
joszamama:fix/string-collection-map-allocation

Conversation

@joszamama

Copy link
Copy Markdown

Summary

Several serializers read a declared length, count, or size and allocate from it before validating it against the available input. A small corrupt or malicious message that declares a multi-billion element array, string, collection, or map triggers a multi-gigabyte allocation and an OutOfMemoryError. This continues the discussion in #1282 (primitive arrays) and extends the same guard to the other read paths that share the pattern, plus an opt-in size limit for callers that decode untrusted input.

Kryo assumes a trusted source, so these are robustness fixes, not a change to that threat model. The default behavior is unchanged.

The pattern

new T[declaredLength] (and new char[charCount], new ArrayList<>(declaredSize), new HashMap(declaredSize), ...) runs before any element is read, so the declared size alone drives the allocation. A 6 to 10 byte message can request gigabytes.

What this changes

When an Input is backed by a fixed buffer (no InputStream), the number of elements cannot exceed the number of bytes remaining, because every element occupies at least one byte. That is the bound used here.

  • Primitive arrays and byte[] (Input.readInts/readLongs/.../readBytes, and the ByteBufferInput, UnsafeInput, UnsafeByteBufferInput variants): validate the declared length against the bytes remaining before allocating. This is the Unbounded new int[length] from declared array length in Input.readInts → OutOfMemoryError on malformed input #1282 case.
  • Strings (readUtf8Chars in Input and ByteBufferInput): the same guard on the UTF-8 char count.
  • String[], closure captured args, and CompatibleFieldSerializer field counts: the same guard. Each element is read as a string or a class-tagged object, so it is at least one byte.
  • Collections and Maps (CollectionSerializer, MapSerializer, including the Arrays.asList, JDK immutable, and PriorityQueue create overrides): clamp the initial capacity to the bytes remaining instead of throwing. The collection or map still grows to the real element count, so valid input is never rejected, while a corrupt size no longer forces a large pre-allocation. The clamp is applied in read() before create() so the create overrides are covered too.

Why throw in some places and clamp in others

Throwing is only safe where every element is guaranteed to consume at least one byte (primitives, bytes, chars, strings, class-tagged objects). There, a declared length greater than the bytes remaining is provably malformed, so no valid input is rejected.

Collections and maps can hold elements that serialize to zero bytes (for example a registered no-field element type written with a known serializer), so a valid collection can legitimately declare more elements than there are bytes remaining at that point. Throwing there would reject valid input. Clamping the capacity avoids that: it caps the pre-allocation but lets the structure grow, so the result is always correct.

What is intentionally left unchanged

ObjectArraySerializer (generic Object[]) is not guarded by the bytes-remaining check. Its elements can serialize to zero bytes and the array is fixed size, so neither the throw nor the clamp can bound it without rejecting valid input. The maxArraySize limit below still applies to it.

maxArraySize: the configurable limit (default off), and why

The bytes-remaining guard only applies to a fixed buffer. For an InputStream-backed Input the buffered window is not the total size, so a declared size cannot be checked against it without rejecting valid large input. To let callers bound streaming (and buffered) deserialization, this adds:

input.setMaxArraySize(n); // default Integer.MAX_VALUE

validateArrayLength and the collection and map size reads reject any declared size above the limit, on both stream-backed and buffer-backed input.

The default is Integer.MAX_VALUE, that is, unlimited:

  • By default nothing changes. A valid payload always has length <= bytes remaining, so the
    buffer guard never fires on valid input, and the unlimited cap never fires at all. Existing callers see today's behavior, and the in-memory case is still protected automatically.
  • Callers that decode untrusted data opt in with a value suited to their application. This is the only mechanism that covers stream-backed input, which cannot be bounded automatically.

A finite default was considered but not chosen. A finite default would reject legitimately large payloads from existing users and would change behavior on upgrade. Keeping it unlimited preserves backward compatibility and matches Kryo's trusted-source model: a caller that exposes Kryo to untrusted input sets the bound it expects.

Backwards compatibility

  • Default behavior is unchanged. Large valid arrays, strings, collections, and maps still deserialize. The guard rejects only a declared size larger than the input can supply.
  • The one observable change for malformed input is the exception type: KryoBufferUnderflowException or KryoException instead of OutOfMemoryError or NegativeArraySizeException. It is catchable and thrown before the allocation.
  • Stream-backed input is unchanged unless setMaxArraySize is called.

New tests

  • InputOutputTest.testArrayLengthValidation, testMaliciousArrayLength: the primitive array and byte[] guards, including an end-to-end int[] decode.
  • InputOutputTest.testMaliciousStringLength: the string char-count guard on Input and ByteBufferInput.
  • InputOutputTest.testMaxArraySize: the limit on a stream-backed input for arrays.
  • UnsafeInputOutputTest.testArrayLengthValidation: the unsafe array variants.
  • CollectionSerializerTest.testMaliciousCollectionSize, testMaxCollectionSize: the collection clamp and the stream limit.
  • MapSerializerTest.testMaliciousMapSize, testMaxMapSize: the map clamp and the stream limit.

Verification

The full module test suite passes. It was also run on the project's JDK matrix (Temurin 8, 11, 17, 21, 25), building on 11 and testing on each, as in CI.

…pt length (EsotericSoftware#1282)

When reading from a fixed buffer (no InputStream), a primitive array or byte[] was allocated from the declared length before checking that the input held enough data, so a tiny corrupt payload declaring a multi-billion element length triggered an OutOfMemoryError. The declared length is now validated against the remaining bytes (every element needs at least one byte) before allocating, in Input, ByteBufferInput, UnsafeInput and UnsafeByteBufferInput.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Kryo deserialization against declared-size-driven, unbounded pre-allocation by validating/clamping declared lengths before allocating arrays/strings/collection/map backing storage, and introduces an opt-in Input#setMaxArraySize(int) limit for callers handling untrusted input (especially stream-backed inputs).

Changes:

  • Add Input.maxArraySize with get/setMaxArraySize and a shared validateArrayLength guard used by multiple read paths.
  • Apply declared-size validation to primitive bulk array reads, byte[] reads, UTF-8 char buffer growth, String[], closure captured args, and CompatibleFieldSerializer field counts.
  • Clamp collection/map initial capacities based on remaining bytes for buffer-backed inputs; add tests covering malicious sizes and max-size enforcement.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/com/esotericsoftware/kryo/io/Input.java Adds maxArraySize, validateArrayLength, and uses it to guard allocations in array/string/byte[] reads.
src/com/esotericsoftware/kryo/io/ByteBufferInput.java Applies validateArrayLength to allocations in ByteBuffer-backed bulk reads and UTF-8 char buffer growth.
src/com/esotericsoftware/kryo/unsafe/UnsafeInput.java Applies validateArrayLength to unsafe bulk primitive array allocations.
src/com/esotericsoftware/kryo/unsafe/UnsafeByteBufferInput.java Applies validateArrayLength to unsafe ByteBuffer bulk primitive array allocations.
src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java Clamps declared collection size for initial capacity and enforces maxArraySize.
src/com/esotericsoftware/kryo/serializers/MapSerializer.java Clamps declared map size for initial capacity and enforces maxArraySize.
src/com/esotericsoftware/kryo/serializers/DefaultArraySerializers.java Guards String[] allocation using Input.validateArrayLength.
src/com/esotericsoftware/kryo/serializers/CompatibleFieldSerializer.java Guards field-name array allocation using Input.validateArrayLength.
src/com/esotericsoftware/kryo/serializers/ClosureSerializer.java Guards closure captured-args array allocation using Input.validateArrayLength.
test/com/esotericsoftware/kryo/io/InputOutputTest.java Adds tests for array-length validation, malicious string/array sizes, and maxArraySize.
test/com/esotericsoftware/kryo/io/UnsafeInputOutputTest.java Adds tests for unsafe array-length validation.
test/com/esotericsoftware/kryo/serializers/CollectionSerializerTest.java Adds malicious-size and maxArraySize tests for collections.
test/com/esotericsoftware/kryo/serializers/MapSerializerTest.java Adds malicious-size and maxArraySize tests for maps.
Comments suppressed due to low confidence (1)

src/com/esotericsoftware/kryo/io/Input.java:965

  • validateArrayLength(length) only compares the element count to bytes remaining (1 byte/element lower bound). For fixed-width bulk reads like readInts(int length) (4 bytes/element), malformed input can still allocate very large arrays when length <= (limit-position) but (long)length*4 > (limit-position), causing OOM before the later underflow. Additionally, the subsequent length << 2 / length << 3 calculations can overflow to a negative value, which may lead to incorrect optional(...) behavior and even ArrayIndexOutOfBoundsException while bulk-reading. Consider adding bytes-per-element aware validation using long required = (long)length * BYTES_PER_ELEMENT (and checking required fits in int) for the fixed-width array methods in Input/ByteBufferInput/unsafe variants.
	/** Reads an int array in bulk. This may be more efficient than reading them individually. */
	public int[] readInts (int length) throws KryoException {
		int[] array = new int[validateArrayLength(length)];
		if (optional(length << 2) == length << 2) {
			byte[] buffer = this.buffer;
			int p = this.position;
			for (int i = 0; i < length; i++, p += 4) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/com/esotericsoftware/kryo/io/Input.java
Comment thread src/com/esotericsoftware/kryo/io/Input.java
Comment thread src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java
Comment thread src/com/esotericsoftware/kryo/serializers/MapSerializer.java
Comment thread src/com/esotericsoftware/kryo/io/ByteBufferInput.java
Comment thread src/com/esotericsoftware/kryo/unsafe/UnsafeByteBufferInput.java
Comment thread src/com/esotericsoftware/kryo/unsafe/UnsafeInput.java
@theigl

theigl commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

@joszamama: This looks great already.

Except for the issues raised by Copilot (from a quick glance, most of them seem valid), I would ask you for two more changes:

  1. Document the new configuration option for Input in the readme and add javadoc for the option.
  2. Add a benchmark for array deserialization to kryo-benchmarks and check if there are any performance regressions. My guess would be that the overhead is negligible, but it would be great to know for sure.

It might make sense to separate array and collection maxArraySize from the max size for strings. We could configure reasonable defaults for the next major release and the value for maxArraySize could probably be much lower than the one for strings. What do you think?

@theigl

theigl commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

It might make sense to separate array and collection maxArraySize from the max size for strings. We could configure reasonable defaults for the next major release and the value for maxArraySize could probably be much lower than the one for strings. What do you think?

Alternatively, we can start out with the single configuration option, keep it as simple as possible and split it later as needed.

Account for element width and reject sizes whose byte count overflows an int, preventing over-allocation and the length<<n shift overflow. Also reject a negative maxArraySize.
@joszamama

Copy link
Copy Markdown
Author

Thanks @theigl. Pushed a follow-up addressing the review.

Copilot points

  • The fixed-width array readers (readInts, readLongs, readFloats, readDoubles, readShorts, readChars) now validate by byte count instead of element count. I added a validateArrayLength(int length, int bytesPerElement) overload and use it in Input, ByteBufferInput, UnsafeInput and UnsafeByteBufferInput. It rejects a declared size whose total byte size is larger than the bytes remaining (buffer-backed) or overflows an int (stream-backed), so the length << n shift can no longer overflow. The one-byte paths (byte arrays, UTF-8, var-length, objects) keep the existing check.
  • setMaxArraySize now rejects negative values with an IllegalArgumentException.
  • Renamed the message from "Array size" to "Declared size" in Input, CollectionSerializer and MapSerializer, since it also fires for strings, collections and maps.

Docs

Added a "Limiting deserialized size" section to the README and javadoc on getMaxArraySize, setMaxArraySize and both validateArrayLength overloads.

Benchmark

Added readDoubles/writeDoubles to ArrayBenchmark (it already covered ints and longs) and ran the array read benchmarks with and without the guard to isolate its cost. The overhead is within noise:

array read unguarded guarded
readInts 0.230 ± 0.074 0.228 ± 0.065
readLongs 0.514 ± 0.082 0.513 ± 0.086
readDoubles 0.511 ± 0.086 0.505 ± 0.098

(s/op, SingleShotTime, batchSize 12M, single fork, local JDK 24 on macOS arm64.) The check runs once per array read.

@joszamama

Copy link
Copy Markdown
Author

It might make sense to separate array and collection maxArraySize from the max size for strings. We could configure reasonable defaults for the next major release and the value for maxArraySize could probably be much lower than the one for strings. What do you think?

Alternatively, we can start out with the single configuration option, keep it as simple as possible and split it later as needed.

I'd keep the single option for now. Splitting strings from arrays and collections fits better together with choosing finite defaults, which is a behavior change (hence, better scheduled for the next major release?) I am more in the security side than on the maintenance side, so I am sure your opinion would be more valuable.

@theigl theigl left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only two minor comments.

I'll let Copilot review once more.

Comment on lines 173 to +176
/** Used by {@link #read(Kryo, Input, Class)} to create the new object. This can be overridden to customize object creation, eg
* to call a constructor with arguments. The default implementation uses {@link Kryo#newInstance(Class)} with a special case
* for HashMap. */
private static int clampSize (Input input, int size) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc belongs to the create method. Please move it there.

Also move clampSize to Input (instance method) and re-use it between Map and Collection serializers.


public int[] readInts (int length) throws KryoException {
int[] array = new int[length];
int[] array = new int[validateArrayLength(length, 4)];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use constants Integer.BYTES, Float.BYTES, Long.BYTES etc.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Comment thread src/com/esotericsoftware/kryo/io/Input.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants