Fix for Issue #1282: Guard deserialization against unbounded allocation from declared sizes#1284
Conversation
…pt length (EsotericSoftware#1282) When reading from a fixed buffer (no InputStream), a primitive array or byte[] was allocated from the declared length before checking that the input held enough data, so a tiny corrupt payload declaring a multi-billion element length triggered an OutOfMemoryError. The declared length is now validated against the remaining bytes (every element needs at least one byte) before allocating, in Input, ByteBufferInput, UnsafeInput and UnsafeByteBufferInput.
There was a problem hiding this comment.
Pull request overview
This PR hardens Kryo deserialization against declared-size-driven, unbounded pre-allocation by validating/clamping declared lengths before allocating arrays/strings/collection/map backing storage, and introduces an opt-in Input#setMaxArraySize(int) limit for callers handling untrusted input (especially stream-backed inputs).
Changes:
- Add
Input.maxArraySizewithget/setMaxArraySizeand a sharedvalidateArrayLengthguard used by multiple read paths. - Apply declared-size validation to primitive bulk array reads,
byte[]reads, UTF-8 char buffer growth,String[], closure captured args, andCompatibleFieldSerializerfield counts. - Clamp collection/map initial capacities based on remaining bytes for buffer-backed inputs; add tests covering malicious sizes and max-size enforcement.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/com/esotericsoftware/kryo/io/Input.java | Adds maxArraySize, validateArrayLength, and uses it to guard allocations in array/string/byte[] reads. |
| src/com/esotericsoftware/kryo/io/ByteBufferInput.java | Applies validateArrayLength to allocations in ByteBuffer-backed bulk reads and UTF-8 char buffer growth. |
| src/com/esotericsoftware/kryo/unsafe/UnsafeInput.java | Applies validateArrayLength to unsafe bulk primitive array allocations. |
| src/com/esotericsoftware/kryo/unsafe/UnsafeByteBufferInput.java | Applies validateArrayLength to unsafe ByteBuffer bulk primitive array allocations. |
| src/com/esotericsoftware/kryo/serializers/CollectionSerializer.java | Clamps declared collection size for initial capacity and enforces maxArraySize. |
| src/com/esotericsoftware/kryo/serializers/MapSerializer.java | Clamps declared map size for initial capacity and enforces maxArraySize. |
| src/com/esotericsoftware/kryo/serializers/DefaultArraySerializers.java | Guards String[] allocation using Input.validateArrayLength. |
| src/com/esotericsoftware/kryo/serializers/CompatibleFieldSerializer.java | Guards field-name array allocation using Input.validateArrayLength. |
| src/com/esotericsoftware/kryo/serializers/ClosureSerializer.java | Guards closure captured-args array allocation using Input.validateArrayLength. |
| test/com/esotericsoftware/kryo/io/InputOutputTest.java | Adds tests for array-length validation, malicious string/array sizes, and maxArraySize. |
| test/com/esotericsoftware/kryo/io/UnsafeInputOutputTest.java | Adds tests for unsafe array-length validation. |
| test/com/esotericsoftware/kryo/serializers/CollectionSerializerTest.java | Adds malicious-size and maxArraySize tests for collections. |
| test/com/esotericsoftware/kryo/serializers/MapSerializerTest.java | Adds malicious-size and maxArraySize tests for maps. |
Comments suppressed due to low confidence (1)
src/com/esotericsoftware/kryo/io/Input.java:965
validateArrayLength(length)only compares the element count to bytes remaining (1 byte/element lower bound). For fixed-width bulk reads likereadInts(int length)(4 bytes/element), malformed input can still allocate very large arrays whenlength <= (limit-position)but(long)length*4 > (limit-position), causing OOM before the later underflow. Additionally, the subsequentlength << 2/length << 3calculations can overflow to a negative value, which may lead to incorrectoptional(...)behavior and evenArrayIndexOutOfBoundsExceptionwhile bulk-reading. Consider adding bytes-per-element aware validation usinglong required = (long)length * BYTES_PER_ELEMENT(and checkingrequiredfits inint) for the fixed-width array methods inInput/ByteBufferInput/unsafe variants.
/** Reads an int array in bulk. This may be more efficient than reading them individually. */
public int[] readInts (int length) throws KryoException {
int[] array = new int[validateArrayLength(length)];
if (optional(length << 2) == length << 2) {
byte[] buffer = this.buffer;
int p = this.position;
for (int i = 0; i < length; i++, p += 4) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@joszamama: This looks great already. Except for the issues raised by Copilot (from a quick glance, most of them seem valid), I would ask you for two more changes:
It might make sense to separate array and collection |
Alternatively, we can start out with the single configuration option, keep it as simple as possible and split it later as needed. |
Account for element width and reject sizes whose byte count overflows an int, preventing over-allocation and the length<<n shift overflow. Also reject a negative maxArraySize.
|
Thanks @theigl. Pushed a follow-up addressing the review. Copilot points
Docs Added a "Limiting deserialized size" section to the README and javadoc on Benchmark Added
(s/op, SingleShotTime, batchSize 12M, single fork, local JDK 24 on macOS arm64.) The check runs once per array read. |
I'd keep the single option for now. Splitting strings from arrays and collections fits better together with choosing finite defaults, which is a behavior change (hence, better scheduled for the next major release?) I am more in the security side than on the maintenance side, so I am sure your opinion would be more valuable. |
theigl
left a comment
There was a problem hiding this comment.
Looks great! Only two minor comments.
I'll let Copilot review once more.
| /** Used by {@link #read(Kryo, Input, Class)} to create the new object. This can be overridden to customize object creation, eg | ||
| * to call a constructor with arguments. The default implementation uses {@link Kryo#newInstance(Class)} with a special case | ||
| * for HashMap. */ | ||
| private static int clampSize (Input input, int size) { |
There was a problem hiding this comment.
The JavaDoc belongs to the create method. Please move it there.
Also move clampSize to Input (instance method) and re-use it between Map and Collection serializers.
|
|
||
| public int[] readInts (int length) throws KryoException { | ||
| int[] array = new int[length]; | ||
| int[] array = new int[validateArrayLength(length, 4)]; |
There was a problem hiding this comment.
Please use constants Integer.BYTES, Float.BYTES, Long.BYTES etc.
Summary
Several serializers read a declared length, count, or size and allocate from it before validating it against the available input. A small corrupt or malicious message that declares a multi-billion element array, string, collection, or map triggers a multi-gigabyte allocation and an
OutOfMemoryError. This continues the discussion in #1282 (primitive arrays) and extends the same guard to the other read paths that share the pattern, plus an opt-in size limit for callers that decode untrusted input.Kryo assumes a trusted source, so these are robustness fixes, not a change to that threat model. The default behavior is unchanged.
The pattern
new T[declaredLength](andnew char[charCount],new ArrayList<>(declaredSize),new HashMap(declaredSize), ...) runs before any element is read, so the declared size alone drives the allocation. A 6 to 10 byte message can request gigabytes.What this changes
When an
Inputis backed by a fixed buffer (noInputStream), the number of elements cannot exceed the number of bytes remaining, because every element occupies at least one byte. That is the bound used here.byte[](Input.readInts/readLongs/.../readBytes, and theByteBufferInput,UnsafeInput,UnsafeByteBufferInputvariants): validate the declared length against the bytes remaining before allocating. This is the Unbounded new int[length] from declared array length in Input.readInts → OutOfMemoryError on malformed input #1282 case.readUtf8CharsinInputandByteBufferInput): the same guard on the UTF-8 char count.String[], closure captured args, andCompatibleFieldSerializerfield counts: the same guard. Each element is read as a string or a class-tagged object, so it is at least one byte.CollectionSerializer,MapSerializer, including theArrays.asList, JDK immutable, andPriorityQueuecreate overrides): clamp the initial capacity to the bytes remaining instead of throwing. The collection or map still grows to the real element count, so valid input is never rejected, while a corrupt size no longer forces a large pre-allocation. The clamp is applied inread()beforecreate()so the create overrides are covered too.Why throw in some places and clamp in others
Throwing is only safe where every element is guaranteed to consume at least one byte (primitives, bytes, chars, strings, class-tagged objects). There, a declared length greater than the bytes remaining is provably malformed, so no valid input is rejected.
Collections and maps can hold elements that serialize to zero bytes (for example a registered no-field element type written with a known serializer), so a valid collection can legitimately declare more elements than there are bytes remaining at that point. Throwing there would reject valid input. Clamping the capacity avoids that: it caps the pre-allocation but lets the structure grow, so the result is always correct.
What is intentionally left unchanged
ObjectArraySerializer(genericObject[]) is not guarded by the bytes-remaining check. Its elements can serialize to zero bytes and the array is fixed size, so neither the throw nor the clamp can bound it without rejecting valid input. ThemaxArraySizelimit below still applies to it.maxArraySize: the configurable limit (default off), and why
The bytes-remaining guard only applies to a fixed buffer. For an
InputStream-backedInputthe buffered window is not the total size, so a declared size cannot be checked against it without rejecting valid large input. To let callers bound streaming (and buffered) deserialization, this adds:validateArrayLengthand the collection and map size reads reject any declared size above the limit, on both stream-backed and buffer-backed input.The default is
Integer.MAX_VALUE, that is, unlimited:buffer guard never fires on valid input, and the unlimited cap never fires at all. Existing callers see today's behavior, and the in-memory case is still protected automatically.
A finite default was considered but not chosen. A finite default would reject legitimately large payloads from existing users and would change behavior on upgrade. Keeping it unlimited preserves backward compatibility and matches Kryo's trusted-source model: a caller that exposes Kryo to untrusted input sets the bound it expects.
Backwards compatibility
KryoBufferUnderflowExceptionorKryoExceptioninstead ofOutOfMemoryErrororNegativeArraySizeException. It is catchable and thrown before the allocation.setMaxArraySizeis called.New tests
InputOutputTest.testArrayLengthValidation,testMaliciousArrayLength: the primitive array andbyte[]guards, including an end-to-endint[]decode.InputOutputTest.testMaliciousStringLength: the string char-count guard onInputandByteBufferInput.InputOutputTest.testMaxArraySize: the limit on a stream-backed input for arrays.UnsafeInputOutputTest.testArrayLengthValidation: the unsafe array variants.CollectionSerializerTest.testMaliciousCollectionSize,testMaxCollectionSize: the collection clamp and the stream limit.MapSerializerTest.testMaliciousMapSize,testMaxMapSize: the map clamp and the stream limit.Verification
The full module test suite passes. It was also run on the project's JDK matrix (Temurin 8, 11, 17, 21, 25), building on 11 and testing on each, as in CI.