Skip to content

compress2/decompress2 corrupt all-zeros buffer when length is not a multiple of typesize #665

@FabianIsensee

Description

@FabianIsensee

blosc2.compress2() followed by blosc2.decompress2() fails to round-trip an
all-zeros input whose byte length is not a multiple of typesize.
compress2 emits a 32-byte "all-zeros special value" frame that decompress2
then refuses to decode, raising:

ValueError: Error while decompressing, check the src data and/or the dparams

The data is silently lost at compress time (the 32-byte frame does not encode
the true length correctly), so this is a data-corruption bug, not merely a
decode-side error.

Environment

  • python-blosc2: 4.5.1
  • c-blosc2: 3.1.4 (2026-06-17)
  • Python: 3.12.11
  • numpy: 2.1.2 (not required to reproduce)
  • OS: Linux 6.17 x86_64 (glibc 2.39)

Also reproduced on python-blosc2 4.3.3 / c-blosc2 earlier, so this is not a
recent regression.

Minimal reproduction (no numpy)

import blosc2

data = bytes(707658)            # all zeros; 707658 % 8 == 2  (NOT a multiple of typesize 8)
c = blosc2.compress2(data, typesize=8)
print(len(c))                   # -> 32  (all-zeros special-value frame)
blosc2.decompress2(c)           # -> ValueError: Error while decompressing, ...

Trigger conditions (all three required)

  1. The input is all zeros (triggers blosc2's zero special-value frame; the
    compressed output is 32 bytes regardless of input size).
  2. The input byte length is not a multiple of typesize.
  3. Any codec — reproduced with ZSTD, LZ4, and BLOSCLZ.

If any of these does not hold, the round-trip succeeds.

Controls (all behave correctly)

import blosc2

# Length IS a multiple of typesize -> OK
blosc2.decompress2(blosc2.compress2(bytes(707656), typesize=8))      # OK (707656 % 8 == 0)

# typesize=1 -> every length is a multiple -> OK
blosc2.decompress2(blosc2.compress2(bytes(707658), typesize=1))      # OK

# Non-zero data at the same (non-multiple) length -> OK
blosc2.decompress2(blosc2.compress2(b"\x07" * 707658, typesize=8))   # OK (clen=86, not the 32-byte zero frame)

# Random/incompressible data at the same length -> OK

Divisibility sweep (all-zeros, typesize=8)

length length % 8 result
80000 0 OK
80001 1 FAIL
80007 7 FAIL
80008 0 OK
707656 0 OK
707658 2 FAIL
707664 0 OK

Same pattern for typesize=4 (fails unless len % 4 == 0) and typesize=2
(fails unless len % 2 == 0). typesize=1 always succeeds.

Related observation

The blosc1-compatibility API guards against this by rejecting non-multiple
lengths up front:

blosc2.compress(bytes(707658), typesize=8)
# ValueError: len(src) can only be a multiple of typesize (8).

compress2 instead accepts the same input and produces a frame that cannot be
decompressed. It should either apply the same validation, or (preferably)
correctly handle a trailing partial element in the all-zeros special-value path.

Impact

Real-world hit: we compress arbitrary numpy arrays as raw byte streams. An
all-zeros region (e.g. a cleared/blank segmentation tile) of 166*49*87 = 707658 bytes silently produced an undecodable frame, surfacing only at
decompress time on the receiving end. Passing typesize=1 is a safe workaround
for byte-stream payloads, but the underlying compress2/decompress2
inconsistency looks like a genuine bug.

Workaround

Pass typesize=1 when compressing a raw byte stream (or otherwise ensure the
length is a multiple of typesize).


---

*Repo to file against: https://github.com/Blosc/python-blosc2 (route to
c-blosc2 if the fault is in the special-value frame codec).*

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions