Skip to content

HDF5 lib is more restrictive than file format spec #6409

@Apollo3zehn

Description

@Apollo3zehn

Describe the bug

I am the author of PureHDF, a C# library to read and write HDF5 files without a dependency to the HDF5 C library. Recently I received this merge request where a user of that lib complains that files written by PureHDF cannot be opened anymore in e.g. HDFView and all other tools that rely on the C lib.

The merge request proposes to use the minimum number of bytes required to encode the chunk dimensions instead of using a hardcoded value of 8 bytes.

For now I rejected this proposal because according to the spec the Dimension Size Encoded Length of the Chunked Storage Property Description is described as This is the size in bytes used to encode Dimension Size.. So the spec does not state that this has to be the minimal size required to encode the dimension lengths and so with PureHDF I opted to just always use 8 bytes.

The problem now is that a commit 6 months ago introduced a check to ensure the minimal number of bytes is used and otherwise it would throw an error: https://github.com/HDFGroup/hdf5/blame/develop/src/H5Dchunk.c#L858

And now all files written with that fixed number of 8 bytes cannot be read anymore.

Image

If you confirm that my interpretation of the spec is correct in that implementations are free to choose the number of bytes to encode the chunk dimension sizes, then it would be great if you can remove the recently introduced check linked above for file reading operations

Expected behavior

When reading, the HDF5 lib should accept the actual value of the Dimension Size Encoded Length field encoded in the file instead of expecting it to be the minimal length required to encode the dimensions to ensure compliance with the spec.

Additional context

Metadata

Metadata

Labels

Type

No fields configured for Task.

Projects

Status

Planning

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions