Skip to content

Don't alter OME-XML BigEndian values for int8/uint8 data#151

Merged
sbesson merged 3 commits into
glencoesoftware:masterfrom
melissalinkert:mixed-pixel-types
Apr 22, 2026
Merged

Don't alter OME-XML BigEndian values for int8/uint8 data#151
sbesson merged 3 commits into
glencoesoftware:masterfrom
melissalinkert:mixed-pixel-types

Conversation

@melissalinkert
Copy link
Copy Markdown
Member

The endianness coming from the Zarr library may be different in this case, since it doesn't technically matter what the endianness is for int8/uint8 data. However, this prevents the endianness from mismatching if a uint8 Image is followed by a uint16 (or greater) Image.

Without this change, relevant test data would have thrown a FormatException with the (incorrectly formatted) Endian mismatch... message. With this change, conversion should succeed. This may need a little reworking when we switch to zarr-java for v2 reading.

I wanted to add a test for this case, but the fake file structure doesn't currently allow creating multiple series with different pixel types, so will need to give that some more thought.

The endianness coming from the Zarr library may be different in this case,
since it doesn't technically matter what the endianness is for int8/uint8 data.
However, this prevents the endianness from mismatching if a uint8 Image is followed
by a uint16 (or greater) Image.
Copy link
Copy Markdown
Member

@Yajing826 Yajing826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to successfully run the conversion and load the data into OMERO Plus. Thanks!

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with a Zarr v2 dataset containing a mixture of arrays with "|u1" and ">u2 dtypes.

Without this PR, the conversion fails with the Endian mismatch error (including unreplace {} values) and it succeeds with this PR included.

A few questions:

  • should we wait for #152 to be included to retest this PR? should we also test it against v3 data?
  • has this always been an issue or is this a recent regression?
  • even though endianness is not relevant for single byte data types, should it also be the responsibility of bioformats2raw to try and maintain consistency between the underlying Zarr array metadata and Pixels.BigEndian?

In terms of testing, I assume it should be possible to create a sample OME-XML with a mixture of data types?

@melissalinkert
Copy link
Copy Markdown
Member Author

If an RC with this change is needed very urgently (defer to @Yajing826), then I would say test this first, and then double-check this case with v2 and v3 as part of the review of #152. If an RC with this change is not needed urgently, then it might be better to get #152 in first and then we can re-evaluate this.

  • has this always been an issue or is this a recent regression?

I believe this has always been an issue. It will only appear in the case where there is a int8/uint8 series before a higher byte depth image, which isn't super common with the data that we typically convert.

  • even though endianness is not relevant for single byte data types, should it also be the responsibility of bioformats2raw to try and maintain consistency between the underlying Zarr array metadata and Pixels.BigEndian?

It really depends on what happens in the underlying library. For jzarr:

https://github.com/zarr-developers/jzarr/blob/main/src/main/java/com/bc/zarr/ZarrHeader.java#L66-L72
https://github.com/zarr-developers/jzarr/blob/main/src/main/java/com/bc/zarr/ZarrHeader.java#L115-L117

So that forces writing int8/uint8 with undefined endianness in all cases, but will return the system-native endianness when reading the same data back. I don't see how we can work around that, but definitely worth re-evaluating for zarr-java.

@melissalinkert
Copy link
Copy Markdown
Member Author

And for zarr-java, note that the only int8/uint8 data types allowed for v2 are unspecified endianness:

https://github.com/zarr-developers/zarr-java/blob/main/src/main/java/dev/zarr/zarrjava/v2/DataType.java

@Yajing826
Copy link
Copy Markdown
Member

If an RC with this change is needed very urgently (defer to @Yajing826), then I would say test this first, and then double-check this case with v2 and v3 as part of the review of #152. If an RC with this change is not needed urgently, then it might be better to get #152 in first and then we can re-evaluate this.

It's not urgent, I only needed it for internal testing, and so far no customer has had this issue directly.

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the private source file, I generated various OME-Zarr datasets both with bioformats2raw 0.11.0 (v2) and bioformats2raw 0.12.0rc2 (v2 and v3).

After conversion, confirmed that the versions result in mixed endianness and that the switch to zarr-java uses little endian - see glencoesoftware/bioformats2raw#317

omero@ngff:/mnt/data/seb/raw2ometiff_151$ grep dtype failing_dataset_0.11.0.zarr/*/0/.zarray
failing_dataset_0.11.0.zarr/0/0/.zarray:  "dtype" : "|u1",
failing_dataset_0.11.0.zarr/1/0/.zarray:  "dtype" : "|u1",
failing_dataset_0.11.0.zarr/2/0/.zarray:  "dtype" : ">u2",
omero@ngff:/mnt/data/seb/raw2ometiff_151$ grep dtype failing_dataset_0.12.0rc3_v2.zarr/*/0/.zarray
failing_dataset_0.12.0rc3_v2.zarr/0/0/.zarray:  "dtype" : "|u1",
failing_dataset_0.12.0rc3_v2.zarr/1/0/.zarray:  "dtype" : "|u1",
failing_dataset_0.12.0rc3_v2.zarr/2/0/.zarray:  "dtype" : "<u2",
omero@ngff:/mnt/data/seb/raw2ometiff_151$ grep endian failing_dataset_0.12.0rc3_v3.zarr/*/0/zarr.json
failing_dataset_0.12.0rc3_v3.zarr/0/0/zarr.json:          "endian" : "little"
failing_dataset_0.12.0rc3_v3.zarr/0/0/zarr.json:          "endian" : "little"
failing_dataset_0.12.0rc3_v3.zarr/1/0/zarr.json:          "endian" : "little"
failing_dataset_0.12.0rc3_v3.zarr/1/0/zarr.json:          "endian" : "little"
failing_dataset_0.12.0rc3_v3.zarr/2/0/zarr.json:          "endian" : "little"
failing_dataset_0.12.0rc3_v3.zarr/2/0/zarr.json:          "endian" : "little"

Trying to convert these Zarr datasets both with raw2ometiff 0.9.0 and this PR yields the following result

raw2ometiff 0.9.0 raw2ometiff PR 151
failing_dataset_0.11.0.zarr fail (Endian mismatch in series {} (expected {}) pass
failing_dataset_0.12.0-rc3_v2.zarr pass fail (Endian mismatch in series 2 (expected false)
failing_dataset_0.12.0-rc3_v3.zarr fail ('.zgroup' expected but is not readable or missing in store) fail (Endian mismatch in series 2 (expected false)

While this fixes the behavior for the Zarr datasets converted with bioformats2raw 0.11.0, there seems to be another issue when converting with bioformats2raw 0.12.0rc3.

@melissalinkert
Copy link
Copy Markdown
Member Author

While this fixes the behavior for the Zarr datasets converted with bioformats2raw 0.11.0, there seems to be another issue when converting with bioformats2raw 0.12.0rc3

As discussed earlier, this appears to be the result of the mismatch between the OME-XML and Zarr array metadata that is fixed in glencoesoftware/bioformats2raw#317. If I convert the test dataset with glencoesoftware/bioformats2raw#317 using both --ngff-version 0.4 and --ngff-version 0.5, and then convert both output Zarrs using this PR, then I see no error.

@Yajing826, if you could please verify that the two PRs together resolve this issue with the following steps:

  1. convert the test dataset to Zarr with bioformats2raw 0.11.0
  2. convert the Zarr in step (1) to OME-TIFF with this PR
  3. convert the test dataset to Zarr with this PR and --ngff-version 0.4
  4. convert the test dataset to Zarr with this PR and --ngff-version 0.5
  5. convert the Zarr in step (3) to OME-TIFF with this PR
  6. convert the Zarr in step (4) to OME-TIFF with this PR
  7. check that the OME-TIFFs in steps (2), (5), and (6) are usable

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retested in the context of glencoesoftware/bioformats2raw#317 (review) and with the Zarr v2/v3 datasets generated with the build of bioformats2raw, everything is passing as expected.

raw2ometiff 0.9.0 raw2ometiff PR 151
failing_dataset_0.11.0.zarr fail (Endian mismatch in series {} (expected {}) pass
failing_dataset_0.12.0-rc3_v2.zarr pass fail (Endian mismatch in series 2 (expected false)
failing_dataset_0.12.0-rc3_v3.zarr fail ('.zgroup' expected but is not readable or missing in store) fail (Endian mismatch in series 2 (expected false)
failing_dataset_0.12.0-SNAPSHOT_v3.zarr pass pass
failing_dataset_0.12.0-SNAPSHOT_v3.zarr fail ('.zgroup' expected but is not readable or missing in store) pass

Copy link
Copy Markdown
Member

@Yajing826 Yajing826 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test instructions @melissalinkert!

I covered the testing scenarios as follows:

  1. convert test image to zarr using bioformats2raw-0.11.0
  2. convert test image to zarr using bioformats2raw-0.12.0-SNAPSHOT with --ngff-version 0.4
  3. convert test image to zarr using bioformats2raw-0.12.0-SNAPSHOT with --ngff-version 0.5

The output from all three scenarios were converted to ome.tiff using the present PR with raw2ometiff-0.10.0-SNAPSHOT, and loaded into OMERO Plus for visual inspection. All conversions ran fine without errors and all three outputs looked fine within OMERO Plus.

Copy link
Copy Markdown
Member

@erindiel erindiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving based on the functional testing documented here. 🫶

@sbesson sbesson merged commit 0d01215 into glencoesoftware:master Apr 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants