Skip to content

Make consolidated metadata optional#449

Closed
tomwhite wants to merge 5 commits intosgkit-dev:mainfrom
tomwhite:vcf_zarr_exists
Closed

Make consolidated metadata optional#449
tomwhite wants to merge 5 commits intosgkit-dev:mainfrom
tomwhite:vcf_zarr_exists

Conversation

@tomwhite
Copy link
Copy Markdown
Member

@tomwhite tomwhite commented Feb 10, 2026

Some Zarr stores don't support consolidated metadata (e.g. Icechunk), or some users may not want to use it since it can get out of date with the data in the store.

Fixes #276

)

consolidate_metadata = click.option(
"--consolidate-metadata",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To turn off consolidated metadate you would do:

vcf2zarr convert --consolidate-metadata=false ...

But perhaps this is better (more consistent with other options):

vcf2zarr convert --no-consolidated-metadata ...

Thoughts?

@coveralls
Copy link
Copy Markdown
Collaborator

Coverage Status

coverage: 97.3% (+0.009%) from 97.291%
when pulling 98d637f on tomwhite:vcf_zarr_exists
into cdd4c85 on sgkit-dev:main.

@jeromekelleher
Copy link
Copy Markdown
Member

I think it's worth stepping back here an asking why we consolidate metadata at all - I think it's just for xarray support? I'd be happier dropping it entirely tbh.

@tomwhite
Copy link
Copy Markdown
Member Author

It can be useful for reducing latency for metadata operations with cloud stores when there's a large number of groups - something that xarray does benefit from, which is discussed a bit here: zarr-developers/zarr-python#3119.

That said, it's easy enough to add it (or remove it) from a VCZ store if it's needed (or not). So I don't have a strong opinion either way.

@jeromekelleher
Copy link
Copy Markdown
Member

I think it's probably redundant as far as bio2zarr is concerned and VCZ should be agnostic to whether it's there or not. My recollection is that I added it in the early days just to get sgkit/xarray support working, and never really examined the assumption. If a particular client needs the consolidated metadata then I think they should be responsible for managing it (which as you say is easy).

It would be a breaking change here that older versions of bio2zarr wouldn't support newer VCZ repos, but I think that's a pretty minor problem in practise.

Does vcztools require consolidated metadata?

@tomwhite
Copy link
Copy Markdown
Member Author

Does vcztools require consolidated metadata?

No it doesn't (I just tried it without).

I'll rework this PR (or create a new one) to remove.

@jeromekelleher
Copy link
Copy Markdown
Member

Great, thanks! Making things simpler is good.

@tomwhite
Copy link
Copy Markdown
Member Author

Closing in favour of #450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inspect fails for datasets with out consolidated metadata

3 participants