Skip to content

Add a --zarr-format CLI option#483

Merged
jeromekelleher merged 4 commits intosgkit-dev:mainfrom
tomwhite:zarr-format-cli-option
Apr 24, 2026
Merged

Add a --zarr-format CLI option#483
jeromekelleher merged 4 commits intosgkit-dev:mainfrom
tomwhite:zarr-format-cli-option

Conversation

@tomwhite
Copy link
Copy Markdown
Member

First draft of #482. Still needs a test for the new option. Also not sure whether to do other formats (plink etc) here or in follow ups.

Copy link
Copy Markdown
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I think we should stabilise on this as a long term interface now.

Comment thread bio2zarr/vcz.py Outdated
zarr_format_kwargs = dict(
zarr_format=self.zarr_format or zarr_utils.ZARR_FORMAT
)
root = zarr.open(store=self.path, mode="a", **zarr_format_kwargs)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a zarr_utils.open_zarr function which takes zarr_format as an argument to hide these details.

Comment thread bio2zarr/zarr_utils.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to

DEFAULT_ZARR_FORMAT = int(os.environ.get("BIO2ZARR_DEFAULT_ZARR_FORMAT", "2"))

and change the comment. This feels like a reasonable thing to do long-term so we can get rid of the caveat.

Comment thread bio2zarr/cli.py Outdated
"--zarr-format",
type=click.Choice([2, 3]),
default=None,
help="Zarr format version of output",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that this is 2 by default, but this behaviour can be changed by using the BIO2ZARR_DEFAULT_ZARR_FORMAT env var.

@jeromekelleher
Copy link
Copy Markdown
Member

Also not sure whether to do other formats (plink etc) here or in follow ups.

May as well do it in one go? Should be a straightforward change, and better to know now if there are any hidden complications.

Remove unnecessary monkeypatching

Rename ZARR_FORMAT to DEFAULT_ZARR_FORMAT and similarly for the env var

Add zarr_format to more signatures

Add open_zarr_append convenience function
Add plink zarr_format test

Add vcf zarr_format test
@tomwhite tomwhite force-pushed the zarr-format-cli-option branch from e447ed9 to c903d26 Compare April 24, 2026 10:37
@tomwhite tomwhite marked this pull request as ready for review April 24, 2026 10:41
@tomwhite
Copy link
Copy Markdown
Member Author

Thanks for the suggestions @jeromekelleher. I've implemented all of them and added some tests to check the zarr_format API explicitly.

Copy link
Copy Markdown
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple of minor suggestions.

Comment thread bio2zarr/vcz.py Outdated
def __init__(self, source_type, path, zarr_format=None):
self.source_type = source_type
self.path = pathlib.Path(path)
self.zarr_format = zarr_format or zarr_utils.DEFAULT_ZARR_FORMAT
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.zarr_format = zarr_format or zarr_utils.DEFAULT_ZARR_FORMAT
self.zarr_format = zarr_format

Let's keep that decision about what "None" means localised

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread bio2zarr/zarr_utils.py Outdated

def open_zarr_append(store, zarr_format):
"""Open a Zarr store for append using the given Zarr format"""
zarr_format_kwargs = dict(zarr_format=zarr_format or DEFAULT_ZARR_FORMAT)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happier if this was explicit, so that things like zarr_format=[] or zarr_format=False don't silently get turned into 2:

if zarr_format is None:
    zarr_format = DEFAULT_ZARR_FORMAT

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jeromekelleher jeromekelleher added this pull request to the merge queue Apr 24, 2026
Merged via the queue into sgkit-dev:main with commit 9c46ead Apr 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants