Skip to content

Support to use zarr.sync.ProcessSynchronizer(path) with S3 as path #1224

Description

@vietnguyengit

Hi everyone,

I've been digging around to see if there's already an existing way to use zarr.sync.ProcessSynchronizer(path) with S3 as path, but no luck.

My scenario is I have a Lambda function that listens to S3 events and writes NetCDF files to a Zarr store (on S3), each Lambda call will process one NetCDF file.

As Lambda is a distributed system, 10 new files uploaded will trigger 10 different processes that try to write to the Zarr store pretty much at the same time, and I experience some data corruption issues.

Using zarr.sync.ProcessSynchronizer() in xarray.dataset.to_zarr(synchronizer=...) for DirectoryStore seems to solve this write consistency issue.

But storing Zarr store on S3 is important to us, and cloud-optimised format like Zarr should be able to fully support S3. So I wonder if this is a bug or a non-existing feature or I just don't know it yet.

Please advise.

Thanks everyone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions