ODD PI 26.3 Objective 1: 🤖 Develop + Maintain the Virtual Zarr Ecosystem

### Motivation

Virtual stores deliver a single entrypoint to a dataset comprised of many files. For NASA datasets this enables:

* Less pre-processing to be “analysis-ready”.
* Users do not have to know about the underlying data format or storage location.
* Greater interoperability through a common API for reading, writing and analyzing complex and heterogeneous NASA datasets.
* Better performance and reduced costs as less data – only the data the user needs – is sent over the internet. 

This PI we have a number of parallel tasks to support the ecosystem of virtual zarr stores at NASA.

Each sub-task has its own motivation:

* **Task 1. Parse manifests back out of Icechunk:** The inability to modify or inspect virtual stores reduces Icechunk adoption, despite its reliability and performance benefits relative to Kerchunk. Risk mitigation from dependency on icechunk and its core maintainers.
* **Task 2. VirtualiZarr Maintenance:** We are core maintainers of VirtualiZarr, which is the library used to parse and write virtual stores. We reserve time to address bugs and questions as they arrive so the library is well-maintained.
* **Task 3. (Stretch) Design Virtual Data Cubes for NISAR and BIOMASS:** NISAR data delivery creates the first real opportunity to test this at scale; a POC now shapes the long-term architecture before patterns solidify.
* **Task 4. (Stretch) Provide bearer-token HTTP support in Icechunk** This is a direct request from PO.DAAC. They have many users will still rely on HTTPS access to datasets since they don't have the ability to work from us-west-2, where Earthdata cloud buckets are located. These users cannot use icechunk stores until there is bearer-token HTTP support in icechunk due to the need to pass along a token from Earthdata login.
    * See https://github.com/zarr-developers/VirtualiZarr/issues/978
* **Task 5. Virtual GEOS-CF maintenance and virtualizarr-data-pipelines** Continued maintenance of the GOES-CF (v2) dataset as it continuously produces new data. Requirements exposed from this work will also be propagated down into virtualizarr-data-pipelines to improve the template for producing pipelines to virtualize datasets.

### Sub-tasks (stand-in for acceptance criteria)

- [x] 1. Parse manifests back out of Icechunk
- [ ] 2. VirtualiZarr Maintenance
- [ ] 3. Stretch: Design Virtual Data Cubes for NISAR and BIOMASS: Deliver a proof-of-concept virtual data cube for NISAR and BIOMASS; define the path to a science-ready production workflow. Provide guidance to data producers/providers such as ASF/ESA on best practices for hosting data.
- [ ] 5. Virtual GEOS-CF maintenance & Virtualizarr-data-pipelines improvement

### LOE

1 FTE (for non-stretch tasks)

### Dependencies

MAAP for AC 3

### Related PRs

- [changes to address feedback, virtual-stores-feasibility-report#52](https://github.com/NASA-IMPACT/virtual-stores-feasibility-report/pull/52)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ODD PI 26.3 Objective 1: 🤖 Develop + Maintain the Virtual Zarr Ecosystem #346

Motivation

Sub-tasks (stand-in for acceptance criteria)

LOE

Dependencies

Related PRs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ODD PI 26.3 Objective 1: 🤖 Develop + Maintain the Virtual Zarr Ecosystem #346

Description

Motivation

Sub-tasks (stand-in for acceptance criteria)

LOE

Dependencies

Related PRs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions