Skip to content

Allow use of obstore as a Zarr store#723

Merged
tomwhite merged 1 commit intomainfrom
zarr-v3-obstore
Jun 27, 2025
Merged

Allow use of obstore as a Zarr store#723
tomwhite merged 1 commit intomainfrom
zarr-v3-obstore

Conversation

@tomwhite
Copy link
Copy Markdown
Member

Fixes #715

All tests pass with

CUBED_SPEC__STORAGE_OPTIONS__USE_OBSTORE=True pytest

But note this doesn't use obstore for all tests, since many have hardcoded Spec objects.

@tomwhite tomwhite marked this pull request as ready for review June 17, 2025 15:11
@tomwhite
Copy link
Copy Markdown
Member Author

I ran the benchmarks on a local machine with an SSD, using the following local_processes_obstore.yaml:

spec:
  work_dir: "$PWD/temp/"
  allowed_mem: "2GB"
  executor_name: "processes"
  storage_options:
    use_obstore: True
export CUBED_CONFIG=tests/configs/local_processes_obstore.yaml
pytest -vs --benchmark

The time was essentially the same as #492 (comment) (and for plain Zarr Python v3 - which again had similar numbers)

┌─────────────────────────────────────────────────┬────────────────────────────┬────────────────────┐
│                      name                       │           start            │      duration      │
│                     varchar                     │         timestamp          │       double       │
├─────────────────────────────────────────────────┼────────────────────────────┼────────────────────┤
│ test_quadratic_means_xarray[50-new-optimizer]   │ 2025-06-27 11:14:36.641891 │ 2.9726290702819824 │
│ test_quadratic_means_xarray[500-new-optimizer]  │ 2025-06-27 11:15:26.2486   │   30.7053439617157 │
│ test_quadratic_means_xarray[5000-new-optimizer] │ 2025-06-27 11:17:21.658876 │  291.1440191268921 │
├─────────────────────────────────────────────────┼────────────────────────────┼────────────────────┤

It will be interesting to see if obstore does better on high-latency stores like S3.

@tomwhite tomwhite merged commit 2c04fda into main Jun 27, 2025
20 checks passed
@tomwhite
Copy link
Copy Markdown
Member Author

tomwhite commented Sep 3, 2025

I just re-ran this with the latest code (obstore 0.8.1), and using the obstore Zarr store is now 12% faster than the default local store on this benchmark (252.9s vs 287.7s).

┌─────────────────────────────────────────────────┬────────────────────────────┬────────────────────┐
│                      name                       │           start            │      duration      │
│                     varchar                     │         timestamp          │       double       │
├─────────────────────────────────────────────────┼────────────────────────────┼────────────────────┤
│ test_quadratic_means_xarray[5000-new-optimizer] │ 2025-09-03 10:44:27.731067 │  287.7307958602905 │
│ test_quadratic_means_xarray[5000-new-optimizer] │ 2025-09-03 10:52:01.361027 │ 252.89893579483032 │
└─────────────────────────────────────────────────┴────────────────────────────┴────────────────────┘

@tomwhite
Copy link
Copy Markdown
Member Author

tomwhite commented Sep 3, 2025

See #731 (comment) for comparison with zarrs-python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use obstore for intermediate storage

1 participant