Follow-up benchmarks for kerchunk vs. NetCDF across file-count ranges, frequency, and access patterns

## Summary

Add follow-up benchmarks to better understand when kerchunk becomes preferable to the native NetCDF engine for our CMIP use case.

## Goals

- Refine the crossover point where kerchunk starts to outperform NetCDF
- Test whether daily-frequency datasets change the result
- Validate expectations for repeated access and remote data access

## Benchmark additions

### 1. File-count bins near the crossover
Test datasets in narrower file-count ranges:

- 25-49
- 50-99
- 100-149
- 150-199
- 200-299
- 300-499
- 500+

### 2. Higher-frequency datasets
Add daily-frequency datasets in addition to the current cases.

### 3. Repeated access
Measure both:
- first open
- repeated open of the same dataset

### 4. Remote access
Run the same comparisons for remote data access.

## Dataset sampling

Use 3 datasets per file-count bin for the initial benchmark pass to keep the batch job within a reasonable runtime. If results are noisy or the crossover remains unclear, follow up with additional samples in the most relevant bins.

## Operations to compare

- open
- load
- temporal average
- spatial average
- subset

## Notes

Current results suggest:
- NetCDF is generally favored when data is colocated with compute at NERSC
- kerchunk becomes more attractive as file counts increase
- remote access still needs to be tested

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow-up benchmarks for kerchunk vs. NetCDF across file-count ranges, frequency, and access patterns #62

Summary

Goals

Benchmark additions

1. File-count bins near the crossover

2. Higher-frequency datasets

3. Repeated access

4. Remote access

Dataset sampling

Operations to compare

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Follow-up benchmarks for kerchunk vs. NetCDF across file-count ranges, frequency, and access patterns #62

Description

Summary

Goals

Benchmark additions

1. File-count bins near the crossover

2. Higher-frequency datasets

3. Repeated access

4. Remote access

Dataset sampling

Operations to compare

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions