You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guide/performance.md
+21-19Lines changed: 21 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -191,20 +191,18 @@ scenarios.
191
191
### Concurrent I/O operations
192
192
193
193
Zarr uses asynchronous I/O internally to enable concurrent reads and writes across multiple chunks.
194
-
The level of concurrency is controlled by the `async.concurrency` configuration setting, which
195
-
determines the maximum number of concurrent I/O operations.
196
-
197
-
The default value is 10, which is a conservative value. You may get improved performance by tuning
198
-
the concurrency limit. You can adjust this value based on your specific needs:
194
+
Concurrency is controlled at the **store level** — each store instance can have its own concurrency
195
+
limit, set via the `concurrency_limit` parameter when creating the store.
199
196
200
197
```python
201
198
import zarr
202
199
203
-
#Set concurrency for the current session
204
-
zarr.config.set({'async.concurrency': 128})
200
+
#Local filesystem store with custom concurrency limit
201
+
store =zarr.storage.LocalStore("data/my_array.zarr", concurrency_limit=64)
205
202
206
-
# Or use environment variable
207
-
# export ZARR_ASYNC_CONCURRENCY=128
203
+
# Remote store with higher concurrency for network I/O
204
+
from obstore.store import S3Store
205
+
store = zarr.storage.ObjectStore(S3Store.from_url("s3://bucket/path"), concurrency_limit=128)
208
206
```
209
207
210
208
Higher concurrency values can improve throughput when:
@@ -217,32 +215,36 @@ Lower concurrency values may be beneficial when:
217
215
- Memory is constrained (each concurrent operation requires buffer space)
218
216
- Using Zarr within a parallel computing framework (see below)
219
217
218
+
Set `concurrency_limit=None` to disable the concurrency limit entirely.
219
+
220
220
### Using Zarr with Dask
221
221
222
-
[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and Zarr's concurrency settings.
222
+
[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and the store's concurrency limit.
223
223
224
-
**Important**: When using many Dask threads, you may need to reduce both Zarr's `async.concurrency` and `threading.max_workers`settings to avoid creating too many concurrent operations. The total number of concurrent I/O operations can be roughly estimated as:
224
+
**Important**: When using many Dask threads, you may need to reduce the store's `concurrency_limit` and Zarr's `threading.max_workers`setting to avoid creating too many concurrent operations. The total number of concurrent I/O operations can be roughly estimated as:
For example, if you're running Dask with 10 threads and Zarr's default concurrency of 64, you could potentially have up to 640 concurrent operations, which may overwhelm your storage system or cause memory issues.
230
+
For example, if you're running Dask with 10 threads and a store concurrency limit of 64, you could potentially have up to 640 concurrent operations, which may overwhelm your storage system or cause memory issues.
231
231
232
-
**Recommendation**: When using Dask with many threads, configure Zarr's concurrency settings:
232
+
**Recommendation**: When using Dask with many threads, configure concurrency settings:
233
233
234
234
```python
235
235
import zarr
236
236
import dask.array as da
237
237
238
-
# If using Dask with many threads (e.g., 8-16), reduce Zarr's concurrency settings
238
+
# Create store with reduced concurrency limit for Dask workloads
239
+
store = zarr.storage.LocalStore("data/large_array.zarr", concurrency_limit=4)
'threading.max_workers': 4, # Limit Zarr's internal thread pool
242
244
})
243
245
244
246
# Open Zarr array
245
-
z = zarr.open_array('data/large_array.zarr', mode='r')
247
+
z = zarr.open_array(store=store, mode='r')
246
248
247
249
# Create Dask array from Zarr array
248
250
arr = da.from_array(z, chunks=z.chunks)
@@ -253,8 +255,8 @@ result = arr.mean(axis=0).compute()
253
255
254
256
**Configuration guidelines for Dask workloads**:
255
257
256
-
-`async.concurrency`: Controls the maximum number of concurrent async I/O operations. Start with a lower value (e.g., 4-8) when using many Dask threads.
257
-
-`threading.max_workers`: Controls Zarr's internal thread pool size for blocking operations (defaults to CPU count). Reduce this to avoid thread contention with Dask's scheduler.
258
+
-`concurrency_limit` (per-store): Controls the maximum number of concurrent async I/O operations for a given store. Start with a lower value (e.g., 4-8) when using many Dask threads.
259
+
-`threading.max_workers` (global config): Controls Zarr's internal thread pool size for blocking operations (defaults to CPU count). Reduce this to avoid thread contention with Dask's scheduler.
258
260
259
261
You may need to experiment with different values to find the optimal balance for your workload. Monitor your system's resource usage and adjust these settings based on whether your storage system or CPU is the bottleneck.
0 commit comments