You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user-guide/performance.md
+5-45Lines changed: 5 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -190,29 +190,7 @@ scenarios.
190
190
191
191
### Concurrent I/O operations
192
192
193
-
For latency-sensitve storage backends like HTTP and cloud object storage, Zarr uses asynchronous I/O internally to enable concurrent reads and writes across multiple chunks.
194
-
Concurrency is controlled at the **store level**. Many of the stores defined in `zarr-python` accept a concurrency limit on construction via the `concurrency_limit` parameter.
195
-
196
-
```python
197
-
import zarr
198
-
199
-
# Local filesystem store with custom concurrency limit
200
-
store = zarr.storage.LocalStore("data/my_array.zarr", concurrency_limit=64)
201
-
202
-
# Remote store with higher concurrency for network I/O
203
-
from obstore.store import S3Store
204
-
store = zarr.storage.ObjectStore(S3Store.from_url("s3://bucket/path"), concurrency_limit=128)
205
-
```
206
-
207
-
Higher concurrency values can improve throughput when:
208
-
- Working with remote storage (e.g., S3, GCS) where network latency is high
209
-
- Reading/writing many small chunks in parallel
210
-
- The storage backend can handle many concurrent requests
211
-
212
-
Lower concurrency values may be beneficial when:
213
-
- Working with local storage with limited I/O bandwidth
214
-
- Memory is constrained (each concurrent operation requires buffer space)
215
-
- Using Zarr within a parallel computing framework (see below)
193
+
For latency-sensitive storage backends like HTTP and cloud object storage, Zarr uses asynchronous I/O internally to enable concurrent reads and writes across multiple chunks. Zarr does not impose its own concurrency limits — storage backends are expected to manage their own concurrency constraints (e.g., connection pool sizes, rate limits). If you need to limit concurrency for a particular backend, configure it at the storage layer (e.g., via fsspec or obstore options).
216
194
217
195
### Thread pool size (`threading.max_workers`)
218
196
@@ -238,32 +216,21 @@ concurrently.
238
216
239
217
### Using Zarr with Dask
240
218
241
-
[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and the store's concurrency limit.
219
+
[Dask](https://www.dask.org/) is a popular parallel computing library that works well with Zarr for processing large arrays. When using Zarr with Dask, it's important to consider the interaction between Dask's thread pool and Zarr's internal thread pool.
242
220
243
-
**Important**: When using many Dask threads, you may need to reduce the store's `concurrency_limit` and Zarr's `threading.max_workers` setting to avoid creating too many concurrent operations. The total number of concurrent I/O operations can be roughly estimated as:
For example, if you're running Dask with 10 threads and a store concurrency limit of 64, you could potentially have up to 640 concurrent operations, which may overwhelm your storage system or cause memory issues.
250
-
251
-
**Recommendation**: When using Dask with many threads, configure concurrency settings:
221
+
**Recommendation**: When using Dask with many threads, reduce Zarr's internal thread pool to avoid thread contention:
252
222
253
223
```python
254
224
import zarr
255
225
import dask.array as da
256
226
257
-
# Create store with reduced concurrency limit for Dask workloads
258
-
store = zarr.storage.LocalStore("data/large_array.zarr", concurrency_limit=4)
259
-
260
-
# Also limit Zarr's internal thread pool
227
+
# Limit Zarr's internal thread pool
261
228
zarr.config.set({
262
229
'threading.max_workers': 4, # Limit Zarr's internal thread pool
263
230
})
264
231
265
232
# Open Zarr array
266
-
z = zarr.open_array(store=store, mode='r')
233
+
z = zarr.open_array("data/large_array.zarr", mode='r')
-`concurrency_limit` (per-store): Controls the maximum number of concurrent async I/O operations for a given store. Start with a lower value (e.g., 4-8) when using many Dask threads.
278
-
-`threading.max_workers` (global config): Controls Zarr's internal thread pool size for blocking operations (defaults to CPU count). Reduce this to avoid thread contention with Dask's scheduler.
279
-
280
-
You may need to experiment with different values to find the optimal balance for your workload. Monitor your system's resource usage and adjust these settings based on whether your storage system or CPU is the bottleneck.
281
-
282
242
### Thread safety and process safety
283
243
284
244
Zarr arrays are designed to be thread-safe for concurrent reads and writes from multiple threads within the same process. However, proper synchronization is required when writing to overlapping regions from multiple threads.
0 commit comments