Skip to content

Commit abbc4e5

Browse files
committed
add numpy array encoding/decoding benchmark
1 parent d9b1b93 commit abbc4e5

File tree

2 files changed

+32
-2
lines changed

2 files changed

+32
-2
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,13 +103,13 @@ newdata
103103
```
104104

105105
PyJData supports multiple N-D array data compression/decompression methods (i.e. codecs), similar
106-
to HDF5 filters. The currently supported filters include `zlib`, `gzip`, `lz4`, `lzma`, and various
106+
to HDF5 filters. Currently supported codecs include `zlib`, `gzip`, `lz4`, `lzma`, `base64` and various
107107
`blosc2` compression methods, including `blosc2blosclz`, `blosc2lz4`, `blosc2lz4hc`, `blosc2zlib`,
108108
`blosc2zstd`. To apply a selected compression method, one simply set `{'compression':'method'}` as
109109
the option to `jdata.encode` or `jdata.save` function; `jdata.load` or `jdata.decode` automatically
110110
decompress the data based on the `_ArrayZipType_` annotation present in the data. Only `blosc2`
111111
compression methods support multi-threading. To set the thread number, one should define a `nthread`
112-
value in the option for both encoding and decoding.
112+
value in the option (`opt`) for both encoding and decoding.
113113

114114

115115
## Utility

test/benchcodecs.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import jdata as jd
2+
import numpy as np
3+
import time
4+
import os
5+
6+
print("jdata version:" + jd.__version__)
7+
8+
codecs = ["zlib", "lzma", "lz4", "blosc2blosclz", "blosc2lz4", "blosc2lz4hc", "blosc2zlib", "blosc2zstd"]
9+
10+
def benchmark(codec, x):
11+
t0 = time.time()
12+
jd.save(x, "matrix_" + codec + suffix, {"compression": codec, "nthread": 8})
13+
dt = time.time() - t0 # saving time
14+
res = {"codec": codec, "save": dt}
15+
y = jd.load("matrix_" + codec + suffix, {"nthread": 8}) # loading
16+
res["load"] = time.time() - t0 - dt # loading time
17+
res["size"] = os.path.getsize("matrix_" + codec + suffix)
18+
res["sum"] = y.sum()
19+
print(res)
20+
return res
21+
22+
23+
x = np.eye(10000)
24+
suffix = '.jdb'
25+
res = list(map(benchmark, codecs, [x] * len(codecs)))
26+
# print(np.array(res))
27+
28+
suffix = '.jdt'
29+
res = list(map(benchmark, codecs, [x] * len(codecs)))
30+
# print(np.array(res))

0 commit comments

Comments
 (0)