Skip to content

Commit 52cc164

Browse files
HDF5 filters (compression) (#1644)
* Preparation work: Refactoring * Basic compression/filtering in HDF5 * Configure generic filters via JSON object * Full support for the set_filter API * Fix: captured structured bindings are a C++20 extension * Refactoring to satisfy the Github bot * Fix includes * Switch to JSON config for NVidia compiler's benefit * Verbose CI debugging lets goo * Revert "Verbose CI debugging lets goo" This reverts commit abefc3a. * Use Blosc2 filter not yet integrated into CI * Add compression example * Add HDF5-Blosc2 to some Linux workflow * Update .github/workflows/dependencies/install_hdf5_blosc2 * Add Python example * Some documentation fixes * Fix install_hdf5_blosc2 script * Complete examples * ADIOS2 shorthand: dataset.operators may also be a single element * Fix indentation * Fix patch URL * Update documentation and tests for ADIOS2 * Deactivate tests for HDF5-Blosc2 * Add documentation * Some more consistency in examples * Install with sudo rights * Erase unnecessary line from example * Fix datatypes in Python example * Use CMake flag directly... * Reset extended write example to dev Compression example is moved to 15_compression now * Do we need -L/usr/local/lib ?? * Try if HDF5 finds the filter on its own... * Ok that works, so cleanup * Explicitly set chunks = "auto" * CI fixes * Add HDF5-Blosc2 to further CI runs * Add hdf5plugin to some Python runs * Skip patch in Clang runs * Fix includes * Fixes * Further fixes * Remove blosc filter from some runs again This is too bothersome to set up and the runs that we have are enough. * Add missing dataset definition * Pull the Blosc2 stuff down in the example file * Ditch self-compiled Blosc2 plugin, use hdf5plugin package * CI fixes * Try installing the deb package for h5pl... * tmp: check if python example for hdf5+blosc2 runs * fixes * Move hdf5plugin Python tests to other runs * Revert "tmp: check if python example for hdf5+blosc2 runs" This reverts commit b81437b. * .... * ... * Install hdf5plugin into venv * Remove CI debugging * Cleanup
1 parent b567766 commit 52cc164

13 files changed

Lines changed: 1118 additions & 152 deletions

File tree

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env bash
2+
3+
version_major=1.14
4+
version_minor=6
5+
build_var=ubuntu-2404_gcc
6+
7+
cd /opt
8+
wget "https://github.com/HDFGroup/hdf5_plugins/releases/download/hdf5-${version_major}.${version_minor}/hdf5_plugins-${version_major}-${build_var}.deb" >&2
9+
sudo dpkg -i "hdf5_plugins-${version_major}-${build_var}.deb" >&2
10+
rm "hdf5_plugins-${version_major}-${build_var}.deb"
11+
echo "/HDF_Group/HDF5/${version_major}.${version_minor}/lib/plugin/"

.github/workflows/linux.yml

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,12 @@ jobs:
9797
sudo apt-get update
9898
sudo apt-get install clang-11 gfortran libopenmpi-dev python3
9999
sudo .github/workflows/dependencies/install_spack
100+
100101
- name: Build
101102
env: {CC: clang-11, CXX: clang++-11, CXXFLAGS: -Werror}
102103
run: |
104+
# Use this to make the HDF5 plugins available from the C/C++ API.
105+
export HDF5_PLUGIN_PATH="$(sudo -E .github/workflows/dependencies/install_hdf5_plugins)"
103106
sudo ln -s "$(which cmake)" /usr/bin/cmake
104107
eval $(spack env activate --sh .github/ci/spack-envs/clang11_nopy_ompi_h5_ad2/)
105108
spack install
@@ -172,16 +175,20 @@ jobs:
172175
run: |
173176
sudo apt-get update
174177
sudo apt-get remove openmpi* libopenmpi* *hdf5* || true
175-
sudo apt-get install g++ gfortran python3
178+
sudo apt-get install g++ gfortran python3 python3-venv
179+
176180
sudo .github/workflows/dependencies/install_spack
177181
182+
178183
# Need to build this manually due to broken MPICH package in Ubuntu 24.04
179184
# https://bugs.launchpad.net/ubuntu/+source/mpich/+bug/2072338
180185
sudo .github/workflows/dependencies/install_mpich
181186
182187
- name: Build
183188
env: {CC: gcc, CXX: g++, MPICH_CC: gcc, MPICH_CXX: g++, CXXFLAGS: -Werror}
184189
run: |
190+
# Use this to make the HDF5 plugins available from the C/C++ API.
191+
export HDF5_PLUGIN_PATH="$(sudo -E .github/workflows/dependencies/install_hdf5_plugins)"
185192
cmake --version
186193
mpiexec --version
187194
mpicxx --version
@@ -190,9 +197,13 @@ jobs:
190197
eval $(spack env activate --sh .github/ci/spack-envs/gcc13_py312_mpich_h5_ad2/)
191198
spack install
192199
200+
python -m venv venv
201+
source venv/bin/activate
202+
pip install mpi4py numpy hdf5plugin
203+
193204
share/openPMD/download_samples.sh build
194205
cmake -S . -B build \
195-
-DopenPMD_USE_PYTHON=OFF \
206+
-DopenPMD_USE_PYTHON=ON \
196207
-DopenPMD_USE_MPI=ON \
197208
-DopenPMD_USE_HDF5=ON \
198209
-DopenPMD_USE_ADIOS2=ON \
@@ -238,6 +249,8 @@ jobs:
238249
- name: Build
239250
env: {CC: gcc-12, CXX: g++-12, CXXFLAGS: -Werror}
240251
run: |
252+
# Use this to make the HDF5 plugins available from the C/C++ API.
253+
export HDF5_PLUGIN_PATH="$(sudo -E .github/workflows/dependencies/install_hdf5_plugins)"
241254
sudo ln -s "$(which cmake)" /usr/bin/cmake
242255
eval $(spack env activate --sh .github/ci/spack-envs/gcc12_py36_ompi_h5_ad2/)
243256
spack install
@@ -248,7 +261,8 @@ jobs:
248261
-DopenPMD_USE_MPI=ON \
249262
-DopenPMD_USE_HDF5=ON \
250263
-DopenPMD_USE_ADIOS2=ON \
251-
-DopenPMD_USE_INVASIVE_TESTS=ON
264+
-DopenPMD_USE_INVASIVE_TESTS=ON \
265+
-DCMAKE_VERBOSE_MAKEFILE=ON
252266
cmake --build build --parallel 4
253267
ctest --test-dir build --output-on-failure
254268
@@ -261,6 +275,7 @@ jobs:
261275
run: |
262276
sudo apt-get update
263277
sudo apt-get install g++ libopenmpi-dev libhdf5-openmpi-dev python3 python3-numpy python3-mpi4py python3-pandas python3-h5py-mpi python3-pip
278+
python3 -m pip install jsonschema==4.* referencing
264279
# TODO ADIOS2
265280
- name: Build
266281
env: {CXXFLAGS: -Werror, PKG_CONFIG_PATH: /usr/lib/x86_64-linux-gnu/pkgconfig}
@@ -278,7 +293,6 @@ jobs:
278293
cmake --build build --parallel 4
279294
ctest --test-dir build --output-on-failure
280295
281-
python3 -m pip install jsonschema==4.* referencing
282296
cd share/openPMD/json_schema
283297
PATH="../../../build/bin:$PATH" make -j 2
284298
# We need to exclude the thetaMode example since that has a different

.github/workflows/tooling.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ jobs:
2222
sudo apt-get install clang clang-tidy gfortran libopenmpi-dev python-is-python3
2323
SPACK_VER=1.0.1 sudo -E .github/workflows/dependencies/install_spack
2424
echo "SPACK VERSION: $(spack --version)"
25+
26+
# Use this to make the HDF5 plugins available from the C/C++ API.
27+
export HDF5_PLUGIN_PATH="$(sudo -E .github/workflows/dependencies/install_hdf5_plugins)"
28+
echo "$HDF5_PLUGIN_PATH"
29+
ls "$HDF5_PLUGIN_PATH"
2530
- name: Build
2631
env: {CC: clang, CXX: clang++}
2732
run: |
@@ -52,6 +57,11 @@ jobs:
5257
sudo apt-get install clang-19 libc++-dev libc++abi-dev python3 gfortran libopenmpi-dev python3-numpy
5358
SPACK_VER=1.0.1 sudo -E .github/workflows/dependencies/install_spack
5459
echo "SPACK VERSION: $(spack --version)"
60+
61+
# Use this to make the HDF5 plugins available from the C/C++ API.
62+
export HDF5_PLUGIN_PATH="$(sudo -E .github/workflows/dependencies/install_hdf5_plugins)"
63+
echo "$HDF5_PLUGIN_PATH"
64+
ls "$HDF5_PLUGIN_PATH"
5565
- name: Build
5666
env: {CC: mpicc, CXX: mpic++, OMPI_CC: clang-19, OMPI_CXX: clang++-19, CXXFLAGS: -Werror, OPENPMD_HDF5_CHUNKS: none, OPENPMD_TEST_NFILES_MAX: 100}
5767
run: |

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,6 +718,7 @@ set(openPMD_EXAMPLE_NAMES
718718
12_span_write
719719
13_write_dynamic_configuration
720720
14_toml_template
721+
15_compression
721722
)
722723
set(openPMD_PYTHON_EXAMPLE_NAMES
723724
2_read_serial
@@ -734,6 +735,7 @@ set(openPMD_PYTHON_EXAMPLE_NAMES
734735
11_particle_dataframe
735736
12_span_write
736737
13_write_dynamic_configuration
738+
15_compression
737739
)
738740

739741
if(openPMD_USE_INVASIVE_TESTS)

docs/source/backends/hdf5.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,19 @@ Virtual file drivers are configured via JSON/TOML.
2525
Refer to the page on :ref:`JSON/TOML configuration <backendconfig-hdf5>` for further details.
2626

2727

28+
Filters (compression)
29+
*********************
30+
31+
HDF5 supports so-called filters for transformations such as compression on datasets.
32+
These can be permanent (applied to an entire dataset) and transient (applied to individual I/O operations).
33+
The openPMD-api currently supports permanent filters.
34+
Pipelines of multiple subsequent filters are supported.
35+
Refer also to `this documentation <https://web.ics.purdue.edu/~aai/HDF5/html/Filters.html>`_.
36+
37+
Filters are applied via :ref:`JSON/TOML configuration <backendconfig-hdf5>`, see there for detailed instructions on how to apply filters.
38+
There are also extended examples on how to apply compression options to ADIOS2 and HDF5 in the examples: `Python <https://github.com/openPMD/openPMD-api/blob/dev/examples/15_compression.py>`_ / `C++ <https://github.com/openPMD/openPMD-api/blob/dev/examples/15_compression.cpp>`_.
39+
40+
2841
Backend-Specific Controls
2942
-------------------------
3043

docs/source/details/backendconfig.rst

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,8 @@ Explanation of the single keys:
185185
Additionally, specifying ``"disk_override"``, ``"buffer_override"`` or ``"new_step_override"`` will take precedence over options specified without the ``_override`` suffix, allowing to invert the normal precedence order.
186186
This way, a data producing code can hardcode the preferred flush target per ``flush()`` call, but users can e.g. still entirely deactivate flushing to disk in the ``Series`` constructor by specifying ``preferred_flush_target = buffer_override``.
187187
This is useful when applying the asynchronous IO capabilities of the BP5 engine.
188-
* ``adios2.dataset.operators``: This key contains a list of ADIOS2 `operators <https://adios2.readthedocs.io/en/latest/components/components.html#operator>`_, used to enable compression or dataset transformations.
189-
Each object in the list has two keys:
188+
* ``adios2.dataset.operators``: This key contains either a single ADIOS2 `operator <https://adios2.readthedocs.io/en/latest/components/components.html#operator>`_ or a list of operators, used to enable compression or dataset transformations.
189+
Each operator is an object with two keys:
190190

191191
* ``type`` supported ADIOS operator type, e.g. zfp, sz
192192
* ``parameters`` is an associative map of string parameters for the operator (e.g. compression levels)
@@ -247,6 +247,24 @@ Explanation of the single keys:
247247
An explicit chunk size can be specified as a list of positive integers, e.g. ``hdf5.dataset.chunks = [10, 100]``. Note that this specification should only be used per-dataset, e.g. in ``resetDataset()``/``reset_dataset()``.
248248

249249
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.
250+
* ``hdf5.datasets.permanent_filters``: Either a single HDF5 permanent filter specification or a list of HDF5 permanent filter specifications.
251+
Each filter specification is a JSON/TOML object, but there are multiple options:
252+
253+
* Zlib: The Zlib filter has a distinct API in HDF5 and the configuration for Zlib in openPMD is hence also different. It is activated by the mandatory key ``type = "zlib"`` and configured by the optional integer key ``aggression``.
254+
Example: ``{"type": "zlib", "aggression": 5}``.
255+
* Filters identified by their global ID `registered with the HDF group <https://github.com/HDFGroup/hdf5_plugins/blob/master/docs/RegisteredFilterPlugins.md>`_.
256+
They are activated by the mandatory integer key ``id`` containing this global ID.
257+
All other keys are optional:
258+
259+
* ``type = "by_id"`` may optionally be specified for clarity and consistency.
260+
* The string key ``flags`` can take the values ``"mandatory"`` or ``"optional"``, indicating if HDF5 should abort execution if the filter cannot be applied for some reason.
261+
* The key ``cd_values`` points to a list of nonnegative integers.
262+
These are filter-specific configuration options.
263+
Refer to the specific filter's documentation.
264+
265+
Alternatively to an integer ID, the key ``id`` may also be of string type, identifying one of the six builtin filters of HDF5: ``"deflate", "shuffle", "fletcher32", "szip", "nbit", "scaleoffset"``.
266+
267+
250268
* ``hdf5.vfd.type`` selects the HDF5 virtual file driver.
251269
Currently available are:
252270

examples/13_write_dynamic_configuration.cpp

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ type = "bp4"
4747
4848
# ADIOS2 allows adding several operators
4949
# Lists are given in TOML by using double brackets
50+
# For specifying a single operator only, the list may be skipped.
5051
[[adios2.dataset.operators]]
5152
type = "zlib"
5253
@@ -192,14 +193,12 @@ CFG.CHUNKS = [10]
192193
"resizable": true,
193194
"adios2": {
194195
"dataset": {
195-
"operators": [
196-
{
197-
"type": "zlib",
198-
"parameters": {
199-
"clevel": 9
200-
}
196+
"operators": {
197+
"type": "zlib",
198+
"parameters": {
199+
"clevel": 9
201200
}
202-
]
201+
}
203202
}
204203
}
205204
})END";

examples/13_write_dynamic_configuration.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
3232
# ADIOS2 allows adding several operators
3333
# Lists are given in TOML by using double brackets
34+
# For specifying a single operator only, the list may be skipped.
3435
[[adios2.dataset.operators]]
3536
type = "zlib"
3637
@@ -106,12 +107,12 @@ def main():
106107
}
107108
}
108109
config['adios2']['dataset'] = {
109-
'operators': [{
110+
'operators': {
110111
'type': 'zlib',
111112
'parameters': {
112113
'clevel': 9
113114
}
114-
}]
115+
}
115116
}
116117

117118
temperature = iteration.meshes["temperature"]

0 commit comments

Comments
 (0)