You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make lazy parsing (defer_iteration_parsing) more discoverable by giving a hint on the command line (#1802)
* Base impl: Hint when parsing takes too long
Only file-based so far
* Factor out timeout functionality
* Do this also for g/v-encodings
* Env variables, better warning message
* Add some docs
* Adapt examles
* Revert dataframe example
It accesses all Iterations at once
* Dataframe: Use snapshots API for iterating
To support Streaming workflows
* Formatting
* Fix extra quote
* Inline Comment: Seconds
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Formatting
---------
Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/source/details/backendconfig.rst
+28Lines changed: 28 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,6 +94,34 @@ Using the Streaming API (i.e. ``SeriesInterface::readIteration()``) will do this
94
94
Parsing eagerly might be very expensive for a Series with many iterations, but will avoid bugs by forgotten calls to ``Iteration::open()``.
95
95
In complex environments, calling ``Iteration::open()`` on an already open environment does no harm (and does not incur additional runtime cost for additional ``open()`` calls).
96
96
97
+
By default, the library will print a warning to suggest using deferred Iteration parsing when opening a Series takes long.
98
+
The timeout can be tuned by the JSON/TOML key ``hint_lazy_parsing_timeout`` (integer, seconds):
99
+
if set to a positive value, the library will print periodic warnings to stderr when eager parsing of Iterations takes longer than the specified number of seconds (default: ``20``). Setting this option to ``0`` disables the warnings.
100
+
101
+
Environment variables may alternatively be used for options concerning deferred iteration parsing:
102
+
103
+
* Environment variable ``OPENPMD_DEFER_ITERATION_PARSING``: if set to a truthy value (e.g. ``1``), the Series will be opened with deferred iteration parsing as if ``{"defer_iteration_parsing": true}`` had been supplied.
104
+
* Environment variable ``OPENPMD_HINT_LAZY_PARSING_TIMEOUT``: accepts integral values equivalent to the ``hint_lazy_parsing_timeout`` key.
105
+
106
+
Examples:
107
+
108
+
.. code-block:: bash
109
+
110
+
# enable lazy parsing via env var
111
+
export OPENPMD_DEFER_ITERATION_PARSING=1
112
+
113
+
# disable the parsing hint/warning
114
+
export OPENPMD_HINT_LAZY_PARSING_TIMEOUT=0
115
+
116
+
Or in a Series constructor JSON/TOML configuration:
117
+
118
+
.. code-block:: json
119
+
120
+
{
121
+
"defer_iteration_parsing": true,
122
+
"hint_lazy_parsing_timeout": 20
123
+
}
124
+
97
125
The key ``resizable`` can be passed to ``Dataset`` options.
98
126
It if set to ``{"resizable": true}``, this declares that it shall be allowed to increased the ``Extent`` of a ``Dataset`` via ``resetDataset()`` at a later time, i.e., after it has been first declared (and potentially written).
99
127
For HDF5, resizable Datasets come with a performance penalty.
0 commit comments