Skip to content

Commit 3588a3d

Browse files
authored
Merge pull request #1411 from PowerGridModel/feature/documentation-consistency
Documentation: dataset teminology consistency
2 parents f849516 + f927311 commit 3588a3d

3 files changed

Lines changed: 133 additions & 47 deletions

File tree

docs/user_manual/dataset-terminology.md

Lines changed: 94 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,92 @@ attribute.
1111
For detailed data types used throughout `power-grid-model`, please refer to
1212
[Python API Reference](../api_reference/python-api-reference.md).
1313

14+
## Buffer Type
15+
16+
Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based.
17+
18+
### Row (row-based, row-major)
19+
20+
Attributes of the same component are stored contiguously before moving to the next component.
21+
22+
### Columnar (column-based, column-major)
23+
24+
Attributes are grouped across components by attribute type.
25+
26+
## Buffer Representation
27+
28+
Defines whether component data can be interpreted as a dense 2D matrix.
29+
30+
### Dense
31+
32+
Dense buffers represent data as a rectangular matrix.
33+
This representation implies that all scenarios contain the same number of component entries.
34+
35+
### Sparse
36+
37+
Component data is stored as a flattened 1D buffer.
38+
39+
Scenario boundaries are defined using an index pointer (`indptr`).
40+
The `indptr` defines how the flattened buffer is segmented into per-scenario ranges.
41+
42+
Sparse buffers may be either uniform or non-uniform.
43+
44+
## Component Dataset Independency
45+
46+
Defines whether all scenarios operate on the same component IDs.
47+
48+
### Independent
49+
50+
All scenarios modify the same component IDs in the same order.
51+
52+
Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore
53+
a reset is required between scenarios.
54+
55+
### Dependent
56+
57+
Different scenarios may modify different components.
58+
59+
Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore
60+
a reset is required between scenarios.
61+
62+
## Component Data Uniformity
63+
64+
Defines whether all scenarios contain the same number of component entries, independent of buffer representation.
65+
Uniformity is independent of buffer representation.
66+
67+
### Uniform
68+
69+
All scenarios contain the same number of component entries.
70+
71+
- Dense buffers are always uniform (by construction)
72+
- Sparse buffers may also be uniform
73+
74+
### Non-uniform
75+
76+
Scenarios contain different numbers of component entries.
77+
78+
- Only possible in sparse representation
79+
80+
## Serialization Representation
81+
82+
Defines how datasets are serialized. Three serialization representations are supported: compact list, named map,
83+
and mixed.
84+
85+
### Compact List
86+
87+
Uses positional arrays instead of named attributes.
88+
The attributes present in the dataset are stored separately.
89+
90+
Generated when using `compact_list=True`.
91+
92+
### Named Map
93+
94+
Uses explicit attribute names per component.
95+
96+
### Mixed
97+
98+
Combination of compact list and named map (only possible in manual construction, e.g. validation datasets).
99+
14100
## Data structures
15101

16102
```{mermaid}
@@ -75,7 +161,7 @@ graph TD
75161
elements of all components) for a single scenario.
76162
- **{py:class}`BatchDataset <power_grid_model.data_types.BatchDataset>`:** A data type storing update and or output
77163
data for one or more scenarios.
78-
A batch dataset can contain sparse or dense data, depending on the component.
164+
A batch dataset can contain dense or sparse representations per component.
79165

80166
- **{py:class}`ComponentData <power_grid_model.data_types.ComponentData>`:** The data corresponding to the component.
81167
- **{py:class}`DataArray <power_grid_model.data_types.DataArray>`:** A data array can be a single or a batch array.
@@ -85,10 +171,11 @@ graph TD
85171
- **{py:class}`BatchArray <power_grid_model.data_types.BatchArray>`:** Multiple batches of data can be represented
86172
in sparse or dense forms.
87173
- **{py:class}`DenseBatchArray <power_grid_model.data_types.DenseBatchArray>`:** A 2D structured numpy array
88-
containing a list of components of the same type for each scenario.
174+
containing a list of components of the same type for each scenario. This implies all scenarios contain the
175+
same number of components (uniform structure).
89176
- **{py:class}`SparseBatchArray <power_grid_model.data_types.SparseBatchArray>`:** A typed dictionary with a 1D
90177
numpy array of `Indexpointer` type under `indptr` key and `SingleArray` under `data` key which is all components
91-
flattened over all batches.
178+
flattened across scenarios, with scenario boundaries defined by `indptr`.
92179
- **{py:class}`ColumnarData <power_grid_model.data_types.ColumnarData>`:** A dictionary of attributes as keys and
93180
individual numpy arrays as values.
94181
This format is described in more detail in
@@ -183,9 +270,10 @@ The batch size is the number of scenarios.
183270
- **n_scenarios:** The total number of scenarios in the batch.
184271
(Same as Batch Size)
185272

186-
- **n_component_elements_per_scenario:** The number of elements of a specific component for each scenario.
187-
This can be an integer (for dense batches), or a list of integers for sparse batches, where each integer in the list
188-
represents the number of elements of a specific component for the scenario corresponding to the index of the integer.
273+
- **n_component_elements_per_scenario:** The number of component instances per scenario, independent of representation
274+
format (dense or sparse). This can be an integer (for dense batches), or a list of integers for sparse batches,
275+
where each integer in the list represents the number of elements of a specific component for the scenario
276+
corresponding to the index of the integer.
189277

190278
- **Sub-batch:** When computing in parallel, all scenarios in batch calculation are distributed over threads.
191279
Each thread handles a subset of the `Batch`, called a `Sub-batch`.

docs/user_manual/serialization.md

Lines changed: 33 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -40,16 +40,16 @@ data.
4040

4141
#### JSON schema attributes object
4242

43-
[`Attributes`](#json-schema-attributes-object) contains specified attributes per [`Component`](#json-schema-component)
44-
type (e.g.: `"node"`).
45-
It is only required for those components that contain `HomogeneousComponentData` objects and that data needs to follow
46-
the attributes listed in this object.
47-
It may be empty if for data for all instances certain component is `InhomogeneousComponentData`.
48-
It reduces compression when a dataset largely follows the exact same pattern.
43+
[`Attributes`](#json-schema-attributes-object) defines the attribute list and ordering
44+
for each [`Component`](#json-schema-component) (e.g.: `"node"`), e.g., when component data is represented
45+
using the compact list format (`use_compact_list=True`).
46+
47+
The order of attributes in this section determines the order of values in the compact list representation.
48+
This is independent of whether the component data is stored as `DenseComponentData` or `SparseComponentData`.
4949

5050
- [`Attributes`](#json-schema-attributes-object): `Object`
51-
- [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes) containing the
52-
desired [`Attribute`](#json-schema-attribute)s for that [`Component`](#json-schema-component).
51+
- [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes)
52+
defining the ordered list of [`Attribute`](#json-schema-attribute)s for that component.
5353

5454
For example, for an `"update"` dataset that contains only updates to the `"from_status"` attribute of `"branch"`
5555
components, it may be `{"branch": ["from_status"]}`.
@@ -80,8 +80,8 @@ E.g.: `"id"`.
8080

8181
#### JSON schema dataset object
8282

83-
The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object) if
84-
the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is
83+
The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object)
84+
if the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is
8585
`false`, or a [`BatchDataset`](#json-schema-batch-dataset-object) otherwise.
8686

8787
- [`Dataset`](#json-schema-dataset-object): [`SingleDataset`](#json-schema-single-dataset-object) |
@@ -124,35 +124,33 @@ remains the same.
124124

125125
#### JSON schema component data object
126126

127-
A [`ComponentData`](#json-schema-component-data-object) object is either a
128-
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object or an
129-
[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object
127+
A [`ComponentData`](#json-schema-component-data-object) represents the data of a single component instance.
130128

131129
- [`ComponentData`](#json-schema-component-data-object):
132-
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) |
133-
[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object)
130+
[`DenseComponentData`](#json-schema-component-data-object-dense-representation) |
131+
[`SparseComponentData`](#json-schema-component-data-object-sparse-representation)
132+
133+
#### JSON schema component data object (dense representation)
134134

135-
#### JSON schema homogeneous component data object
135+
A [`DenseComponentData`](#json-schema-component-data-object-dense-representation) object
136+
stores values in a fixed positional order defined by the `attributes` field in the root object.
136137

137-
A [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object contains the actual values of a
138-
certain component following the exact order of the attributes listed in the [`attributes`](#json-schema-root-object)
139-
field in the [`PowerGridModelRoot`](#json-schema-root-object) object.
138+
- [`DenseComponentData`](#json-schema-component-data-object-dense-representation): `Array`
139+
- [`AttributeValue`](#json-schema-attribute-value): values in the exact order defined by the component's attribute
140+
list.
140141

141-
- [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object): `Array`
142-
- [`AttributeValue`](#json-schema-attribute-value): the value of each attribute.
142+
#### JSON schema component data object (sparse representation)
143143

144-
#### JSON schema inhomogeneous component data object
144+
A [`SparseComponentData`](#json-schema-component-data-object-sparse-representation) object
145+
stores values grouped by attribute.
145146

146-
An [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object contains actual values per
147-
attribute of a certain component.
148-
Contrary to the [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object), it lists the names of the
149-
attributes for which the values are specified, so the attributes may be in arbitrary order and do not have to follow the
150-
schema listed in the [`attributes`](#json-schema-root-object) field in the
151-
[`PowerGridModelRoot`](#json-schema-root-object) object.
147+
Unlike [`DenseComponentData`](#json-schema-component-data-object-dense-representation),
148+
it explicitly stores attribute names, allowing attributes to appear in arbitrary order and vary between components or
149+
scenarios.
152150

153-
- [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object): `Object`
154-
- [`Attribute`](#json-schema-attribute): [`AttributeValue`](#json-schema-attribute-value): the value of each attribute
155-
per attribute.
151+
- [`SparseComponentData`](#json-schema-component-data-object-sparse-representation): `Object`
152+
- [`Attribute`](#json-schema-attribute):
153+
[`AttributeValue`](#json-schema-attribute-value)
156154

157155
#### JSON schema attribute value
158156

@@ -255,11 +253,11 @@ The type is listed for each attribute in [Components](components.md).
255253

256254
The following example contains an input dataset.
257255
The nodes and sym_loads are represented using
258-
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object),
259-
the lines are represented using [`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object),
256+
[`DenseComponentData`](#json-schema-component-data-object-dense-representation),
257+
the lines are represented using [`SparseComponentData`](#json-schema-component-data-object-sparse-representation),
260258
while the sources are represented using a mixture of
261-
[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) and
262-
[`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object).
259+
[`DenseComponentData`](#json-schema-component-data-object-dense-representation) and
260+
[`SparseComponentData`](#json-schema-component-data-object-sparse-representation).
263261

264262
```json
265263
{

src/power_grid_model/_core/power_grid_model.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -613,11 +613,11 @@ def calculate_power_flow( # noqa: PLR0913
613613
- key: Component type name to be updated in batch.
614614
- value:
615615
616-
- For homogeneous update batch (a 2D numpy structured array):
616+
- For dense (uniform) update batch (a 2D numpy structured array):
617617
618618
- Dimension 0: Each batch.
619619
- Dimension 1: Each updated element per batch for this component type.
620-
- For inhomogeneous update batch (a dictionary containing two keys):
620+
- For sparse (non-uniform) update batch (a dictionary containing two keys)::
621621
622622
- indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
623623
update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of
@@ -800,11 +800,11 @@ def calculate_state_estimation( # noqa: PLR0913
800800
- key: Component type name to be updated in batch.
801801
- value:
802802
803-
- For homogeneous update batch (a 2D numpy structured array):
803+
- For dense (uniform) update batch (a 2D numpy structured array):
804804
805805
- Dimension 0: Each batch.
806806
- Dimension 1: Each updated element per batch for this component type.
807-
- For inhomogeneous update batch (a dictionary containing two keys):
807+
- For sparse (non-uniform) update batch (a dictionary containing two keys)::
808808
809809
- indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
810810
update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of
@@ -964,11 +964,11 @@ def calculate_short_circuit( # noqa: PLR0913
964964
- key: Component type name to be updated in batch
965965
- value:
966966
967-
- For homogeneous update batch (a 2D numpy structured array):
967+
- For dense (uniform) update batch (a 2D numpy structured array):
968968
969969
- Dimension 0: each batch
970970
- Dimension 1: each updated element per batch for this component type
971-
- For inhomogeneous update batch (a dictionary containing two keys):
971+
- For sparse (non-uniform) update batch (a dictionary containing two keys)::
972972
973973
- indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
974974
update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of

0 commit comments

Comments
 (0)