Merge pull request #1411 from PowerGridModel/feature/documentation-consistency

zhen0427 · web-flow · commit 3588a3d7c004 · 2026-06-10T07:11:56.000Z
Documentation: dataset teminology consistency
diff --git a/docs/user_manual/dataset-terminology.md b/docs/user_manual/dataset-terminology.md
@@ -11,6 +11,92 @@ attribute.
 For detailed data types used throughout `power-grid-model`, please refer to
  [Python API Reference](../api_reference/python-api-reference.md).
 
+## Buffer Type
+
+Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based.
+
+### Row (row-based, row-major)
+
+Attributes of the same component are stored contiguously before moving to the next component.
+
+### Columnar (column-based, column-major)
+
+Attributes are grouped across components by attribute type.
+
+## Buffer Representation
+
+Defines whether component data can be interpreted as a dense 2D matrix.
+
+### Dense
+
+Dense buffers represent data as a rectangular matrix.
+This representation implies that all scenarios contain the same number of component entries.
+
+### Sparse
+
+Component data is stored as a flattened 1D buffer.
+
+Scenario boundaries are defined using an index pointer (`indptr`).
+The `indptr` defines how the flattened buffer is segmented into per-scenario ranges.
+
+Sparse buffers may be either uniform or non-uniform.
+
+## Component Dataset Independency
+
+Defines whether all scenarios operate on the same component IDs.
+
+### Independent
+
+All scenarios modify the same component IDs in the same order.
+
+Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore
+a reset is required between scenarios.
+
+### Dependent
+
+Different scenarios may modify different components.
+
+Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore
+a reset is required between scenarios.
+
+## Component Data Uniformity
+
+Defines whether all scenarios contain the same number of component entries, independent of buffer representation.
+Uniformity is independent of buffer representation.
+
+### Uniform
+
+All scenarios contain the same number of component entries.
+
+- Dense buffers are always uniform (by construction)
+- Sparse buffers may also be uniform
+
+### Non-uniform
+
+Scenarios contain different numbers of component entries.
+
+- Only possible in sparse representation
+
+## Serialization Representation
+
+Defines how datasets are serialized. Three serialization representations are supported: compact list, named map,
+and mixed.
+
+### Compact List
+
+Uses positional arrays instead of named attributes.
+The attributes present in the dataset are stored separately.
+
+Generated when using `compact_list=True`.
+
+### Named Map
+
+Uses explicit attribute names per component.
+
+### Mixed
+
+Combination of compact list and named map (only possible in manual construction, e.g. validation datasets).
+
 ## Data structures
 
 ```{mermaid}
@@ -75,7 +161,7 @@ graph TD
     elements of all components) for a single scenario.
   - **{py:class}`BatchDataset <power_grid_model.data_types.BatchDataset>`:** A data type storing update and or output
     data for one or more scenarios.
-    A batch dataset can contain sparse or dense data, depending on the component.
+    A batch dataset can contain dense or sparse representations per component.
 
 - **{py:class}`ComponentData <power_grid_model.data_types.ComponentData>`:** The data corresponding to the component.
   - **{py:class}`DataArray <power_grid_model.data_types.DataArray>`:** A data array can be a single or a batch array.
@@ -85,10 +171,11 @@ graph TD
     - **{py:class}`BatchArray <power_grid_model.data_types.BatchArray>`:** Multiple batches of data can be represented
       in sparse or dense forms.
       - **{py:class}`DenseBatchArray <power_grid_model.data_types.DenseBatchArray>`:** A 2D structured numpy array
-        containing a list of components of the same type for each scenario.
+        containing a list of components of the same type for each scenario. This implies all scenarios contain the
+        same number of components (uniform structure).
       - **{py:class}`SparseBatchArray <power_grid_model.data_types.SparseBatchArray>`:** A typed dictionary with a 1D
         numpy array of `Indexpointer` type under `indptr` key and `SingleArray` under `data` key which is all components
-        flattened over all batches.
+        flattened across scenarios, with scenario boundaries defined by `indptr`.
   - **{py:class}`ColumnarData <power_grid_model.data_types.ColumnarData>`:** A dictionary of attributes as keys and
     individual numpy arrays as values.
     This format is described in more detail in
@@ -183,9 +270,10 @@ The batch size is the number of scenarios.
 - **n_scenarios:** The total number of scenarios in the batch.
   (Same as Batch Size)
 
-- **n_component_elements_per_scenario:** The number of elements of a specific component for each scenario.
-  This can be an integer (for dense batches), or a list of integers for sparse batches, where each integer in the list
-  represents the number of elements of a specific component for the scenario corresponding to the index of the integer.
+- **n_component_elements_per_scenario:** The number of component instances per scenario, independent of representation
+  format (dense or sparse). This can be an integer (for dense batches), or a list of integers for sparse batches,
+  where each integer in the list represents the number of elements of a specific component for the scenario
+  corresponding to the index of the integer.
 
 - **Sub-batch:** When computing in parallel, all scenarios in batch calculation are distributed over threads.
   Each thread handles a subset of the `Batch`, called a `Sub-batch`.
diff --git a/docs/user_manual/serialization.md b/docs/user_manual/serialization.md
@@ -40,16 +40,16 @@ data.
 
 #### JSON schema attributes object
 
-[`Attributes`](#json-schema-attributes-object) contains specified attributes per [`Component`](#json-schema-component)
-type (e.g.: `"node"`).
-It is only required for those components that contain `HomogeneousComponentData` objects and that data needs to follow
-the attributes listed in this object.
-It may be empty if for data for all instances certain component is `InhomogeneousComponentData`.
-It reduces compression when a dataset largely follows the exact same pattern.
+[`Attributes`](#json-schema-attributes-object) defines the attribute list and ordering
+for each [`Component`](#json-schema-component) (e.g.: `"node"`), e.g., when component data is represented
+using the compact list format (`use_compact_list=True`).
+
+The order of attributes in this section determines the order of values in the compact list representation.
+This is independent of whether the component data is stored as `DenseComponentData` or `SparseComponentData`.
 
 - [`Attributes`](#json-schema-attributes-object): `Object`
-  - [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes) containing the
-    desired [`Attribute`](#json-schema-attribute)s for that [`Component`](#json-schema-component).
+  - [`Component`](#json-schema-component): [`ComponentAttributes`](#json-schema-component-attributes)
+    defining the ordered list of [`Attribute`](#json-schema-attribute)s for that component.
 
 For example, for an `"update"` dataset that contains only updates to the `"from_status"` attribute of `"branch"`
 components, it may be `{"branch": ["from_status"]}`.
@@ -80,8 +80,8 @@ E.g.: `"id"`.
 
 #### JSON schema dataset object
 
-The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object) if
-the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is
+The [`Dataset`](#json-schema-dataset-object) object is either a [`SingleDataset`](#json-schema-single-dataset-object)
+if the [`is_batch`](#json-schema-root-object) field in the [`PowerGridModelRoot`](#json-schema-root-object) object is
 `false`, or a [`BatchDataset`](#json-schema-batch-dataset-object) otherwise.
 
 - [`Dataset`](#json-schema-dataset-object): [`SingleDataset`](#json-schema-single-dataset-object) |
@@ -124,35 +124,33 @@ remains the same.
 
 #### JSON schema component data object
 
-A [`ComponentData`](#json-schema-component-data-object) object is either a
-[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object or an
-[`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object
+A [`ComponentData`](#json-schema-component-data-object) represents the data of a single component instance.
 
 - [`ComponentData`](#json-schema-component-data-object):
-  [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) |
-  [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object)
+  [`DenseComponentData`](#json-schema-component-data-object-dense-representation) |
+  [`SparseComponentData`](#json-schema-component-data-object-sparse-representation)
+
+#### JSON schema component data object (dense representation)
 
-#### JSON schema homogeneous component data object
+A [`DenseComponentData`](#json-schema-component-data-object-dense-representation) object
+stores values in a fixed positional order defined by the `attributes` field in the root object.
 
-A [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) object contains the actual values of a
-certain component following the exact order of the attributes listed in the [`attributes`](#json-schema-root-object)
-field in the [`PowerGridModelRoot`](#json-schema-root-object) object.
+- [`DenseComponentData`](#json-schema-component-data-object-dense-representation): `Array`
+  - [`AttributeValue`](#json-schema-attribute-value): values in the exact order defined by the component's attribute
+    list.
 
-- [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object): `Array`
-  - [`AttributeValue`](#json-schema-attribute-value): the value of each attribute.
+#### JSON schema component data object (sparse representation)
 
-#### JSON schema inhomogeneous component data object
+A [`SparseComponentData`](#json-schema-component-data-object-sparse-representation) object
+stores values grouped by attribute.
 
-An [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object) object contains actual values per
-attribute of a certain component.
-Contrary to the [`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object), it lists the names of the
-attributes for which the values are specified, so the attributes may be in arbitrary order and do not have to follow the
-schema listed in the [`attributes`](#json-schema-root-object) field in the
-[`PowerGridModelRoot`](#json-schema-root-object) object.
+Unlike [`DenseComponentData`](#json-schema-component-data-object-dense-representation),
+it explicitly stores attribute names, allowing attributes to appear in arbitrary order and vary between components or
+scenarios.
 
-- [`InhomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object): `Object`
-  - [`Attribute`](#json-schema-attribute): [`AttributeValue`](#json-schema-attribute-value): the value of each attribute
-    per attribute.
+- [`SparseComponentData`](#json-schema-component-data-object-sparse-representation): `Object`
+  - [`Attribute`](#json-schema-attribute):
+    [`AttributeValue`](#json-schema-attribute-value)
 
 #### JSON schema attribute value
 
@@ -255,11 +253,11 @@ The type is listed for each attribute in [Components](components.md).
 
 The following example contains an input dataset.
 The nodes and sym_loads are represented using
-[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object),
-the lines are represented using [`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object),
+[`DenseComponentData`](#json-schema-component-data-object-dense-representation),
+the lines are represented using [`SparseComponentData`](#json-schema-component-data-object-sparse-representation),
 while the sources are represented using a mixture of
-[`HomogeneousComponentData`](#json-schema-homogeneous-component-data-object) and
-[`InomogeneousComponentData`](#json-schema-inhomogeneous-component-data-object).
+[`DenseComponentData`](#json-schema-component-data-object-dense-representation) and
+[`SparseComponentData`](#json-schema-component-data-object-sparse-representation).
 
 ```json
 {
diff --git a/src/power_grid_model/_core/power_grid_model.py b/src/power_grid_model/_core/power_grid_model.py
@@ -613,11 +613,11 @@ def calculate_power_flow(  # noqa: PLR0913
                     - key: Component type name to be updated in batch.
                     - value:
 
-                        - For homogeneous update batch (a 2D numpy structured array):
+                        - For dense (uniform) update batch (a 2D numpy structured array):
 
                             - Dimension 0: Each batch.
                             - Dimension 1: Each updated element per batch for this component type.
-                        - For inhomogeneous update batch (a dictionary containing two keys):
+                        - For sparse (non-uniform) update batch (a dictionary containing two keys)::
 
                             - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
                               update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of
@@ -800,11 +800,11 @@ def calculate_state_estimation(  # noqa: PLR0913
                     - key: Component type name to be updated in batch.
                     - value:
 
-                        - For homogeneous update batch (a 2D numpy structured array):
+                        - For dense (uniform) update batch (a 2D numpy structured array):
 
                             - Dimension 0: Each batch.
                             - Dimension 1: Each updated element per batch for this component type.
-                        - For inhomogeneous update batch (a dictionary containing two keys):
+                        - For sparse (non-uniform) update batch (a dictionary containing two keys)::
 
                             - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
                               update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of
@@ -964,11 +964,11 @@ def calculate_short_circuit(  # noqa: PLR0913
                     - key: Component type name to be updated in batch
                     - value:
 
-                        - For homogeneous update batch (a 2D numpy structured array):
+                        - For dense (uniform) update batch (a 2D numpy structured array):
 
                             - Dimension 0: each batch
                             - Dimension 1: each updated element per batch for this component type
-                        - For inhomogeneous update batch (a dictionary containing two keys):
+                        - For sparse (non-uniform) update batch (a dictionary containing two keys)::
 
                             - indptr: A 1D numpy int64 array with length n_batch + 1. Given batch number k, the
                               update array for this batch is data[indptr[k]:indptr[k + 1]]. This is the concept of