[hist] Document histogram terminology

hahnjo · hahnjo · commit b172ac63ee9a · 2025-07-02T08:44:31.000+02:00
diff --git a/hist/histv7/doc/Terminology.md b/hist/histv7/doc/Terminology.md
@@ -0,0 +1,54 @@
+# Histogram Terminology
+
+This document collects, defines, and explains terms that are used in ROOT's histogram package.
+The goal is to start from a common understanding, which should avoid ambiguities and ease discussions.
+It also helps (future) developers to navigate the code because classes and methods are named accordingly.
+The list is ordered alphabetically, though dependent terms are kept together with their parent.
+It is supposed to be exhaustive; any missing term should be added when needed.
+
+An *axis* is a bin configuration in one dimension.
+A *regular axis* has equidistant bins in the interval $[a, b)$.
+A *variable bin axis* is configured with explicit bin edges $[e_{n}, e_{n+1})$.
+A *categorical axis* has a unique label per bin.
+*Axes* is the plural of axis and usually means the bin configurations for all dimensions of a histogram.
+
+A *bin content* is the value of a single bin.
+The *bin content type* can be an integer type, a floating-point type, the special `RDoubleBinWithError`, or a user-defined type.
+
+A *bin error* is the Poisson error of a bin content.
+With the special `RDoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$
+Otherwise it is the square root of the bin content, which is only correct with unweighted filling.
+
+A *bin index* (plural *indices*) refers to a single bin of a dimension, an array of indices refers to a bin in a histogram.
+A *normal bin* is inside an axis and its index starts from 0.
+*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value.
+The *invalid bin index* is another special value.
+
+A *bin index range* is a range from `begin` (inclusive) to `end` (exclusive).
+For its purpose, the underflow bin is ordered before all normal bins while the overflow bin is placed after.
+As the `end` is exclusive, the invalid bin index is ordered last to make it possible to include the overflow bin.
+
+*Filling* a histogram means to add an entry to a histogram.
+*Concurrent filling* allows to modify the same histogram without (external) synchronization.
+
+A *histogram* is the combination of an axes configuration and storage of bin contents.
+For most use cases, it also includes (global) *histogram statistics*.
+On the one hand, these are the number of entries, the sum of weights, and the sum of weights squared.
+The number of *effective entries* can be computed as the ratio $$\frac{(\sum w_i)^2}{\sum w_i^2}$$.
+Furthermore, for each dimension the histogram statistics include the sum of weights times value and the sum of weights times value squared.
+This allows to compute the arithmetic mean and the standard deviation of the values before binning.
+
+A *linearized index* starts from 0 up to the total number of bins, potentially including flow bins.
+For a single axis, it places the flow bins after the normal bins.
+The *global index* is a combination of the linearized indices from all axes.
+
+A *profile* is a histogram that computes the arithmetic mean and standard deviation per bin.
+During filling, it accepts an additional `double` value and accumulates its sum and sum of squares.
+
+*Slicing* means to extract a subset of the normal bins in each dimension.
+Bin contents of excluded normal bins are added to the flow bins.
+
+A *snapshot* is a consistent clone of the histogram during concurrent filling.
+
+A *weight* is an optional floating-point value passed during filling.
+It defaults to $1$ if not specified, which is also called unweighted filling.