|
| 1 | +# Histogram Terminology |
| 2 | + |
| 3 | +This document collects, defines, and explains terms that are used in ROOT's histogram package. |
| 4 | +The goal is to start from a common understanding, which should avoid ambiguities and ease discussions. |
| 5 | +It also helps (future) developers to navigate the code because classes and methods are named accordingly. |
| 6 | +The list is ordered alphabetically, though dependent terms are kept together with their parent. |
| 7 | +It is supposed to be exhaustive; any missing term should be added when needed. |
| 8 | + |
| 9 | +An *axis* is a bin configuration in one dimension. |
| 10 | +A *regular axis* has equidistant bins in the interval $[a, b)$. |
| 11 | +A *variable bin axis* is configured with explicit bin edges $[e_{n}, e_{n+1})$. |
| 12 | +A *categorical axis* has a unique label per bin. |
| 13 | +*Axes* is the plural of axis and usually means the bin configurations for all dimensions of a histogram. |
| 14 | + |
| 15 | +A *bin content* is the value of a single bin. |
| 16 | +The *bin content type* can be an integer type, a floating-point type, the special `RDoubleBinWithError`, or a user-defined type. |
| 17 | + |
| 18 | +A *bin error* is the Poisson error of a bin content. |
| 19 | +With the special `RDoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$ |
| 20 | +Otherwise it is the square root of the bin content, which is only correct with unweighted filling. |
| 21 | + |
| 22 | +A *bin index* (plural *indices*) refers to a single bin of a dimension, an array of indices refers to a bin in a histogram. |
| 23 | +A *normal bin* is inside an axis and its index starts from 0. |
| 24 | +*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value. |
| 25 | +The *invalid bin index* is another special value. |
| 26 | + |
| 27 | +A *bin index range* is a range from `begin` (inclusive) to `end` (exclusive). |
| 28 | +For its purpose, the underflow bin is ordered before all normal bins while the overflow bin is placed after. |
| 29 | +As the `end` is exclusive, the invalid bin index is ordered last to make it possible to include the overflow bin. |
| 30 | + |
| 31 | +*Filling* a histogram means to add an entry to a histogram. |
| 32 | +*Concurrent filling* allows to modify the same histogram without (external) synchronization. |
| 33 | + |
| 34 | +A *histogram* is the combination of an axes configuration and storage of bin contents. |
| 35 | +For most use cases, it also includes (global) *histogram statistics*. |
| 36 | +On the one hand, these are the number of entries, the sum of weights, and the sum of weights squared. |
| 37 | +The number of *effective entries* can be computed as the ratio $$\frac{(\sum w_i)^2}{\sum w_i^2}$$. |
| 38 | +Furthermore, for each dimension the histogram statistics include the sum of weights times value and the sum of weights times value squared. |
| 39 | +This allows to compute the arithmetic mean and the standard deviation of the values before binning. |
| 40 | + |
| 41 | +A *linearized index* starts from 0 up to the total number of bins, potentially including flow bins. |
| 42 | +For a single axis, it places the flow bins after the normal bins. |
| 43 | +The *global index* is a combination of the linearized indices from all axes. |
| 44 | + |
| 45 | +A *profile* is a histogram that computes the arithmetic mean and standard deviation per bin. |
| 46 | +During filling, it accepts an additional `double` value and accumulates its sum and sum of squares. |
| 47 | + |
| 48 | +*Slicing* means to extract a subset of the normal bins in each dimension. |
| 49 | +Bin contents of excluded normal bins are added to the flow bins. |
| 50 | + |
| 51 | +A *snapshot* is a consistent clone of the histogram during concurrent filling. |
| 52 | + |
| 53 | +A *weight* is an optional floating-point value passed during filling. |
| 54 | +It defaults to $1$ if not specified, which is also called unweighted filling. |
0 commit comments