Skip to content

Commit b172ac6

Browse files
committed
[hist] Document histogram terminology
1 parent 3921bec commit b172ac6

1 file changed

Lines changed: 54 additions & 0 deletions

File tree

hist/histv7/doc/Terminology.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Histogram Terminology
2+
3+
This document collects, defines, and explains terms that are used in ROOT's histogram package.
4+
The goal is to start from a common understanding, which should avoid ambiguities and ease discussions.
5+
It also helps (future) developers to navigate the code because classes and methods are named accordingly.
6+
The list is ordered alphabetically, though dependent terms are kept together with their parent.
7+
It is supposed to be exhaustive; any missing term should be added when needed.
8+
9+
An *axis* is a bin configuration in one dimension.
10+
A *regular axis* has equidistant bins in the interval $[a, b)$.
11+
A *variable bin axis* is configured with explicit bin edges $[e_{n}, e_{n+1})$.
12+
A *categorical axis* has a unique label per bin.
13+
*Axes* is the plural of axis and usually means the bin configurations for all dimensions of a histogram.
14+
15+
A *bin content* is the value of a single bin.
16+
The *bin content type* can be an integer type, a floating-point type, the special `RDoubleBinWithError`, or a user-defined type.
17+
18+
A *bin error* is the Poisson error of a bin content.
19+
With the special `RDoubleBinWithError`, it is the square root of the sum of weights squared: $\sqrt{\sum w_i^2}$
20+
Otherwise it is the square root of the bin content, which is only correct with unweighted filling.
21+
22+
A *bin index* (plural *indices*) refers to a single bin of a dimension, an array of indices refers to a bin in a histogram.
23+
A *normal bin* is inside an axis and its index starts from 0.
24+
*Underflow* and *overflow* bins, also called *flow bins*, are outside the axis and their index has a special value.
25+
The *invalid bin index* is another special value.
26+
27+
A *bin index range* is a range from `begin` (inclusive) to `end` (exclusive).
28+
For its purpose, the underflow bin is ordered before all normal bins while the overflow bin is placed after.
29+
As the `end` is exclusive, the invalid bin index is ordered last to make it possible to include the overflow bin.
30+
31+
*Filling* a histogram means to add an entry to a histogram.
32+
*Concurrent filling* allows to modify the same histogram without (external) synchronization.
33+
34+
A *histogram* is the combination of an axes configuration and storage of bin contents.
35+
For most use cases, it also includes (global) *histogram statistics*.
36+
On the one hand, these are the number of entries, the sum of weights, and the sum of weights squared.
37+
The number of *effective entries* can be computed as the ratio $$\frac{(\sum w_i)^2}{\sum w_i^2}$$.
38+
Furthermore, for each dimension the histogram statistics include the sum of weights times value and the sum of weights times value squared.
39+
This allows to compute the arithmetic mean and the standard deviation of the values before binning.
40+
41+
A *linearized index* starts from 0 up to the total number of bins, potentially including flow bins.
42+
For a single axis, it places the flow bins after the normal bins.
43+
The *global index* is a combination of the linearized indices from all axes.
44+
45+
A *profile* is a histogram that computes the arithmetic mean and standard deviation per bin.
46+
During filling, it accepts an additional `double` value and accumulates its sum and sum of squares.
47+
48+
*Slicing* means to extract a subset of the normal bins in each dimension.
49+
Bin contents of excluded normal bins are added to the flow bins.
50+
51+
A *snapshot* is a consistent clone of the histogram during concurrent filling.
52+
53+
A *weight* is an optional floating-point value passed during filling.
54+
It defaults to $1$ if not specified, which is also called unweighted filling.

0 commit comments

Comments
 (0)