Skip to content

Commit 4c4511b

Browse files
dudarbohvepadulano
andcommitted
[df] Add example on thread-safe usage of RNGs
Using only one random number generator in an application running with ROOT::EnableImplicitMT() is a common pitfall. This may introduce race conditions that are not trivial to understand for the user. This commit introduces an example of using random number generators in an RDataFrame computation graph in a parallel and thread-safe way. Two strategies are exemplified: * Generating non-deterministic random numbers by using one generator per RDataFrame slot. In this case, different RDataFrame runs may produce different results. * Generating deterministic random numbers by using one generator per RDataFrame slot and then using a unique identifier per entry to seed the generator. In this case, different RDataFrame runs produce the same results. This example assumes that there is such unique identifier: it must either come from the dataset or it must be generated somehow by RDataFrame. The example shows the usage of the 'rdfentry_' column, which in the case of the RDataFrame constructor taking one integer number will represent a truly unique identifier (i.e. a different entry will have a different, unique value of 'rdfentry_' irrespective of how many slots are present and running). This would not work at the moment instead with any other RDataFrame constructor. As a note on the implementation: another possible way to implement the examples above would have been by instantiating 'thread_local' generators and seed them appropriately. This is not possible because cling does not work well with 'static' and 'thread_local' keywords in JITted code, so we choose to demonstrate the approach with the RDataFrame slots. Co-authored-by: Vincenzo Eduardo Padulano <vincenzo.eduardo.padulano@cern.ch>
1 parent d3d1a47 commit 4c4511b

4 files changed

Lines changed: 229 additions & 0 deletions

File tree

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
/// \file
2+
/// \ingroup tutorial_dataframe
3+
/// \notebook -nodraw
4+
/// Usage of multithreading mode with random generators.
5+
///
6+
/// This example illustrates how to define functions that generate random numbers and use them in an RDataFrame
7+
/// computation graph in a thread-safe way.
8+
///
9+
/// Using only one random number generator in an application running with ROOT::EnableImplicitMT() is a common pitfall.
10+
/// This pitfall creates race conditions resulting in a distorted random distribution. In the example, this issue is
11+
/// solved by creating one random number generator per RDataFrame processing slot, thus allowing for parallel and
12+
/// thread-safe access. The example also illustrates the difference between non-deterministic and deterministic random
13+
/// number generation.
14+
///
15+
/// \macro_code
16+
/// \macro_image
17+
/// \macro_output
18+
///
19+
/// \date February 2026
20+
/// \author Bohdan Dudar (JGU Mainz), Fernando Hueso-González (IFIC, CSIC-UV), Vincenzo Eduardo Padulano (CERN)
21+
22+
#include <iostream>
23+
#include <memory>
24+
#include <TCanvas.h>
25+
#include <ROOT/RDataFrame.hxx>
26+
27+
#include "df041_ThreadSafeRNG.hxx"
28+
29+
// Canvas that should survive the running of this macro
30+
std::unique_ptr<TCanvas> myCanvas;
31+
32+
void df041_ThreadSafeRNG()
33+
{
34+
myCanvas = std::make_unique<TCanvas>("myCanvas", "myCanvas", 1000, 500);
35+
myCanvas->Divide(3, 1);
36+
37+
unsigned int nEntries{10000000};
38+
39+
// 1. Single thread for reference
40+
auto df1 = ROOT::RDataFrame(nEntries).Define("x", GetNormallyDistributedNumberFromGlobalGenerator);
41+
auto h1 = df1.Histo1D({"h1", "Single thread (no MT)", 1000, -4, 4}, {"x"});
42+
myCanvas->cd(1);
43+
h1->DrawCopy();
44+
45+
// 2. One generator per RDataFrame slot, with random_device seeding
46+
// Notes and Caveats:
47+
// - How many numbers are drawn from each generator is not deterministic
48+
// and the result is not deterministic between runs.
49+
unsigned int nSlots{8};
50+
ROOT::EnableImplicitMT(nSlots);
51+
// Before running the RDataFrame computation graph, we reinitialize the generators (one per slot), so they can
52+
// be used accordingly during the execution.
53+
ReinitializeGenerators(nSlots);
54+
auto df2 = ROOT::RDataFrame(nEntries).Define("x", GetNormallyDistributedNumberPerSlotGenerator, {"rdfslot_"});
55+
auto h2 = df2.Histo1D({"h2", "Thread-safe (MT, non-deterministic)", 1000, -4, 4}, {"x"});
56+
myCanvas->cd(2);
57+
h2->DrawCopy();
58+
59+
// 3. One generator per RDataFrame slot, with entry seeding
60+
// Notes and Caveats:
61+
// - With RDataFrame(INTEGER_NUMBER) constructor (as in the example),
62+
// the result is deterministic and identical on every run
63+
// - With RDataFrame(TTree) constructor, the result is not guaranteed to be deterministic.
64+
// To make it deterministic, use something from the dataset to act as the event identifier
65+
// instead of rdfentry_, and use it as a seed.
66+
67+
// Before running the RDataFrame computation graph, we reinitialize the generators (one per slot), so they can
68+
// be used accordingly during the execution.
69+
ReinitializeGenerators(nSlots);
70+
auto df3 = ROOT::RDataFrame(nEntries).Define("x", GetNormallyDistributedNumberPerSlotGeneratorForEntry,
71+
{"rdfslot_", "rdfentry_"});
72+
auto h3 = df3.Histo1D({"h3", "Thread-safe (MT, deterministic)", 1000, -4, 4}, {"x"});
73+
myCanvas->cd(3);
74+
h3->DrawCopy();
75+
76+
std::cout << std::fixed << std::setprecision(3) << "Final distributions : " << "Mean " << " +- "
77+
<< "StdDev" << std::endl;
78+
std::cout << std::fixed << std::setprecision(3) << "Theoretical : " << "0.000" << " +- "
79+
<< "1.000" << std::endl;
80+
std::cout << std::fixed << std::setprecision(3) << "Single thread (no MT) : " << h1->GetMean() << " +- "
81+
<< h1->GetStdDev() << std::endl;
82+
std::cout << std::fixed << std::setprecision(3) << "Thread-safe (MT, non-deterministic): " << h2->GetMean() << " +- "
83+
<< h2->GetStdDev() << std::endl;
84+
std::cout << std::fixed << std::setprecision(3) << "Thread-safe (MT, deterministic) : " << h3->GetMean() << " +- "
85+
<< h3->GetStdDev() << std::endl;
86+
}
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#ifndef ROOT_TUTORIALS_ANALYSIS_DATAFRAME_DF041
2+
#define ROOT_TUTORIALS_ANALYSIS_DATAFRAME_DF041
3+
4+
#include <random>
5+
6+
// NOTE: these globals are intentionally NOT protected by a mutex.
7+
// This function is only safe to call from a single thread (used as reference).
8+
inline std::random_device globalRandomDevice{};
9+
inline std::mt19937 globalGenerator(globalRandomDevice());
10+
inline std::normal_distribution<double> globalGaus(0., 1.);
11+
12+
double GetNormallyDistributedNumberFromGlobalGenerator()
13+
{
14+
return globalGaus(globalGenerator);
15+
}
16+
17+
// One generator per slot — initialized once before the event loop
18+
inline std::vector<std::mt19937> generators;
19+
inline std::vector<std::normal_distribution<double>> gaussians;
20+
21+
void ReinitializeGenerators(unsigned int nSlots)
22+
{
23+
std::random_device rd;
24+
generators.resize(nSlots);
25+
for (auto &gen : generators)
26+
gen.seed(rd());
27+
gaussians.resize(nSlots, std::normal_distribution<double>(0., 1.));
28+
}
29+
30+
double GetNormallyDistributedNumberPerSlotGenerator(unsigned int slot)
31+
{
32+
return gaussians[slot](generators[slot]);
33+
}
34+
35+
double GetNormallyDistributedNumberPerSlotGeneratorForEntry(unsigned int slot, unsigned long long entry)
36+
{
37+
// We want to generate a random number distributed according to a normal distribution in a thread-safe way and such
38+
// that it is reproducible across different RDataFrame runs, i.e. given the same input to the generator it will
39+
// produce the same value. This is one way to do it. It assumes that the input argument represents a unique entry ID,
40+
// such that any thread processing an RDataFrame task will see this number once throughout the entire execution of
41+
// the computation graph
42+
// Calling both `reset` and `seed` methods is fundamental here to ensure reproducibility: without them the same
43+
// generator could be seeded by a different entry (depending on which is the first entry ID seen by a thread) or
44+
// could be at a different step of the sequence (depending how many entries this particular thread is processing).
45+
// Alternatively, if both the generator and the distribution objects were recreated from scratch at every function
46+
// call (i.e. by removing the `thread_local` attribute), then the next two method calls would not be necessary, at
47+
// the cost of a possible performance degradation.
48+
gaussians[slot].reset();
49+
generators[slot].seed(entry);
50+
return gaussians[slot](generators[slot]);
51+
}
52+
53+
#endif // ROOT_TUTORIALS_ANALYSIS_DATAFRAME_DF041
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# \file
2+
# \ingroup tutorial_dataframe
3+
# \notebook -nodraw
4+
# Usage of multithreading mode with random generators.
5+
#
6+
# This example illustrates how to define functions that generate random numbers and use them in an RDataFrame
7+
# computation graph in a thread-safe way.
8+
#
9+
# Using only one random number generator in an application running with ROOT.EnableImplicitMT() is a common pitfall.
10+
# This pitfall creates race conditions resulting in a distorted random distribution. In the example, this issue is
11+
# solved by creating one random number generator per RDataFrame processing slot, thus allowing for parallel and
12+
# thread-safe access. The example also illustrates the difference between non-deterministic and deterministic random
13+
# number generation.
14+
#
15+
# \macro_code
16+
# \macro_image
17+
# \macro_output
18+
#
19+
# \date February 2026
20+
# \author Bohdan Dudar (JGU Mainz), Fernando Hueso-González (IFIC, CSIC-UV), Vincenzo Eduardo Padulano (CERN)
21+
22+
import os
23+
24+
import ROOT
25+
26+
27+
def df041_ThreadSafeRNG():
28+
29+
# First, we declare the functions needed by the RDataFrame computation graph to the interpreter
30+
ROOT.gInterpreter.Declare(
31+
f'#include "{os.path.join(str(ROOT.gROOT.GetTutorialDir()), "analysis", "dataframe", "df041_ThreadSafeRNG.hxx")}"'
32+
)
33+
34+
myCanvas = ROOT.TCanvas("myCanvas", "myCanvas", 1000, 500)
35+
myCanvas.Divide(3, 1)
36+
37+
nEntries = 10000000
38+
39+
# 1. Single thread for reference
40+
df1 = ROOT.RDataFrame(nEntries).Define("x", "GetNormallyDistributedNumberFromGlobalGenerator()")
41+
h1 = df1.Histo1D(("h1", "Single thread (no MT)", 1000, -4, 4), "x")
42+
myCanvas.cd(1)
43+
h1.DrawCopy()
44+
45+
# 2. One generator per RDataFrame slot, with random_device seeding
46+
# Notes and Caveats:
47+
# - How many numbers are drawn from each generator is not deterministic
48+
# and the result is not deterministic between runs.
49+
nSlots = 8
50+
ROOT.EnableImplicitMT(nSlots)
51+
# Before running the RDataFrame computation graph, we reinitialize the generators (one per slot), so they can
52+
# be used accordingly during the execution.
53+
ROOT.ReinitializeGenerators(nSlots)
54+
df2 = ROOT.RDataFrame(nEntries).Define("x", "GetNormallyDistributedNumberPerSlotGenerator(rdfslot_)")
55+
h2 = df2.Histo1D(("h2", "Thread-safe (MT, non-deterministic)", 1000, -4, 4), "x")
56+
myCanvas.cd(2)
57+
h2.DrawCopy()
58+
59+
# 3. One generator per RDataFrame slot, with entry seeding
60+
# Notes and Caveats:
61+
# - With RDataFrame(INTEGER_NUMBER) constructor (as in the example),
62+
# the result is deterministic and identical on every run
63+
# - With RDataFrame(TTree) constructor, the result is not guaranteed to be deterministic.
64+
# To make it deterministic, use something from the dataset to act as the event identifier
65+
# instead of rdfentry_, and use it as a seed.
66+
67+
# Before running the RDataFrame computation graph, we reinitialize the generators (one per slot), so they can
68+
# be used accordingly during the execution.
69+
ROOT.ReinitializeGenerators(nSlots)
70+
df3 = ROOT.RDataFrame(nEntries).Define(
71+
"x", "GetNormallyDistributedNumberPerSlotGeneratorForEntry(rdfslot_, rdfentry_)"
72+
)
73+
h3 = df3.Histo1D(("h3", "Thread-safe (MT, deterministic)", 1000, -4, 4), "x")
74+
myCanvas.cd(3)
75+
h3.DrawCopy()
76+
77+
print(f"{'{:<40}'.format('Final distributions')}: Mean +- StdDev")
78+
print(f"{'{:<40}'.format('Theoretical')}: 0.000 +- 1.000")
79+
print(f"{'{:<40}'.format('Single thread (no MT)')}: {h1.GetMean():.3f} +- {h1.GetStdDev():.3f}")
80+
print(f"{'{:<40}'.format('Thread-safe (MT, non-deterministic)')}: {h2.GetMean():.3f} +- {h2.GetStdDev():.3f}")
81+
print(f"{'{:<40}'.format('Thread-safe (MT, deterministic)')}: {h3.GetMean():.3f} +- {h3.GetStdDev():.3f}")
82+
83+
# We draw the canvas with block=True to stop the execution before end of the
84+
# function and to be able to interact with the canvas until necessary
85+
myCanvas.Draw(block=True)
86+
87+
88+
if __name__ == "__main__":
89+
df041_ThreadSafeRNG()

tutorials/analysis/dataframe/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ A collection of building block examples for your analysis.
6464
| df036_missingBranches.C | df036_missingBranches.py | Deal with missing values due to a missing branch when switching to a new file in a chain. |
6565
| df037_TTreeEventMatching.C | df037_TTreeEventMatching.py | Deal with missing values due to not finding a matching event in an auxiliary dataset. |
6666
| df040_RResultPtr_lifetimeManagement.C | | Lifetime management of RResultPtr and the underlying objects. |
67+
| df041_ThreadSafeRNG.C | | Thread-safe usage of random number generators in multithreading. |
6768

6869

6970
\anchor readwrite

0 commit comments

Comments
 (0)