You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bindings/pyroot/pythonizations/doc/index.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
ROOT is a C++ framework used across HEP for data storage, analysis and visualisation. Its full API is available directly in Python through dynamic bindings powered by [cppyy](https://cppyy.readthedocs.io/). Every ROOT class you see in the
7
7
C++ documentation is accessible from Python under the `ROOT` module.
8
8
9
-
On top of that, a set of [pythonizations](@ref Pythonizations) adapt selected classes to feel more natively Pythonic: operator overloading, iterators, NumPy interoperability, and more.
9
+
On top of that, a set of @ref Pythonizations adapt selected classes to feel more natively Pythonic: operator overloading, iterators, NumPy interoperability, and more.
10
10
11
11
12
12
# Installation
@@ -100,19 +100,19 @@ h.Fill(data)
100
100
101
101
# Write it to a ROOT file
102
102
with ROOT.TFile.Open("output.root", "RECREATE") as f:
103
-
h.Write()
103
+
f.WriteObject(h, "my_histogram")
104
104
~~~
105
105
106
-
Now we create an RDataFrame from scratch, define a new column with a Python lambda and draw a histogram:
106
+
Now we create an @ref dataframe - ROOT's high-level interface for columnar data analysis - from scratch, define a new column and draw a histogram:
Copy file name to clipboardExpand all lines: bindings/pyroot/pythonizations/python/ROOT/_pythonization/dataloader.md
+12-38Lines changed: 12 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
\brief Feed ROOT data directly into models for machine learning training.
4
4
5
5
6
-
`RDataLoader` streams ROOT data into machine learning frameworks as batches ready for training. It takes any [RDataFrame](@refPy_RDataFrame) as input, giving you access to the full ROOT ecosystem for filtering, defining new variables and applying selections; it delivers batches of your dataset for [NumPy](https://numpy.org/devdocs/reference/generated/numpy.ndarray.html), [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) and [PyTorch](https://docs.pytorch.org/docs/main/tensors.html) through a simple iteration interface.
6
+
`RDataLoader` streams ROOT data into machine learning frameworks as batches ready for training. It takes any @refdataframe as input, giving you access to the full ROOT ecosystem for filtering, defining new variables and applying selections; it delivers batches of your dataset for [NumPy](https://numpy.org/devdocs/reference/generated/numpy.ndarray.html), [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) and [PyTorch](https://docs.pytorch.org/docs/main/tensors.html) through a simple iteration interface.
7
7
8
8
\note `RDataLoader` is part of `ROOT.Experimental.ML` and is currently experimental. The API may change between ROOT releases.
9
9
@@ -28,7 +28,7 @@ A one-page quick reference covering the API.
28
28
29
29
## Getting your data ready
30
30
31
-
`RDataLoader` takes an `RDataFrame` as input. This means your data preparation (selecting events, computing
31
+
`RDataLoader` takes an @ref dataframe as input. This means your data preparation (selecting events, computing
32
32
new variables, applying cuts, etc.) all happens before the loader is created, using the full power of `RDataFrame`:
33
33
34
34
~~~{.py}
@@ -38,13 +38,9 @@ import ROOT
38
38
# Open a ROOT file and create an RDataFrame
39
39
rdf = ROOT.RDataFrame("events", "file.root")
40
40
41
-
# Define a Python callback to compute a new variable
42
-
def invariant_mass(E: float, p: float) -> float:
43
-
return math.sqrt(E**2 - p**2)
44
-
45
41
# Apply selections and compute derived features
46
42
rdf = rdf.Filter("nMuons >= 2") \
47
-
.Define("inv_mass", invariant_mass, ["E", "p"])
43
+
.Define("inv_mass", "sqrt(E*E - p*p)")
48
44
~~~
49
45
50
46
Then pass your `RDataFrame` to `RDataLoader`:
@@ -138,7 +134,7 @@ dl = RDataLoader(
138
134
# events with fewer than 10 jets are zero-padded
139
135
~~~
140
136
141
-
\warning Every RVec column in `columns` must appear in `max_vec_sizes`.
137
+
\warning Every vector column in `columns` must appear in `max_vec_sizes`.
142
138
143
139
## Iterating Batches
144
140
@@ -212,6 +208,14 @@ train, val = train_val.train_test_split(test_size=0.176)
212
208
213
209
## Advanced Features
214
210
211
+
### Eager loading
212
+
213
+
By default the loader reads data lazily, one chunk of data at a time. For small datasets that fit in memory and will be iterated many times, eager loading pays a one-time cost at construction and then serves batches every epoch from memory:
Correct class imbalance by oversampling the minority or undersampling the majority. You can do this by passing two RDataFrames:
@@ -244,33 +248,3 @@ dl = RDataLoader(rdf,
244
248
for X, y, w in dl.as_torch():
245
249
loss = (loss_fn(model(X), y) * w).mean()
246
250
~~~
247
-
248
-
### Eager loading
249
-
250
-
By default the loader reads data lazily, one chunk of data at a time. For small datasets that fit in memory and will be iterated many times, eager loading pays a one-time cost at construction and then serves every epoch from memory:
0 commit comments