Skip to content

Commit 7f2d80a

Browse files
committed
starting analyzer scripts
1 parent f0a2af5 commit 7f2d80a

15 files changed

Lines changed: 955 additions & 14 deletions

analyzer/README.md

Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
# GEMC Analyzer
2+
3+
`analyzer` is a small Python package for reading GEMC output files and plotting
4+
variables by name. It currently focuses on CSV and ROOT output from `gstreamer`,
5+
with a reader structure that can be extended to other formats later.
6+
7+
## Dependencies
8+
9+
The main analyzer API uses the standard scientific Python stack:
10+
11+
```sh
12+
python3 -m pip install pandas numpy matplotlib
13+
```
14+
15+
ROOT output support also requires `uproot`:
16+
17+
```sh
18+
python3 -m pip install uproot
19+
```
20+
21+
ROOT prerequisites:
22+
23+
- GEMC must be built with ROOT support and the ROOT streamer plugin available.
24+
- The run must use `gstreamer` format `root`.
25+
- Reading ROOT files from Python does not require importing C++ ROOT; the analyzer
26+
uses `uproot`.
27+
28+
The dependency-free SVG helper in `analyzer/svg_plot.py` only uses the Python
29+
standard library. It is useful on minimal systems where `pandas`, `numpy`, and
30+
`matplotlib` are not installed.
31+
32+
## GEMC Output Model
33+
34+
The CSV streamer writes two flattened files per worker thread:
35+
36+
```text
37+
<rootname>_t<thread>_true_info.csv
38+
<rootname>_t<thread>_digitized.csv
39+
```
40+
41+
For one thread and `filename: b2`, the files are typically:
42+
43+
```text
44+
b2_t0_true_info.csv
45+
b2_t0_digitized.csv
46+
```
47+
48+
CSV files are read with:
49+
50+
```python
51+
pd.read_csv(path, sep=",", skipinitialspace=True)
52+
```
53+
54+
The CSV rows include event context columns such as:
55+
56+
```text
57+
evn, timestamp, thread_id, detector
58+
```
59+
60+
The digitized B2 output includes columns like:
61+
62+
```text
63+
hitn, pid, tid, E, time, totEdep
64+
```
65+
66+
The ROOT streamer writes one ROOT file per worker thread. For one thread and
67+
`filename: b2`, the file is typically:
68+
69+
```text
70+
b2_t0.root
71+
```
72+
73+
The file contains trees named:
74+
75+
```text
76+
event_header
77+
run_header
78+
true_info_<detector>
79+
digitized_<detector>
80+
```
81+
82+
ROOT detector trees store vector branches. The analyzer flattens each vector
83+
element into one `DataFrame` row.
84+
85+
## Python API
86+
87+
Read one digitized CSV file:
88+
89+
```python
90+
from analyzer import read_output
91+
92+
output = read_output("tmp/b2_t0_digitized.csv")
93+
df = output.get_frame("digitized")
94+
print(df.columns)
95+
```
96+
97+
Read a CSV root name when both files exist:
98+
99+
```python
100+
from analyzer import read_output
101+
102+
output = read_output("tmp/b2_t0", kind="csv")
103+
print(output.summary())
104+
```
105+
106+
Plot `totEdep` grouped by `pid`:
107+
108+
```python
109+
from analyzer import plot_variable, read_output
110+
111+
output = read_output("tmp/b2_t0_digitized.csv")
112+
plot_variable(
113+
output,
114+
"totEdep",
115+
data="digitized",
116+
bins=30,
117+
xlim=(0.0, 0.1),
118+
show=True,
119+
)
120+
```
121+
122+
Read ROOT output:
123+
124+
```python
125+
from analyzer import read_output
126+
127+
output = read_output("tmp/b2_t0.root", kind="root")
128+
df = output.get_frame("digitized", detector="flux")
129+
```
130+
131+
## Command Line Usage
132+
133+
Run `python3 -m analyzer` from the GEMC source directory, where the `analyzer`
134+
package directory is visible to Python.
135+
136+
The `-m` flag takes a module name, not a filesystem path. Do not run
137+
`python3 -m ../analyzer`. If your shell is inside `tmp/`, either move back to
138+
the source directory or set `PYTHONPATH=..`.
139+
140+
Print a summary:
141+
142+
```sh
143+
python3 -m analyzer tmp/b2_t0_digitized.csv
144+
```
145+
146+
Plot a digitized variable with matplotlib:
147+
148+
```sh
149+
python3 -m analyzer tmp/b2_t0_digitized.csv totEdep --kind csv --xlim 0.0 0.1
150+
```
151+
152+
Save a plot instead of showing it:
153+
154+
```sh
155+
python3 -m analyzer tmp/b2_t0_digitized.csv totEdep --kind csv --save tmp/b2_totEdep.png
156+
```
157+
158+
Plot ROOT output with matplotlib:
159+
160+
```sh
161+
python3 -m analyzer tmp/b2_t0.root totEdep --kind root --detector flux --save tmp/b2_totEdep.png
162+
```
163+
164+
## Dependency-Free SVG Plot
165+
166+
If `pandas`, `numpy`, or `matplotlib` are unavailable, create an SVG histogram
167+
directly from the CSV file:
168+
169+
```sh
170+
python3 -B analyzer/svg_plot.py tmp/b2_t0_digitized.csv totEdep --out tmp/b2_totEdep.svg --bins 30
171+
```
172+
173+
Add an x-axis range with:
174+
175+
```sh
176+
python3 -B analyzer/svg_plot.py tmp/b2_t0_digitized.csv totEdep --out tmp/b2_totEdep.svg --bins 30 --xlim 0.0 0.1
177+
```
178+
179+
## Run The B2 Example
180+
181+
Run these commands from the GEMC source directory:
182+
183+
```sh
184+
mkdir -p tmp
185+
```
186+
187+
Build the B2 geometry into a local SQLite database:
188+
189+
```sh
190+
PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=/opt/projects/gemc/src/api \
191+
python3 examples/basic/b2/b2.py -f sqlite -sql tmp/gemc.db
192+
```
193+
194+
Run GEMC with CSV output rooted at `tmp/b2`:
195+
196+
```sh
197+
build/gemc examples/basic/b2/b2.yaml \
198+
'-gsystem=[{name: b2, factory: sqlite}]' \
199+
'-gstreamer=[{format: csv, filename: tmp/b2}]' \
200+
-sql=tmp/gemc.db \
201+
-n=20
202+
```
203+
204+
With one worker thread, this produces:
205+
206+
```text
207+
tmp/b2_t0_digitized.csv
208+
tmp/b2_t0_true_info.csv
209+
```
210+
211+
Inspect the digitized CSV header:
212+
213+
```sh
214+
head -1 tmp/b2_t0_digitized.csv
215+
```
216+
217+
Expected columns include:
218+
219+
```text
220+
evn, timestamp, thread_id, detector, hitn, pid, tid, E, time, totEdep
221+
```
222+
223+
Create the `totEdep` plot with the main analyzer API:
224+
225+
```sh
226+
python3 -m analyzer tmp/b2_t0_digitized.csv totEdep --kind csv --save tmp/b2_totEdep.png
227+
```
228+
229+
Or create the same style of histogram without third-party Python packages:
230+
231+
```sh
232+
python3 -B analyzer/svg_plot.py tmp/b2_t0_digitized.csv totEdep --out tmp/b2_totEdep.svg --bins 30
233+
```
234+
235+
### Run B2 With ROOT Output
236+
237+
To produce ROOT output instead of CSV, keep the same `tmp/gemc.db` and run:
238+
239+
```sh
240+
build/gemc examples/basic/b2/b2.yaml \
241+
'-gsystem=[{name: b2, factory: sqlite}]' \
242+
'-gstreamer=[{format: root, filename: tmp/b2}]' \
243+
-sql=tmp/gemc.db \
244+
-n=20
245+
```
246+
247+
With one worker thread, this produces:
248+
249+
```text
250+
tmp/b2_t0.root
251+
```
252+
253+
Read the ROOT file from Python if you want to inspect or manipulate the data
254+
before plotting:
255+
256+
```python
257+
from analyzer import plot_variable, read_output
258+
259+
output = read_output("tmp/b2_t0.root", kind="root")
260+
print(output.summary())
261+
262+
df = output.get_frame("digitized", detector="flux")
263+
print(df[["pid", "totEdep"]].head())
264+
265+
plot_variable(
266+
output,
267+
"totEdep",
268+
data="digitized",
269+
detector="flux",
270+
bins=30,
271+
show=True,
272+
)
273+
```
274+
275+
The Python inspection step is not required for plotting. To plot directly from
276+
the command line, use:
277+
278+
```sh
279+
python3 -m analyzer tmp/b2_t0.root totEdep --kind root --detector flux --save tmp/b2_root_totEdep.png
280+
```
281+
282+
If your shell is inside `tmp/`, do not use `python3 -m ../analyzer`. The `-m`
283+
flag accepts a module name, not a relative path. Use either:
284+
285+
```sh
286+
cd ..
287+
python3 -m analyzer tmp/b2_t0.root totEdep --kind root --detector flux --save tmp/b2_root_totEdep.png
288+
```
289+
290+
or:
291+
292+
```sh
293+
PYTHONPATH=.. python3 -m analyzer b2_t0.root totEdep --kind root --detector flux --save b2_root_totEdep.png
294+
```
295+
296+
If matplotlib reports that its default cache directory is not writable, set a
297+
writable `MPLCONFIGDIR`:
298+
299+
```sh
300+
PYTHONPATH=.. MPLCONFIGDIR=. python3 -m analyzer b2_t0.root totEdep --kind root --detector flux --save b2_root_totEdep.png
301+
```
302+
303+
## Extending Readers
304+
305+
New formats should return a `GemcOutput` object from `analyzer.dataset`.
306+
Populate one or more of these maps:
307+
308+
```python
309+
GemcOutput(
310+
true_info={"name": true_info_dataframe},
311+
digitized={"name": digitized_dataframe},
312+
headers={"event_header": event_header_dataframe},
313+
)
314+
```
315+
316+
Then add the format selection to `read_output()` in `analyzer/readers.py`.

analyzer/__init__.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
"""Python helpers for reading and plotting GEMC output."""
2+
3+
__all__ = ["GemcOutput", "plot_variable", "read_output"]
4+
5+
6+
def __getattr__(name):
7+
if name == "GemcOutput":
8+
from .dataset import GemcOutput
9+
10+
return GemcOutput
11+
if name == "plot_variable":
12+
from .plotting import plot_variable
13+
14+
return plot_variable
15+
if name == "read_output":
16+
from .readers import read_output
17+
18+
return read_output
19+
raise AttributeError(f"module 'analyzer' has no attribute {name!r}")

analyzer/__main__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
"""Command line entry point for ``python -m analyzer``."""
2+
3+
from .cli import main
4+
5+
if __name__ == "__main__":
6+
main()
715 Bytes
Binary file not shown.
312 Bytes
Binary file not shown.
3.42 KB
Binary file not shown.
4.99 KB
Binary file not shown.
5.74 KB
Binary file not shown.
8.62 KB
Binary file not shown.

0 commit comments

Comments
 (0)