Skip to content
aechavez edited this page Apr 28, 2022 · 83 revisions

In the past, our analysis was done through the kickstart repository ldmx-analysis. Frequent software changes made this difficult, so a Python-based analysis framework emerged within our group to be more immune to chaotic development periods. This part of the tutorial will walk you through analyzing a collection of ROOT files using pyEcalVeto.

NOTE: There are plans to refactor pyEcalVeto in the interest of being more user-friendly and enhancing readability. This page will be updated once this is done.

Processing ROOT Files Interactively

  1. The ROOT files for this tutorial are provided under inputs in the TutorialFiles folder. Start by navigating to pyEcalVeto and examine treeMaker.py. This script processes each event and calculates a slew of kinematic variables, some of which we feed to a machine learning program called a boosted decision tree (BDT). For now, we'll use this script to analyze the tutorial files. Let's also make a directory to hold whatever gets output later.
cd /nfs/slac/g/ldmx/users/<USER>/ldmx-sw-v3.0.0/LDMX-scripts/pyEcalVeto
ldmx python3 treeMaker.py --help
mkdir outputs

The second command should have brought up some useful information on how to use the script. A quick rundown: Tell the script to run in batch mode with the --batch flag, specify your inputs either as a list of files with the -i flag or as a list of directories with the --indirs flag, label each group of files with the -g flag, specify your outputs with the -o flag, and tell the script how many events to process for each file group with the -m flag.

  1. Now we'll have the analysis script run over each file and output the results from the first 500 events of each file. We'll also label each output file by its process. The following command does all of this.
ldmx python3 treeMaker.py -i $PWD/../TutorialFiles/inputs/0.001_input.root $PWD/../TutorialFiles/inputs/0.01_input.root $PWD/../TutorialFiles/inputs/0.1_input.root $PWD/../TutorialFiles/inputs/1.0_input.root -g 0.001 0.01 0.1 1.0 -o $PWD/outputs $PWD/outputs $PWD/outputs $PWD/outputs -m 500

Once the script finishes processing the files, you can go ahead and delete the newly created scratch directory. Sometimes it isn't able to do this on its own, but this will hopefully be fixed in the future.

  1. Navigate to the output directory and open up the 0.001 GeV signal file in ROOT. Let's examine the number of reconstructed hits read out from the ECal.
cd outputs
root 0.001_unsorted.root
new TBrowser()

Browse through the file and select the nReadoutHits leaf under the EcalVeto branch. It should be the first leaf. If all goes as expected, you should see the following histogram.

Submitting Batch Jobs

  1. Oftentimes you'll need to run over a large number of files. This is where batch submission comes into play. Start by setting LSB_JOB_REPORT_MAIL=Y to receive email updates about your jobs' progress. You'll need to set this variable every time you open up a new terminal if you want to receive updates.
  2. Navigate to your workspace and submit some batch jobs. This is done through the bsub command. You can set which queue to submit a job to with the -q flag (Available options are short, medium, and long), specify how long a job is expected to run in minutes with the -W flag, and set how many cores you want to use with the -n flag. Let's have treeMaker.py run over each file as before and process all of the events this time. We'll submit the jobs to the short queue, running each on a single core with an expected run time of 5 minutes.
cd /nfs/slac/g/ldmx/users/<USER>/ldmx-sw-v3.0.0
bsub -q short -W 5 -n 1 -R "select[centos7] span[hosts=1]" singularity run --home $PWD $PWD/ldmx_dev_latest.sif . python3 $PWD/LDMX-scripts/pyEcalVeto/treeMaker.py --batch -i $PWD/LDMX-scripts/TutorialFiles/inputs/0.001_input.root -g 0.001 -o $PWD/LDMX-scripts/pyEcalVeto/outputs
bsub -q short -W 5 -n 1 -R "select[centos7] span[hosts=1]" singularity run --home $PWD $PWD/ldmx_dev_latest.sif . python3 $PWD/LDMX-scripts/pyEcalVeto/treeMaker.py --batch -i $PWD/LDMX-scripts/TutorialFiles/inputs/0.01_input.root -g 0.01 -o $PWD/LDMX-scripts/pyEcalVeto/outputs
bsub -q short -W 5 -n 1 -R "select[centos7] span[hosts=1]" singularity run --home $PWD $PWD/ldmx_dev_latest.sif . python3 $PWD/LDMX-scripts/pyEcalVeto/treeMaker.py --batch -i $PWD/LDMX-scripts/TutorialFiles/inputs/0.1_input.root -g 0.1 -o $PWD/LDMX-scripts/pyEcalVeto/outputs
bsub -q short -W 5 -n 1 -R "select[centos7] span[hosts=1]" singularity run --home $PWD $PWD/ldmx_dev_latest.sif . python3 $PWD/LDMX-scripts/pyEcalVeto/treeMaker.py --batch -i $PWD/LDMX-scripts/TutorialFiles/inputs/1.0_input.root -g 1.0 -o $PWD/LDMX-scripts/pyEcalVeto/outputs

It's crucial that you run bsub from the directory where your singularity image file (.sif) is located. If your image has a different name than the one shown here, make sure to point the command to the correct file. Note the judicious application of absolute file paths. This is good practice when working from inside the container, as it can be finicky about the locations of files sometimes.

  1. Navigate to the output directory and open up the 1.0 GeV signal file in ROOT. Let's examine the transverse RMS deviation of ECal hits.
cd LDMX-scripts/pyEcalVeto/outputs
root 1.0_unsorted.root
new TBrowser()

Browse through the file and select the showerRMS leaf under the EcalVeto branch. It should be the fourth leaf down from nReadoutHits. If all goes as expected, you should see the following histogram.

Clone this wiki locally