CCQE spline smoothing: a new workflow by kjplows · Pull Request #495 · GENIE-MC/Generator

kjplows · 2026-05-25T14:24:47Z

CCQE cross sections for tunes like the "argon tunes" AR23_20i_00_000 and the upcoming AR25_20i_00_000 rely on Monte Carlo integration of the probe + struck nucleon system. The NewQELXSec algorithm handles polling from the nuclear ground state to provide this initial struck nucleon: it is configurable via NumNucleonThrows to perform N polls, and report the average calculated cross section from all N polls as the spline result, per knot.
This has the unintended consequence of adding a statistical uncertainty to the spline knots. For states using fairly "peaked" ground states, such as LFG (AR23_20i_00_000), this uncertainty is small (if still noticeable on the scale of the cross section). For a more realistic spectral function (such as in AR25_20i_00_000), the uncertainty is quite large.

To prove my point: below, a comparison of the result of numu + 40Ar CCQE scattering, with N=500 (current NewQELXSec default) nucleons of LocalFGM_SpectralFunctionLikeWithCorrelation and SpectralFunc ground states (i.e. AR23_20i_00_000 and AR25_20i_00_000). (Note the difference of scale peak, which is caused by a switching off of the RPA correction in the AR25_20i_00_000 tune). The jitter leads to significant instability in the tail of O(1 GeV).

In a previous PR #468 , I attemtped an early solution to the problem, by running a Gaussian kernel density estimator to "smooth out" these splines: neighbouring points would hopefully "pull" each knot towards what is assumed to be the "true" underlying cross section.
In this PR, I have implemented a new production workflow (currently optimised for Fermilab grid nodes, but extensible to any node that can run singularity SL7), which instead of running gmkspl once with N nucleons polled, instead runs gmkspl N times with one nucleon polled each time, incrementing the random number seed by 1 every time. The results are collected in a parquet file using python3 (required packages are installed on the worker node on workflow execution), and the parquet plus an xml file using the mean of these throws are returned.
The files can be found in $GENIE/src/contrib/ccqe_spline_smoothing/:

run_splines_parquet_FNAL-SL7.sh : the script that generates the executable to be run, and the DAG file to be created to be run via jobsub_submit --dag.
example.cmd : An example of how this is run. It is similar to (if somewhat less sophisticated than) @nusense 's production scripts. This example takes an input configuration file pdgs_for_splines.cfg that tells the workflow which neutrino and target PDGs to make splines for, and relies on the sbnd UPS setup to retrieve tagged SBN-wide version genie v3_06_02_sbn3 at qualifier e26:prof and use tune AR25_20i_01_001 (tune is described in New CMCs and small fixes for GENIE #490), and sets N=1000, for 250 knots with emax = 100 GeV.
The config file reads:

# syntax: nu_pdg tgt_pdg pretty-neutrino pretty-tgt
12 1000060120 nue 12C
-12 1000260560 nuebar 56Fe
14 1000180400 numu 40Ar
-14 1000080160 numubar 16O

so four splines will be made: nue at 12C, nuebar at 56Fe, numu at 40Ar, and numubar at 16O.

A second plot: Demonstration of the error of the mean. This uses the SpectralFunction ground state from AR25_20i_00_000,. Each pink 'x' is one result from an N=500 run, of which 100 runs were performed. The median of these runs is the dashed red line (in this energy range, functionally identical to the mean): the red band is the error on the median (again, almost identical to the error on the mean).

Finally, I include a script construct_representative_splines.py (alongside a requirements.txt file that can be used alongside pyenv + Python 3.13.5 to run it on AL9 machines) that takes as input the parquets from the workflow above and runs a Gaussian process regression from sklearn (many more details about Gaussian processes can be seen in the User Guide) and make a smoothed version of the spline.
Grosso modo, the model is as follows: Suppose that the cross sections reported by the N runs we took follow a true latent distribution dσ/dΕν (Εν) with heteroskedastic noise ε overlaid on top. We know at each knot the mean, <σ>, and the error, δσ: from that, a Gaussian process regressor learns the characteristic "length scale" of the latent distribution, and distinguishes between the latent distribution and noise.
This script writes out a summary parquet with the mean, error-on-mean, prediction, and error on the prediction, and a GENIE XML file with one spline containing the predicted cross section on each knot.

For brevity, only a few plots will follow, but more are available if desired.

First, a comparison between taking the mean over N=10 000 nucleons (solid red points) and the Gaussian process regression (orange dashed line and coloured band) for numu on 40Ar, close to the peak: black points with error bars are the result of running this workflow over N = 1000 nucleons and are used as input to the Gaussian process regressor.

Next, a comparison between taking the mean over N=1000 nucleons (solid points) and the Gaussian process regression (coloured bands) for four tunes: G18_10a_02_11b, AR23_20i_00_000, AR25_20i_00_000, and AR25_20i_01_001, all for numu on 40Ar, in the region <= 100 GeV (top plot linear in Enu, bottom logaritmic in Enu):

Jitter is evident in all four tunes (two use LFG, two use SF); the Gaussian process reduces these.

Third, equivalent plots for numubar on 16O, nue on 12C, and nuebar on 56Fe (the three other nuclei for which spectral function implementations exist):

Finally, two more scripts are provided:

replace_splines.py : Given a "base xml file" (such as a gxspl-NUsmall.xml) and a revision xml file (such as the output of construct_representative_spline.py), it searches the base xml for the spline(s) contained in the revision xml, and produces an output xml. The output is identical to base, apart from the splines contained in the revision xml which are drawn from the revision.
reduce_gxspl_CCQES_for_sbn.awk : A convenience awk script (based off of the scripts available in the genie_xsec larsoft.opensciencegrid.org area that @nusense wrote) that keeps only CC-QES (not charm/strange) splines for nu(e, mu)(bar), on only Ar40, O16, C12, Fe56.

A final plot: comparison between unsmoothed and smoothed numu + Ar40 spline knots, where the unsmoothed knots come from the base xml, and the smoothed knots from the output of running replace_splines.py with revision made by python3 construct_representative_splines.py -i AR25_20i_00_000/numu_40Ar.parquet -o AR25_20i_00_000/smoothed_numu_40Ar.xml -t AR25_20i_00_000 -p 14 -n 1000180400:

With my apologies for a verbose PR, and my sincere thanks for discussions and help along the way, I once again ask @nusense and @sjgardiner for review. I would consider #468 superseded by this PR.

… smoothing them post hoc.

sjgardiner · 2026-07-10T18:19:51Z

Thanks a lot, @kjplows, sorry for only getting to this now. I've started reviewing this but am not done yet. Since you have confirmed here that it is obsolete, I will close PR #468 to avoid confusion.

Production and analysis workflow for generating many CCQE splines and…

1ed311f

… smoothing them post hoc.

kjplows self-assigned this May 25, 2026

kjplows requested review from nusense and sjgardiner May 25, 2026 14:43

kjplows marked this pull request as ready for review May 25, 2026 14:43

sjgardiner mentioned this pull request Jul 10, 2026

Adds a SplinePostProcessor singleton class to Generator #468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CCQE spline smoothing: a new workflow#495

CCQE spline smoothing: a new workflow#495
kjplows wants to merge 1 commit into
GENIE-MC:masterfrom
kjplows:feature/kplows-smoothing-scripts

kjplows commented May 25, 2026 •

edited

Loading

Uh oh!

sjgardiner commented Jul 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kjplows commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjgardiner commented Jul 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kjplows commented May 25, 2026 •

edited

Loading