CCQE spline smoothing: a new workflow#495
Open
kjplows wants to merge 1 commit into
Open
Conversation
… smoothing them post hoc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CCQE cross sections for tunes like the "argon tunes" AR23_20i_00_000 and the upcoming AR25_20i_00_000 rely on Monte Carlo integration of the probe + struck nucleon system. The
NewQELXSecalgorithm handles polling from the nuclear ground state to provide this initial struck nucleon: it is configurable viaNumNucleonThrowsto performNpolls, and report the average calculated cross section from allNpolls as the spline result, per knot.This has the unintended consequence of adding a statistical uncertainty to the spline knots. For states using fairly "peaked" ground states, such as LFG (AR23_20i_00_000), this uncertainty is small (if still noticeable on the scale of the cross section). For a more realistic spectral function (such as in AR25_20i_00_000), the uncertainty is quite large.
To prove my point: below, a comparison of the result of numu + 40Ar CCQE scattering, with

N=500(currentNewQELXSecdefault) nucleons ofLocalFGM_SpectralFunctionLikeWithCorrelationandSpectralFuncground states (i.e. AR23_20i_00_000 and AR25_20i_00_000). (Note the difference of scale peak, which is caused by a switching off of the RPA correction in the AR25_20i_00_000 tune). The jitter leads to significant instability in the tail of O(1 GeV).In a previous PR #468 , I attemtped an early solution to the problem, by running a Gaussian kernel density estimator to "smooth out" these splines: neighbouring points would hopefully "pull" each knot towards what is assumed to be the "true" underlying cross section.
In this PR, I have implemented a new production workflow (currently optimised for Fermilab grid nodes, but extensible to any node that can run singularity SL7), which instead of running
gmksplonce withNnucleons polled, instead runsgmksplNtimes with one nucleon polled each time, incrementing the random number seed by 1 every time. The results are collected in aparquetfile using python3 (required packages are installed on the worker node on workflow execution), and the parquet plus an xml file using the mean of these throws are returned.The files can be found in
$GENIE/src/contrib/ccqe_spline_smoothing/:jobsub_submit --dag.pdgs_for_splines.cfgthat tells the workflow which neutrino and target PDGs to make splines for, and relies on thesbndUPS setup to retrieve tagged SBN-wide versiongenie v3_06_02_sbn3at qualifiere26:profand use tuneAR25_20i_01_001(tune is described in New CMCs and small fixes for GENIE #490), and setsN=1000, for 250 knots with emax = 100 GeV.The config file reads:
so four splines will be made: nue at 12C, nuebar at 56Fe, numu at 40Ar, and numubar at 16O.
A second plot: Demonstration of the error of the mean. This uses the SpectralFunction ground state from AR25_20i_00_000,. Each pink 'x' is one result from an

N=500run, of which 100 runs were performed. The median of these runs is the dashed red line (in this energy range, functionally identical to the mean): the red band is the error on the median (again, almost identical to the error on the mean).Finally, I include a script
construct_representative_splines.py(alongside arequirements.txtfile that can be used alongsidepyenv+ Python 3.13.5 to run it on AL9 machines) that takes as input the parquets from the workflow above and runs a Gaussian process regression fromsklearn(many more details about Gaussian processes can be seen in the User Guide) and make a smoothed version of the spline.Grosso modo, the model is as follows: Suppose that the cross sections reported by the
Nruns we took follow a true latent distributiondσ/dΕν (Εν)with heteroskedastic noiseεoverlaid on top. We know at each knot the mean,<σ>, and the error,δσ: from that, a Gaussian process regressor learns the characteristic "length scale" of the latent distribution, and distinguishes between the latent distribution and noise.This script writes out a summary parquet with the mean, error-on-mean, prediction, and error on the prediction, and a GENIE XML file with one spline containing the predicted cross section on each knot.
For brevity, only a few plots will follow, but more are available if desired.
First, a comparison between taking the mean over
N=10 000nucleons (solid red points) and the Gaussian process regression (orange dashed line and coloured band) for numu on 40Ar, close to the peak: black points with error bars are the result of running this workflow overN = 1000nucleons and are used as input to the Gaussian process regressor.Next, a comparison between taking the mean over


N=1000nucleons (solid points) and the Gaussian process regression (coloured bands) for four tunes:G18_10a_02_11b,AR23_20i_00_000,AR25_20i_00_000, andAR25_20i_01_001, all for numu on 40Ar, in the region <= 100 GeV (top plot linear in Enu, bottom logaritmic in Enu):Jitter is evident in all four tunes (two use LFG, two use SF); the Gaussian process reduces these.
Third, equivalent plots for numubar on 16O, nue on 12C, and nuebar on 56Fe (the three other nuclei for which spectral function implementations exist):



Finally, two more scripts are provided:
gxspl-NUsmall.xml) and arevision xml file(such as the output ofconstruct_representative_spline.py), it searches the base xml for the spline(s) contained in the revision xml, and produces an output xml. The output is identical to base, apart from the splines contained in the revision xml which are drawn from the revision.genie_xseclarsoft.opensciencegrid.org area that @nusense wrote) that keeps onlyCC-QES(not charm/strange) splines for nu(e, mu)(bar), on only Ar40, O16, C12, Fe56.A final plot: comparison between unsmoothed and smoothed numu + Ar40 spline knots, where the unsmoothed knots come from the base xml, and the smoothed knots from the output of running

replace_splines.pywith revision made bypython3 construct_representative_splines.py -i AR25_20i_00_000/numu_40Ar.parquet -o AR25_20i_00_000/smoothed_numu_40Ar.xml -t AR25_20i_00_000 -p 14 -n 1000180400:With my apologies for a verbose PR, and my sincere thanks for discussions and help along the way, I once again ask @nusense and @sjgardiner for review. I would consider #468 superseded by this PR.