# A first determination of parton distributions with theoretical uncertainties

## Abstract

The parton distribution functions (PDFs) which characterize the structure of the proton are currently one of the dominant sources of uncertainty in the predictions for most processes measured at the Large Hadron Collider (LHC). Here we present the first extraction of the proton PDFs that accounts for the missing higher order uncertainty (MHOU) in the fixed-order QCD calculations used in PDF determinations. We demonstrate that the MHOU can be included as a contribution to the covariance matrix used for the PDF fit, and then introduce prescriptions for the computation of this covariance matrix using scale variations. We validate our results at next-to-leading order (NLO) by comparison to the known next order (NNLO) corrections. We then construct variants of the NNPDF3.1 NLO PDF set that include the effect of the MHOU, and assess their impact on the central values and uncertainties of the resulting PDFs.

The search for new physics at present [1] and future [2] high-energy colliders, and specifically at the LHC, has turned from the mapping of the energy frontier to the exploration of the precision frontier: looking for subtle deviations from Standard Model predictions. In this endeavor, an accurate estimate of uncertainties associated with these predictions is crucial. At present, these uncertainties have two main origins. The first is the missing higher order uncertainty (MHOU) from the truncation of the QCD perturbative expansion. The second is related to knowledge of the structure of the colliding protons, as encoded in the parton distributions (PDFs) [3].

PDFs are extracted by comparing theoretical predictions to experimental data. Currently, PDF uncertainties only account for the propagated statistical and systematic errors on the measurements used in their determination. However, the same MHOU which affects predictions at the LHC also affect predictions for the various processes that enter the PDF determination. These are currently neglected, perhaps because they are believed to be generally less important than experimental uncertainties. However, as PDFs become more precise, in particular thanks to ever tighter constraints from LHC data [4], MHOUs in PDF determinations will eventually become significant. Already in recent PDF sets making extensive use of LHC data, such as NNPDF3.1 [5], the shift between PDFs at next-to-leading order (NLO) and the next order (NNLO) is sometimes larger than the PDF uncertainties from the experimental data.

Here we present the first PDF extraction that systematically accounts for the MHOU in the QCD calculations used to extract them. MHOUs are routinely estimated by varying the arbitrary renormalization \(\mu _r\) and factorization \(\mu _f\) scales of perturbative computations [1], though alternative methods have also been proposed [6, 7, 8]. Our inclusion of the MHOU in a PDF fit involves two steps: first we establish how theoretical uncertainties can be included in such a fit through a covariance matrix [9, 10], and then we find a way of computing and validating the covariance matrix associated with the MHOU using scale variations [11]. By producing variants of NNPDF3.1 which include the MHOU, we are then able to finally address the long-standing question of their impact on state-of-the-art PDF sets. A detailed discussion of our results is presented in a companion paper [12], to which we refer for full computational details, definitions, proofs and results.

*i*-th cross-section, \(T_i^{(0)}\), due to the theory uncertainty, and

*N*is a normalization factor determined by the number of independent nuisance parameters. Since theory uncertainties are independent of the experimental ones, the two can be combined in quadrature: the \(\chi ^2\) used to assess the agreement of theory and data is given by

*i*-th datapoint, and \(C_{ij}\) the experimental covariance matrix. More details of the implementation of the theory covariance matrix in PDF fits may be found in Refs. [9, 10].

Classification of datasets into process types

Process type | Datasets |
---|---|

DIS NC | NMC, SLAC, BCDMS, HERA NC |

DIS CC | NuTeV, CHORUS, HERA CC |

DY | CDF, D0, ATLAS, CMS, LHCb ( |

JET | ATLAS, CMS inclusive jets |

TOP | ATLAS, CMS total + differential cross-sections |

Next, we formulate a variety of prescriptions for how to construct Eq. (1) by picking a set of scale variations and correlation patterns. A simple possibility is the 3-point prescription, in which we vary both scales coherently (thus setting \(k_f=k_r\)) by a fixed amount about the central value, independently for each process. More sophisticated prescriptions vary the two scales independently, but by the same amount, and assume that while \(\mu _r\) is only correlated within a given process, \(\mu _f\) is fully correlated among processes. This assumption is based on the observation that \(\mu _f\) variations estimate the MHOU in the evolution equations, which are universal (process-independent), though it is an approximation given that the evolution of different PDFs is governed by different anomalous dimensions, which do not necessarily share the same MHO corrections.

We then proceed to the validation of the resulting covariance matrices at NLO. We use the same experimental data and theory calculations as in the NNPDF3.1 \(\alpha _s\) study [13] with two minor differences: the value of the lower kinematic cut has been increased from \(Q_{\mathrm{min}}^2=2.69\hbox { GeV}^2\) to \(13.96\hbox { GeV}^2\) in order to ensure the validity of the perturbative QCD expansion when scales are varied downwards, and the HERA \(F_2^b\) and fixed-target Drell-Yan cross-sections have been removed, for technical reasons related to difficulties in implementing scale variation. In total we then have \(N_\mathrm{dat}=2819\) data points. The theory covariance matrix \(S_{ij}\) has been constructed by means of the ReportEngine software [14] taking as input the scale-varied NLO theory cross-sections \(T_i(k_f,k_r)\), provided by APFEL [15] for the DIS structure functions and by APFELgrid [16] combined with APPLgrid [17] for the hadronic cross-sections.

The validation of the full covariance matrix including correlations is more subtle. We first diagonalize \({\widehat{S}}_{ij}\), by finding the (orthonormal) eigenvectors \(e^a_i\) which correspond to positive eigenvalues \((s^a)^2\): these define a subspace *S* orthonormal to the large null subspace. The dimension \(N_S\) of *S* depends on the total number of independent scale variations, the number of processes, and the correlation pattern. Its determination is nontrivial, and it requires computing firstly the total number of distinct scale variations for any pair of processes, i.e., the total number of vectors \(\varDelta ^{(k)}\) in Eq. (1), and secondly determining the full set of linear relations between them in order to establish how many of them are independent (see Ref. [12]).

The validation can be considered successful if the angle \(\theta =\arcsin ( |\delta ^{\mathrm{miss}}_i|/|\delta _i|)\) is small, meaning that the NNLO-NLO shift lies substantially within the subspace *S* estimated by the scale variations, and furthermore if \(|\delta ^a|\simeq |s^a|\), so that the size of the shift along each eigenvector is correctly estimated by the corresponding eigenvalue. Using the 9-point prescription, for individual processes we find \(\theta =3^{\circ }, 14^{\circ }, 22^{\circ }, 32^{\circ }, 16^{\circ }\) for top, jets, DY, NC and CC DIS respectively. For the complete dataset with the same prescription we find \(\theta =26^{\circ }\).

The projected shifts and eigenvalues are compared in Fig. 2. The size of the eigenvalues generally falls as the projected shifts get smaller. For the six largest eigenvectors the eigenvalue is always larger than the shift and, in all but two cases, of very similar size to the shift. The seventh eigenvalue is smaller than, but of the same order as, the shift, while the eighth eigenvalue significantly underestimates the shift. However, given that the eighth eigenvalue is already one order of magnitude smaller that the first, this means that most of the shift is well described by the theory covariance matrix, and somewhat overestimated by it in just a few cases. We conclude that the validation is successful: remarkably, the pattern of correlations of theory shifts in a 2819-dimensional vector space is well captured by just 28 nuisance parameters.

We can now proceed to a NLO global PDF determination with a theory covariance matrix \(S_{ij}\) computed using the 9-point prescription. From the point of view of the NNPDF fitting methodology, the addition of the theory contribution to the covariance matrix does not entail any changes: we follow the procedure of Ref. [18], but with the covariance matrix \(C_{ij}\) now replaced by \(C_{ij}+S_{ij}\), both in the Monte Carlo replica generation and in the fitting. In Table 2 we show some fit quality estimators for the resulting PDF sets obtained using only the experimental covariance matrix, alongside the theory covariance matrix with two different prescriptions.

The central \(\chi ^2\) per datapoint and the average uncertainty reduction \(\phi \) for the 3-point and 9-point fits

| \(C+ S^{(\mathrm 3pt)}\) | \(C+ S^{(\mathrm 9pt)}\) | |
---|---|---|---|

\(\chi ^2\) | 1.139 | 1.139 | 1.109 |

\(\phi \) | 0.314 | 0.394 | 0.415 |

In Fig. 4 we compare at \(Q=10\) GeV the gluon and quark singlet PDFs obtained at NLO with and without a theory covariance matrix, normalized to the latter. We also show the central NNLO result when the theory covariance matrix is not included. Three features of this comparison are apparent. First, when including the MHOU, the increase in PDF uncertainty in the data region is quite moderate, in agreement with the \(\phi \) values of Table 2. Second, the NLO-NNLO shift is fully compatible with the overall uncertainty. Finally, the central value is also modified by the inclusion of \(S_{ij}\) in the fit, as the balance between different data sets adjusts according to their relative theoretical precision. Interestingly, the central prediction shifts towards the known NNLO result, showing that, thanks to the inclusion of the MHOU, the overall fit quality has improved.

It is important to understand that the meaning of PDFs and their uncertainties changes once the theory covariance matrix is included: so the error bands e.g. in Fig. 4 have a different meaning according to whether the theory covariance matrix is included. When it is included, PDF uncertainties account for data and methodological uncertainties, but also for MHOUs. Also, their central values now optimize the agreement with data based on a \(\chi ^2\) which includes MHOUs.

The usage of these PDFs is accordingly different. Firstly, they should be combined with hard cross-sections which also include MHOU. The MHOU on the prediction and the PDF uncertainty (now also including MHOUs) should be combined in the standard way (i.e. in quadrature), since with a universal PDF it is not possible to keep track of the correlations (which surely exist) between MHOU in processes used for PDF determination, and the MHOU in the prediction itself. This neglected correlation is likely to be a small effect in most situations [12], and it leads to a conservative uncertainty estimate. Second, it is important to keep in mind that MHOUs in the theory prediction must be included in the computation of the \(\chi ^2\) when assessing the agreement of these PDFs with new data, since, as we have seen, their central value is shifted as a consequence of the inclusion of the MHOUs.

In summary, we have presented the first global PDF analysis that accounts for the MHOU associated with the fixed order QCD perturbative calculations used in the fit. The inclusion of the MHOU shifts central values by an amount that is not negligible on the scale of the PDF uncertainty, moving the NLO result towards the NNLO result. PDF uncertainties increase moderately, because of the improvement of fit quality due to the rebalancing of datasets according to their theoretical precision. For this to be effective, the correlations in \(S_{ij}\) play a crucial role. These correlations are rather more extensive than those related to experimental systematics, since all different measurements of the same process are correlated through their common MHO corrections, and different processes are correlated through MHO corrections to perturbative evolution. A more accurate treatment of these correlations (especially those related to perturbative evolution) will be the subject of future studies.

Our results pave the way towards a fully consistent treatment of MHOU for precision LHC phenomenology. The NLO results presented here will be upgraded to NNLO, facilitated by tools such as the APPLfast grid interface to the NNLOJET program [19]. We thus anticipate that the upcoming NNPDF4.0 PDF set will be able to fully account for MHOU both at NLO and NNLO, as well as other sources of theory uncertainty, such as those related to nuclear corrections [10, 20].

## Notes

### Acknowledgements

R. D. B. is supported by the UK Science and Technology Facility Council through Grant ST/P000630/1.S. F. is supported by the European Research Council under the European Union’s Horizon 2020 research and innovation Programme (Grant agreement n.740006). T. G. is supported by The Scottish Funding Council, grant H14027. Z. K. is supported by the European Research Council Consolidator Grant “NNLOforLHC2”. E. R. N. is supported by the European Commission through the Marie Skłodowska-Curie Action ParDHonS_FFs.TMDs (Grant number 752748). R. L. P. and M. W. are supported by the STFC Grant ST/R504737/1. J. R. is supported by the European Research Council Starting Grant “PDF4BSM” and by the Netherlands Organization for Scientific Research (NWO). L. R. is supported by the European Research Council Starting Grant “REINVENT” (Grant number 714788). M. U. is partially supported by the STFC Grant ST/L000385/1 and funded by the Royal Society grants DH150088 and RGF/EA/180148. C. V. is supported by the STFC grant ST/R504671/1.

## References

- 1.D. de Florian et al., Handbook of LHC higgs cross sections: 4. Deciphering the nature of the higgs sector. https://doi.org/10.2172/1345634, https://doi.org/10.23731/CYRM-2017-002 (2016) arXiv:1610.07922
- 2.M. Cepeda, et al., (2019). arXiv:1902.00134
- 3.J. Gao, L. Harland-Lang, J. Rojo, Phys. Rep.
**742**, 1 (2018). arXiv:1709.04922 ADSMathSciNetCrossRefGoogle Scholar - 4.
- 5.
- 6.
- 7.
- 8.E. Bagnaschi, M. Cacciari, A. Guffanti, L. Jenniches, JHEP
**02**, 133 (2015). arXiv:1409.5036 ADSCrossRefGoogle Scholar - 9.R. D. Ball, A. Deshpande, The proton spin, semi-inclusive processes, and measurements at a future electron ion collider, in
*From My Vast Repertoire ... : Guido Altarelli’s Legacy*, ed. By A. Levy, S. Forte, G. Ridolfi (World Scientific, Singapore, 2018). arXiv:1801.04842 - 10.R.D. Ball, E.R. Nocera, R.L. Pearson, Eur. Phys. J. C
**79**(3), 282 (2019). arXiv:1812.09074 ADSCrossRefGoogle Scholar - 11.R.L. Pearson, C. Voisey, Nucl. Part. Phys. Proc.
**300–302**, 24 (2018). arXiv:1810.01996 CrossRefGoogle Scholar - 12.R. Abdul Khalek, et al., (2019). arXiv:1906.10698
- 13.R.D. Ball, S. Carrazza, L. Del Debbio, S. Forte, Z. Kassabov, J. Rojo, E. Slade, M. Ubiali, Eur. Phys. J. C
**78**(5), 408 (2018). arXiv:1802.03398 ADSCrossRefGoogle Scholar - 14.Z. Kassabov. Reportengine: a framework for declarative data analysis. https://doi.org/10.5281/zenodo.2571601 (2019)
- 15.V. Bertone, S. Carrazza, J. Rojo, Comput. Phys. Commun.
**185**, 1647 (2014). arXiv:1310.1394 ADSMathSciNetCrossRefGoogle Scholar - 16.V. Bertone, S. Carrazza, N.P. Hartland, Comput. Phys. Commun.
**212**, 205 (2017). arXiv:1605.02070 ADSCrossRefGoogle Scholar - 17.
- 18.
- 19.T. Gehrmann, et al., PoS RADCOR2017, 074 (2018). arXiv:1801.06415
- 20.R. Abdul Khalek, J.J. Ethier, J. Rojo, Eur. Phys. J. C
**79**, 471 (2019). arXiv:1904.00018

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Funded by SCOAP^{3}