Skip to main content

Analysis at Particle Level

  • Chapter
  • First Online:
Inclusive b Jet Production in Proton-Proton Collisions

Part of the book series: Springer Theses ((Springer Theses))

  • 196 Accesses

Abstract

The chapter is organised as follows. First, the disagreement in the b-tagged fraction observed in the previous chapter is investigated; a correction to the simulation is applied to fix the disagreement. Then, the b-jet and n-jet cross sections are simultaneously extracted together with advanced techniques of unfolding; the treatment of the systematic uncertainties in the unfolding is also discussed.

The method of Least Squares is seen to be our best course when we have thrown overboard a certain portion of our data—a sort of sacrifice which has often to be made by those who sail the stormy seas of Probability.

— Francis Ysidro Edgeworth

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This reference is the standard reference given in HEP. However, the same technique has already been published in other fields of science [9, 10]. In astronomy and optics for instance, it is known as Lucy-Richardson deconvolution.

References

  1. Marchesini I, Skovpen K, Discussions on calibration with JP. Private communication

    Google Scholar 

  2. Kuusela M (2014) Introduction to unfolding in high energy physics. http://smat.epfl.ch/~kuusela/talks/ETH_Jul_2014.pdf. Accessed 30 Nov 2017

  3. Schmitt S (2012) TUnfold: an algorithm for correcting migration effects in high energy physics. JINST 7: T10003. https://doi.org/10.1088/1748-0221/7/10/T10003, arXiv:1205.6201 [physics.data-an]

  4. Stefan Schmitt. “Data Unfolding Methods in High Energy Physics”. In: EPJ Web Conf. 137 (2017), p. 11008. https://doi.org/10.1051/epjconf/201713711008, arXiv:1611.01927 [physics.data-an]

  5. Hocker A, Kartvelishvili V (1996) SVD approach to data unfolding. Nucl. Instrum. Meth. A372:469-481. https://doi.org/10.1016/0168-9002(95)01478-0. arXiv:hep-ph/9509307 [hep-ph]

  6. Schmitt S (2017) Personal homepage. http://www.desy.de/~sschmitt/tunfold.html. Accessed 24 Sept 2017

  7. D’Agostini G (1995) A Multidimensional unfolding method based on Bayes’ theorem. Nucl Instrum Method A362:487–498. https://doi.org/10.1016/0168-9002(95)00274-X

    Article  ADS  Google Scholar 

  8. D’Agostini G (2010) Improved iterative Bayesian unfolding. arXiv:1010.0632 [physics.data-an]

  9. Mülthei HN, Schorr B (1987) On an iterative method for the unfolding of spectra. Nucl Instrum Methods Phys Res Sect A: Accel Spectrom Detect Assoc Equipm 257(2):371–377. ISSN: 0168-9002. https://doi.org/10.1016/0168-9002(87)90759-5

  10. Shustov AE, Ulin SE (2015) Matrix of response functions for deconvolution of gamma-ray spectra. Phys Proc 74(Supplement C):399–404. Fundamental research in particle physics and cosmophysics. ISSN: 1875-3892. https://doi.org/10.1016/j.phpro.2015.09.210, http://www.sciencedirect.com/science/article/pii/S1875389215014091

  11. Adye T (2011) Unfolding algorithms and tests using RooUnfold. arXiv:1105.1160. Comments: 6 pages, 5 figures, presented at PHYSTAT 2011, CERN, Geneva, Switzerland, January 2011, to be published in a CERN Yellow Report, 313-318. 6 p. https://cds.cern.ch/record/1349242

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick L. S. Connor .

Appendices

8.3 Details About Fit of Purity

In this appendix, we describe step by step the determination of the purity. First, we show the templates. Then we investigate the different ways to constrain charm in the CSVv2-tagged region; we compare pythia 8 and MadGraph. After, in order to justify our approach in the non-CSVv2-tagged region, we show that the fit is not stable.

1.1 8.3.1 Templates

In the discussion conducted in Sect. 8.1, the templates were sketched in Fig. 8.2; the templates in the CSVv2-tagged (non-CSVv2-tagged) region are shown in Fig. 8.13 (Fig. 8.14) in bins of \(p_T\) and y. For readability, the statistical errors are not shown; however, they become larger and larger for increasing JP values.

Tagged region. In the CSVv2-tagged region, the different templates are peaked at different values for the different flavours; however, the peak are less and less distinct while going to higher \(p_T\) and higher y; one also observes that the c templates lie halfway between the light and b templates.

Non-tagged region. On the other hand, in the non-CSVv2-tagged region, if the templates look different at low \(p_T\), they very similar at high \(p_T\). Moreover, the light component has a roughly 50 times larger statistics.

Fig. 8.13
figure 13

The shape of the JP discriminant is shown for the different flavours in the CSVv2-tagged sample. Each grid corresponds to a \((p_T,y)\) bin; the colours represent the different flavours

Fig. 8.14
figure 14

The shape of the JP discriminant is shown for the different flavours in the non-CSVv2-tagged sample. Each grid corresponds to a \((p_T,y)\) bin; the colours represent the different flavours

1.2 8.3.2 Determination of the Purity in the CSVv2-Tagged Region

In Figs. 8.15 and 8.16, the fit is investigated for different ways to constrain the c component with the pythia 8 and MadGraphsamples:

  • as an independent component, i.e. just as in Eq. 8.5 (corresponding to the blue circles);

  • together with the light component, i.e. \(N_\text {data}^\text {total} = p_\text {udsg+c} (N_\text {MC}^\text {udsg} + N_\text {MC}^\text {c}) + p_\text {b} N_\text {MC}^\text {b}\) (corresponding to the purple squares)

  • or together with the bottom component, i.e. \(N_\text {data}^\text {total} = p_\text {udsg} N_\text {MC}^\text {udsg} + p_\text {b+c} ( N_\text {MC}^\text {c} + N_\text {MC}^\text {b})\) (corresponding to the orange stars).

In all cases, the configuration where charm is constrained with bottom seems optimal, since at low \(p_T\), the correction is expected to be minimal (the disagreement in the fraction ratio mainly happens at high \(p_T\)). The findings were also confirmed using cMVAv2 (not shown here); the result was however of lower quality because the correlation of cMVAv2 with JP is greater.

The ratios of the JP discriminant in bins of \(p_T\) and y is shown before (after) the fit in Fig. 8.17 (Fig. 8.18).

Fig. 8.15
figure 15

Purity in simulation before and after fit in the CSVv2-tagged region with pythia 8. The columns (rows) correspond to the rapidity bins (flavours). Different configurations to constrain charm are considered

Fig. 8.16
figure 16

Purity in simulation before and after fit in the CSVv2-tagged region with MadGraph. The columns (rows) correspond to the rapidity bins (flavours). Different configurations to constrain charm are considered

Fig. 8.17
figure 17

Ratio of the JP discriminant in the CSVv2-tagged region, of simulation to data in bins of \((p_T,y)\) before the fit; the colours stands for the flavour. The \(\chi ^2\) per n.d.f. is given

Fig. 8.18
figure 18

Ratio of the JP discriminant in the CSVv2-tagged region, of simulation to data in bins of \((p_T,y)\) after the fit with b and c constrained together; the colours stands for the flavour. The \(\chi ^2\) per n.d.f. is given

1.3 8.3.3 Determination of the Purity in the Non-CSVv2-Tagged Region

In the non-CSVv2-tagged region, the difference of statistics of the contributions from lights and from HF components. The attempt of fit with a similar approach as in the CSVv2-tagged region (in the previous subsection) is shown in Fig. 8.19; only the case of b and c constrained together converged (in almost all bins, except at low \(p_T\) in the central region, the different attempts of fit systematically returned NaN), therefore it is the only one that can be shown here. This failure justifies the solution mentioned in Sect. 8.1, where the renormalisation factor of the \(b+c\) component is propagated from the CSVv2-tagged to the non-CSVv2-tagged region, and the light component is only rescaled to match the data; the result of this procedure, with the different charm constraints, is shown in Fig. 8.20.

The ratios of the JP discriminant in bins of \(p_T\) and y is shown before (after) the fit in Fig. 8.21 (Fig. 8.22).

Fig. 8.19
figure 19

Purity in simulation before and after fit in the CSVv2-tagged region with pythia 8. The columns (rows) correspond to the rapidity bins (flavours). Only the case where charm and bottom are constrained together is considered

Fig. 8.20
figure 20

Purity in simulation before and after extrapolation of the renormalisation of the b and c component and rescaling of the light component in the non-CSVv2-tagged region with pythia 8. The columns (rows) correspond to the rapidity bins (flavours). Different configurations to constrain charm are considered

Fig. 8.21
figure 21

Ratio of the JP discriminant of simulation to data in the non-CSVv2-tagged region in bins of \((p_T,y)\) before the fit; the colours stands for the flavour. The \(\chi ^2\) per n.d.f. is given

Fig. 8.22
figure 22

Ratio of the JP discriminant of simulation to data in the non-CSVv2-tagged region in bins of \((p_T,y)\) after the fit with b and c constrained together; the colours stands for the flavour. The \(\chi ^2\) per n.d.f. is given

8.4 Details About Unfolding Procedure

We give additional details in the procedure of unfolding.

First we show the regularisation matrix obtained from pythia 8. Then we discuss the treatment of the statistical uncertainties. Finally, we present additional checks.

1.1 8.4.1 Control Plots for Tikhonov Regularisation

The \(\mathbf {L}\) matrix obtained with the pythia 8 sample after model reweighting may be seen on Fig. 8.23: since only the \(p_T\) is regularised, only the diagonal of the rapidity cells are filled; the constraint index corresponds to the row index in Eq. 8.14, i.e. to a constraint on three consecutive bins at particle level. The \(\mathbf {L}\) matrix is therefore not a square matrix.

The effect of the regularisation on the unfolded spectra can be checked with the product \(\mathbf {L} \mathbf {x}\) (second term in Eq. 8.11). The product is shown in the three different scenarios in Fig. 8.24. One sees explicitly which bins need the more regularisation: at high \(p_T\), especially in the third rapidity bin.

Fig. 8.23
figure 23

\(\mathbf {L}\) matrix constructed from pythia 8. The x axis (y axis) stands for the \((p_T,y,\text {flavour})\) bins (constraint index). The values are given in arbitrary units; the blue (red) entries correspond to positive (negative) entries. The level of transparency is proportional to the absolute value

Fig. 8.24
figure 24

The product of regularisation matrix and unfolded result is shown in bins of rapidity (columns) and flavour (rows), shown for the three scenarios

1.2 8.4.2 Treatment of Statistical Uncertainties

The statistical uncertainty from the MC (via the RM) and from the data (from the measurement) is considered; the former is included as an additional uncertainty in the unfolding procedure, the latter is part of the unfolded result.

The covariance matrix in data at particle level is given by the following: 

$$\begin{aligned} \mathbf {V_x}&= \mathbf {B}^\intercal \mathbf {V_y}^{-1} \mathbf {B}^\intercal \end{aligned}$$
(8.20)

where \(\mathbf {B}\), which operates the transformation, is defined as follows:

$$\begin{aligned} \mathbf {B}&= \mathbf {E} \mathbf {A}^\intercal \mathbf {V_y}^{-1} \end{aligned}$$
(8.21)

with

$$\begin{aligned} \mathbf {E}&= \left( \mathbf {A}^\intercal \mathbf {V_y}^{-1} \mathbf {A} + \tau ^2 \mathbf {L}^\intercal \mathbf {L} \right) ^{-1} \end{aligned}$$
(8.22)

In the case of no regularisation (\(\tau = 0\)), the transformation simplifies to \(\mathbf {B} = \mathbf {A}^{-1}\), as expected for matrix inversion.

The covariance matrix before (after) unfolding can be seen on Fig. 8.25 (Fig. 8.26). The input covariance matrix contains only positive entries; off-diagonal events corresponds to correlations among jets coming from the same events. A single-count observable would show purely diagonal covariance matrices; here, since we are measuring a multi-count observable, there are significant non-diagonal contributions, which matter in the unfolding (see Eq. 8.10). The output covariance matrix contains negative entries (which translates into this chess-pattern); indeed, close bins are constrained together and are therefore correlated.

Fig. 8.25
figure 25

Covariance matrix from measurement. The large sectors correspond to the flavour bins, the cells to the rapidity bins and the small matrices to the \(p_T\) bins. The level of transparency denotes the magnitude of the content in arbitrary units. All entries are positive. Off-diagonal entries show correlations among jets coming from the same events

Fig. 8.26
figure 26

Total covariance matrix after unfolding procedure. The large sectors correspond to the flavour bins, the cells to the rapidity bins and the small matrices to the \(p_T\) bins. The level of transparency denotes the magnitude of the content in arbitrary units. The positive (negative) entries are coloured in blue (red)

1.3 8.4.3 Additional Checks

We present here some additional checks to certify the unfolding. We compare systematically the result of the unfolding obtained with the D’Agostini and Tikhonov regularisations.

1.3.1 8.4.3.1 Backfolding

The backfolding consists in applying the PM on the particle-level spectrum

$$\begin{aligned} \mathbf {y'} = \mathbf {A} \mathbf {x} \end{aligned}$$
(8.23)

The backfolded spectrum \(\mathbf {y'}\) can be compared with the measurement \(\mathbf {y}\). The difference is expected be of the order of the statistical fluctuations; however, since the backfolded spectrum still keeps track of the regularisation (either from the MC prior with D’Agostini or from the \(\mathbf {L}\) matrix with Tikhonov), therefore fluctuations are expectable.

The backfolding after the two algorithms is shown in Fig. 8.27. The curves are compatible with the statistical uncertainties, both for \(\hat{n}\) (above) and \(\hat{b}\) jets (below). The remaining fluctuations are similar for the two backfolded spectra (with the different algorithms) and for the simulation, and give an estimate of the effect of the regularisation.

Fig. 8.27
figure 27

The backfolding is compared for Tikhonov (blue) and D’Agostini (yellow) algorithms with the measurement (black, at one) and with pythia 8 using CSVv2 (red). Different numbers of iterations are shown for D’Agostini. The rows (columns) correspond to the flavour (rapidity)

1.3.2 8.4.3.2 \(\chi ^2\) of Agreement

The \(\chi ^2\) of agreement is defined in Eq. 8.10. It is shown on the left hand side of Fig. 8.28 for the different iterations and for the unfolding obtained with Tikhonov. One observes the converge to the D’Agostini unfolding to a value close to the one of Tikhonov.

1.3.3 8.4.3.3 \(\chi ^2\) of Change

The \(\chi ^2\) of change is defined as follows:

$$\begin{aligned} \chi ^2&= (\mathbf {x}_i - \mathbf {x}_{i-1})^\intercal \mathbf {V}^{-1} (\mathbf {x}_i - \mathbf {x}_{i-1}) \end{aligned}$$
(8.24)

where i denotes the iteration. It is shown on the right hand side of Fig. 8.28, where it is given for the change \(2^i\) to \(2^{i+1}\) iterations. The change is smaller and smaller, indicating the convergence.

Fig. 8.28
figure 28

\(\chi ^2\)s of agreement (left) and of change (right). The bins corresponds to different numbers of iterations in D’Agostini unfolding. The result obtained with D’Agostini (Tikhonov) unfolding is shown by a continuous (dashed) line

1.3.4 8.4.3.4 Bottom Line Test

In the Bottom Line Test (BLT), we compare the agreement of simulation and data before unfolding, after unfolding and after backfolding by computing the following \(\chi ^2\): 

$$\begin{aligned} \chi ^2_\text {BLT}&= \left( \mathbf {z}_\text {data} - \mathbf {z}_\text {MC} \right) ^\intercal \mathbf {V}^{-1}_\text {data} \left( \mathbf {z}_\text {data} - \mathbf {z}_\text {MC} \right) \end{aligned}$$
(8.25)

where \(\mathbf {z}=\mathbf {y}\) (before) or \(\mathbf {z}=\mathbf {x}\) (after) with respective data covariance matrix. One compares the values in the Tikhonov algorithm and for different number of iterations in the D’Agostini algorithm. If the unfolding is correctly performed, i.e. if only the effect of the detector is treated, then one does not expect the agreement to change significantly at the different levels.

The test performed with the pythia 8 sample can be seen in Fig. 8.29, where the value of Eq. 8.25 are shown for the different number of iterations of the D’Agostini unfolding in the bins and with a single line for the Tikhonov unfolding:

  • The BLT of the unfolding is shown on the left hand side of the figure. The higher number of iterations does not improve the global agreement of the result obtained with the D’Agostini algorithm. Moreover, the result obtained from D’Agostini has larger uncertainties, and therefore leads to a lower \(\chi ^2_\text {BLT}\) than with the Tikhonov algorithm.

  • The BLT of the backfolding is shown on the right hand side of the figure. In contrast to the BLT at particle level, the \(\chi ^2_\text {BLT}\) takes different values for D’Agostini, which is likely related to the treatment of the uncertainties. However, it goes to values of the same order as the result obtained with the Tikhonov algorithm and as the measurement. The fact the backfolding after Tikhonov algorithm has a lower \(\chi ^2_\text {BLT}\) than the measurement is explained by the regularisation; indeed, the backfolded spectrum is still regularised, with respect to the measurement.

Fig. 8.29
figure 29

Bottom Line Test, on the left (right) at hadron-level (detector-level) for the unfolding (backfolding). The iteration value corresponds to the one used in D’Agostini unfolding (continuous line), while the Tikhonov unfolding (dashed line) and the measurement (dashed line) has only one value

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Connor, P.L.S. (2019). Analysis at Particle Level. In: Inclusive b Jet Production in Proton-Proton Collisions. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-030-34383-5_8

Download citation

Publish with us

Policies and ethics