Just as Chap. 1 is not meant to be a complete introduction to electroweak symmetry breaking but is aimed at introducing the aspects of Higgs physics most relevant to the LHC this section cannot cover the entire field of QCD. Instead, we will focus on QCD as it impacts LHC physics and searches for new physics at the LHC, like for example the Higgs searches discussed in the first part of the lecture.

In Sect. 2.1 we will introduce the most important process at the LHC, the Drell–Yan process or lepton pair production. This process will lead us through all of the introduction into QCD. Ultraviolet divergences and renormalization we will only mention in passing, to get an idea how much of the treatment of ultraviolet and infrared divergences works the same way. After discussing in detail infrared divergences in Sects. 2.32.5 we will spend some time on modern approaches on combining QCD matrix element calculations at leading order and next-to-leading order in perturbative QCD with parton showers. This last part is fairly close to current research with the technical details changing rapidly. Therefore, we will rely on toy models to illustrate the different approaches.

2.1 Drell–Yan Process

Most text books on QCD start from a very simple class of QCD processes, namely deep inelastic scattering. These are processes with the HERA initial state \(e^\pm p.\) The problem with this approach is that for the LHC era such processes are not very useful. What we would like to understand instead are processes of the kind \(pp \to W\!\mathop{+}\hbox{jets}, pp \to H\!\mathop{+}\hbox{jets}, pp \to t\bar{t}\!\mathop{+}\hbox{jets},\) or the production of new particles with or without jets. These kind of signal and background processes and their relevance in an LHC analysis we already mentioned in Sect. 1.4.3.

From a QCD perspective such processes are very complex, so we need to step back a little and start with a simple question: we know how to compute the production rate and distributions for photon and Z production for example at LEP, \(e^+e^- \to \gamma,Z \to \ell^+ \ell^-.\) What is the production rate for this final state at the LHC, how do we account for quarks inside the protons, and what are the best-suited kinematic variables to use at a hadron collider?

2.1.1 Gauge Boson Production

The simplest question we can ask at the LHC is: How do we compute the production of a weak gauge boson? This process is referred to as Drell–Yan production. In our first attempts we will explicitly not care about additional jets, so if we assume the proton to consist of quarks and gluons and simply compute the process \(q \bar{q} \to \gamma,Z\) under the assumption that the quarks are partons inside protons. Gluons do not couple to electroweak gauge bosons, so we only have to consider valence quark versus sea anti-quark scattering in the initial state. Modulo the \(SU(2)\) and \(U(1)\) charges which describe the \(Zf \bar{f}\) and \(\gamma \bar{f} f\) couplings in the Feynman rules

$$\begin{array}{lr}\fbox{$- i \gamma^{\mu} \left( \ell {\mathbb{P}}_{L} + r{\mathbb{P}}_{R} \right)$} \quad \hbox{with}\, \ell = \frac{e}{s_{\it w} c_{\it w}} \left( T_3 - 2 Q s_{\it w}^2 \right) \quad\quad &r = \ell \Big|_{T_3=0} \quad ({\it Zf} \bar{\it f}) \\ \ell = r = Q e &(\gamma {\it f} \bar{\it f}),\\ \end{array}$$
(2.1)

with \(T_3 = \pm 1/2,\) the matrix element and the squared matrix element for the partonic process

$$ q \bar{q} \to \gamma,\!Z $$
(2.2)

will be the same as the corresponding matrix element squared for \(e^+ e^- \to \gamma,\!Z,\) with an additional color factor. The general amplitude for massless fermions is

$$ {\fancyscript{M}}= - i \bar{{\it v}}(k_2) \gamma^\mu \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) u(k_1) \varepsilon_\mu. $$
(2.3)

Massless fermions are a good approximation at the LHC for all particles except for the top quark. For the bottom quark we need to be careful with some aspects of this approximation, but the first two generation quarks and all leptons are usually assumed to be massless in LHC simulations. Once we will arrive at infrared divergences in LHC cross sections we will specifically discuss ways of regulating them without introducing masses.

Squaring the matrix element in Eq. 2.3 means adding the same structure once again, just walking through it in the opposite direction. Luckily, we do not have to care about factors of \((-i)\) since we are only interested in the absolute value squared. Because the chiral projectors \({{\mathbb{P}}_L}r = (\hbox{1\hspace{-3pt}I} \mp \gamma_5)/2\) are real and \(\gamma_5^T=\gamma_5\) is symmetric the left and right-handed gauge boson vertices described by the Feynman rules in Eq. 2.1 do not change under transposition, so for the production of a massive Z boson we obtain

$$ \begin{aligned}[b] \left|{\fancyscript{M}} \right|^2 &= \sum_{\rm spin,pol,color} \bar{u}(k_1) \gamma^{\it v} \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) {\it v}(k_2) \bar{{\it v}}(k_2) \gamma^\mu\\ & \quad \times \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) u(k_1) \varepsilon_\mu \varepsilon^*_{\it v}\\ &= N_c \hbox{Tr}\big[{/\!\!\!k}_1 \gamma^{\it v} \left(\ell{{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) {/\!\!\!k}_2 \gamma^\mu\\ &\quad \times \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) \big]\left( - g_{\mu {\it v}} + \frac{q_\mu q_{\it v}}{m_Z^2} \right) \quad \hbox {in unitary gauge} \\ &= N_c \hbox{Tr}\left[\,\,{/\!\!\!k}_1 \gamma^{\it v} \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) \right. \\ &\quad \times\left.\left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) {/\!\!\!k}_2 \gamma^\mu \right] \left( - g_{\mu {\it v}} + \frac{q_\mu q_{\it v}}{m_Z^2} \right) \quad \hbox{with}\, \{ \gamma_\mu,\gamma_5 \}= 0 \\ &= N_c \hbox{Tr}\left[ {/\!\!\!k}_1 \gamma^{\it v} \left( \ell^2 \frac{\hbox{1\hspace{-3pt}I}}{2} + r^2\frac{\hbox{1\hspace{-4pt}I}}{2} \right) {/\!\!\!k}_2 \gamma^\mu \right]\\ &\quad \times\left( - g_{\mu {\it v}} +\frac{q_\mu q_{\it v}} {m_Z^2} \right) \quad \hbox{symmetric polarization sum} \\ &= \frac{N_c}{2} \left( \ell^2 + r^2 \right) \hbox{Tr}\left[ {/\!\!\!k}_1 \gamma^{\it v} {/\!\!\!k}_2 \gamma^\mu \right]\left( - g_{\mu {\it v}} + \frac{q_\mu q_{\it v}}{m_Z^2} \right)\\ &= 2 N_c \left( \ell^2 + r^2 \right) \left[ k_1^\mu k_2^{\it v} + k_1^{\it v} k_2^\mu - (k_1 k_2) g^{\mu {\it v}} \right]\\ & \quad \times\left( - g_{\mu {\it v}} + \frac{q_\mu q_{\it v}}{m_Z^2} \right)\\ &= 2 N_c \left( \ell^2 + r^2 \right) \left[\vphantom{\left.+ 2 \frac{(-k_1 k_2)^2}{m_Z^2} - \frac{(k_1 k_2)q^2}{m_Z^2} \right]} - 2 (k_1 k_2) + 4 (k_1 k_2)\right.\\ &\quad \left.+\,\,2 \frac{(-k_1 k_2)^2}{m_Z^2} - \frac{(k_1 k_2)q^2}{m_Z^2} \right] \quad \hbox{with}\,\,(q k_1) = - (k_1 k_2) \\ &= 2 N_c \left( \ell^2 + r^2 \right) \left[2 (k_1 k_2)+ \frac{q^4}{2m_Z^2} - \frac{q^4}{2m_Z^2} \right] \hbox {with}\,\,q^2 = (k_1 + k_2)^2 \\ &= 2 N_c \left( \ell^2 + r^2 \right) q^2\\ &= 2 N_c \left( \ell^2 + r^2 \right) m_Z^2\\ \end{aligned} $$
(2.4)

The momenta are \(k_1\) and \(k_2\) for the massless incoming (anti-) fermions and \(q\,{=}\,-k_1-k_2\) for the outgoing gauge boson. The color factor \(N_c\) accounts for the number of \(SU(3)\) states which can be combined to form a color singlet like the Z.

An interesting aspect coming out of our calculation is that the \(1/m_Z\)-dependent terms in the polarization sum do not contribute—as far as the matrix element squared is concerned the Z boson could as well be transverse. This reflects the fact that the Goldstone modes do not couple to massless fermions, just like the Higgs boson. It also means that the corresponding matrix element squared for the photon production simply means not replacing \(q^2 \to m_Z^2\) in the last step.

What is still missing is an averaging factor for initial-state spins and colors, only the sum is included in Eq. 2.4. For incoming electrons as well as incoming quarks this factor \(K_{ij}\) includes \(1/4\) for the spins. Since we do not observe color in the initial state, and the color structure of the incoming \(q \bar{q}\) pair has no impact on the Z–production matrix element, we also average over the color. This means another factor \(1/N_c^2\) in the averaged matrix element, which altogether becomes (in four space–time dimensions)

$$ K_{ij} = \frac{1}{4 N_c^2} $$
(2.5)

In spite of our specific case in Eq. 2.4 looking that way, matrix elements we compute from our Feynman rules are not automatically numbers with mass unit zero.

If for the invariant mass of the two quarks we introduce the Mandelstam variable \(s = (k_2+k_2)^2 = 2 (k_1 k_2)\) momentum conservation implies \(s = q^2 = m_Z^2.\) In four space–time dimensions (this detail will become important later) we can compute a total cross section from the matrix element squared, for example as given in Eq. 2.4, as

$$ \fbox{$s \dfrac{d \sigma}{d y} \Bigg|_{2 \to 1}=\dfrac{\pi}{(4 \pi)^2} K_{ij} \left( 1 - \tau \right) {\left|{\fancyscript{M}} \right|^2}$} \qquad \tau = \frac {m_Z^2}{s}. $$
(2.6)

The mass of the final state appears in \(\tau\), with \(\tau =0\) for a massless photon. It would be replaced to include \(m_W\) or the Higgs mass or the mass of a Kaluza–Klein graviton if needed.

From LEP we know that such a heavy gauge boson we do not actually observe at colliders. What we should really calculate is the production for example of a pair of fermions through an s-channel Z and \(\gamma,\) where the Z might or might not be on its mass shell. The matrix element for this process we can derive from the same Feynman rules in Eq. 2.1, now for an incoming fermion \(k_1,\) incoming anti-fermion \(k_2,\) outgoing fermion \(p_1\) and outgoing anti-fermion \(p_2.\) To make it easy to switch particles between initial and final states, we can define all momenta as incoming, so momentum conservation means \(k_1+k_2+p_1+p_2=0.\) The additional Mandelstam variables we need to describe this \((2 \to 2)\) process are \(t=(k_1+p_1)^2<0\) and \(u=(k_1+p_2)^2<0,\) as usually with \(s+t+u=0\) for massless final state particles. The \((2 \to 2)\) matrix element for the two sets of incoming and outgoing fermions becomes

$$ \begin{aligned}[b] {\fancyscript{M}}&= (- i)^2 \bar{u}(p_1) \gamma^{\it v} \left( \ell^{\prime}{{\mathbb{P}}_L} + r^{\prime} {{\mathbb{P}}_R} \right) {\it v}(p_2) \\ &\quad\times \bar{{\it v}}(k_2) \gamma^\mu \left( \ell {{\mathbb{P}}_L} + r {{\mathbb{P}}_R} \right) u(k_1) \frac{i}{q^2-m_Z^2} \left( - g_{\mu {\it v}} + \frac{q_\mu q_{\it v}}{m_Z^2} \right). \end{aligned} $$
(2.7)

The coupling to the gauge bosons are \(\ell\) and r for the incoming quarks and \(\ell^{\prime}\) and \(r^{\prime}\) for the outgoing leptons. When we combine the four different spinors and their momenta correctly the matrix element squared factorizes into twice the trace we have computed before. The corresponding picture is two fermion currents interacting with each other through a gauge boson. All we have to do is combine the traces properly. If the incoming trace includes the indices \(\mu\) and \({\it v}\) and the outgoing trace the indices \(\rho\) and \(\sigma,\) the Z bosons link \(\mu\) and \(\rho\) as well as \({\it v}\) and \(\sigma.\)

To make the results a little more compact we continue computing this process for a massless photon instead of the Z boson, i.e. for the physical scenario where the initial-state fermions do not have enough energy to excite the intermediate Z boson. The specific features of an intermediate massive Z boson we postpone to Sect. 2.1.2. The assumption of a massless photon simplifies the couplings to \((\ell^2 + r^2) = 2Q^2 e^2\) and the polarization sums to \(-g_{\mu {\it v}}\!\!:\)

$$ \begin{aligned}[b] {\left|{\fancyscript{M}} \right|^2} &= 4 N_c (2 Q^2 e^2) (2 {Q^{\prime}}^2 e^2) \frac{1}{q^4} \left[ k_1^\mu k_2^{\it v} + k_1^{\it v} k_2^\mu - (k_1 k_2) g^{\mu {\it v}} \right] \left( - g_{\mu \rho} \right)\\ & \times\left[ p_1^\rho p_2^\sigma + p_1^\sigma p_2^\rho - (p_1 p_2) g^{\rho \sigma} \right] \left( - g_{{\it v} \sigma} \right)\\ &=\frac{16 N_c Q^2 {Q^{\prime}}^2 e^4}{q^4} \left[ k_{1 \rho} k_2^{\it v} + k_1^{\it v} k_{2 \rho} - (k_1 k_2) g_\rho^{\it v} \right] \left[ p_1^\rho p_{2 {\it v}} + p_{1 {\it v}} p_2^\rho - (p_1 p_2) g^\rho_{\it v} \right] \\ &= \frac{16 N_c Q^2 {Q^{\prime}}^2 e^4}{q^4} \Big[ 2 (k_1 p_1) (k_2 p_2) + 2 (k_1 p_2) (k_2 p_1) \\ & - 2 (k_1 k_2) (p_1 p_2) - 2 (k_1 k_2) (p_1 p_2) + 4 (k_1 k_2) (p_1 p_2) \Big] \\ &= \frac{32 N_c Q^2 {Q^{\prime}}^2 e^4}{q^4} \left[ (k_1 p_1) (k_2 p_2) + (k_1 p_2) (k_2 p_1) \right]\\ &= \frac{32 N_c Q^2 {Q^{\prime}}^2 e^4}{s^2} \left[\frac{t^2}{4}+\frac{u^2}{4}\right]\\ &= \frac{8 N_c Q^2 {Q^{\prime}}^2 e^4}{s^2}\left[ s^2 +2st + 2 t^2 \right]\\ &= 8 N_c Q^2 {Q^{\prime}}^2 e^4 \left[ 1 + 2 \frac{t}{s}+2\frac{t^2}{s^2}\right] \end{aligned} $$
(2.8)

We can briefly check if this number is indeed positive, using the definition of the Mandelstam variable t for massless external particles in terms of the polar angle \(t = s (-1 + \cos \theta)/2 = -s \cdots 0{:}\) the upper phase space boundary \(t=0\) inserted into the brackets in Eq. 2.8 gives \([\cdots]=1,\) just as the lower boundary \(t=-s\) with \([\cdots]=1-2+2=1\). For the central value \(t=-s/2\) the minimum value of the brackets is \([\cdots]=1 -1 + 0.5 = 0.5.\)

The azimuthal angle \(\phi\) plays no role at colliders, unless you want to compute gravitational effects on Higgs production at ATLAS and CMS. Any LHC Monte Carlo will either random-generate a reference angle \(\phi\) for the partonic process or pick one and keep it fixed.

The two-particle phase space integration for massless particles then gives us

$$ \fbox{$s^2 \dfrac{d \sigma}{d t}\Bigg|_{2 \to 2} = \dfrac{\pi}{(4 \pi)^2} K_{ij} {\left|{\fancyscript{M}} \right|^2}$} \qquad t = \frac{s}{2}\left( -1 + \cos \theta \right)\!. $$
(2.9)

For our Drell–Yan process we then find the differential cross section in four space–time dimensions, using \(\alpha = e^2/(4 \pi)\)

$$ \begin{aligned} [b] \frac{d \sigma}{d t} &= \frac{1}{s^2} \frac{\pi}{(4 \pi)^2} \frac{1}{4 N_c} 8 Q^2 {Q^{\prime}}^2 (4 \pi \alpha)^2 \left( 1 + 2 \frac{t}{s} + 2 \frac{t^2}{s^2}\right)\\ &= \frac{1}{s^2}\frac{2 \pi\alpha^2}{N_c}Q^2 {Q^{\prime}}^2 \left(1+2 \frac{t}{s}+2 \frac{t^2}{s^2}\right)\!{,} \end{aligned} $$
(2.10)

which we can integrate over the polar angle or the Mandelstam variable t to compute the total cross section

$$ \begin{aligned} [b] \sigma &= \frac{1}{s^2} \frac{2 \pi\alpha^2}{N_c} Q^2 {Q^{\prime}}^2 \int\nolimits_{-s}^0 dt \left( 1 + 2 \frac{t}{s} + 2 \frac{t^2}{s^2}\right)\\ &=\frac{1}{s^2}\frac{2 \pi\alpha^2}{N_c} Q^2 {Q^{\prime}}^2 \left[ t + \frac{t^2}{s}+\frac{2t^3}{3s^2} \right]_{-s}^{0}\\ &=\frac{1}{s^2}\frac{2 \pi\alpha^2}{N_c} Q^2 {Q^{\prime}}^2 \left( s - \frac{s^2}{s}+\frac{2t^s}{3s^2}\right)\\ &= \frac{1}{s} \frac{2 \pi\alpha^2}{N_c} Q^2 {Q^{\prime}}^2 \frac{2}{3} \quad \Rightarrow \quad \fbox{$\sigma(q\bar{q} \to \ell^+ \ell^-) \Bigg|_{\rm QED}= \dfrac{4 \pi\alpha^2}{3 N_c s}Q_\ell^2 Q_q^2$} \end{aligned} $$
(2.11)

As a side remark—in the history of QCD, the same process but read right-to-left played a crucial role, namely the production rate of quarks in \(e^+ e^-\) scattering. For small enough energies we can neglect the Z exchange contribution. At leading order we can then compute the corresponding production cross sections for muon pairs and for quark pairs in \(e^+ e^-\) collisions. Moving the quarks into the final state means that we do not average of the color in the initial state, but sum over all possible color combinations, which in Eq. 2.9 gives us an averaging factor \(K_{ij} = 1/4.\) Everything else stays the same as for the Drell–Yan process

$$ \fbox{$R \equiv \dfrac{\sigma(e^+ e^- \to {\rm hadrons})}{\sigma(e^+ e^- \to \ell^+ \ell^-)}$}=\frac{\sum_{\rm quarks} \frac{4 \pi\alpha^2 N_c} {3 s} Q_e^2 Q_q^2}{\frac{4 \pi\alpha^2}{3 s} Q_e^2 Q_\ell^2}= N_c \left( 3 \frac{1}{9}+2\frac{4}{9} \right) \mathop{=} \frac{11 N_c}{9} , $$
(2.12)

for example for five quark flavors where the top quark is too heavy to be produced at the given \(e^+ e^-\) collider energy. For those interested in the details we did take one short cut: hadrons are also produced in the hadronic decays of \(e^+ e^- \to \tau^+ \tau^-\) which we strictly speaking need to subtract. This way, R as a function of the collider energy is a beautiful measurement of the weak and color charges of the quarks in QCD.

2.1.2 Massive Intermediate States

Before we move on to describing incoming quarks inside protons we should briefly consider the second Feynman diagram contributing to the Drell–Yan production rate in Eq. 2.11, the on-shell or off-shell Z boson

$$ |{\fancyscript{M}}|^2 = |{\fancyscript{M}}_\gamma + {\fancyscript{M}}_Z |^2=|{\fancyscript{M}}_\gamma |^2 + |{\fancyscript{M}}_Z |^2 + 2 \hbox {Re} {\fancyscript{M}}_Z {\fancyscript{M}}_\gamma. $$
(2.13)

Interference occurs in phase space regions where for both intermediate states the invariant masses of the muon pair are the same. For the photon the on-shell pole is not a problem. It has zero mass, which means that we hit the pole \(1/q^2\) in the matrix element squared only in the limit of zero incoming energy. Strictly speaking we never hit it, because the energy of the incoming particles has to be large enough to produce the final state particles with their tiny but finite masses and with some kind of momentum driving them through the detector.

A problem arises when we consider the intermediate Z boson. In that case, the propagator contributes as \({\left|{\fancyscript{M}} \right|^2} \propto 1/(s-m_Z^2)^2\) which diverges for on-shell Z bosons. Before we can ask what such a pole means for LHC simulations we have to recall how we deal with it in field theory. There, we encounter the same issue when we solve for example the Klein–Gordon equation. The Green function for a field obeying this equation is the inverse of the Klein–Gordon operator

$$ \left(\square + m^2 \right) G(x-x^{\prime})=\delta^4 (x-x^{\prime}) . $$
(2.14)

Fourier transforming \(G(x-x^{\prime})\) into momentum space we find

$$ \begin{aligned}[b] G(x-x^{\prime}) &= \int \frac{d^4 q}{(2 \pi)^4} e^{-i q \cdot (x-x^{\prime})} \tilde{G}(q)\\[-8pt]\\ \left( \square + m^2 \right) G(x-x^{\prime}) &= \int \frac{d^4 q}{(2 \pi)^4} \left( \square + m^2 \right) e^{-i q \cdot (x-x^{\prime})} \tilde{G}(q)\\[-8pt]\\ &=\int \frac{d^4 q}{(2 \pi)^4}\left( (i q)^2 + m^2 \right) e^{-i q \cdot (x-x^{\prime})} \tilde{G}(q)\\[-8pt]\\ &=\int \frac{d^4 q}{(2 \pi)^4}e^{-i q \cdot (x-x^{\prime})} \left( -q^2 + m^2 \right) \tilde{G}(q)\\[-8pt]\\ &{\stackrel{!}{=}}\delta^4 (x-x^{\prime}) = \int \frac{d^4 q}{(2 \pi)^4} e^{-i q \cdot (x-x^{\prime})}\\[-8pt]\\ \Leftrightarrow \quad ( -q^2 + m^2 )& \tilde{G}(q) = 1 \quad \Leftrightarrow \quad \tilde{G}(q) \sim - \frac {1}{q^2 - m^2}. \end{aligned} $$
(2.15)

The problem with the Green function in momentum space is that as an inverse it is not defined for \(q^2 = m^2.\) We usually avoid this problem by slightly shifting this pole following the Feynman i \(\varepsilon\) prescription to \(m^2 \to m^2 - i \varepsilon,\) or equivalently deforming our integration contours appropriately. The sign of this infinitesimal shift we need to understand because it will become relevant for phenomenology when we introduce an actual finite decay width of intermediate states.

In the Feynman \(i \varepsilon\) prescription the sign is crucial to correctly complete the \(q_0\) integration of the Fourier transform in the complex plane

$$\begin{aligned}[b] &\int\nolimits_{-\infty}^\infty d q_0 \frac{e^{-i q_0 x_0}}{q^2-m^2+i \varepsilon}\\ & \quad = \left( \theta(x_0) + \theta(-x_0) \right) \int\nolimits_{-\infty}^\infty d q_0 \frac{e^{-i q_0 x_0}}{q_0^2- (\omega^2-i \varepsilon )} \quad \hbox{with} \quad \omega^2 = {\vec q}^2 + m^2\\ &\quad = \left( \theta(x_0) + \theta(-x_0) \right) \int\nolimits_{-\infty}^\infty d q_0 \frac{e^{-i q_0 x_0}}{(q_0-\sqrt{\omega^2-i \varepsilon})(q_0+\sqrt{\omega^2-i \varepsilon})}\\ &\quad = \left( \theta(x_0) \oint\nolimits_{C_2} + \theta(-x_0) \oint\nolimits_{C_1} \right) dq_0 \frac{e^{-i q_0 x_0}}{(q_0 -\omega (1-i \varepsilon^{\prime}))(q_0 + \omega (1-i \varepsilon^{\prime}))}, \end{aligned} $$
(2.16)

defining \(\varepsilon^{\prime}=\varepsilon/(2 \omega^2).\) In the last step we close the integration contour along the real \(q_0\) axis in the complex \(q_0\) plane. Because the integrand has to vanish for large \(q_0,\) we have to make sure the exponent \(-i x_0 i \hbox {Im} q_0 = x_0 \hbox {Im} q_0\) is negative. For \(x_0>0\) this means \(\hbox {Im} q_0 <0\) and vice versa. This argument forces \(C_1\) to close for positive and \(C_2\) for negative imaginary parts in the complex \(q_0\) plane.

The contour integrals we can solve using Cauchy’s formula , keeping in mind that the integrand has two poles at \(q_0 = \pm \omega (1 - i \varepsilon^{\prime}).\) They lie in the upper (lower) half plane for negative (positive) real parts of \(q_0.\) The contour \(C_1\) through the upper half plane includes the pole at \(q_0 \sim -\omega\) while the contour \(C_2\) includes the pole at \(q_0 \sim \omega,\) all assuming \(\omega > 0\):

$$ \begin{aligned}[b] \int\nolimits_{-\infty}^\infty d q_0 \frac{e^{-i q_0 x_0}}{q^2-m^2+i \varepsilon} &= 2 \pi i \!\!\left[ \theta(x_0) \frac{(-1) e^{-i \omega x_0}}{\omega + \omega (1-i \varepsilon^{\prime})} + \theta(-x_0) \frac{e^{i \omega x_0}} {-\omega - \omega (1-i \varepsilon^{\prime})} \right] \\ &\mathop = \limits^{{\varepsilon^{\prime} \to 0}} - i \frac{\pi}{\omega}\left[ \theta(x_0) e^{-i \omega x_0} + \theta(-x_0) e^{i \omega x_0} \right]\!{.} \end{aligned} $$
(2.17)

The factor \((-1)\) in the \(C_2\) integration arises because Cauchy’s integration formula requires us to integrate counter-clockwise, while going from negative to positive \(\hbox {Re} q_0\) the contour \(C_2\) is defined clockwise. Using this result we can complete the four–dimensional Fourier transform from Eq. 2.15

$$\begin{aligned} G(x) &= \int d^4 q e^{-i (q \cdot x)} \tilde{G}(q) \\ &= \int d^4 q \frac{e^{-i(q \cdot x)}}{q^2-m^2+i \varepsilon} \\ &= -i \pi \int d^3 {\vec q} e^{i\vec{q}\vec{x}} \frac{1} {{\omega}} \left[ {\theta}(x_0) e^{-i {\omega} x_0} + {\theta}(-x_0) e^{i {\omega} x_0} \right] \\ &= -i \pi \int d^4 q e^{i\vec{q}\vec{x}} \frac{1}{{\omega}} \left[{\theta}(x_0) e^{-i q_0 x_0} {\delta}(q_0 - {\omega}) + {\theta}(-x_0) e^{-i q_0 x_0} {\delta}(q_o + {\omega}) \right] \\ &= -i \pi \int d^4 q e^{-i (q \cdot x)} \frac{1}{{\omega}} \left[{\theta}(x_0){\delta}({\omega} - q_0) + {\theta}(-x_0) {\delta}({\omega} + q_0) \right] \\ &= -i \pi \int d^4 q e^{-i (q \cdot x)} \frac{1}{{\omega}} 2 {\omega} \left[ {\theta}(x_0) {\delta}({\omega}^2 - q_0^2) + {\theta}(-x_0) {\delta}({\omega}^2 - q_0^2) \right] \\ &= -2 \pi i \int d^4 q e^{-i (q \cdot x)} \left[{\theta}(x_0) + {\theta}(-x_0) \right] {\delta}(q_0^2 - {\omega}^2) \\ &= -2 \pi i \int d^4 q e^{-i (q \cdot x)} \left[{\theta}(x_0) + {\theta}(-x_0) \right] {\delta}(q^2 - m^2), \end{aligned} $$
(2.18)

with \(q_0^2 - \omega^2 = q^2 - m^2.\) This is exactly the usual decomposition of the propagator function \(\Delta_F(x) = \theta(x_0) \Delta^+(x) + \theta(-x_0) \Delta^-(x)\) into positive and negative energy contributions.

Let us briefly recapitulate what would have happened if we instead had chosen the Feynman parameter \(\varepsilon < 0.\) All steps leading to the propagator function in Eq. 2.18 we summarize in Table 2.1. For the wrong sign of \(i \varepsilon\) the two poles in the complex \(q_0\) plane would be mirrored by the real axis. The solution with \(\hbox{Re}\,\,q_0 > 0\) would sit in the quadrant with \(\hbox{Im}\,\,q_0>0\) and the second pole at a negative real and imaginary part. To be able to close the integration path in the upper half plane in the mathematically positive direction the real pole would have to be matched up with \(\theta(-x_0).\) The residue in the Cauchy integral would now include a factor \(+1/(2 \omega).\) At the end, the two poles would give the same result as for the correct sign of \(i \varepsilon,\) except with a wrong over-all sign.

Table 2.1 Contributions to the propagator function Eq. 2.18 for both signs of \(i \varepsilon\)

When we are interested in the kinematic distributions of on-shell massive states the situation is a little different. Measurements of differential distributions for example at LEP include information on the physical width of the decaying particle, which means we cannot simply apply the Feynman \(i \varepsilon\) prescription as if we were dealing with an asymptotic stable state. From the same couplings governing the Z decay, the Z propagator receives corrections, for example including fermion loops:

Such one-particle irreducible diagrams can occur in the same propagator repeatedly. Schematically written as a scalar they are of the form

$$ \begin{aligned}[b] \frac{i}{q^2-m_0^2+i \varepsilon} +&\frac{i}{q^2-m_0^2+i \varepsilon} (-i M^2) \dfrac{i}{q^2-m_0^2+i \varepsilon} \\ +& \dfrac{i}{q^2-m_0^2+i \varepsilon} (-i M^2) \dfrac{i}{q^2-m_0^2+i \varepsilon} (-i M^2) \dfrac{i}{q^2-m_0^2+i \varepsilon} + \cdots \\ =& \dfrac{i}{q^2-m_0^2+i \varepsilon} \sum_{n=0} \left( \dfrac{M^2}{q^2-m_0^2+i \varepsilon} \right)^n \\ =&\dfrac{i}{q^2-m_0^2+i \varepsilon} \dfrac{1}{1 - {\dfrac{M^2} {q^2-m_0^2+i \varepsilon}}} \quad \hbox {summing the geometric series} \\ =& \dfrac{i}{q^2-m_0^2+i \varepsilon-M^2}, \end{aligned} $$
(2.19)

where we denote the loop as \(M^2\) for reasons which will become obvious later. Requiring that the residue of the propagator be unity at the pole we renormalize the wave function and the mass in the corresponding process. For example for a massive scalar or gauge boson with a real correction \(M^2(q^2)\) this reads

$$ \fbox{$\dfrac{i}{q^2-m_0^2-M^2(q^2)} = \dfrac{iZ}{q^2-m^2} \quad \hbox{for}\,q^2 \sim m^2$}\,, $$
(2.20)

including a renormalized mass m and a wave function renormalization constant Z.

The important step in our argument is that in analogy to the effective ggH coupling discussed in Sect. 1.4.1 the one-loop correction \(M^2\) depends on the momentum flowing through the propagator. Above a certain threshold it can develop an imaginary part because the momentum flowing through the diagram is large enough to produce on-shell states in the loop. Just as for the ggH coupling such absorptive parts appear when a real decay like \(Z \to \ell^+ \ell^-\) becomes kinematically allowed. After splitting \(M^2(q^2)\) into its real and imaginary parts we know what to do with the real part: the solution to \(q^2 - m_0^2 - \hbox{Re} M^2(q^2) \stackrel{!}{=} 0\) defines the renormalized particle mass \(q^2 = m^2\) and the wave function renormalization Z. The imaginary part looks like the Feynman \(i \varepsilon\) term discussed before

$$ \begin{aligned}[b] \frac{i}{q^2-m_0^2+i \varepsilon-\hbox{Re} M^2(q^2)-i \hbox{Im} M^2} &= \frac{iZ}{q^2-m^2+i \varepsilon-i Z \hbox{Im} M^2} \\ & \equiv \frac{iZ}{q^2-m^2+i m \Gamma} \\ \Leftrightarrow \quad \Gamma &= - \frac{Z}{m} \hbox{Im} M^2(q^2=m^2), \end{aligned} $$
(2.21)

for \(\varepsilon \to 0\) and finite \(\Gamma \ne 0.\) The link between the self energy matrix element squared \(M^2\) and the partial width we can illustrate remembering one way to compute scalar integrals or one-loop amplitudes by gluing them together using tree-level amplitudes. The Cuskosky cutting rule discussed in Sect. 1.4.1 tells us schematically written \(\hbox{Im}\, M^{2} \sim M^{2} |_{\rm cut} \equiv \Gamma\) because cutting the one-loop bubble diagram at the one possible place is nothing but squaring the two tree-level matrix element for the decay \(Z \to \ell^+ \ell^-.\) One thing that we need to keep track of, apart from the additional factor m due to dimensional analysis, is the sign of the \(i m \Gamma\) term which just like the \(i \varepsilon\) prescription is fixed by causality.

Going back to the Drell–Yan process \(q \bar{q} \to \ell^+ \ell^-\) we now know that for massive unstable particles the Feynman epsilon which we need to define the Green function for internal states acquires a finite value, proportional to the total width of the unstable particle. This definition of a propagator of an unstable particle in the s channel is what we need for the second Feynman diagram contributing to the Drell–Yan process: \(q\bar{q} \to Z^* \to \ell^+ \ell^-.\) The resulting shape of the propagator squared is a Breit–Wigner propagator

$$ \fbox{$\sigma(q\bar{q} \to Z \to \ell^+ \ell^-) \propto \left| \dfrac{1}{s-m_Z^2+i m \Gamma} \right|^2 = \frac{1}{(s-m_Z^2)^2 + m^2 \Gamma^2}$}\,. $$
(2.22)

When taking everything into account, the \((2 \to 2)\) production cross section also includes the squared matrix element for the decay \(Z \to \ell^+ \ell^-\) in the numerator. In the narrow width approximation , the \((2 \to 2)\) matrix element factorizes into the production process times the branching ratio for \(Z \to \ell^+ \ell^-,\) simply by definition of the Breit–Wigner distribution

$$ \lim_{\Gamma \to 0} \frac{\Gamma_{\ell \ell}}{(s-m_Z^2)^2 + m^2 \Gamma_{\rm{tot}}^{2}} = \Gamma_{\ell \ell} \frac{\pi} {\Gamma_{\rm {tot}}} \delta(s-m_Z^2) \equiv \pi \hbox{BR}(Z \to \ell \ell) \delta(s-m_Z^2). $$
(2.23)

The additional factor \(\pi\) will be absorbed in the different one-particle and two-particle phase space definitions. We immediately see that this narrow width approximation is only exact for scalar particles. It does not keep information about the structure of the matrix element, e.g. when a non-trivial structure of the numerator gives us the spin and angular correlations between the production and decay processes.

Equation 2.23 uses a mathematical relation we might want to remember for life, and that is the definition of the one-dimensional Dirac delta distribution in three ways and including all factors of 2 and \(\pi\)

$$ \delta(x) = \int \frac{d q}{2 \pi} e^{-i x q} = \lim_{\sigma \to 0} \frac{1}{\sigma \sqrt{\pi}} e^{-x^2/\sigma^2} = \lim_{\Gamma \to 0} \frac{1}{\pi} \frac{\Gamma}{x^2 + \Gamma^2}. $$
(2.24)

The second distribution is a Gaussian and the third one we would refer to as a Breit–Wigner shape while most other people call it a Cauchy distribution .

Now, we know everything necessary to compute all Feynman diagrams contributing to muon pair production at a hadron collider. Strictly speaking, the two amplitudes interfere, so we end up with three distinct contributions: \(\gamma\) exchange, Z exchange and the \(\gamma -Z\) interference terms. They have the properties

  • For small energies the \(\gamma\) contribution dominates and can be linked to R.

  • On the Z pole the rate is regularized by the Z width and Z contribution dominates over the photon.

  • In the tails of the Breit–Wigner distribution we expect \(Z - \gamma\) interference.

  • For large energies we are again dominated by the photon channel.

  • Quantum effects allow unstable particles like the Z to decay off-shell, defining a Breit–Wigner propagator.

  • In the limit of vanishing width the Z contribution factorizes into \(\sigma \cdot \hbox{BR.}\)

2.1.3 Parton Densities

At the end of Sect. 2.1.1 the discussion of different energy regimes for R experimentally makes sense—at an \(e^+ e^-\) collider we can tune the energy of the initial state . At hadron colliders the situation is very different. The energy distribution of incoming quarks as parts of the colliding protons has to be taken into account. If we assume that quarks move collinearly with the surrounding proton, i.e. that at the LHC incoming partons have zero \(p_T,\) we can define a probability distribution for finding a parton just depending on the respective fraction of the proton’s momentum. For this momentum fraction \(x = 0\cdots 1\) the parton density function (pdf) is denoted as \(f_i(x),\) where i denote the different partons in the proton, for our purposes u,d,c,s,g and depending on the details b. All incoming partons we assume to be massless.

In contrast to so-called structure functions a pdf is not an observable. It is a distribution in the mathematical sense, which means it has to produce reasonable results when integrated over together with a test function. Different parton densities have very different behavior—for the valence quarks (uud) they peak somewhere around \(x \lesssim 1/3,\) while the gluon pdf is small at \(x\sim 1\) and grows very rapidly towards small x. For some typical part of the relevant parameter space \((x = 10^{-3} \ldots 10^{-1})\) it roughly scales like \(f_g(x) \propto x^{-2}.\) Towards smaller x values it becomes even steeper. This steep gluon distribution was initially not expected and means that for small enough x LHC processes will dominantly be gluon fusion processes.

While we cannot actually compute parton distribution functions \(f_i(x)\) as a function of the momentum fraction x there are a few predictions we can make based on symmetries and properties of the hadrons. Such arguments for example lead to sum rules:

The parton distributions inside an anti-proton are linked to those inside a proton through CP symmetry, which is an exact symmetry of QCD. Therefore, for all partons we know

$$ f^{\bar{p}}_q(x) = f_{\bar{q}}(x) \quad \quad \quad f^{\bar{p}}_{\bar{q}}(x) = f_q(x) \quad \quad \quad f^{\bar{p}}_g(x) = f_g(x) $$
(2.25)

for all values of x.

If the proton consists of three valence quarks uud, plus quantum fluctuations from the vacuum which can either involve gluons or quark–antiquark pairs, the contribution from the sea quarks has to be symmetric in quarks and antiquarks. The expectation values for the signed numbers of up and down quarks inside a proton have to fulfill

$$ \langle N_u \rangle = \int\nolimits_0^1 dx \left( f_u(x) - f_{\bar{u}}(x) \right) = 2 \quad \quad \quad \langle N_d \rangle = \int\nolimits_0^1 dx \left( f_d(x) - f_{\bar{d}}(x) \right) = 1. $$
(2.26)

Similarly, the total momentum of the proton has to consist of sum of all parton momenta. We can write this as the expectation value of \(\sum x_i\)

$$ \left\langle \sum x_i \right\rangle = \int\nolimits_0^1 dx x \left( \sum_q f_q(x) + \sum_{\bar{q}} f_{\bar{q}}(x) + f_g(x) \right) = 1 $$
(2.27)

What makes this prediction interesting is that we can compute the same sum only taking into account the measured quark and antiquark parton densities. We find that the momentum sum rule only comes to 1/2. Half of the proton momentum is carried by gluons.

Given the correct definition and normalization of the pdf we can now compute the hadronic cross section from its partonic counterpart, like the QED result in Eq. 2.11, as

$$ \fbox{$\sigma_{\rm tot} = \displaystyle\int\nolimits_{0}^{1} dx_{1} \int\nolimits_{0}^{1} dx_{2} \sum\limits_{ij} f_i(x_1) f_j(x_2) \hat{\sigma}_{ij}(x_1 x_2 S)$}\,, $$
(2.28)

where i,j are the incoming partons with the momentum factions \(x_{i,j}.\) The partonic energy of the scattering process is \(s=x_1 x_2 S\) with the LHC proton energy of eventually \(\sqrt{S}=14\,\hbox{TeV}.\) The partonic cross section \(\hat{\sigma}\) corresponds to the cross sections \(\sigma\) computed for example in Eq. 2.11. It has to include all the necessary \(\theta\) and \(\delta\) functions for energy–momentum conservation. When we express a general n-particle cross section \(\hat{\sigma}\) including the phase space integration, the \(x_i\) integrations and the phase space integrations can of course be interchanged, but Jacobians will make life hard when we attempt to get them right. In Sect. 2.1.5 we will discuss an easier way to compute kinematic distributions instead of the fully integrated total rate in Eq. 2.28.

2.1.4 Hadron Collider Kinematics

Hadron colliders have a particular kinematic feature in that event by event we do not know the longitudinal velocity of the initial state, i.e. the relative longitudinal boost from the laboratory frame to the partonic center of mass. This sensitivity to longitudinal boosts is reflected in the choice of kinematic variables. The first thing we can do is consider the projection of all momenta onto the transverse plane. These transverse components are trivially invariant under longitudinal boosts because the two are orthogonal to each other.

In addition, for the production of a single electroweak gauge boson we remember that the produced particle does not have any momentum transverse to the beam direction. This reflects the fact that the incoming quarks are collinear with the protons, i.e. they have zero transverse momentum. Such a gauge boson not recoiling against anything else cannot develop a finite transverse momentum. Of course, once we decay this gauge boson, for example into a pair of muons, each muon will have transverse momentum, only their vector sum will be zero:

$$ \sum_{\rm final\,state} \vec{p}_{T,j} = {\vec{0}}. $$
(2.29)

This is a relation between two-dimensional, not three-dimensional vectors. For several particles in the final state we can define an azimuthal angle in the plane transverse to the beam direction. While differences of azimuthal angles, e.g. two such differences between three particles in the final state, are observables the over-all angle is a symmetry of the detector as well as of our physics.

In addition to the transverse plane we need to parameterize the longitudinal momenta in a way which makes it easy to implement longitudinal boosts. In Eq. 2.28 we integrate over the two momentum fractions \(x_{1,2}\) and can at best determine their product \(x_1 x_2 = s/S\) from the final state kinematics. Our task is to replace both, \(x_1\) and \(x_2\) with more physical variables which should be well behaved under longitudinal boosts.

A longitudinal boost for example from the rest frame of a massive particle reads

$$\begin{aligned} [b] \left(\begin{array}{l} E \\ p_{L} \end{array}\right)&= \hbox{exp} \left[ y \left(\begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right) \right] \left(\begin{array}{l} m \\ 0 \end{array}\right)\\ &= \left[ \hbox{1\hspace{-3pt}I} + y \left( \begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right)+ \frac{y^2}{2} \hbox{1\hspace{-3pt}I} +\frac{y^3}{6} \left( \begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right)\cdots \right] \left( \begin{array}{l} m \\ 0 \end{array}\right) \\ \end{aligned} $$
$$\begin{aligned} [b] &= \left[ \hbox{1\hspace{-3pt}I} \sum_{j {\rm even}} \frac{y^j}{j!} + \left( \begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right) \sum_{j {\rm odd}} \frac{y^j}{j!} \right] \left( \begin{array}{ll} m \\ 0\end{array}\right) \\ &= \left[ \hbox{1\hspace{-3pt}I} \cosh y + \left( \begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right) \sinh y \right] \left( \begin{array}{l} m \\ 0 \end{array}\right)= m \left( \begin{array}{l} \cosh y \\ \sinh y \end{array}\right)\!. \end{aligned} $$
(2.30)

In the first line we already see that the combination of two such exponentials involving \(y_1\) and \(y_2\) is additive \(y = y_1 + y_2.\) The rapidity y as defined in Eq. 2.30 we can re-write in a way which allows us to compute it from the four-momentum for example in the LHC lab frame

$$ \frac{1}{2} \log \frac{E+p_L}{E-p_L} = \frac{1}{2} \log \frac{\cosh y + \sinh y}{\cosh y - \sinh y} = \frac{1}{2} \log \frac{e^y}{e^{- y}} = y. $$
(2.31)

We can explicitly check that the rapidity is indeed additive by applying a second longitudinal boost to \((E, p_L)\) in Eq. 2.30

$$ \begin{aligned} \left(\begin{array}{l}E^{\prime} \\ p_L^{\prime}\end{array}\right) &= \hbox{exp} \left[ y^{\prime} \left(\begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right)\right] \left(\begin{array}{l} E \\ p_L \end{array}\right)= \left[\hbox{1\hspace{-3pt}I} \cosh y^{\prime} + \left(\begin{array}{ll} 0 & 1 \\ 1 & 0 \end{array}\right) \sinh y^{\prime} \right] \left(\begin{array}{l} E \\ p_L \end{array}\right) \\ &= \left(\begin{array}{l}E \cosh y^{\prime} + p_L \sinh y^{\prime} \\ p_L \cosh y^{\prime} + E \sinh y^{\prime}\end{array}\right)\!{,} \end{aligned} $$
(2.32)

which gives for the combined rapidity, following its extraction in Eq. 2.21

$$ \frac{1}{2} \log \frac{E^{\prime}+p_{L}^{\prime}}{E^{\prime}-p_{L}^{\prime}} = \frac{1}{2} \log \frac{(E + p_L) (\cosh y^{\prime} + \sinh y^{\prime})}{(E - p_L) (\cosh y^{\prime} - \sinh y^{\prime})} = \frac{1}{2} \log \frac{E+p_L} {E-p_L} + y^{\prime} = y + y^{\prime} . $$
(2.33)

This combination of several longitudinal boosts is important in the case of massless particles. They do not have a rest frame, which means we can only boost them from one finite-momentum frame to the other. For such massless particles we can simplify the formula for the rapidity Eq. 2.31, instead expressing the rapidity in terms of the polar angle \(\theta.\) We use the fact that (only) for massless particles \(E = |{\vec p}|\)

$$ \begin{aligned} [b] y &= \frac{1}{2} \log \frac{E+p_L}{E-p_L} = \frac{1}{2} \log \frac{|{\vec p}|+p_L}{|{\vec p}|-p_L} = \frac{1}{2} \log \frac{1 + \cos \theta}{1 - \cos \theta} = \frac{1}{2} \log \frac{1}{\tan^2 {\frac{\theta}{2}}} \\ & = - \log \tan \frac{\theta}{2} \equiv \eta \end{aligned} $$
(2.34)

This pseudo-rapidity\(\eta\) is more handy, but coincides with the actual rapidity only for massless particles. To get an idea about the experimental setup at the LHC—in CMS and ATLAS we can observe particles to polar angles of 10 or even 1.3\(^{\circ}\), corresponding to maximum pseudo-rapidities of 2.5–4.5. Because this is about the same range as the range of the azimuthal angle\([0, \pi]\) we define a distance measure inside the detector

$$ \begin{aligned} [b] (\Delta R)^2 &= ( \Delta y)^2 + ( \Delta \phi )^2 \\ &= ( \Delta \eta)^2 + ( \Delta \phi )^2 \quad && \hbox{massless particles} \\ &= \left( \log \frac{\tan {\frac{\theta + \Delta \theta}{2}}} {\frac{\theta}{2}} \right)^2 + ( \Delta \phi )^2 \\ &= \frac{(\Delta \theta)^2}{\sin^2 \theta} + ( \Delta \phi )^2 + {\fancyscript{O}}( (\Delta \theta)^3) \end{aligned} $$
(2.35)

The angle \(\theta\) is the polar angle of one of the two particles considered and in our leading approximation can be chosen as each of them without changing Eq. 2.35.

Still studying the production of a single gauge boson at the LHC we can express the final state kinematics in terms of two parameters, the invariant mass of the final-state particle \(q^2\) and its rapidity. The transverse momentum we already know is zero. The two incoming and approximately massless protons have the momenta

$$ p_1 = (E,0,0,E) \quad p_2 = (E,0,0,-E) \quad S = (2E)^2, $$
(2.36)

which for the momentum of the final-state gauge boson in terms of the parton momentum fractions means, when we include the definition of the rapidity in Eq. 2.30

$$ \begin{aligned} q = x_{1} p_{1} + x_{2} p_{2} = E \left(\begin{array}{c} x_{1}+x_{2} \\ 0 \\ 0 \\ x_{1}-x_{2} \end{array}\right) &\stackrel{!}{=} \sqrt{q^2} \left(\begin{array}{c} \cosh y \\ 0 \\ 0 \\ \sinh y \end{array}\right) = 2 E \sqrt{x_{1} x_{2}} \left(\begin{array}{c} \cosh y \\ 0 \\ 0 \\ \sinh y \end{array}\right) \\ \Leftrightarrow \quad \cosh y &= \frac{x_{1}+x_{2}}{2\sqrt{x_{1} x_{2}}} = \frac{1}{2} \left(\sqrt{\frac{x_{1}} {x_{2}}} + \sqrt{\frac{x_{2}}{x_{1}}} \right) \\ \quad \Leftrightarrow \quad e^y &= \sqrt{\frac{x_{1}} {x_{2}}}. \end{aligned} $$
(2.37)

This result can be combined with \(x_1 x_2 = q^2/S\) to obtain

$$ x_1 = \sqrt{\frac{q^2}{S}} e^y \quad \quad \quad x_2 = \sqrt{\frac{q^2}{S}} e^{-y}. $$
(2.38)

These relations allow us to for example compute the hadronic total cross section for lepton pair production in QED

$$ \fbox{$\sigma(pp \to \ell^+ \ell^-) \Bigg|_{\rm QED} = \dfrac{4 \pi\alpha^2 Q_\ell^2}{3 N_c} \displaystyle\int\nolimits_{0}^{1} dx_{1} dx_{2} \sum_{j} Q_{j}^{2} f_{j}(x_1) f_{\bar{j}}(x_2) \dfrac{1}{q^2}$}\,, $$
(2.39)

instead in terms of the hadronic phase space variables \(x_{1,2}\) in terms of the kinematic final state observables \(q^2\) and y. Remember that the partonic or quark–antiquark cross section \(\hat{\sigma}\) is already integrated over the (symmetric) azimuthal angle \(\phi\) and the polar angle Mandelstam variable t. The transverse momentum of the two leptons is therefore fixed by momentum conservation.

The Jacobian for this change of variables reads

$$ \frac{\partial (q^2,y)}{\partial (x_1,x_2)} = \left|\begin{array}{ll} x_{2}S & x_{1}S \\ 1/(2x_1) & -1/(2 x_2)\end{array}\right| = S = \frac{q^2}{x_{1} x_{2}} , $$
(2.40)

which inserted into Eq. 2.39 gives us

$$ \begin{aligned} [b] \sigma(pp \to \ell^+ \ell^-) \Bigg|_{\rm QED} &= \frac{4 \pi\alpha^2 Q_\ell^2}{3 N_c} \int dq^2 dy \frac{x_1 x_2}{q^2} \frac{1}{q^2} \sum_j Q_j^2 f_j(x_1) f_{\bar{j}}(x_2) \\ &= \frac{4 \pi\alpha^2 Q_\ell^2}{3 N_c} \int dq^2 dy \frac{1}{q^4} \sum_j Q_j^2 x_1 f_j(x_1) x_2 f_{\bar{j}}(x_2) . \end{aligned} $$
(2.41)

In contrast to the original form of the integration over the hadronic phase space this form reflects the kinematic observables. For the Drell–Yan process at leading order the \(q^2\) distribution is the same as \(m_{\ell \ell}^2,\) one of the most interesting distributions to study because of different contributions from the photon, the Z boson, or extra dimensional gravitons. On the other hand, the rapidity integral still suffers from the fact that at hadron colliders we do not know the longitudinal kinematics of the initial state and therefore have to integrate over it.

2.1.5 Phase Space Integration

In the previous example we have computed the simple two-dimensional distribution, by leaving out the double integration in Eq. 2.41

$$ \frac{d \sigma(pp \to \ell^+ \ell^-)}{dq^2 dy} \Bigg|_{\rm QED} = \frac{4 \pi\alpha^2 Q_\ell^2}{3 N_c q^4} \sum_j Q_j^2 x_1 f_j(x_1) x_2 f_{\bar{j}}(x_2). $$
(2.42)

This expression we can numerically evaluate and compare to experiment. However, the rapidity y and the momentum transfer \(q^2\) are by no means the only distribution we would like to look at. Moreover, over the parton densities \(f(x)\) we will have to integrate numerically, so we will have to rely on numerical integration tools no matter what we are doing. Looking at a simple \((2 \to 2)\) process we can write the total cross section as

$$ \sigma_{\rm tot} = \int d \phi \int d \cos \theta \int d x_1 \int d x_2 F_{\rm PS} {\left|{\fancyscript{M}} \right|^2} = \int_0^1 dy_1 \cdots dy_4 J_{\rm PS}(\vec{y}) {\left|{\fancyscript{M}} \right|^2}, $$
(2.43)

with an appropriate function \(F_{\rm PS}.\) In the second step we have re-written the phase space integral as an integral over the four-dimensional unit cube, with the appropriate Jacobian. Like any integral we can numerically evaluate this phase space integral by binning the variable we integrate over

$$ \int\nolimits_0^1 dy f(y) \quad \longrightarrow \quad \sum_j (\Delta y)_j f(y_j) \sim \Delta y \sum_j f(y_j). $$
(2.44)

Without any loss of generality we assume that the integration boundaries are \(0 \cdots 1.\) The integration variable y we can divide into a discrete set of points \(y_j,\) for example equidistant in y and by choosing a chain of random numbers \(y_j \varepsilon [0,1]\). In the latter case we need to keep track of the bin widths \((\Delta y)_j.\) When we extend the integral over N dimensions we can in principle divide each axis into bins and compute the functional values for this grid. For not equidistant bins generated by random numbers we again keep track of the associated phase space volume for each random number vector. However, once we know these phase space weights for each phase space point there is no reason to consider the set of random numbers as in any way linked to the N axes. All we need is a chain of random points with an associated phase space weight and their transition matrix element, to integrate over the phase space in complete analogy to Eq. 2.24.

The obvious question arises how such random numbers can be chosen in a smart way; but before we discuss how to best evaluate such an integral numerically, let us first illustrate how this integral is much more useful than just providing the total cross section. If we are interested in the distribution of an observable, like for example the distribution of the transverse momentum of a muon in the Drell–Yan process, we need to compute \(d \sigma/d p_T\) as a function of \(p_T.\) In terms of Eq. 2.23 any physical \(y_1\) distribution is given by

$$ \begin{aligned}[b] \sigma &= \int dy_1 \cdots dy_N f({\vec y}) = \int dy_1 \frac{d\sigma}{dy_1} \\ \frac{d\sigma}{dy_1} \Bigg|_{y_1^0} &= \int dy_2 \cdots dy_N f(y_1^0) = \int dy_1 \cdots dy_N f({\vec y}) \delta(y_1-y_1^0) . \end{aligned} $$
(2.45)

Numerically we can compute this distribution in two ways: one way corresponds to the first line in Eq. 2.45 and means numerically evaluating the \(y_2 \cdots y_N\) integrations and leaving out the \(y_1\) integration. The result will be a function of \(y_1\) which we then evaluate at different points \(y_1^0.\)

The second and much more efficient option corresponds to the second line of Eq. 2.45, with the delta distribution defined for discretized \(y_1.\) First, we define an array with the size given by the number of bins in the \(y_1\) integration. Then, for each \(y_1\) value of the complete \(y_1 \cdots y_N\) integration we decide where the value \(y_1\) goes in this array and add \(f({\vec y})\) to the corresponding column. Finally, we print these columns as a function of \(y_1\) to see the distribution. This set of columns is referred to as a histogram and can be produced using publicly available software. This histogram approach does not sound like much, but imagine we want to compute a distribution \(d\sigma/dp_T,\) where \(p_T({\vec y})\) is a complicated function of the integration variables and kinematic phase space cuts. We then simply evaluate

$$ \frac{d\sigma}{dp_T} = \int dy_1 \cdots dy_N f({\vec y}) \delta \left( p_T({\vec y})-p_T^0 \right) $$
(2.46)

numerically and read off the \(p_T\) distribution as a side product of the calculation of the total rate. Histograms mean that computing a total cross section numerically we can trivially extract all distributions in the same process.

The procedure outlined above has an interesting interpretation. Imagine we do the entire phase space integration numerically. Just like computing the interesting observables we can compute the momenta of all external particles. These momenta are not all independent, because of energy–momentum conservation, but this can be taken care of. The tool which translates the vector of integration variables \({\vec y}\) into the external momenta is called a phase space generator . Because the phase space is not uniquely defined in terms of the integration variables, the phase space generator also returns the Jacobian \(J_{\rm PS},\) called the phase space weight. If we think of the integration as an integration over the unit cube, this weight needs to be combined with the matrix element squared \({\left|{\fancyscript{M}} \right|^2}.\) Once we compute the unique phase space configuration \((k_1, k_2, p_1 \cdots)_j\) corresponding to the vector \({\vec y}_j,\) the combined weight \(W = J_{\rm PS} {\left|{\fancyscript{M}} \right|^2}\) is the probability that this configuration will appear at the LHC. This means we do not only integrate over the phase space, we really simulate LHC events. The only complication is that the probability of a certain configuration is not only given by the frequency with which it appears, but also by the explicit weight. So when we run our numerical integration through the phase space generator and histogram all the distributions we are interested in we generate weighted events . These events, i.e. the momenta of all external particles and the weight W, we can for example store in a big file.

This simulation is not yet what experimentalists want—they want to represent the probability of a certain configuration appearing only by its frequency. Experimentally measured events do not come with a variable weight, either they are recorded or they are not. This means we have to unweight the events by translating the event weight into frequency.

There are two ways to do that. On the one hand, we can look at the minimum event weight and express all other events in relative probability to this event. Translating this relative event weight into a frequency means replacing an event with the relative weight \(W_j/W_{\rm min}\) by \(W_j/W_{\rm min}\) events in the same phase space point. The problem with this method is that we are really dealing with a binned phase space, so we would not know how to distribute these events in the given bin. Alternatively, we can start from to the maximum weight \(W_{\rm max}\) and compute the ratio \(W_j/W_{\rm max} \in [0,1].\) Keeping an event with a given probability means we can generate a flat random number \(r \in [0,1]\) and only keep it if \(W_j/W_{\rm max} > r.\) The challenge in this translation is that we always lose events. If it was not for the experimentalists we would hardly use such unweighted events , but they have good reasons to want such unweighted events which feed best through detector simulations.

The last comment is that if the phase space configuration \((k_1, k_2, p_1 \cdots )_j\) can be measured, its weight \(W_j\) better be positive. This is not trivial once we go beyond leading order. There, we need to add several contributions to produce a physical event, like for example different n-particle final states, and there is no guarantee for each of them to be positive. We have to ensure that after adding up all contributions and after integrating over any kind of unphysical degrees of freedom we might have introduced, the probability of a physics configuration is positive. From this point of view negative values for parton densities \(f(x)<0\) are in principle not problematic, as long as we always keep a positive hadronic rate \(d\sigma_{pp \to X}>0.\)

Going back to the numerical phase space integration for many particles, it faces two problems. First, the partonic phase space for n on-shell particles in the final state has \(3(n+2)-3\) dimensions. If we divide each of these directions in 100 bins, the number of phase space points we need to evaluate for a \((2 \to 4)\) process is \(100^{15}=10^{30},\) which is not realistic.

To integrate over a large number of dimensions we use Monte Carlo integration . In this approach we define a distribution \(p_Y(y)\) such that for a one-dimensional integral we can replace the binned discretized phase space integral by a discretized version based on a set of random numbers \(Y_j\) over the integration variables y

$$ \langle g(Y) \rangle = \int\nolimits_0^1 dy p_Y(y) g(y) \quad \longrightarrow \quad \frac{1}{N} \sum_j g(Y_j). $$
(2.47)

All we have to make sure is that the probability of returning \(Y_j\) is given by \(p_Y(y)\) for \(y < Y_j < y + dy.\) As mentioned above, also Eq. 2.47 has the advantage that we can naively generalize it to any number of N dimensions, just by organizing the random numbers \(Y_j\) in one large chain instead of an N-dimensional array. Our N-dimensional phase space integral shown in Eq. 2.43 we can re-write in the same manner

$$ \int\nolimits_0^1 d^Ny f(y) = \int\nolimits_0^1 d^Ny \frac{f(y)}{p_Y(y)} p_Y(y) = \left< \frac{f(Y)}{p_Y(Y)} \right> \rightarrow \frac{1}{N} \sum_j \frac{f(Y_j)}{p_Y(Y_j)}. $$
(2.48)

To compute the integral we have to average over all phase space values of \(f/p_Y.\) In the ideal case where we exactly know the form of the integrand and can map it into our random numbers, the error of the numerical integration will be zero. So what we have to find is a way to encode \(f(Y_j)\) into \(p_Y(Y_j).\) This task is called importance sampling and you can find some documentation for example on the standard implementation VEGAS to look at the details.

Technically, VEGAS will call the function which computes the weight \(W = J_{\rm PS} {\left|{\fancyscript{M}} \right|^2}\) for a number of phase space points and average over these points, but including another weight factor \(W_{\rm MC}\) representing the importance sampling. If we want to extract distributions via histograms we have to add the total weight \(W = W_{\rm MC} J_{\rm PS} {\left|{\fancyscript{M}} \right|^2}\) to the columns.

The second numerical challenge is that the matrix elements for interesting processes are by no means flat. We would therefore like to help our adaptive or importance sampling Monte Carlo by defining the integration variables such that the integrand becomes as flat as possible. For example for the integration over the partonic momentum fraction we know that the integrand usual falls off as \(1/x.\) In that situation we can substitute

$$ \int\nolimits_\delta dx \frac{C}{x} = \int\nolimits_{\log \delta} d \log x \left( \frac{d \log x}{dx} \right)^{-1} \frac{C}{x} = \int\nolimits_{\log \delta} d \log x C, $$
(2.49)

to obtain a flat integrand. There exists an even more impressive and relevant example: intermediate particles with Breit–Wigner propagators squared are particularly painful to integrate over the momentum \(s = p^2\) flowing through it

$$ P(s,m) = \frac{1}{(s-m^2)^2 + m^2 \Gamma^2}. $$
(2.50)

For example, a Standard-Model Higgs boson with a mass of 120 GeV has a width around \(0.005\,\hbox{GeV},\) which means that the integration over the invariant mass of the Higgs decay products \(\sqrt{s}\) requires a relative resolution of \(10^{-5}.\) Since this is unlikely to be achievable what we should really do is find a substitution which produces the inverse Breit–Wigner as a Jacobian and leads to a flat integrand—et voilà à

$$ \begin{aligned} [b] \int ds \frac{C}{(s-m^2)^2 + m^2 \Gamma^2} &= \int dz \left( \frac{dz}{ds} \right)^{-1} \frac{C}{(s-m^2)^2 + m^2 \Gamma^2} \\ &= \int dz \frac{(s-m^2)^2 + m^2 \Gamma^2}{m \Gamma} \frac{C} {(s-m^2)^2 + m^2 \Gamma^2} \\ &= \frac{1}{m \Gamma} \int dz C \quad \hbox{with} \quad \tan z =\frac{s - m^2}{m \Gamma} . \end{aligned} $$
(2.51)

This is the most useful phase space mapping in LHC physics. Of course, any adaptive Monte Carlo will eventually converge on such an integrand, but a well-chosen set of integration parameters will speed up simulations very significantly.

2.2 Ultraviolet Divergences

From general field theory we know that when we are interested for example in cross section prediction with higher precision we need to compute further terms in its perturbative series in \(\alpha_s.\) This computation will lead to ultraviolet divergences which can be absorbed into counter terms for any parameter in the Lagrangian. The crucial feature is that for a renormalizable theory like our Standard Model including a Higgs boson the number of counter terms is finite, which means once we know all parameters including their counter terms our theory becomes predictive.

In Sect. 2.3 we will see that in QCD processes we also encounter another kind of divergences. They arise from the infrared momentum regime. Infrared divergences is what this lecture is really going to be about, but before dealing with them it is very instructive to see what happens to the much better understood ultraviolet divergences. In Sect. 2.2.1 we will review how such ultraviolet divergences arise and how they are removed. In Sect. 2.2.2 we will review how running parameters appear in this procedure, i.e. how scale dependence is linked to the appearance of divergences. Finally, in Sect. 2.2.3 we will interpret the use of running parameters physically and see that in perturbation theory they re-sum classes of logarithms to all orders in perturbation theory. Later in Sect. 2.3 we will follow exactly the same steps for infrared divergences and develop some crucial features of hadron collider physics.

2.2.1 Counter Terms

Renormalization, i.e. the proper treatment of ultraviolet divergences, is one of the most important things to understand about field theories; more detailed discussions you can find in any book on advanced field theory. The aspect of renormalization which will guide us through this section is the appearance of the renormalization scale.

In perturbation theory, scales automatically arise from the regularization of infrared or ultraviolet divergences. We can see this writing down a simple scalar loop integral, corresponding to two virtual scalars with masses \(m_{1,2}\) and with the external momentum p flowing through a diagram similar to those summed in Sect. 2.1.2

$$ B(p^2;m_1,m_2) \equiv \int \frac{d^4q}{16 \pi^2} \; \frac{1} {q^2-m_1^2} \frac{1}{(q+p)^2-m_2^2}. $$
(2.52)

Such two-point functions appear for example in the gluon self energy, with massless scalars for ghosts, with a Dirac trace in the numerator for quarks, and with massive scalars for supersymmetric scalar quarks . In those cases the two masses are identical \(m_1 = m_2.\) The integration measure \(1/(16 \pi^2)\) is dictated by the usual Feynman rules for the integration over loop momenta. Counting powers of q in Eq. 2.52 we see that the integrand scales like \(1/q\) in the ultraviolet, so it is logarithmically divergent and we have to regularize it. Regularizing means expressing the divergence in a well-defined manner or scheme allowing us to get rid of it by renormalization.

One regularization scheme is to introduce a cutoff into the momentum integral \(\Lambda,\) for example through the so-called Pauli–Villars regularization. Because the ultraviolet behavior of the integrand or integral cannot depend on any parameter living at a small energy scales, the parameterization of the ultraviolet divergence in Eq. 2.52 cannot involve the mass m or the external momentum \(p^2.\) The scalar two-point function has mass dimension zero, so its divergence has to be proportional to \(\log (\Lambda/\mu_R)\) with a dimensionless prefactor and some scale \(\mu_R^2\) which is an artifact of the regularization of such a Feynman diagram.

A more elegant regularization scheme is dimensional regularization . It is designed not to break gauge invariance and naively seems to not introduce a mass scale \(\mu_R.\) When we shift the momentum integration from 4 to \(4 - 2\varepsilon\) dimensions and use analytic continuation in the number of space–time dimensions to renormalize the theory a renormalization scale \(\mu_R\) nevertheless appears once we ensure the two-point function and with it observables like cross sections keep their correct mass dimension

$$ \int \frac{d^4q}{16 \pi^2} \cdots \longrightarrow \mu_R^{2\varepsilon} \; \int \frac{d^{4-2 \varepsilon}q}{16 \pi^2} \cdots = \frac{i \mu_R^{2\varepsilon}}{(4 \pi)^2} \; \left[ \frac{C_{-1}}{\varepsilon} + C_0 + C_1 \varepsilon + {\fancyscript{O}}(\varepsilon^2) \right]. $$
(2.53)

At the end, the scale \(\mu_R\) might become irrelevant and drop out after renormalization and analytic continuation, but to be on the save side we keep it. The constants \(C_i\) in the series in \(1/\varepsilon\) depend on the loop integral we are considering. To regularize the ultraviolet divergence we assume \(\varepsilon>0\) and find mathematically well-defined poles \(1/\varepsilon.\) Defining scalar integrals with the integration measure \(1/(i \pi^2)\) will make for example \(C_{-1}\) come out as of the order \({\fancyscript{O}}(1).\) This is the reason we usually find factors \(1/(4 \pi)^2 = \pi^2/(2 \pi)^4\) in front of the loop integrals.

The poles in \(1/\varepsilon\) will cancel with the universal counter terms once we renormalize the theory. Counter terms we include by shifting parameters in the Lagrangian and the leading order matrix element. They cancel the poles in the combined leading order and virtual one-loop prediction

$$ \begin{aligned}[b] \left| {\fancyscript{M}}_{\rm LO}(g) + {\fancyscript{M}}_{\rm virt} \right|^2 &= \left| {\fancyscript{M}}_{\rm LO}(g) \right|^2 + 2 \hbox{Re}\, {\fancyscript{M}}_{\rm LO}(g) {\fancyscript{M}}_{\rm virt} + \cdots \\ &\to \left| {\fancyscript{M}}_{\rm LO}(g+\delta g) \right|^2 +2 \hbox{Re}\, {\fancyscript{M}}_{\rm LO}(g) {\fancyscript{M}}_{\rm virt} + \cdots \\ \hbox{with} \quad g &\to g^{\rm bare} = g + \delta g \quad \hbox{and} \quad \delta g \propto \alpha_s/\varepsilon . \end{aligned} $$
(2.54)

The dots indicate higher orders in \(\alpha_s,\) for example absorbing the \(\delta g\) corrections in the leading order and virtual interference. As we can see in Eq. 2.54 the counter terms do not come with a factor \(\mu_R^{2 \varepsilon}\) in front. Therefore, while the poles \(1/\varepsilon\) cancel just fine, the scale factor \(\mu_R^{2 \varepsilon}\) will not be matched between the actual ultraviolet divergence and the counter term.

We can keep track of the renormalization scale best by expanding the prefactor of the regularized but not yet renormalized integral in Eq. 2.53 in a Taylor series in \(\varepsilon,\) no question asked about convergence radii

$$ \begin{aligned} \mu_R^{2\varepsilon} \left[ \frac{C_{-1}}{\varepsilon} + C_0 + {\fancyscript{O}}(\varepsilon) \right] &= e^{2 \varepsilon \log \mu_R} \left[ \frac{C_{-1}}{\varepsilon} + C_0 + {\fancyscript{O}}(\varepsilon) \right] \\ &= \left[ 1 + 2 \varepsilon \log \mu_R + {\fancyscript{O}}(\varepsilon^2) \right] \left[ \frac{C_{-1}}{\varepsilon} + C_0 + {\fancyscript{O}}(\varepsilon) \right] \\ &= \frac{C_{-1}}{\varepsilon} + C_0 + C_{-1} \log \mu_R^2 + {\fancyscript{O}}(\varepsilon) \\ &\to \frac{C_{-1}}{\varepsilon} + C_0 + C_{-1} \log \frac{\mu_R^2}{M^2} + {\fancyscript{O}}(\varepsilon) . \end{aligned} $$
(2.55)

In the last step we have by hand corrected for the fact that \(\log \mu_R^2\) with a mass dimension inside the logarithm cannot appear in our calculations. From somewhere else in our calculation the scale dependence logarithm will be matched with a \(\log M^2\) where \(M^2\) is the typical mass or energy scale in our process. This little argument shows that also in dimensional regularization we introduce a mass scale \(\mu_R\) which appears as \(\log (\mu_R^2/M^2)\) in the renormalized expression for our observables. There is no way of removing ultraviolet divergences without introducing the renormalization scale if we keep track of the mass dimensions of our regularized result.

In Eq. 2.54 there appear two contributions to a given observable, the expected \(C_0\) and the renormalization induced \(C_{-1}.\) Because the factors \(C_{-1}\) are linked to the counter terms in the theory we can often guess them without actually computing the loop integral, which is very useful in cases where they numerically dominate.

Counter terms as they schematically appear in Eq. 2.54 are not uniquely defined. They need to include a given divergence, to return finite observables, but we are free to add any finite contribution we want. This opens many ways to define a counter term for example based on physical processes where counter terms do not only cancel the pole but also finite contributions at a given order in perturbation theory. Needless to say, such schemes do not automatically work universally. An example for such a physical renormalization scheme is the on-shell scheme for masses, where we define a counter term such that external on-shell particles do not receive any corrections to their masses. For the top mass this means that just like in Eq. 2.54 we replace the leading order mass with the bare mass, for which we then insert the expression in terms of the renormalized mass and the counter term

$$ \begin{aligned} m_{t}^{\rm bare} &= m_t + \delta m_t \\ &= m_t + m_t \frac{\alpha_s C_F}{4 \pi} \left( 3 \left(-\frac{1} {\varepsilon} + \gamma_E-\log (4 \pi) - \log \frac{\mu_R^2} {M^2} \right) -4 + 3 \log \frac{m_t^2}{M^2} \right) \\ &\equiv m_t + m_t \frac{\alpha_s C_F}{4 \pi} \left( -\frac{3} {\tilde \varepsilon} -4 + 3 \log \frac{m_t^2}{M^2} \right) \\ \Leftrightarrow & \quad \frac{1} {\tilde{\varepsilon}\left({\frac{\mu_R}{M}}\right)} \equiv \frac{1} {\varepsilon} - \gamma_E + \log \frac{4 \pi \mu_R^2}{M^2} . \end{aligned} $$
(2.56)

The convenient scale dependent pole \(1/\tilde{\varepsilon}\) includes the universal additional terms like the Euler gamma function and the scaling logarithm. This logarithm is the big problem in this universality argument, since we need to introduce the arbitrary energy scale M to separate the universal logarithm of the renormalization scale and the parameter-dependent logarithm of the physical process.

A theoretical problem with this on-shell renormalization scheme is that it is not gauge invariant. On the other hand, it describes for example the kinematic features of top pair production at hadron colliders in a stable perturbation series. This means that once we define a more appropriate scheme for heavy particle masses in collider production mechanisms it better be numerically close to the pole mass. For the computation of total cross sections at hadron colliders or the production thresholds at \(e^+ e^-\) colliders the pole mass is not well suited at all, but as we will see in Chap. 3 this is not where at the LHC we expect to measure particle masses, so we should do fine with something very similar to the pole mass.

Another example for a process dependent renormalization scheme is the mixing of \(\gamma\) and Z propagators. There we choose the counter term of the weak mixing angle such that an on-shell Z boson cannot oscillate into a photon, and vice versa. We can generalize this scheme for mixing scalars as they for example appear in supersymmetry, but it is not gauge invariant with respect to the weak gauge symmetries of the Standard Model either. For QCD corrections, on the other hand, it is the most convenient scheme keeping all exchange symmetries of the two scalars.

To finalize this discussion of process dependent mass renormalization we quote the result for a scalar supersymmetric quark, a squark, where in the on-shell scheme we find

$$ \begin{aligned} [b] m_{\tilde q}^{\rm bare} &= m_{\tilde q} + \delta m_{\tilde q} \\ & = m_{\tilde q} + m_{\tilde q} \frac{\alpha_s C_F}{4 \pi} \Bigg( - \frac{2r}{\tilde \varepsilon} - 1 - 3 r - \left( 1 - 2 r \right) \log r \\ & - \left( 1 - r \right)^2 \log \left| \frac{1} {r} - 1 \right| - 2 r \log \frac{m_{\tilde q}^2}{M^2} \Bigg) \end{aligned} $$
(2.57)

with \(r = m_{\tilde g}^2/m_{\tilde q}^2.\) The interesting aspect of this squark mass counter term is that it also depends on the gluino mass, not just the squark mass itself. The reason why QCD counter terms tend to depend only on the renormalized quantity itself is that the gluon is massless. In the limit of vanishing gluino contribution the squark mass counter term is again only proportional to the squark mass itself

$$ m_{\tilde q}^{\rm bare} \Bigg|_{m_{\tilde g} = 0} = m_{\tilde q} + \delta m_{\tilde q} = m_{\tilde q} + m_{\tilde q} \frac{\alpha_s C_F}{4 \pi} \left( - \frac{1}{\tilde \varepsilon} - 3 + \log \frac{m_{\tilde q}^2}{M^2} \right). $$
(2.58)

One common feature of all mass counter terms listed above is \(\delta m \propto m,\) which means that we actually encounter a multiplicative renormalization

$$ m^{\rm bare} = Z_m m = \left( 1 + \delta Z_m \right) m = \left( 1 + \frac{\delta m}{m} \right) m = m + \delta m, $$
(2.59)

with \(\delta Z_m = \delta m/m\) linking the two ways of writing the mass counter term. This form implies that particles with zero mass will not obtain a finite mass through renormalization. If we remember that chiral symmetry protects a Lagrangian from acquiring fermion masses this means that on-shell renormalization does not break this symmetry. A massless theory cannot become massive by mass renormalization. Regularization and renormalization schemes which do not break symmetries of the Lagrangian are ideal.

When we introduce counter terms in general field theory we usually choose a slightly more model independent scheme—we define a renormalization point. This is the energy scale at which the counter terms cancels all higher order contributions, divergent as well as finite. The best known example is the electric charge which we renormalize in the Thomson limit of zero momentum transfer through the photon propagator

$$ e \to e^{\rm bare} = e + \delta e . $$
(2.60)

Looking back at \(\delta m_t\) as defined in Eq. 2.56 we also see a way to define a completely general counter term: if dimensional regularization, i.e. the introduction of \(4 - 2\varepsilon\) dimensions does not break any of the symmetries of our Lagrangian, like Lorentz symmetry or gauge symmetries, we should simply subtract the ultraviolet pole and nothing else. The only question is: do we subtract \(1/\varepsilon\) (MS scheme) or do we subtract \(1/\tilde \varepsilon\) in the \(\overline{MS}\;scheme.\) In the \(\overline{\hbox{MS}}\) scheme the counter term is then scale dependent.

Carefully counting, there are three scales present in such a scheme. First, there is the physical scale in the process. In our case of a top self energy this is for example the top mass \(m_t\) appearing in the matrix element for the process \(pp \to t\bar{t}.\) Next, there is the renormalization scale \(\mu_R,\) a reference scale which is part of the definition of any counter term. And last but not least, there is the scale M separating the counter term from the process dependent result, which we can choose however we want, but which as we will see implies a running of the counter term. The role of this scale M will become clear when we go through the example of the running strong coupling \(\alpha_s.\) Of course, we would prefer to choose all three scales the same, but in a complex physical process this might not always be possible. For example, any massive \((2 \to 3)\) production process naturally involves several external physical scales.

Just a side remark for completeness: a one loop integral which has no intrinsic mass scale is the two-point function with zero mass in the loop and zero momentum flowing through the integral: \(B(p^2=0;0,0).\) It appears for example in the self energy corrections of external quarks and gluons. Based on dimensional arguments this integral has to vanish altogether. On the other hand, we know that like any massive two-point function it has to be ultraviolet divergent \(B \sim 1/\varepsilon_{\rm UV}\) because setting all internal and external mass scales to zero is nothing special from an ultraviolet point of view. This can only work if the scalar integral also has an infrared divergence appearing in dimensional regularization. We can then write the entire massless two-point function as

$$ B(p^2=0;0,0) = \int \frac{d^4q}{16 \pi^2} \; \frac{1}{q^2} \frac{1}{(q+p)^2} = \frac{i \pi^2}{16\pi^2} \left( \frac{1} {\varepsilon_{\rm UV}} - \frac{1}{\varepsilon_{\rm IR}} \right), $$
(2.61)

keeping track of the divergent contributions from the infrared and the ultraviolet regimes. For this particular integral they precisely cancel, so the result for \(B(0;0,0)\) is zero, but setting it to zero too early will spoil any ultraviolet and infrared finiteness tests in practical calculations. Treating the two divergences strictly separately and dealing with them one after the other also ensures that for ultraviolet divergences we can choose \(\varepsilon >0\) while for infrared divergences we require \(\varepsilon <0.\)

2.2.2 Running Strong Coupling

To get an idea what these different scales which appear in the process of renormalization mean let us compute such a scale dependent parameter, namely the running strong coupling \(\alpha_s(\mu_R^2)\). The Drell–Yan process is one of the very few relevant process at hadron colliders where the strong coupling does not appear at tree level, so we cannot use it as our toy process this time. Another simple process where we can study this coupling is bottom pair production at the LHC, where at some energy range we will be dominated by valence quarks: \(q \bar{q} \to b \bar{b}.\) The only Feynman diagram is an s-channel off-shell gluon with a momentum flow \(p^2 \equiv s.\)

At next-to-leading order this gluon propagator will be corrected by self energy loops, where the gluon splits into two quarks or gluons and re-combines before it produces the two final state bottoms. Let us for now assume that all quarks are massless. The Feynman diagrams for the gluon self energy include a quark look, a gluon loop, and the ghost loop which removes the unphysical degrees of freedom of the gluon inside the loop.

The gluon self energy correction or vacuum polarization, as propagator corrections to gauge bosons are usually labelled, will be a scalar, i.e. all fermion lines close and the Dirac trace is computed inside the loop. In color space the self energy will (hopefully) be diagonal, just like the gluon propagator itself, so we can ignore the color indices for now. In unitary gauge the gluon propagator is proportional to the transverse tensor \(T^{\mu {\it v}} = g^{\mu{\it v}} - p^{\it v} p^\mu/p^2.\) As mentioned in the context of the effective gluon-Higgs coupling, the same should be true for the gluon self energy, which we therefore write as \(\Pi^{\mu {\it v}} \equiv \Pi T^{\mu {\it v}}.\) A useful simple relation is \(T^{\mu {\it v}} T_{\it v}^\rho = T^{\mu \rho}\) and \(T^{\mu {\it v}} g_{\it v}^\rho = T^{\mu \rho}\). Including the gluon, quark, and ghost loops the regularized gluon self energy with a momentum flow \(p^2\) reads

$$ \begin{aligned}[b] -\dfrac{1}{p^2} \Pi\left( \dfrac{\mu_R^2}{p^2} \right) &=&& \dfrac{\alpha_s}{4 \pi} \left( -\dfrac{1}{\tilde \varepsilon} + \log \dfrac{p^2}{M^2} \right) \left( \dfrac{13}{6} N_c - \dfrac{2}{3} n_f \right) + {\fancyscript{O}}(\log m_t^2 ) \\ &\equiv && \alpha_s \left( - \dfrac{1}{\tilde \varepsilon} + \log \dfrac{p^2}{M^2} \right) b_0 + {\fancyscript{O}}(\log m_t^2 ) \\ &&&\hbox{with} \quad \fbox{$b_0 = \dfrac{1}{4 \pi} \left( \dfrac{11}{3} N_c - \dfrac{2}{3} n_f \right)$}\, . \end{aligned} $$
(2.62)

The minus sign arises from the factors i in the propagators, as shown in Eq. 2.19. The number of fermions coupling to the gluons is \(n_f\). From the comments on \(B(p^2;0,0)\) we could guess that the loop integrals will only give a logarithm \(\log p^2\) which is then matched by the logarithm \(\log M^2\) implicitly included in the definition of \(\tilde \varepsilon\). The factor \(b_0\) arises from the one-loop corrections to the gluon self energy, i.e. from diagrams which include one additional factor \(\alpha_s\). Strictly speaking, this form is the first term in a perturbative series in the strong coupling \(\alpha_s = g_s^2/(4 \pi)\). Later on, we will indicate where additional higher order corrections would enter.

In the second step of Eq. 2.62 we have sneaked in additional contributions to the renormalization of the strong coupling from the other one-loop diagrams in the process, replacing the factor 13/6 by a factor 11/3. This is related to the fact that there are actually three types of divergent virtual-gluon diagrams in the physical process \(q \bar{q} \rightarrow b \bar{b}{:}\) the external quark self energies with renormalization factors \(Z_f^{1/2},\) the internal gluon self energy \(Z_A,\) and the vertex corrections \(Z_{Aff}.\) The only physical parameters we can renormalize in this process are the strong coupling and, if finite, the bottom mass. Wave function renormalization constants are not physical. The entire divergence in our \(q \bar{q} \rightarrow b \bar{b}\) process which needs to be absorbed in the strong coupling \(Z_g\) is given by the combination

$$ Z_{Aff} = Z_g Z_A^{1/2} Z_f \quad \Leftrightarrow \quad \dfrac{Z_{Aff}}{Z_A^{1/2} Z_f} \equiv Z_g. $$
(2.63)

We can check this definition of \(Z_g\) by comparing all vertices in which the strong coupling \(g_s\) appears, namely the gluon coupling to quarks, ghosts as well as the triple and quartic gluon vertex. All of them need to have the same divergence structure

$$ \frac{Z_{Aff}}{Z_A^{1/2} Z_f} \,{\stackrel {!}=}\,\frac{Z_{A \eta \eta}}{Z_A^{1/2} Z_\eta} \,{\stackrel {!}=}\,\frac{Z_{3A}}{Z_A^{3/2}}\,{\stackrel {!}= }\,\sqrt{\frac{Z_{4A}}{Z_A^2}}. $$
(2.64)

If we had done the same calculation in QED, i.e. looked for a running electric charge, we would have found that the vacuum polarization diagrams for the photon do account for the entire counter term of the electric charge. The other two renormalization constants \(Z_{Aff}\) and \(Z_f\) cancel because of gauge invariance.

In contrast to QED, the strong coupling diverges in the Thomson limit because QCD is confined towards large distances and weakly coupled at small distances. Lacking a well-enough motivated reference point we are tempted to renormalize \(\alpha_s\) in the \({\overline{\hbox{MS}}}\) scheme. From Eq. 2.62 we know that the ultraviolet pole which needs to be cancelled by the counter term is proportional to the function \(b_0\)

$$ \begin{aligned}[b] g_s^{\rm{bare}} &= Z_g g_s = \left( 1 + \delta Z_g \right) g_s = \left( 1 + {\frac{\delta g_s}{g_s}} \right) g_s\\ (g_s^2)^{\rm{bare}} &= ( Z_g g_s )^2 = \left( 1 + {\frac{\delta g_s}{g_s}} \right)^2 g_s^2 = \left( 1 + 2 {\frac{\delta g_s}{g_s}} \right) g_s^2 = \left( 1 + {\frac{\delta g_s^2}{g_s^2}} \right) g_s^2 \\ \alpha_s^{\rm{bare}} &=\left( 1 + {\frac{\delta \alpha_s}{\alpha_s}} \right) \alpha_s\\ &{\stackrel{!}{=}} \left( 1- {\frac{\Pi}{p^2}} \Bigg|_{\rm{pole}} \right) \alpha_s(M^2) = \left( 1- {\frac{\alpha_s}{\tilde \varepsilon \left( {\frac{\mu_R}{M}} \right) }} b_0 \right) \alpha_s(M^2). \end{aligned} $$
(2.65)

Only in the last step we have explicitly included the scale dependence of the counter term. Because the bare coupling does not depend on any scales, this means that \(\alpha_s\) depends on the unphysical scale M. Similar to the top mass renormalization scheme we can switch to a more physical scheme for the strong coupling as well: we can absorb also the finite contributions of \(\Pi(\mu_R^2/p^2)\) into the strong coupling by simply identifying \(M^2 = p^2.\) This implies based again on Eq. (6.62)

$$ \alpha_s^{\rm{bare}} = \alpha_s(p^2) \left( 1 - {\frac{\alpha_s b_0}{\tilde \varepsilon}}+ \alpha_s b_0 \log {\frac{p^2}{M^2}} \right) $$
(2.66)

This formula defines a running coupling \(\alpha_s(p^2),\) because the definition of the coupling now has to account for a possible shift between the original argument \(p^2\) and the scale \(M^2\) coming out of the \({\overline{\hbox{MS}}}\) scheme. Since according to Eqs. 2.65 and 2.66 the bare strong coupling can be expressed in terms of \(\alpha_s(M^2)\) as well as in terms of \(\alpha_s(p^2)\) we can link the two scales through

$$ \begin{aligned} [b] \alpha_s(M^2) &= \alpha_s(p^2) + \alpha_s^2 b_0 \log {\frac{p^2}{M^2}}\\ \Leftrightarrow \quad {\frac{d \alpha_s(p^2)}{d \log p^2}} &= - \alpha_s^2 b_0 + {\fancyscript{O}}(\alpha_s^3) \end{aligned} $$
(2.67)

To the given loop order the argument of the strong coupling squared in this formula can be neglected—its effect is of higher order.

In this first formula for the running coupling constant we see that \(b_0\) is positive in the Standard Model. This means that for \(p^2 > M^2\) the ultraviolet limit of the strong coupling is zero. This makes QCD an asymptotically free theory. We can compute the function \(b_0\) in general models by simply adding all contributions of strongly interacting particles in this loop

$$ b_0 = - {\frac{1}{12 \pi}} \; \sum_ {\rm colored\, states} D_j \; T_{R,j}, $$
(2.68)

where we need to know some kind of counting factor \(D_j\) which is \(-11\) for a vector boson (gluon), +4 for a Dirac fermion (quark), +2 for a Majorana fermion (gluino), +1 for a complex scalar (squark) and +1/2 for a real scalar. The color charges are \(T_R=1/2\) for the fundamental representation of SU(3) and \(C_A = N_c\) for the adjoint representation. The masses of the loop particles are not relevant in this approximation because we are only interested in the ultraviolet regime of QCD where all particles can be regarded massless. When we really model the running of \(\alpha_s\) we need to take into account threshold effects of heavy particles, because particles can only contribute to the running of \(\alpha_s\) at scales above their mass scale. This is why the R ratio computed in Eq. 2.12 is so interesting once we vary the energy of the incoming electron–positron pair.

We can do even better than this fixed order in perturbation theory: while the correction to \(\alpha_s\) in Eq. 2.66 is perturbatively suppressed by the usual factor \(\alpha_s/(4 \pi)\) it includes a logarithm of a ratio of scales which does not need to be small. Instead of simply including these gluon self energy corrections at a given order in perturbation theory we can instead include chains of one-loop diagrams with \(\Pi\) appearing many times in the off-shell gluon propagator. This series of Feynman diagrams is identical to the one we sum for the mass renormalization in Eq. 2.19. It means we replace the off-shell gluon propagator by (schematically written without the factors i)

$$ \begin{aligned}[b] {\frac{T^{\mu {\it v}}}{p^2}} \rightarrow & {\frac{T^{\mu {\it v}}}{p^2}} + \left( {\frac{T}{p^2}} \cdot (-T \Pi) \cdot {\frac{T}{p^2}} \right)^{\mu {\it v}} \\ & + \left( {\frac{T}{p^2}} \cdot (-T \Pi) \cdot {\frac{T}{p^2}} \cdot (-T \Pi) \cdot {\frac{T}{p^2}} \right)^{\mu {\it v}} + \cdots \\ = & {\frac{T^{\mu {\it v}}}{p^2}} \sum_{j=0}^\infty \left( -{\frac{\Pi}{p^2}} \right)^j = {\frac{T^{\mu {\it v}}}{p^2}} \; {\frac{1}{1 + \Pi/p^2}}. \end{aligned} $$
(2.69)

To avoid indices we abbreviate \(T^{\mu {\it v}} T_{\it v}^\rho = T \cdot T\) which make sense because of \((T \cdot T \cdot T)^{\mu {\it v}} = T^{\mu \rho} T^\sigma_\rho T_\sigma^{\it v} = T^{\mu {\it v}}.\) This re-summation of the logarithm which appears in the next-to-leading order corrections to \(\alpha_s\) moves the finite shift in \(\alpha_s\) shown in Eqs. 2.62 and 2.66 into the denominator, while we assume that the pole will be properly taken care off in any of the schemes we discuss

$$ \alpha_s^{\rm{bare}} = \alpha_s(M^2) - {\frac{\alpha_s^2 b_0}{\tilde{\varepsilon}}} \equiv {\frac{\alpha_s(p^2)}{1 - \alpha_s \; b_0 \; \log {\frac{p^2}{M^2}}}} - {\frac{\alpha_s^2 b_0}{\tilde{\varepsilon}}}. $$
(2.70)

Just as in the case without re-summation, we can use this complete formula to relate the values of \(\alpha_s\) at two reference points, i.e. we consider it a renormalization group equation (RGE) which evolves physical parameters from one scale to another in analogy to the fixed order version in Eq. 2.67

$$ {\frac{1}{\alpha_s(M^2)}} ={\frac{1}{\alpha_s(p^2)}} \left( 1- \alpha_s \;b_0 \;\log {\frac{p^2}{M^2}}\right) = {\frac{1}{\alpha_s(p^2)}}- b_0 \; \log {\frac{p^2}{M^2}}+ {\fancyscript{O}}(\alpha_s). $$
(2.71)

The factor \(\alpha_s\) inside the parentheses we can again evaluate at either of the two scales, the difference is going to be a higher order effect. When we differentiate \(\alpha_s(p^2)\) with respect to the momentum transfer \(p^2\) we find, using the relation \(d/d x (1/\alpha_s) = - 1/\alpha_s^2 \; d \alpha_s/dx\)

$$ \begin{aligned} [b] {\dfrac{1}{\alpha_s}} \; {\dfrac{d \alpha_s}{d \log p^2}}& = - \alpha_s {\dfrac{d}{d \log p^2}} \; {\dfrac{1}{\alpha_s}} = - \alpha_s b_0 + {\fancyscript{O}}(\alpha_s^2)\\ &\fbox{${ p^2 {\dfrac{d \alpha_s}{d p^2}} \equiv {\dfrac{d \alpha_s}{d \log p^2}} = \beta = -\alpha_s^2 \displaystyle \sum_{n=0} b_n \alpha_s^n }$} \end{aligned} $$
(2.72)

This is the famous running of the strong coupling constant including all higher order terms \(b_n.\)

It is customary to replace the renormalization point of \(\alpha_s\) in Eq. 2.70 with a reference scale where the denominator of the re-summed running coupling crosses zero and the running coupling itself should diverge. This is the Landau pole of the strong coupling, as discussed for the Higgs self coupling in Sect. 1.2.3. At one loop order this reads

$$ \begin{aligned}[b] 1 - \alpha_s \;b_0 \; \log {\frac{{\Lambda_{\rm{QCD}}^2}}{M^2}} &\,{\stackrel{!}{=}}\,0 \quad \Leftrightarrow \quad \log {\frac{{\Lambda_{\rm{QCD}}^2}}{M^2}} = {\frac{1}{\alpha_s(M^2) b_0}} \\ \log {\frac{p^2}{M^2}} &= \log {\frac{p^2}{{\Lambda_{\rm{QCD}}^2}}} + {\frac{1}{\alpha_s(M^2) b_0}} \\ {\frac{1}{\alpha_s(p^2)}} &= {\frac{1}{\alpha_s(M^2)}}+ b_0 \; \log {\frac{p^2}{M^2}}\\ &= {\frac{1}{\alpha_s(M^2)}} \;+ b_0 \log {\frac{p^2} {{\Lambda_{\rm{QCD}}^2}}} - {\frac{1}{\alpha_s(M^2)}}= b_0\log {\frac{p^2}{{\Lambda_{\rm{QCD}}^2}}} \\ & \fbox{${\alpha_s(p^2) = {\dfrac{1}{b_0 \log{\dfrac{p^2} {{\Lambda_{\rm{QCD}}^2}}}}}}$} \end{aligned} $$
(2.73)

This scheme can be generalized to any order in perturbative QCD and is not that different from the Thomson limit renormalization scheme of QED, except that with the introduction of \({\Lambda_{\rm{QCD}}}\) we are choosing a reference point which is particularly hard to compute perturbatively. One thing that is interesting in the way we introduce \({\Lambda_{\rm{QCD}}}\) is the fact that we introduce a scale into our theory without ever setting it. All we did was renormalize a coupling which becomes strong at large energies and search for the mass scale of this strong interaction. This trick is called dimensional transmutation .

In terms of language, there is a little bit of confusion between field theorists and phenomenologists: up to now we have introduced the renormalization scale \(\mu_R\) as the renormalization point, for example of the strong coupling constant. In the \({\overline{\hbox{MS}}}\) scheme, the subtraction of \(1/\tilde{\varepsilon}\) shifts the scale dependence of the strong coupling to \(M^2\) and moves the logarithm \(\log M^2/\Lambda^2_{\rm{QCD}}\) into the definition of the renormalized parameter. This is what we will from now on call the renormalization scale in the phenomenological sense, i.e. the argument we evaluate \(\alpha_s\) at. Throughout this section we will keep the symbol M for this renormalization scale in the \({\overline{\hbox{MS}}}\) scheme, but from Sect. 2.3 on we will shift back to \(\mu_R\) instead of M as the argument of the running coupling.

2.2.3 Re-Summing Scaling Logarithms

In the last Sect. 2.2.2 we have introduced the running strong coupling in a fairly abstract manner. For example, re-summing diagrams and changing the running of \(\alpha_s\) from Eqs. 2.67 to 2.72 we did not yet link to any physics. In what way does the re-summation of the one-loop diagrams for the s channel gluon improve our prediction of the bottom pair production rate at the LHC?

To illustrate those effects we best look at a simple observable which depends on just one energy scale, namely \(p^2.\) The first observable coming to mind is again the Drell–Yan cross section \(\sigma(q \bar{q} \rightarrow \mu^+ \mu^-),\) but since we are not really sure what to do with the parton densities which are included in the actual hadronic observable, we better use an observable at an \(e^+ e^-\) collider. Something that will work and includes \(\alpha_s\) at least in the one-loop corrections is the R parameter given in Eq. 2.12

$$ R = {\frac{\sigma(e^+ e^- \rightarrow {\hbox{hadrons}})}{\sigma(e^+ e^- \rightarrow \mu^+ \mu^-)}} = N_c \sum_{\rm{quarks}} Q_q^2 = {\frac{11 N_c}{9}}. $$
(2.74)

The numerical value at leading order assumes five quarks. Including higher order corrections we can express the result in a power series in the renormalized strong coupling \(\alpha_s.\) In the \({\overline{\hbox{MS}}}\) scheme we subtract \(1/\tilde \varepsilon(\mu_R/M)\) and in general include a scale dependence on M in the individual prefactors \(r_n\)

$$ R \left( {\frac{p^2}{M^2}}, \alpha_s \right) = \sum_{n=0} \; r_n\left( {\frac{p^2}{M^2}} \right) \; \alpha_s^n(M^2) \quad\quad r_0 = {\frac{11 N_c}{9}}. $$
(2.75)

The \(r_n\) we can assume to be dimensionless—if they are not, we can scale R appropriately using \(p^2.\) This implies that the \(r_n\) only depend on ratios of two scales, the externally fixed \(p^2\) on the one hand and the artificial \(M^2\) on the other.

At the same time we know that R is an observable, which means that including all orders in perturbation theory it cannot depend on any artificial scale choices M. Writing this dependence as a total derivative and setting it to zero we find an equation which would be called a Callan–Symanzik equation if instead of the running coupling we had included a running mass

$$ \begin{aligned} 0\,{\stackrel{!}=}& \; M^2 {\frac{d}{d M^2}} R \left( {\frac{p^2}{M^2}}, \alpha_s(M^2) \right)= M^2 \left[ {\frac{\partial}{{\partial} {M^2}}} + {\frac{{\partial}\alpha_s}{{\partial} {M^2}}} {\frac{{\partial}}{{\partial}\alpha_s}} \right]R \left( {\frac{p^2}{M^2}}, \alpha_s \right)\\ =& \left[ M^2 \frac{{\partial}}{{\partial}{M^2}}+ \beta {\frac{{\partial}}{{\partial}\alpha_s}} \right] \;\sum_{n=0} \; r_n\left( {\frac{p^2}{M^2}} \right) \; \alpha_s^n\\ =& \; \sum_{n=1}M^2 {\frac{{\partial}r_n}{{\partial}M^2}} \alpha_s^n + \sum_{n=1}\beta r_n n \alpha_s^{n-1} \quad {\hbox {with}} \quad r_0 = {\frac{11 N_c}{9}} = {\hbox {const}}\\ =& \; M^2 \sum_{n=1}{\frac{{\partial}r_n}{{\partial}M^2}} \alpha_s^n - \sum_{n=1} \sum_{m=0}n r_n \alpha_s^{n+m+1} b_m \quad {\hbox {with}} \quad \beta = -\alpha_s^2 \sum_{m=0} b_m \alpha_s^m\\ =& \; M^2 {\frac{{\partial}r_1}{{\partial}M^2}} \alpha_s + \left( M^2 {\frac{{\partial}r_2}{{\partial}M^2}}- r_1 b_0 \right) \alpha_s^2\\ &+ \left( M^2 {\frac{{\partial}r_3}{{\partial}M^2}} - r_1 b_1 - 2 r_2 b_0 \right) \alpha_s^3+ {\fancyscript{O}}(\alpha_s^4). \end{aligned}$$
(2.76)

In the second line we have to remember that the M dependence of \(\alpha_s\) is already included in the appearance of \(\beta,\) so \(\alpha_s\) should be considered a variable by itself. This perturbative series in \(\alpha_s\) has to vanish in each order of perturbation theory. The non-trivial structure, namely the mix of \(r_n\) derivatives and the perturbative terms in the \(\beta\) function we can read off the \(\alpha_s^3\) term in Eq. 2.76: first, we have the appropriate NNNLO corrections \(r_3.\) Next, we have one loop in the gluon propagator \(b_0\) and two loops for example in the vertex \(r_2.\) And finally, we need the two-loop diagram for the gluon propagator \(b_1\) and a one-loop vertex correction \(r_1.\) The kind-of Callan–Symanzik equation Eq. 2.76 requires

$$ \begin{aligned}[b] {\frac{{\partial}r_1}{{\partial}\log M^2}} &= 0 \\ {\frac{{\partial}r_2}{{\partial}\log M^2}} &= r_1 b_0 \\ {\frac{{\partial}r_3}{{\partial}\log M^2}} &= r_1 b_1 + 2 r_2(M^2) b_0 \\ \cdots \end{aligned} $$
(2.77)

The dependence on the argument \(M^2\) vanishes for \(r_0\) and \(r_1.\) Keeping in mind that there will be integration constants \(c_n\) independent of \(M^2\) and that another, in our simple case unique momentum scale \(p^2\) has to cancel the mass units inside \(\log M^2\) we find

$$ \begin{aligned} [b] r_0 &= c_0 = {\frac{11 N_c}{9}}\\ r_1 &= c_1\\ r_2 &= c_2 + c_1 b_0 \log {\frac{M^2}{p^2}}\\ \end{aligned} $$
$$ \begin{aligned} [b] r_3 &= \int d \log {\frac{{M^{\prime}}^2}{p^2}} \left( c_1 b_1 + 2 \left( c_2 + c_1 b_0 \log {\frac{{M^{\prime}}^2}{p^2}} \right) b_0 \right) \log {\frac{{M^{\prime}}^2}{p^2}}\\ &= c_3+ \left( c_1 b_1+ 2 c_2 b_0 \right) \log {\frac{M^2}{p^2}} + c_1 b_0^2 \log^2 {\frac{M^2}{p^2}}\\ \cdots \end{aligned} $$
(2.78)

This chain of \(r_n\) values looks like we should interpret the apparent fixed-order perturbative series for R in Eq. 2.75 as a series which implicitly includes terms of the order \(\log^{n-1} M^2/p^2\) in each \(r_n.\) They can become problematic if this logarithm becomes large enough to spoil the fast convergence in terms of \(\alpha_s\,{\sim}\,0.1,\) evaluating the observable R at scales far away from the scale choice for the strong coupling constant M.

Instead of the series in \(r_n\) we can use the conditions in Eq. 2.78 to express \(R\) in terms of the \(c_n\) and collect the logarithms appearing with each \(c_n.\) The geometric series we then resum to

$$ \begin{aligned} [b] R =& \sum_n r_n \left( {\frac{p^2}{M^2}} \right) \; \alpha_s^n(M^2)\\ =& \; c_0+ c_1 \left( 1 + \alpha_s b_0 \log {\frac{M^2}{p^2}} + \alpha_s^2 b_0^2 \log^2 {\frac{M^2}{p^2}} +\cdots \right) \alpha_s(M^2)\\ &+ c_2 \left( 1 + 2 \alpha_s b_0 \log \frac{M^2}{p^2} +\cdots \right) \alpha_s^2(M^2) + \cdots\\ =& \; c_0+ c_1 {\frac{\alpha_s(M^2)}{1 - \alpha_s b_0 \log {\frac{M^2}{p^2}}}} + c_2 \left({\frac{\alpha_s(M^2)}{1 - \alpha_s b_0 \log {\frac{M^2}{p^2}}}} \right)^2 + \cdots\\ \equiv& \sum c_n \; \alpha_s^n(p^2). \end{aligned} $$
(2.79)

In the last step we use what we know about the running coupling from Eq. 2.71. Note that in contrast to the \(r_n\) integration constants the \(c_n\) are by definition independent of \(p^2/M^2\) and therefore more suitable as a perturbative series in the presence of potentially large logarithms.

This new organization of the QCD perturbation series for \(R\) can be interpreted as re-summing all logarithms of the kind \(\log M^2/p^2\) in the new organization of the perturbative series and absorbing them into the running strong coupling evaluated at the scale \(p^2.\) In this manner, all scale dependence in the perturbative series for the dimensionless observable R is moved into \(\alpha_s.\) In Eq. 2.79 we also see that this series in \(c_n\) will never lead to a scale-invariant result when we include a finite order in perturbation theory.

Before moving on we collect the logic of the argument given in this section: when we regularize an ultraviolet divergence we automatically introduce a reference scale \(\mu_R.\) Naively, this could be a ultraviolet cutoff scale, but even the seemingly scale invariant dimensional regularization in the conformal limit of our field theory cannot avoid the introduction of a scale. There are several ways of dealing with such a scale: first, we can renormalize our parameter at a reference point. Secondly, we can define a running parameter, i.e. absorb the scale logarithm into the \({\overline{\hbox{MS}}}\) counter term. In that case introducing \({\Lambda_{\rm{QCD}}}\) leaves us with a compact form of the running coupling \(\alpha_s(M^2;{\Lambda_{\rm{QCD}}}).\)

Strictly speaking, at each order in perturbation theory the scale dependence should vanish together with the ultraviolet poles, as long as there is only one scale affecting a given observable. However, defining the running strong coupling we sum one-loop vacuum polarization graphs. Even when we compute an observable at a given loop order, we implicitly include higher order contributions. They lead to a dependence of our perturbative result on the artificial scale \(M^2,\) which phenomenologists refer to as renormalization scale dependence.

Using the R ratio we see what our definition of the running coupling means in terms of re-summing logarithms: reorganizing our perturbative series to get rid of the ultraviolet divergence \(\alpha_s(p^2)\) re-sums the scale logarithms \(\log p^2/M^2\) to all orders in perturbation theory. We will need this picture once we introduce infrared divergences in the following section.

2.3 Infrared Divergences

After this brief excursion into ultraviolet divergences and renormalization we can return to the original example, the Drell–Yan process. Last, we wrote down the hadronic cross sections in terms of parton distributions at leading order in Eq. 2.39. At this stage particle distributions (pdfs) are only functions of the collinear momentum fraction of the partons inside the proton about which from a theory point of view we only know a set of sum rules.

The perturbative question we need to ask for \(\mu^+ \mu^-\) production at the LHC is: what happens if together with the two leptons we produce additional jets which for one reason or another we do not observe in the detector. Such jets could for example come from the radiation of a gluon from the initial state quarks. In Sect. 2.3.1 we will study the kinematics of radiating such jets and specify the infrared divergences this leads to. In Sects. 2.3.2 and 2.3.3 we will show that these divergences have a generic structure and can be absorbed into a re-definition of the parton densities, similar to an ultraviolet renormalization of a Lagrangian parameter. In Sects. 2.3.4 and 2.3.5 we will again follow the example of the ultraviolet divergences and specify what absorbing these divergences means in terms logarithms appearing in QCD calculations.

Throughout this writeup we will use the terms jets and final state partons synonymously. This is not really correct once we include jet algorithms and hadronization. On the other hand, in Sect. 3.1.2 we will see that the purpose of a jet algorithm is to take us from some kind of energy deposition in the calorimeter to the parton radiated in the hard process. The two should therefore be closely related.

2.3.1 Single Jet Radiation

Let us get back to the radiation of additional partons in the Drell–Yan process. We can start for example by computing the cross section for the partonic process \(q \bar{q} \rightarrow Z g.\) However, this partonic process involves renormalization of ultraviolet divergences as well as loop diagrams which we have to include before we can say anything reasonable, i.e. ultraviolet and infrared finite.

To make life easier and still learn about the structure of collinear infrared divergences we instead look at the crossed process \(q g \rightarrow Z q.\) It should behave similar to any other \((2 \rightarrow 2)\) jet radiation, except that it has a different incoming state than the leading-order Drell–Yan process and hence does not involve virtual corrections. This means we do not have to deal with ultraviolet divergences and renormalization, and can concentrate on parton or jet radiation from the initial state. Moreover, let us go back to Z production instead of a photon, to avoid confusion with massless particles which are not radiated jets.

The amplitude for this \((2 \rightarrow 2)\) process is—modulo charges and averaging factors, but including all Mandelstam variables

$$ {\left|{\fancyscript{M}} \right|^2}\sim - {\frac{t}{s}} - {\frac{s^2 -2 m_Z^2 (s + t - m_Z^2)}{st}}. $$
(2.80)

The Mandelstam variable t for one massless final state particle can be expressed as \(t = -s (1-\tau) y\) in terms of the rescaled gluon-emission angle \(y=(1 - \cos \theta)/2\) and \(\tau = m_Z^2/s.\) Similarly, we obtain \(u = -s (1-\tau) (1-y),\) so as a first check we can confirm that \(t+u=-s(1-\tau) = -s+m_Z^2.\) The collinear limit when the gluon is radiated in the beam direction is given by \(y \rightarrow 0,\) corresponding to negative \(t \rightarrow 0\) with finite \(u=-s+m_Z^2.\) In this limit the matrix element can also be written as

$$ {\left|{\fancyscript{M}} \right|^2}\sim {\frac{s^2 - 2 s m_Z^2 + 2 m_Z^4}{s(s-m_Z^2)}} \; {\frac{1}{y}} + {\fancyscript{O}}(y). $$
(2.81)

This expression is divergent for collinear gluon radiation or splitting, i.e. for small angles y. We can translate this \(1/y\) divergence for example into the transverse momentum of the gluon or Z

$$ s p_T^2 = t u = s^2 (1 - \tau)^2 \; y (1-y) = (s-m_Z^2)^2 y + {\fancyscript{O}}(y^2) $$
(2.82)

In the collinear limit our matrix element squared in Eq. 2.81 becomes

$$ {\left|{\fancyscript{M}} \right|^2}\sim {\frac{s^2 - 2 s m_Z^2 + 2 m_Z^4}{s^2}} \; {\frac{(s-m_Z^2)}{p_T^2}} + {\fancyscript{O}}(p_T^0). $$
(2.83)

The matrix element for the tree-level process \(q g \rightarrow Z q\) has a leading divergence proportional to \(1/p_T^2.\) To compute the total cross section for this process we need to integrate the matrix element over the entire two-particle phase space. Starting from Eq. 2.41 and using the appropriate Jacobian this integration can be written in terms of the reduced angle y. Approximating the matrix element as \(C^{\prime}/y\) or \(C/p_T^2,\) we then integrate

$$ \begin{aligned} [b] \int\nolimits_{y^{\rm{min}}}^{y^{\rm{max}}} d y \; {\frac{C^{\prime}}{y}} =& \int\nolimits_{p_T^{\rm{min}}}^{p_T^{\rm{max}}} d p_T^2 \; {\frac{C}{p_T^2}} = \; 2 \int\nolimits_{p_T^{\rm{min}}}^{p_T^{\rm{max}}} d p_T \; p_T \; {\frac{C}{p_T^2}}\\ \simeq& \; 2 C \int\nolimits_{p_T^{\rm{min}}}^{p_T^{\rm{max}}} d p_T {\frac{1}{p_T}} = 2 C \; \log {\frac{p_T^{\rm{max}}}{p_T^{\rm{min}}}} \end{aligned} $$
(2.84)

The form \(C/p_T^2\) for the matrix element is of course only valid in the collinear limit; in the non-collinear phase space C is not a constant. However, Eq. 2.84 describes well the collinear divergence arising from quark radiation at the LHC.

Next, we follow the same strategy as for the ultraviolet divergence. First, we regularize the divergence for example using dimensional regularization. Then, we find a well-defined way to get rid of it. Dimensional regularization means writing the two-particle phase space in \(n=4-2 \varepsilon\) dimensions. Just for reference, the complete formula in terms of the angular variable y reads

$$ s \; {\frac{d \sigma}{d y}} = {\frac{\pi (4 \pi)^{-2+\varepsilon}}{\Gamma(1-\varepsilon)}} \; \left( {\frac{\mu_F^2}{m_Z^2}} \right)^\varepsilon \; {\frac{\tau^\varepsilon (1-\tau)^{1-2 \varepsilon}}{y^\varepsilon (1-y)^\varepsilon}} {\left |{\fancyscript{M}} \right|^2} \sim \left( {\frac{\mu_F^2}{m_Z^2}} \right)^\varepsilon \; {\frac{\left |{\fancyscript{M}} \right|^2}{y^\varepsilon (1-y)^\varepsilon}} \,. $$
(2.85)

In the second step we only keep the factors we are interested in. The additional factor \(1/y^\varepsilon\) regularizes the integral at \(y \rightarrow 0,\) as long as \(\varepsilon<0\) by slightly increasing the suppression of the integrand in the infrared regime. After integrating the leading collinear divergence \(1/y^{1+\varepsilon}\) we are left with a pole \(1/(-\varepsilon).\) This regularization procedure is symmetric in \(y \leftrightarrow (1-y).\) What is important to notice is again the appearance of a scale \(\mu_F^{2 \varepsilon}\) with the n-dimensional integral. This scale arises from the infrared regularization of the phase space integral and is referred to as factorization scale . The actual removal of the infrared pole—corresponding to the renormalization in the ultraviolet case—is called mass factorization and works exactly the same way as renormalizing a parameter: in a well-defined scheme we simply subtract the pole from the fixed-order matrix element squared.

2.3.2 Parton Splitting

From the discussion of the process \(q g \rightarrow Z q\) we can at least hope that after taking care of all other infrared and ultraviolet divergences the collinear structure of the process \(q \bar{q} \rightarrow Z g\) will be similar. In this section we will show that we can indeed write all collinear divergences in a universal form, independent of the hard process which we choose as the Drell–Yan process. In the collinear limit, the radiation of additional partons or the splitting into additional partons will be described by universal splitting functions .

Infrared divergences occur for massless particles in the initial or final state, so we need to go through all ways incoming or outgoing gluons and quark can split into each other. The description of the factorized phase space, with which we will start, is common to all these different channels. The first and at the LHC most important case is the splitting of one gluon into two, shown in Fig. 2.1, where the two daughter gluons are close to mass shell while the mother has to have a finite positive invariant mass \(p_a^2 \gg p_b^2, p_c^2.\) We again assign the direction of the momenta as \(p_a = - p_b - p_c,\) which means we have to take care of minus signs in the particle energies. The kinematics of this approximately collinear process we can describe in terms of the energy fractions z and \(1-z\) defined as

$$ z = {\frac{|E_b|}{|E_a|}} = 1 - {\frac{|E_c|}{|E_a|}}, $$
(2.86)

which means for the four momentum of the splitting particle

$$ \begin{aligned}[b] p_a^2 & = 2 (p_b p_c) = 2 z (1-z) (1 - \cos \theta ) E_a^2 = z (1-z) E_a^2 \theta^2 + {\fancyscript{O}}(\theta^4) \\ \Leftrightarrow \quad \theta &\equiv \theta_b + \theta_c \simeq {\frac{1}{|E_a|}} \; \sqrt{ {\frac{p_a^2}{z (1-z)}} }, \end{aligned} $$
(2.87)

in the collinear limit and in terms of the opening angle \(\theta\) between \(\vec{p}_b\) and \(\vec{p}_c.\) Because \(p_a^2 >0\) we call this final state splitting configuration time-like branching . For this configuration we can write down the so-called Sudakov decomposition of the four-momenta

$$ - p_a = p_b + p_c =\left( - z p_a + \beta n + p_T \right) \; + \; \left( - (1-z) p_a - \beta n - p_T \right)\!. $$
(2.88)

It defines an arbitrary unit four-vector n, a component orthogonal to the mother momentum and n, i.e. \(p_a (p_a p_T) = 0 = (n p_T),\) and a free factor \(\beta.\) This way, we can specify n such that it defines the direction of the \(p_b-p_c\) decay plane. In this decomposition we can set only one invariant mass to zero, for example that of a radiated gluon \(p_c^2=0.\) The second final state will have a finite invariant mass \(p_b^2 \neq 0.\)

Fig. 2.1
figure 1

Splitting of one gluon into two gluons. Figure from Ref. [1]

Relative to \(\vec{p}_a\) we can split the opening angle \(\theta\) for massless partons according to Fig. 2.1

$$ \begin{aligned} [b] \theta &= \theta_b + \theta_c \quad \hbox {and} \quad {\frac{\theta_b}{\theta_c}} = {\frac{p_T}{|E_b|}} \left( {\frac{p_T}{|E_c|}} \right)^{-1} = {\frac{1-z}{z}}\\ \quad \Leftrightarrow \quad \theta &= {\frac{\theta_b}{1-z}} = {\frac{\theta_c}{z}} \end{aligned} $$
(2.89)

Using this specific phase space parameterization we can divide an \((n+1)\)-particle process into an n-particle process and a splitting process of quarks and gluons. First, this requires us to split the \((n+1)\)-particle phase space alone into an n-particle phase space and the collinear splitting. The general \((n+1)\)-particle phase space separating off the n-particle contribution

$$ \begin{aligned} [b] d \Phi_{n+1} &= \cdots{\frac{d^3 \vec{p}_b}{2 (2 \pi)^3 |E_b|}} \; {\frac{d^3 \vec{p}_c}{2 (2 \pi)^3 |E_c|}} \;\\ &= \cdots {\frac{d^3 \vec{p}_a}{2 (2 \pi)^3 |E_a|}} \; {\frac{d^3 \vec{p}_c}{2 (2 \pi)^3 |E_c|}} {\frac{|E_a|}{|E_b|}} \quad {\hbox {at fixed}}{ p_a}\\ &= d \Phi_n \; {\frac{d p_{c,3} d p_T p_T d \phi}{2 (2 \pi)^3 |E_c|}} \; {\frac{1}{z}}\\ &= d \Phi_n \; {\frac{d p_{c,3} d p_T^2 d \phi}{4 (2 \pi)^3 |E_c|}} \; {\frac{1}{z}} \end{aligned} $$
(2.90)

is best expressed in terms of the energy fraction z and the azimuthal angle \(\phi.\)

In other words, separating the \((n+1)\)-particle space into an n-particle phase space and a \((1 \rightarrow 2)\) splitting phase space is possible without any approximation, and all we have to take care of is the correct prefactors in the new parameterization. The third direction of \(p_c\) we can translate into z in a convenient reference frame for the momenta appearing in the Sudakov decomposition

$$ p_a = \left(\begin{array}{c}|E_a| \\ 0 \\ 0 \\ p_{a,3} \end{array}\right) = |E_a| \left(\begin{array}{c}1 \\ 0 \\ 0 \\ 1+\fancyscript{O}(\theta) \end{array}\right) \quad\quad n = \left(\begin{array}{c} 1 \\ 0 \\ 0 \\ -1 \end{array}\right)\quad p_T = \left(\begin{array}{c}0 \\ p_{T,1} \\ p_{T,2} \\ 0 \end{array}\right). $$
(2.91)

This choice has the special feature that \(n^2=0\) which allows us to derive \(\beta\) from the momentum parameterization shown in Eq. 2.88 and the additional condition that \(p_c^2=0\)

$$ \begin{aligned}[b] p_c^2 &= \left( -(1-z) p_a - \beta n - p_T \right)^2 \\ &= (1-z)^2 p_a^2 + p_T^2 + 2 \beta (1-z) (n p_a) \\ &= (1-z)^2 p_a^2 + p_T^2 + 4 \beta (1-z) |E_a| (1+\fancyscript{O}(\theta)) {\stackrel{!}{\,=\,}} 0 \\ \Leftrightarrow \beta &\simeq - {\frac{p_T^2 + (1-z)^2 p_a^2}{4 (1-z) |E_a|}}. \\[5pt] \end{aligned} $$
(2.92)

Starting from Eq. 2.88 for the z component of \(p_c,\) expressing \(p_a\) and \(p_T\) following Eq. 2.91 and inserting this value for \(\beta\) gives us

$$\begin{aligned}[b] {\frac{d p_{c,3}}{d z}} &= {\frac{d}{d z}} \left[ -(1-z) |E_a|(1+\fancyscript{O}(\theta)) + \beta \right]\\ &= {\frac{d}{d z}} \left[ -(1-z) |E_a|(1+\fancyscript{O}(\theta)) - {\frac{p_T^2+(1-z)^2 p_a^2}{4 (1-z) |E_a|}} \right] \\ &= |E_a|(1+\fancyscript{O}(\theta)) - {\frac{p_T^2}{4 (1-z)^2 E_a}} + {\frac{p_a^2} {4 |E_a|}} \\ &= {\frac{|E_c|}{1-z}} (1+\fancyscript{O}(\theta)) - {\frac{\theta^2 z^2 E_c^2}{4 (1-z)^2 E_a}} + {\frac{z (1-z) E_a^2 \theta^2 + {\fancyscript{O}}(\theta^4)}{4 |E_a|}} \\ &= {\frac{|E_c|}{1-z}} + {\fancyscript{O}}(\theta) \quad \Leftrightarrow \quad {\frac{d p_{c,3}}{|E_c|}} \simeq {\frac{dz}{1-z}}. \end{aligned} $$
(2.93)

In addition to substituting \(d p_{c,3}\) by dz in Eq. 2.90 we also replace \(d p_T^2\) with \(d p_a^2\) according to

$$ {\frac{p_T^2}{p_a^2}} = {\frac{E_b^2 \theta_b^2}{z (1-z) E_a^2 \theta^2}} = {\frac{z^2 (1-z)^2 E_a^2 \theta^2}{z (1-z) E_a^2 \theta^2}} = z (1-z). $$
(2.94)

This gives us the final result for the separated collinear phase space

$$ \fbox{${ d \Phi_{n+1} = d \Phi_n \; {\dfrac{d z d p_a^2 d \phi}{4 (2 \pi)^3}} = d \Phi_n \; {\dfrac{d z d p_a^2}{4 (2 \pi)^2}} }$}\,, $$
(2.95)

where in the second step we assume a spherical symmetry.

Adding the transition matrix elements to this factorization of the phase space and ignoring the initial-state flux factor which is common to both processes we can now postulate a full factorization for one collinear emission and in the collinear approximation

$$ \begin{aligned} d \sigma_{n+1} &=\overline{|{\fancyscript{M}}_{n+1}|^2} \; d \Phi_{n+1} \\ &=\overline{|{\fancyscript{M}}_{n+1}|^2} \; d \Phi_n {\frac{d p_a^2 dz}{4(2 \pi)^2}} \\ &= {\frac{2 g_s^2}{p_a^2}} \; \hat{P}(z) \; \overline{|{\fancyscript{M}}_n|^2} \; d \Phi_n {\frac{d p_a^2 dz}{16 \pi^2}} \quad {\hbox {assuming}} \quad \fbox{${ \overline{|{\fancyscript{M}}_{n+1}|^2} = {\dfrac{2 g_s^2}{p_a^2}} \; \hat{P}(z) \; \overline{|{\fancyscript{M}}_n|^2} }$}\,.\\[2mm] \end{aligned} $$
(2.96)

This last step is an assumption we will now proceed to show step by step by constructing the appropriate splitting kernels \(\hat{P}(z)\) for all different quark and gluon configurations. If Eq. 2.96 holds true it means we can compute the \((n+1)\) particle amplitude squared from the n-particle case convoluted with the appropriate splitting kernel. Using \(d \sigma_n \sim \overline{|{\fancyscript{M}}_n|^2} \; d \Phi_n\) and \(g_s^2 = 4 \pi \alpha_s\) we can write this relation in its most common form

$$ \fbox{${\sigma_{n+1} = \int \sigma_n \; {\dfrac{d p_a^2}{p_a^2}} dz \; {\dfrac{\alpha_s}{2 \pi}} \; \hat{P}(z) }$}\,. $$
(2.97)

Reminding ourselves that relations of the kind \(\overline{|{\fancyscript{M}}_{n+1}|^2} = p \overline{|{\fancyscript{M}}_{n}|^2}\) can typically be summed, for example for the case of successive soft photon radiation in QED, we see that Eq. 2.97 is not the final answer. It does not include the necessary phase space factor \(1/n!\) from identical bosons in the final state which leads to the simple exponentiation.

As the first parton splitting in QCD we study a gluon splitting into two gluons, shown in Fig. 2.1. To compute its transition amplitude we parameterize all gluon momenta and polarizations. With respect to the scattering plane opened by \(\vec{p}_b\) and \(\vec{p}_c\) all three gluons have two transverse polarizations, one in the plane, \(\varepsilon^\|,\) and one perpendicular to it, \(\varepsilon^\perp.\) In the limit of small scattering angles, the three parallel as well as the three perpendicular polarization vectors are aligned. The perpendicular polarizations are also orthogonal to all three gluon momenta, while the polarizations in the plane are only proportional to their corresponding momenta, which altogether means for the three-vectors

$$ \begin{aligned} [b] ( \varepsilon_i^\| \varepsilon_j^\| ) &= -1 + {\fancyscript{O}}(\theta) \quad &( \varepsilon_i^\perp \varepsilon_j^\perp ) &= -1 \quad ( \varepsilon_i^\perp \varepsilon_j^\| ) = {\fancyscript{O}}(\theta) \notag \\ ( \varepsilon_i^\perp p_j ) &= 0 &( \varepsilon_j^\| p_j ) &= 0, \end{aligned} $$
(2.98)

with general \(i \ne j\) and exactly one and zero for \(i=j.\) The finite combinations between polarization vectors and momenta which we need are, in terms of the three-dimensional opening angles \(\angle (\vec{\varepsilon},\vec{p})\)

$$ \begin{aligned} [b] (\varepsilon_a^\| p_b ) &= - E_b \cos \angle (\vec{\varepsilon}_a^\|,\vec{p}_b ) = - E_b \cos \left( {\frac{\pi}{2}} - \theta_b \right) = + E_b \sin \theta_b\\ & \simeq + E_b \theta_b = z (1-z) E_a \theta \\ ( \varepsilon_b^\| p_c ) &= - E_c \cos \angle (\vec{\varepsilon}_b^\|,\vec{p}_c ) = - E_c \cos \left( {\frac{\pi}{2}} + \theta \right) = - E_c \sin \theta \\ &\simeq - E_c \theta = - (1-z) E_a \theta \\ (\varepsilon_c^\| p_b ) &= - E_b \cos \angle (\vec{\varepsilon}_c^\|,\vec{p}_b ) = - E_b \cos \left( {\frac{\pi}{2}} - \theta \right) = + E_b \sin \theta \\ &\simeq + E_b \theta = z E_a \theta . \end{aligned} $$
(2.99)

Using these kinematic relations we can tackle the actual splitting amplitude. For three gluons the splitting amplitude will be proportional to the vertex, now switching back to the symmetric definition of all incoming momenta

$$\begin{aligned} V_{ggg} {=}\,&\, i g f^{abc} \; \varepsilon_a^\alpha \varepsilon_b^\beta \varepsilon_c^\gamma [ g_{\alpha \beta} (p_a - p_b)_\gamma \\ & + g_{\beta \gamma} (p_b - p_c)_\alpha + g_{\gamma \alpha} (p_c - p_a)_\beta]\\ {=}\,&\, i g f^{abc} \; \varepsilon_a^\alpha \varepsilon_b^\beta \varepsilon_c^\gamma [ g_{\alpha \beta} (-p_c - 2 p_b)_\gamma \\ & + g_{\beta \gamma} (p_b - p_c)_\alpha + g_{\gamma \alpha} (2 p_c + p_b)_\beta ] \quad && {\hbox {with}} \; p_a = - p_b - p_c \\ {=}\,&\, i g f^{abc} \; \left[ -2(\varepsilon_a \varepsilon_b) (\varepsilon_c p_b)+ (\varepsilon_b \varepsilon_c) (\varepsilon_a p_b)\right. \\ & \left. \mathop{-}\, (\varepsilon_b \varepsilon_c) (\varepsilon_a p_c) + 2 (\varepsilon_c \varepsilon_a) (\varepsilon_b p_c) \right] && {\hbox {with}} \; (\varepsilon_{\!j} p_{\!j}) = 0 \\ {=}& - 2 i g f^{abc} \left[ (\varepsilon_a \varepsilon_b) (\varepsilon_c p_b) -(\varepsilon_b \varepsilon_c) (\varepsilon_a p_b) \right. \\ & \left.\mathop{-}\,(\varepsilon_c \varepsilon_a) (\varepsilon_b p_c) \right] && {\hbox {with}}\; (\varepsilon_a p_c) = - (\varepsilon_a p_b) \\ {=}\,& - 2 i g f^{abc} \; \left[(\varepsilon_a \varepsilon_b) (\varepsilon_c^\parallel p_b) -(\varepsilon_b \varepsilon_c) (\varepsilon_a^\parallel p_b)\right. \\ & \left. \mathop{-}\,(\varepsilon_c \varepsilon_a) (\varepsilon_b^\parallel p_c) \right] && {\hbox {with}} \; (\varepsilon_i^\perp p_{\!j}) = 0. \end{aligned} $$
(2.100)

Squaring the splitting matrix element to compute the \((n+1)\) and n particle matrix elements squared for the unpolarized case gives us

$$ \begin{aligned} \overline{|{\fancyscript{M}} _{n+1} |^2} &= {\frac{1}{2}} \; \left( {\frac{1}{p_a^2}} \right)^2 \; 4 g_s^2 \; {\frac{1} {N_c^2-1}} \; {\frac{1}{N_a}} \; \left[ \sum_{\rm{3\,terms}} \pm f^{abc} \; (\varepsilon \cdot \varepsilon) (\varepsilon \cdot p) \right]^2 \; \overline{| {\fancyscript{M}}_n |^2} \\ &= {\frac{2g_s^2}{p_a^2}} \; {\frac{f^{abc} f^{abc}}{N_c^2-1}} \; {\frac{1}{N_a}} \; \left[ \sum {\frac{(\varepsilon \cdot \varepsilon)^2 (\varepsilon \cdot p)^2}{p_a^2}} \right] \; \overline{| {\fancyscript{M}}_n |^2}, \end{aligned} $$
(2.101)

where the sum originally includes the three terms in the brackets of Eq. 2.100. Each term in this sum is symmetric in two indices but gets multiplied with the anti-symmetric color factor. The final result will only be finite if we square each term individually as a product of two symmetric and two anti-symmetric terms. In other words, the sum over the external gluons becomes an incoherent polarization sum.

Going through all possible combinations we know what can contribute inside the brackets of Eq. 2.100: \((\varepsilon^\|_a \varepsilon^\|_b)\) as well as \((\varepsilon^\perp_a \varepsilon^\perp_b)\) can be combined with \((\varepsilon^\|_c p_b);\) \((\varepsilon^\|_b \varepsilon^\|_c)\) or \((\varepsilon^\perp_b \varepsilon^\perp_c)\) with \((\varepsilon^\|_a p_b);\) and last but not least we can combine \((\varepsilon^\|_a \varepsilon^\|_c)\) and \((\varepsilon^\perp_a \varepsilon^\perp_c)\) with \((\varepsilon^\|_b p_c).\) These six combinations contribute to the splitting matrix element as

$$ \left.\begin{array}{lll@{\quad}|@{\quad}r@{\quad}|@{\quad}rr} \varepsilon_a & \varepsilon_b & \varepsilon_c & \pm (\varepsilon \cdot \varepsilon) (\varepsilon \cdot p) & {\frac{(\varepsilon \cdot \varepsilon)^2 (\varepsilon \cdot p)^2}{p_a^2}} & \\\hline \| & \| & \| & &\\ \perp & \perp & \| & {(-1) (-z) E_a \theta} & {{\frac{z}{1-z}}} \\\hline \| & \| & \| &&\\ \| & \perp & \perp & {-(-1) (-z)(1-z) E_a \theta} & {z (1-z)} \\\hline \| & \| & \| &&\\ \perp & \| & \perp & {-(-1) (1-z) E_a \theta} & {{\frac{1-z}{z}}}\\ \end{array}\right. $$

These six cases correspond to four different polarizations of the three external gluons. For the coherent sum in Eq. 2.101 we find

$$ \begin{aligned} \overline{| {\fancyscript{M}}_{n+1} |^2} &= {\frac{2 g_s^2}{p_a^2}} \; N_c \; {\frac{1}{N_a}} \; \left[ \sum {\frac{(\varepsilon \cdot \varepsilon)^2 (\varepsilon \cdot p)^2}{p_a^2}} \right] \; \overline{| {\fancyscript{M}}_n |^2} \quad &&{\hbox {with}} \; f^{abc} f^{abd} = N_c \delta^{cd} \\ &= {\frac{2 g_s^2} {p_a^2}} \; N_c \; \left[ {\frac{z}{1-z}} + {\frac{1-z} {z}} + z (1-z) \right] \; \overline{| {\fancyscript{M}}_n |^2} \quad &&{\hbox {with}} \; N_a = 2 \\ &\equiv {\frac{2 g_s^2}{p_a^2}} \; \hat{P}_{g \leftarrow g}(z) \; \overline{| {\fancyscript{M}}_n |^2}\\ \end{aligned} $$
$$ \begin{aligned} & \Leftrightarrow \quad \fbox{${\hat{P}_{g \leftarrow g}(z) = C_A \left[ {\dfrac{z}{1-z}} + {\dfrac{1-z}{z}} + z (1-z) \right] }$}\,, \end{aligned} $$
(2.102)

using \(C_A = N_c\) and averaging over the color \((N_c^2-1)\) and polarization \(N_a\) of the mother particle a . The factor \(1/2\) in the first line takes into account that for two final state gluons the \((n+1)\)-particle phase space is only half its usual size. The form of the splitting kernel is symmetric when we exchange the two gluons z and \((1-z).\) It diverges if either of the gluons become soft. The notation \(\hat{P}_{i \leftarrow j} \sim \hat{P}_{ij}\) is inspired by a matrix notation which we can use to multiply the splitting matrix from the right with the incoming parton vector to get the final parton vector. Following the logic described above, with this calculation we prove that the factorized form of the \((n+1)\)-particle matrix element squared in Eq. 2.96 holds for gluons only.

The same kind of splitting kernel we can compute for a gluon into two quarks and a quark into a quark and a gluon

$$ g(p_a) \rightarrow q(p_b) + \bar{q}(p_c) \quad \quad {\hbox {and}} \quad \quad q(p_a) \rightarrow q(p_b) + g(p_c). $$
(2.103)

Both splittings include the quark–quark–gluon vertex, coupling the gluon current to the quark and antiquark spinors. The spinors of the massless quark \(u(p_b)\) and the massless antiquark \({\it v}(p_c)\) we can write in terms of two-component spinors

$$ \begin{aligned} [b] u(p) = \sqrt{E}\left(\begin{array}{l}\chi_\pm \\ \pm \chi_\pm\end{array}\right) \quad {\hbox {with}} \quad & \chi_+ = \left(\begin{array}{l}1 \\ \theta/2 \end{array}\right) \quad\quad \hbox {(spin up)} \\ & \chi_- = \left(\begin{array}{l}- \theta/2 \\ 1 \end{array}\right)\quad{\hbox {(spin down)}}. \end{aligned} $$
(2.104)

For the massless antiquark we need to replace \(\theta \rightarrow -\theta\) and take into account the different relative spin–momentum directions \((\sigma \hat{p}),\) leading to the additional sign in the lower two spinor entries. The antiquark spinors then become

$$ \begin{aligned} {\it v}(p) &= - i \sqrt{E} \left(\begin{array}{l}\mp \varepsilon \chi_\pm \\ \varepsilon \chi_\pm \end{array}\right) \quad {\hbox {with}} \quad & \chi_+ &= \left(\begin{array}{l}1 \\ -\theta/2 \end{array}\right) & \varepsilon \chi_+ &= \left(\begin{array}{l}-\theta/2 \\ -1 \end{array}\right) &{\hbox {(spin up)}} \\ & & \chi_- &= \left(\begin{array}{l}\theta/2 \\ 1 \end{array}\right) \quad &\varepsilon \chi_- &= \left(\begin{array}{l}1 \\ -\theta/2 \end{array}\right) & {\hbox {(spin down)}}.\end{aligned} $$
(2.105)

Our calculations we again limit to the leading terms in the small scattering angle \(\theta.\) In addition to the fermion spinors, for the coupling to a gluonic current we need the Dirac matrices which we can conveniently express in terms of the Pauli matrices defined in Sect. 1.1.2

$$ \gamma^0 = \left(\begin{array}{rr}\hbox{1\hspace{-3pt}I} & 0 \\ 0 & -\hbox{1\hspace{-3pt}I} \end{array}\right) \quad\quad \gamma^j = \left(\begin{array}{cl}0 & \sigma^j \\ -\sigma^j & 0 \end{array}\right)\quad \Rightarrow \quad \gamma^0 \gamma^0 = \hbox{1\hspace{-3pt}I} \quad\quad \gamma^0 \gamma^j = \left(\begin{array}{ll}0 & \sigma^j \\ \sigma^j & 0 \end{array}\right) $$
(2.106)

In the notation introduced in Eq. 2.102 we first compute the splitting kernel \(\hat{P}_{q \leftarrow g},\) sandwiching the qqg vertex between an outgoing quark \(\bar{u}_\pm(p_b)\) and an outgoing antiquark \(v_\pm(p_a)\) for all possible spin combinations. We start with all four gluon polarizations, i.e. all four gamma matrices, between two spin-up quarks and their spinors written out in Eqs. 2.104 and 2.105

$$\begin{aligned}[b] {\frac{\bar{u}_+(p_b) \gamma^0 {\it v}_-(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{llll}1&&& \\ &1&& \\ &&1& \\ &&&1\end{array}\right) \left(\begin{array}{c}1 \\ -\theta_c/2 \\ 1 \\ -\theta_c/2 \end{array}\right)\\ & = \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}1 \\ -\theta_c/2 \\ 1 \\ -\theta_c/2 \end{array}\right)= 2 \\ {\frac{\bar{u}_+(p_b) \gamma^1 {\it v}_-(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b} {2}} \right) \left(\begin{array}{llll}&&&1 \\ &&1& \\ &1&& \\ 1&&& \end{array}\right)\left(\begin{array}{c}1 \\ -\theta_c/2 \\ 1 \\ -\theta_c/2 \end{array}\right)\\ &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}-\theta_c/2 \\ 1 \\ -\theta_c/2 \\ 1 \end{array}\right) = \theta_b - \theta_c \\ {\frac{\bar{u}_+(p_b) \gamma^2 {\it v}_-(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{llll}&&&-i \\ &&i& \\ &-i&& \\ i&&& \end{array}\right)\left(\begin{array}{c}1 \\ -\theta_c/2 \\ 1 \\ -\theta_c/2 \end{array}\right)\\ &= i \left( 1, {\frac{\theta_b} {2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}\theta_c/2 \\ 1 \\ \theta_c/2 \\ 1 \end{array}\right) = i ( \theta_b + \theta_c) \\ {\frac{\bar{u}_+(p_b) \gamma^3 {\it v}_-(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{llll}&&1& \\ &&&-1 \\ 1&&& \\ &-1&& \end{array}\right)\left(\begin{array}{c}1 \\ -\theta_c/2 \\ 1 \\ -\theta_c/2 \end{array}\right)\\ &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}1 \\ \theta_c/2 \\ 1 \\ \theta_c/2 \end{array}\right)= 2, \end{aligned} $$
(2.107)

Somewhat surprisingly the unphysical scalar and longitudinal gluon polarizations seem to contribute to this vertex. However, after adding the two unphysical degrees of freedom they cancel because of the form of our metric. For transverse gluons we can compute this vertex factor also for the other diagonal spin combination

$$\begin{aligned}[b] {\frac{\bar{u}_-(p_b) \gamma^1 {\it v}_+(p_c)}{-iE}} &= \left( -{\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}}, -1 \right) \left(\begin{array}{llll}&&&1 \\ &&1& \\ &1&& \\ 1&&& \end{array}\right)\left(\begin{array}{c}\theta_c/2 \\ 1 \\ -\theta_c/2 \\ -1 \end{array}\right) \\ &=\left( -{\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}}, -1 \right) \left(\begin{array}{c}-1 \\ -\theta_c/2 \\ 1 \\ \theta_c/2 \end{array}\right)= \theta_b - \theta_c \\ {\frac{\bar{u}_-(p_b) \gamma^2 {\it v}_+(p_c)}{-iE}} &= \left( -{\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}}, -1 \right) \left(\begin{array}{llll} &&&-i \\ &&i& \\ &-i&& \\ i&&& \end{array}\right)\left(\begin{array}{c}\theta_c/2 \\ 1 \\ -\theta_c/2 \\ -1 \end{array}\right)\\ &= i \left( -{\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}}, -1 \right) \left(\begin{array}{c}1 \\ -\theta_c/2 \\ -1 \\ \theta_c/2 \end{array}\right)= -i (\theta_b + \theta_c). \end{aligned} $$
(2.108)

Before collecting the prefactors for this gluon-quark splitting, we also need the same-spin case

$$\begin{aligned}[b] {\frac{\bar{u}_+(p_b) \gamma^1 {\it v}_+(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{llll}&&&1 \\ &&1& \\ &1&& \\ 1&&& \end{array}\right)\left(\begin{array}{c}\theta_c/2 \\ 1 \\ -\theta_c/2 \\ -1 \end{array}\right)\\\ &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}-1 \\ -\theta_c/2 \\ 1 \\ \theta_c/2 \end{array}\right)= 0 \\ {\frac{\bar{u}_+(p_b) \gamma^2 {\it v}_+(p_c)}{-iE}} &= \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b} {2}} \right) \left(\begin{array}{llll}&&&-i \\ &&i& \\ &-i&& \\ i&&&\end{array}\right) \left(\begin{array}{c}\theta_c/2 \\ 1 \\ -\theta_c/2 \\ -1 \end{array}\right)\\ &= i \left( 1, {\frac{\theta_b}{2}}, 1, {\frac{\theta_b}{2}} \right) \left(\begin{array}{c}1 \\ -\theta_c/2 \\ -1 \\ \theta_c/2 \end{array}\right)= 0, \end{aligned} $$
(2.109)

which vanishes. The gluon current can only couple to two fermions via a spin flip. For massless fermions this means that the gluon splitting into two quarks involves two quark spin cases, each of them coupling to two transverse gluon polarizations. Keeping track of all the relevant factors our vertex function for the splitting \(g \rightarrow q \bar{q}\) becomes for each of the two quark spins

$$ \begin{aligned} \displaystyle{ V_{qqg} } & \displaystyle{ = -i g_s T^a \; \bar{u}_\pm(p_b) \gamma_\mu \varepsilon_a^\mu {\it v}_\mp(p_c) \equiv -i g_s T^a \; \varepsilon_a^j \; F_\pm^{(j)}} \\ &\displaystyle{{\frac{|F^{(1)}_+|^2}{p_a^2}} ={\frac{|F^{(1)}_-|^2}{p_a^2}} = {\frac{E_b E_c ( \theta_b - \theta_c )^2}{p_a^2}} = {\frac{E_a^2 z (1-z) (1-z-z)^2 \theta^2}{E_a^2 z (1-z) \theta^2}} = (1-2z)^2} \\ &\displaystyle{ {\frac{|F^{(2)}_+|^2}{p_a^2}} ={\frac{|F^{(2)}_-|^2}{p_a^2}} = {\frac{E_b E_c ( \theta_b + \theta_c )^2}{p_a^2}} = {\frac{E_a^2 z (1-z) (1-z+z)^2 \theta^2}{E_a^2 z (1-z) \theta^2}} = 1,} \end{aligned} $$
(2.110)

omitting irrelevant factors i and \((-1)\) which will drop out once we compute the absolute value squared. In complete analogy to the gluon splitting case we can factorize the \((n+1)\)-particle matrix element into

$$ \begin{aligned}[b] \overline{| {\fancyscript{M}}_{n+1} |^2} &= \left( {\frac{1} {p_a^2}} \right)^2 \; g_s^2 \; {\frac{{\hbox{Tr}} T^a T^a}{N_c^2-1}} \; {\frac{1}{N_a}} \; \left[ |F^{(1)}_+|^2 + |F^{(1)}_-|^2 + |F^{(2)}_+|^2 + |F^{(2)}_-|^2 \right] \; \overline{| {\fancyscript{M}}_n |^2} \\ &= {\frac{g_s^2}{p_a^2}} \; T_R {\frac{N_c^2-1}{N_c^2-1}} \; \left[ (1-2z)^2 + 1 \right] \; \overline{| {\fancyscript{M}}_n |^2} \quad {\hbox {with}} \; {\hbox{Tr}} T^a T^b = T_R \delta^{ab}, \; N_a = 2 \\ &= {\frac{2 g_s^2}{p_a^2}} \; T_R \; \left[ z^2 + (1-z)^2 \right] \; \overline{| {\fancyscript{M}}_n |^2} \\ &\equiv {\frac{2 g_s^2}{p_a^2}} \hat{P}_{q \leftarrow g}(z) \overline{| {\fancyscript{M}}_n |^2} \\ &\Leftrightarrow \quad \fbox{${ \hat{P}_{q \leftarrow g}(z) = T_R \left[ (z^2 + (1-z)^2 \right]}$ }\,, \end{aligned} $$
(2.111)

with \(T_R = 1/2\). In the first line we implicitly assume that the internal quark propagator can be written as something like \(u \bar{u}/p_a^2\) and we only need to consider the denominator. This splitting kernel is again symmetric in z and \((1-z)\) because QCD does not distinguish between the outgoing quark and the outgoing antiquark.

The third splitting we compute is gluon radiation off a quark, i.e. \(q(p_a) \rightarrow q(p_b) + g(p_c),\) sandwiching the qqg vertex between an outgoing quark \(\bar{u}_\pm(p_b)\) and an incoming quark \(u_\pm(p_a).\) From the splitting of a gluon into a quark-antiquark pair we already know that we can limit our analysis to the physical gluon polarizations and a spin flip in the quarks. Inserting the spinors from Eq. 2.104 and the two relevant gamma matrices gives us

$$\begin{aligned} {\frac{\bar{u}_+(p_b) \gamma^1 u_+(p_a)}{E}} &= \left( 1, {\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}} \right) \left(\begin{array}{llll}&&&1 \\ &&1& \\ &1&& \\ 1&&& \end{array}\right)\left(\begin{array}{c}1 \\ \theta_a^*/2 \\ 1 \\ \theta_a^*/2 \end{array}\right)\\ &= i \left( 1, {\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}} \right) \left(\begin{array}{c} \theta_a^*/2 \\ 1 \\ \theta_a^*/2 \\ 1 \end{array}\right)= \theta_b^* + \theta_a^* \end{aligned} $$
$$\begin{aligned}[b] {\frac{\bar{u}_+(p_b) \gamma^2 u_+(p_a)}{E}} &= \left( 1, {\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}} \right) \left(\begin{array}{llll}&&&-i \\ &&i& \\ &-i&& \\ i&&& \end{array}\right)\left(\begin{array}{c}1 \\ \theta_a^*/2 \\ 1 \\ \theta_a^*/2 \end{array}\right)\\ &= i \left( 1, {\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}} \right) \left(\begin{array}{c}- \theta_a^*/2 \\ 1 \\ -\theta_a^*/2 \\ 1 \end{array}\right)= i ( \theta_b^* - \theta_a^*), \end{aligned} $$
(2.112)

with the angles \(\theta_b^*\) and \(\theta_a^*\) relative to the final state gluon direction \(\vec{p}_c.\) Comparing to the situation shown in Fig. 2.1 for the angle relative to the scattered gluon we find \(\theta_b^* = \theta\) while for the incoming quark \(\theta_a^* = - \theta_c = -z \theta.\) As expected, the spin-down case gives the same result, modulo a complex conjugation

$$\begin{aligned}[b] {\frac{\bar{u}_-(p_b) \gamma^1 u_-(p_a)}{E}} &= \left( -{\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}}, -1 \right) \left(\begin{array}{llll}&&&1 \\ &&1& \\ &1&& \\ 1&&& \end{array}\right)\left(\begin{array}{c}-\theta_a^*/2 \\ 1 \\ \theta_a^*/2 \\ -1 \end{array}\right) \\ &= \left( -{\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}}, -1 \right) \left(\begin{array}{c}-1 \\ \theta_a^*/2 \\ 1 \\ -\theta_a^*/2 \end{array}\right) = \theta_a^* + \theta_b^* \\ {\frac{\bar{u}_-(p_b) \gamma^2 u_-(p_a)}{E}} &= \left( -{\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}}, -1 \right) \left(\begin{array}{llll}&&&-i \\ &&i& \\ &-i&& \\ i&&& \end{array}\right)\left(\begin{array}{c}-\theta_a^*/2 \\ 1 \\ \theta_a^*/2 \\ -1 \end{array}\right)\\ &= i \left( -{\frac{\theta_b^*}{2}}, 1, {\frac{\theta_b^*}{2}}, -1 \right) \left(\begin{array}{c}1 \\ \theta_a^*/2 \\ -1 \\ -\theta_a^*/2 \end{array}\right)= i ( \theta_a^* - \theta_b^* ). \end{aligned} $$
(2.113)

The vertex function for gluon radiation off a quark then becomes

$$ \begin{aligned} V_{qqg} & = -i g_s T^a \; \bar{u}_\pm(p_b) \gamma_\mu \varepsilon_a^\mu u_\pm(p_c) \equiv -i g_s T^a \; \varepsilon_a^j \; F_\pm^{(j)} \notag \\ &{\frac{|F^{(1)}_+|^2}{p_a^2}} ={\frac{|F^{(1)}_-|^2} {p_a^2}} = {\frac{E_a E_b (\theta_a^* + \theta_b^*)^2}{p_a^2}} = {\frac{E_a^2 z (z-1)^2 \theta^2} {E_a^2 z (1-z) \theta^2}} = (1-z) \\ &{\frac{|F^{(2)}|^2_+}{p_a^2}} ={\frac{|F^{(2)}|^2_-}{p_a^2}} = {\frac{E_a E_b (\theta_b^* - \theta_a^*)^2}{p_a^2}} = {\frac{E_a^2 z (1+z)^2 \theta^2}{E_a^2 z (1-z) \theta^2}} = {\frac{(1+z)^2}{1-z}}, \\[5pt] \end{aligned} $$
(2.114)

again dropping irrelevant prefactors. The factorized matrix element for this channel reads

$$ \begin{aligned} \overline{| {\fancyscript{M}}_{n+1} |^2} &= \left( {\frac{1}{p_a^2}} \right)^2 \; g_s^2 \; {\frac{{\hbox{Tr}} T^a T^a}{N_c}} \; {\frac{1}{N_a}} \; \left[ |F^{(1)}_+|^2 + |F^{(1)}_-|^2 + |F^{(2)}_+|^2 + |F^{(2)}_-|^2 \right] \; \overline{| {\fancyscript{M}}_n |^2} \\ &= {\frac{g_s^2}{p_a^2}} \; {\frac{N_c^2-1}{2N_c}} \; {\frac{(1+z)^2 + (1-z)^2}{1-z}} \; \overline{| {\fancyscript{M}}_n |^2} \\ &= {\frac{2 g_s^2}{p_a^2}} \; C_F \; {\frac{1+z^2}{1-z}} \; \overline{| {\fancyscript{M}}_n |^2} \\ &\equiv {\frac{2 g_s^2}{p_a^2}} \hat{P}_{q \leftarrow g}(z) \overline{| {\fancyscript{M}}_n |^2} \\ &\Leftrightarrow \quad \fbox{${ \hat{P}_{q \leftarrow q}(z) = C_F {\dfrac{1+z^2}{1-z}}}$}\,. \end{aligned} $$
(2.115)

The averaging factor \(1/N_a=2\) now is the number of quark spins in the intermediate state . Just switching \(z \leftrightarrow (1-z)\) we can read off the kernel for a quark splitting written in terms of the final state gluon

$$ \fbox{${\hat{P}_{g \leftarrow q}(z) = C_F {\dfrac{1+(1-z)^2}{z}} }$}\,. $$
(2.116)

This result finalizes our calculation of all QCD splitting kernels \(\hat{P}_{i \leftarrow j}(z)\) between quarks and gluons. As alluded to earlier, similar to ultraviolet divergences which get removed by counter terms these splitting kernels are universal, i.e. they do not depend on the hard n-particle matrix element which is part of the original \((n+1)\)-particle process. The four results we show in Eqs. 2.102, 2.111, 2.115, and 2.116. This means that by construction of the kernels \(\hat{P}\) we have shown that the collinear factorization Eq. 2.97 holds at this level in perturbation theory.

Before using this splitting property to describe QCD effects at the LHC we need to look at the splitting of partons in the initial state, meaning \(|p_a^2|, p_c^2 \ll |p_b^2|\) where \(p_b\) is the momentum entering the hard interaction. The difference to the final state splitting is that now we can consider the split parton momentum \(p_b = p_a - p_c\) as a t-channel diagram, so we already know \(p_b^2 = t <0\) from our usual Mandelstam variables argument. This space-like splitting version of Eq. 2.88 we can solve for \(p_b^2\)

$$ \begin{aligned} t \equiv p_b^2 &= (-z p_a + \beta n + p_T )^2 \\ &= p_T^2 - 2 z \beta (p_a n) \quad &&\hbox{\hbox {with}} \; p_a^2 = n^2 = (p_a p_T) = (n p_T) = 0 \\ &= p_T^2 + {\frac{p_T^2 z}{1-z}} \quad &&\hbox{using Eq. 2.92} \\ &= {\frac{p_T^2}{1-z}} = -{\frac{p_{T,1}^2+p_{T,2}^2}{1-z}} <0. \end{aligned} $$
(2.117)

The calculation of the splitting kernels and matrix elements is the same as for the time-like case, with the one exception that for splitting in the initial state the flow factor has to be evaluated at the reduced partonic energy \(E_b = z E_a\) and that the energy fraction entering the parton density needs to be replaced by \(x_b \rightarrow z x_b.\) The factorized matrix element for initial state splitting then reads just like Eq. 2.97

$$ \sigma_{n+1} = \int \sigma_n \; {\frac{d t}{t}} dz \; {\frac{\alpha_s}{2 \pi}} \; \hat{P}(z). $$
(2.118)

How to use this property to make statements about the quark and gluon content in the proton will be the focus of the next section.

2.3.3 DGLAP Equation

What we now know about collinear parton splitting we can use to describe incoming partons. For example in \(pp \rightarrow Z\) production incoming partons inside the protons transform into each other via collinear splitting until they enter the Z production process as quarks. Taking Eq. 2.118 seriously the parton density we insert into Eq. 2.28 then depends on two parameters, the final energy fraction and the virtuality \(f(x_n,-t_n)\) where the second parameter t is new compared to the purely probabilistic picture in Eq. 2.28. It cannot be neglected unless we convince ourselves that it is unphysical. As we will see later it corresponds exactly to the artificial renormalization scale which appears when we re-sum the scaling logarithms which appear in counter terms.

More quantitatively, we start with a quark inside the proton with an energy fraction \(x_0,\) as it enters the hadronic phase space integral shown in Sect. 2.1.4. Since this quark is confined inside the proton it can only have small transverse momentum, which means its four-momentum squared \(t_0\) is negative and its absolute value \(|t_0|\) is small. The variable t we call virtuality because for the incoming partons which if on-shell have \(p^2=0\) it gives the distance to the mass shell. Let us simplify our kinematic argument by assuming that there exists only one splitting, namely successive gluon radiation \(q \rightarrow q g\) off an incoming quark, where the outgoing gluons are not relevant

In that case each collinear gluon radiation will decrease the quark energy \(x_{j+1}\,{<}\, x_j\) and through recoil increase its virtuality \(|t_{j+1}| = -t_{j+1} > -t_j = |t_j|.\)

From the last section we know what the successive splitting means in terms of splitting probabilities. We can describe how the parton density \(f(x,-t)\) evolves in the \((x-t)\) plane as depicted in Fig. 2.2. The starting point \((x_0,t_0)\) is at least probabilistically given by the energy and kind of the hadron, for example the proton. For a given small virtuality \(|t_0|\) we start at some kind of fixed \(x_0\) distribution. We then interpret each branching as a step strictly downward in \(x_j \rightarrow x_{j+1}\) where the t value we assign to this step is the ever increasing virtuality \(|t_{j+1}|\) after the branching. Each splitting means a synchronous shift in x and t, so the actual path in the \((x-t)\) plane really consists of discrete points. The probability of such a splitting to occur is given by \(\hat{P}_{q \leftarrow q}(z) \equiv \hat{P}(z)\) as it appears in Eq. 2.118

$$ {\frac{\alpha_s}{2\pi}} \; \hat{P}(z) \; {\frac{dt}{t}} \; dz. $$
(2.119)

In this picture we consider this probability a smooth function in t and z. At the end of the path we will probe this evolved parton density, where \(x_n\) and \(t_n\) enter the hard scattering process and its energy-momentum conservation.

Fig. 2.2
figure 2

Path of an incoming parton in the \((x-t)\) plane. Because we define t as a negative number its axis is labelled \(|t|\)

When we convert a partonic into a hadronic cross section numerically we need to specify the probability of the parton density \(f(x,-t)\) residing in an infinitesimal square \([x_j,x_j+\delta x]\) and, if this second parameter has anything to do with physics, \([|t_j|,|t_j|+\delta t].\) Using our \((x,t)\) plane we compute the flow into this square and out of this square which together defines the net shift in f in the sense of a differential equation, similar to the derivation of Gauss’ theorem for vector fields inside a surface

$$ \delta f_{\rm{in}} - \delta f_{\rm{out}} = \delta f(x,-t). $$
(2.120)

The incoming and outgoing flow we compute from the history of the \((x,t)\) evolution. At this stage our picture becomes a little subtle; the way we define the path between two splittings in Fig. 2.2 it can enter and leave the square either vertically or horizontally, but we have to decide which we choose. If we define a splitting as a vertical drop in x at the target value \(t_{j+1}\) an incoming path hitting the square at some value t can come from any x value above the square. Using this convention and following the fat solid lines in Fig. 2.2 the vertical flow into (and out of) the square \((x,t)\) square is proportional to \(\delta t\)

$$ \begin{aligned} \delta f_{\rm{in}}(-t) &= \delta t \; \left( {\frac{\alpha_s \hat{P}}{2\pi t}}\otimes f \right)(x,-t) = {\frac{\delta t}{t}} \int\nolimits_x^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(z) f\left({\frac{x}{z}},-t\right) \\ &= {\frac{\delta t}{t}} \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(z) f\left({\frac{x}{z}},-t\right) \quad {\hbox {assuming}} \; f(x',-t)=0 \; {\hbox {for}} \; x^{\prime} > 1, \end{aligned} $$
(2.121)

where \(\delta t\) is the size of the interval covered by the virtuality value t. We use the definition of a convolution

$$ \begin{aligned} [b] (f \otimes g)(x) &= \int\nolimits_0^1 dx_1 dx_2 f(x_1) g(x_2) \; \delta(x - x_1 x_2) \\ &= \int\nolimits_0^1 {\frac{dx_1}{x_1}} f(x_1) g\left( {\frac{x}{x_1}} \right) = \int\nolimits_0^1 {\frac{dx_2}{x_2}} f\left( {\frac{x}{x_2}} \right) g(x_2). \end{aligned} $$
(2.122)

The outgoing flow we define in complete analogy, again leaving the infinitesimal square vertically. Following the fat solid line in Fig. 2.2 the outgoing flow is also proportional to \(\delta t\)

$$ \delta f_{\rm{out}}(-t) = \delta t \; \int\nolimits_0^1 dy {\frac{\alpha_s \hat{P}(y)}{2 \pi t}} \; f(x,-t) = {\frac{\delta t}{t}} f(x,-t) \int\nolimits_0^1 dy \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(y). $$
(2.123)

The y integration, unlike the z integration for the incoming flow is not a convolution. This integration appears because we do not know the normalization of \(\hat{P}(z)\) distribution which we interpret as a probability. The reason why it is not a convolution is that for the outgoing flow we know the starting condition and integrate over the final configurations; this aspect will become important later. Combining Eqs. 2.121 and 2.123 we can compute the change in the parton density of the quarks as

$$\begin{aligned} \delta f(x,-t) &= {\frac{\delta t}{t}} \; \left[ \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(z) \; f\left({\frac{x}{z}},-t\right) - \int\nolimits_0^1 dy \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(y) \; f(x,-t) \right]\\ &= {\frac{\delta t}{t}} \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \left[ \hat{P}(z) - \delta(1-z) \int\nolimits_0^1 dy \hat{P}(y) \right] \; f\left({\frac{x} {z}},-t\right) \\ &\equiv {\frac{\delta t}{t}} \int\nolimits_x^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(z)_+ \; f\left({\frac{x}{z}},-t\right) \\ \Leftrightarrow \quad {\frac{\delta f(x,-t)}{\delta (-t)}} &= {\frac{1}{(-t)}} \int\nolimits_x^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \hat{P}(z)_+ \; f\left({\frac{x} {z}},-t\right)\!, \end{aligned} $$
(2.124)

again assuming \(f(x)=0\) for \(x>1,\) strictly speaking requiring \(\alpha_s\) to only depend on t but not on z, and using the specifically defined plus subtraction scheme

$$ \fbox{${F(z)_+ \equiv F(z) - \delta(1-z) \displaystyle\int\nolimits_0^1 dy \; F(y) }$}\,, $$
(2.125)

or equivalently

$$ \int\nolimits_0^1 dz \; {\frac{f(z)}{(1-z)_+}} = \int\nolimits_0^1 dz \; \left( {\frac{f(z)}{1-z}} - {\frac{f(1)}{1-z}} \right). $$
(2.126)

For the second definition we choose \(F(z) = 1/(1-z),\) multiply it with an arbitrary test function \(f(z)\) and integrate over z. In contrast to the original z integral the plus-subtracted integral is by definition finite in the limit \(z \rightarrow 1\) where some of the splitting kernels diverge. For example the quark splitting kernel including the plus prescription becomes of this form \(C_F ((1+z^2)/(1-z))_+.\) At this stage the plus prescription is simply a convenient way of writing a complicated combination of splitting kernels, but we will see that it also has a physics meaning.

We can check that the plus prescription indeed acts as a regularization technique for the parton densities. Obviously, the integral over \(f(z)/(1-z)\) is divergent at the boundary \(z \rightarrow 1,\) which we know we can cure using dimensional regularization . The special case \(f(z)=1\) illustrates how dimensional regularization of infrared divergences in the phase space integration Eq. 2.85 works

$$ \int\nolimits_0^1 dz \; {\frac{1} {(1-z)^{1-\varepsilon}}} = \int\nolimits_0^1 dz \; {\frac{1} {z^{1-\varepsilon}}} = {\frac{z^\varepsilon}{\varepsilon}} \Bigg|_0^1 = {\frac{1}{\varepsilon}} \quad {\hbox {with}} \; \varepsilon > 0, $$
(2.127)

for \(4+2 \varepsilon\) dimensions. This change in sign avoids the analytic continuation of the usual value \(n=4-2\varepsilon\) to \(\varepsilon < 0.\) The dimensionally regularized integral we can write as

$$ \begin{aligned} \int\nolimits_0^1 dz \; {\frac{f(z)}{(1-z)^{1-\varepsilon}}} =& \int\nolimits_0^1 dz \; {\frac{f(z)-f(1)}{(1-z)^{1-\varepsilon}}} + f(1) \int\nolimits_0^1 dz \; {\frac{1}{(1-z)^{1-\varepsilon}}} \\ =&\int\nolimits_0^1 dz \; {\frac{f(z)-f(1)}{1-z}} \left( 1 + {\fancyscript{O}}(\varepsilon) \right) + {\frac{f(1)}{\varepsilon}} \\ =& \int\nolimits_0^1 dz \; {\frac{f(z)}{(1-z)_+}} \left( 1 + {\fancyscript{O}}(\varepsilon) \right) + {\frac{f(1)}{\varepsilon}} \quad \hbox{\hbox {by definition}} \\ \int\nolimits_0^1 dz \; {\frac{f(z)}{(1-z)^{1-\varepsilon}}} - {\frac{f(1)}{\varepsilon}} =& \int\nolimits_0^1 dz \; {\frac{f(z)}{(1-z)_+}} \left( 1 + {\fancyscript{O}}(\varepsilon) \right)\!. \end{aligned} $$
(2.128)

The dimensionally regularized integral minus the pole, i.e. the finite part of the dimensionally regularized integral, is the same as the plus-subtracted integral modulo terms of the order \(\varepsilon.\) The third line in Eq. 2.128 shows that the difference between a dimensionally regularized splitting kernel and a plus-subtracted splitting kernel manifests itself as terms proportional to \(\delta(1-z).\) Physically, they represent contributions to a soft-radiation phase space integral.

Before we move on introducing a gluon density we can slightly reformulate the splitting kernel \(\hat{P}_{q \leftarrow q}\) in Eq. 2.115. If the plus prescription regularizes the pole at \(z \rightarrow 1,\) what happens when we include the numerator of the regularized function, e.g. the quark splitting kernel, as compared to leaving it out? The finite difference between these results is

$$ \begin{aligned} \left( {\frac{1+z^2}{1-z}} \right)_+ - (1+z^2) \; \left( {\frac{1}{1-z}}\right)_+ &= {\frac{1+z^2} {1-z}} - \delta(1-z) \int\nolimits_0^1 dy \; {\frac{1+y^2}{1-y}} \\ &\quad\enskip- {\frac{1+z^2}{1-z}} + \delta(1-z) \int\nolimits_0^1 dy \; {\frac{1+z^2}{1-y}} \\ & = - \delta(1-z) \int\nolimits_0^1 dy \; \left( {\frac{1+y^2}{1-y}} - {\frac{2}{1-y}} \right) \\ &= \delta(1-z) \int\nolimits_0^1 dy \; {\frac{y^2-1}{y-1}} \\ & = \delta(1-z) \int\nolimits_0^1 dy \; (y+1) = {\frac{3}{2}} \delta(1-z). \end{aligned} $$
(2.129)

We can therefore write the quark’s splitting kernel in two equivalent ways

$$ \fbox{${ P_{q \leftarrow q}(z) = C_F \left( {\dfrac{1+z^2}{1-z}} \right)_+ = C_F \left[ {\dfrac{1+z^2}{(1-z)_+}} + {\dfrac{3}{2}} \delta(1-z) \right] }$}\,. $$
(2.130)

The infinitesimal version of Eq. 2.124 is the Dokshitzer–Gribov–Lipatov—Altarelli–Parisi or DGLAP integro-differential equation which describes the scale dependence of the quark parton density. As we already know quarks do not only appear in \(q \rightarrow q\) splitting, but also in gluon splitting. Therefore, we generalize Eq. 2.124 to include the full set of QCD partons, i.e. quarks and gluons. This generalization involves a sum over all allowed splittings and the plus-subtracted splitting kernels. For the quark density on the left hand side it is

$$ \begin{aligned} \fbox{${ {\dfrac{d f_q(x,-t)}{d \log (-t)}} = -t \; {\dfrac{d f_q(x,-t)}{d (-t)}} = \displaystyle\sum\limits_{j=q,g} \int\nolimits_x^1 {\dfrac{dz} {z}} \; {\dfrac{\alpha_s}{2\pi}} \; P_{q \leftarrow j}(z) \; f_j\left({\dfrac{x}{z}},-t\right)}$}\,,\\[3pt]\end{aligned} $$
(2.131)

with \(P_{q \leftarrow j}(z) \equiv \hat{P}_{q \leftarrow j}(z)_+.\) Going back to Eq. 2.124 we add all relevant parton indices and splittings and arrive at

$$ \begin{aligned} \delta f_q(x,-t) &= {\frac{\delta t} {t}} \; \left[ \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s} {2\pi}} \hat{P}_{q \leftarrow q}(z) \; f_q\left({\frac{x}{z}},-t\right) \!+\! \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \hat{P}_{q \leftarrow g}(z) f_g\left({\frac{x}{z}},-t\right) \right.\\ & \left. - \int\nolimits_0^1 dy \; {\frac{\alpha_s}{2\pi}} \; \hat{P}_{q \leftarrow q}(y) \; f_q(x,-t) \right]. \end{aligned} $$
(2.132)

Of the three terms on the right-hand side the first and the third together define the plus-subtracted splitting kernel \(P_{q \leftarrow q}(z),\) just following the argument above. The second term is a proper convolution and the only term proportional to the gluon parton density. Quarks can be produced in gluon splitting but cannot vanish into it. Therefore, we have to identify the second term with \(P_{q \leftarrow g}\) in Eq. 2.131 without adding a plus-regulator

$$ \fbox{${ P_{q \leftarrow g}(z) \equiv \hat{P}_{q \leftarrow g}(z) = T_R \left[ z^2 + (1-z)^2 \right] }$}\,. $$
(2.133)

In principle, the splitting kernel \(\hat{P}_{g \leftarrow q}\) also generates a quark, in addition to the final state gluon. However, comparing this to the terms proportional to \(\hat{P}_{q \leftarrow q}\) they both arise from the same splitting, namely a quark density leaving the infinitesimal square in the \((x-t)\) plane via the splitting \(q \rightarrow qg.\) Including the additional \(\hat{P}_{g \leftarrow q}(y)\) would be double counting and should not appear, as the notation \(g \leftarrow q\) already suggests.

The second QCD parton density we have to study is the gluon density. The incoming contribution to the infinitesimal square is given by the sum of four splitting scenarios each leading to a gluon with virtuality \(-t_{j+1}\)

$$ \begin{aligned} \delta f_{\rm{in}}(-t) &= {\frac{\delta t}{t}} \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \Bigg[ \hat{P}_{g \leftarrow g}(z) \left( f_g\left({\frac{x}{z}},-t\right) + f_g\left({\frac{x}{1-z}},-t\right) \right) \\ & + \hat{P}_{g \leftarrow q}(z) \left( f_q\left({\frac{x}{z}},-t\right) + f_{\bar{q}} \left({\frac{x}{z}},-t\right) \right) \Bigg] \\ &= {\frac{\delta t}{t}} \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \Bigg[ 2 \hat{P}_{g \leftarrow g}(z) f_g\left({\frac{x}{z}},-t\right) \\ & + \hat{P}_{g \leftarrow q}(z) \left( f_q\left({\frac{x}{z}},-t\right) + f_{\bar{q}} \left({\frac{x}{z}},-t\right) \right) \Bigg], \end{aligned} $$
(2.134)

using \(P_{g \leftarrow g}(1-z) = P_{g \leftarrow g}(z).\) To leave the volume element in the (x,t) space a gluon can either split into two gluons or radiate one of \(n_f\) light-quark flavors. Combining the incoming and outgoing flows we find

$$ \begin{aligned} \delta f_g(x,-t) =& {\frac{\delta t}{t}} \int\nolimits_0^1 {\frac{dz}{z}} \; {\frac{\alpha_s}{2\pi}} \; \Bigg[ 2 \hat{P}_{g \leftarrow g}(z) f_g\left({\frac{x}{z}},-t\right)\\ & + \hat{P}_{g \leftarrow q}(z) \left( f_q\left({\frac{x}{z}},-t\right) + f_{\bar{q}} \left({\frac{x}{z}},-t\right) \right) \Bigg] \\ &- {\frac{\delta t}{t}} \; \int\nolimits_0^1 dy \; {\frac{\alpha_s}{2\pi}} \; \left[\hat{P}_{g \leftarrow g}(y)+ n_{\!f} \hat{P}_{q \leftarrow g}(y)\right] f_g(x,-t) \end{aligned} $$
(2.135)

Again, for the gluon density there exists a contribution to \(\delta f_{\rm{in}}\) proportional to \(f_q\) or \(f_{\bar{q}}\) which is not matched by the outgoing flow. On the other hand, from the quark case we already know how to deal with it, namely by defining without a second plus subtraction

$$ \fbox{$ P_{g \leftarrow q}(z) \equiv \hat{P}_{g \leftarrow q}(z) = C_F {\dfrac{1+(1-z)^2}{z}} $}\,. $$
(2.136)

This ensures that the off-diagonal contribution to the gluon density is taken into account following Eq. 2.131.

Three regularized splitting kernels entering the DGLAP equation we give in Eqs. 2.130, 2.133 and 2.136. The generic structure of the DGLAP equation implies that the two off-diagonal splitting kernels do not include any plus prescription \(\hat{P}_{i \leftarrow j} = P_{i \leftarrow j}.\) We could have expected this because these off-diagonal kernels are finite in the soft limit \(z \rightarrow 0,1.\) Applying a plus prescription would only have modified the splitting kernels at the isolated (zero-measure) point \(y=1\) which for a finite value of the integrand does not affect the integral on the right-hand side of the DGLAP equation.

The final splitting kernel \(P_{g \leftarrow g}\) from the diagonal relation on the right-hand side of Eq. 2.135 requires some more work. The y integral not involving the gluon parton densities we can explicitly carry out for the gluon splitting into a quark pair

$$ \begin{aligned}[b] - \int\nolimits_0^1 dy \; \frac{\alpha_s} {2\pi} \; n_{\!f} \; \hat{P}_{q \leftarrow g}(y) &= - \frac{\alpha_s} {2\pi} \; n_{\!f} \; T_R \; \int\nolimits_0^1 dy \; \left[ 1 - 2y + 2 y^2 \right]\\ &= - \frac{\alpha_s} {2\pi} \; n_{\!f} \; T_R \; \left[ y - y^2 + \frac{2 y^3} {3} \right]_0^1 \\ &= - \frac{2}{3}\; \frac{\alpha_s} {2\pi} \; n_{\!f} \; T_R . \end{aligned} $$
(2.137)

The second y integral for gluon radiation has to consist of a finite term and a term we can use to define the plus prescription for \(\hat{P}_{g \leftarrow g}\)

$$ \begin{aligned} \int\nolimits_0^1 dy \frac{\alpha_s} {2\pi} \hat{P}_{g \leftarrow g}(y) =& \frac{\alpha_s} {2\pi} C_A \; \int\nolimits_0^1 dy \; \left[ \frac{y} {1-y} + \frac{1-y} {y} + y(1-y) \right] \\=& \frac{\alpha_s} {2\pi} \; C_A \; \int\nolimits_0^1 dy \; \left[ \frac{2 y} {1-y} + y(1-y) \right] \\=& \frac{\alpha_s} {2\pi} \; C_A \; \int\nolimits_0^1 dy \; \left[ \frac{2 (y-1)} {1-y} + y(1-y) \right] +\frac{\alpha_s} {2\pi} \; C_A \; \int\nolimits_0^1 dy \; \frac{2} {1-y} \\=& \frac{\alpha_s} {2\pi} \; C_A \; \int\nolimits_0^1 dy \; \left[ -2 + y - y^2 \right] + \delta(1-z) \; \frac{\alpha_s} {2\pi} \; 2 C_A \; \int\nolimits_0^1 dy \; \frac{z} {1-y} \\ =& \frac{\alpha_s} {2\pi} \; C_A \; \left[ -2 + \frac{1} {2}- \frac{1} {3} \right] +\delta(1-z) \; \frac{\alpha_s} {2\pi} \; 2 C_A \; \int\nolimits_0^1 dy \; \frac{z} {1-y} \\ =& - \frac{\alpha_s} {2\pi} \; \frac{11} {6} \; C_A \; + \delta(1-z) \; \frac{\alpha_s} {2\pi} \; 2 C_A \; \int\nolimits_0^1 dy \; \frac{z} {1-y}. \end{aligned} $$
(2.138)

The contribution proportional to \(\delta(1-z)\) is necessary to give a finite result and is absorbed into the plus prescription of the convoluted \(\hat{P}_{g \leftarrow g}(z),\) including the factor two in front. This defines the last remaining regularized splitting kernel

$$\begin{aligned} \fbox{$P_{g \leftarrow g}(z) = 2 C_A \left( \frac{z} {(1-z)_+} + \frac{1-z} {z} + z(1-z) \right) + \frac{11 C_A} {6} \delta(1-z) + \frac{2 n_f T_R} {3} \delta(1-z)$} \\[5pt] \end{aligned} $$
(2.139)

and concludes our computation of all four regularized splitting functions which appear in the DGLAP equation Eq. 2.131.

Before discussing and solving the DGLAP equation, let us briefly recapitulate: for the full quark and gluon particle content of QCD we have derived the DGLAP equation which describes a (so-called factorization) scale dependence of the quark and gluon parton densities. The universality of the splitting kernels is obvious from the way we derive them—no information on the n-particle process ever enters the derivation.

The DGLAP equation is formulated in terms of four splitting kernels of gluons and quarks which are linked to the splitting probabilities, but which for the DGLAP equation have to be regularized. With the help of a plus-subtraction the kernels \(P_{i \leftarrow j}(z)\) are all finite, including in the soft limit \(z \to 1.\) However, splitting kernels are only regularized when needed, so the finite off-diagonal quark-gluon and gluon-quark splittings are unchanged. This means the plus prescription really acts as an infrared renormalization, moving universal infrared divergences into the definition of the parton densities. The original collinear divergence has vanished as well.

The only approximation we make in the computation of the splitting kernels is that in the y integrals we implicitly assume that the running coupling \(\alpha_s\) does not depend on the momentum fraction. In its standard form and in terms of the factorization scale \(\mu_F^2 \equiv -t\) the DGLAP equation reads

$$\begin{aligned} \fbox{$\frac{d f_i(x,\mu_F)} {d \log \mu_F^2}= \sum\nolimits_j \int\nolimits_x^1 \; \frac{dz} {z} \; \frac{\alpha_s} {2 \pi} \; P_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},\mu_F\right)= \frac{\alpha_s}{2 \pi} \sum\nolimits_j \left( P_{i \leftarrow j} \otimes f_j \right) (x,\mu_F)$}\,. \\[5pt] \end{aligned} $$
(2.140)

2.3.4 Parton Densities

Solving the integro-differential DGLAP equation Eq. 2.140 for the parton densities is clearly beyond the scope of this writeup. Nevertheless, we will sketch how we would approach this. This will give us some information on the structure of its solutions which we need to understand the physics of the DGLAP equation.

One simplification we can make in this illustration is to postulate eigenvalues in parton space and solve the equation for them. This gets rid of the sum over partons on the right hand side. One such parton density is the non-singlet parton density, defined as the difference of two parton densities \(f_q^{{\rm NS}} = (f_u - f_{\bar{u}}).\) Since gluons cannot distinguish between quarks and antiquarks, the gluon contribution to their evolution cancels, at least in the massless limit. This will be true at arbitrary loop order, since flavor \(SU(3)\) commutes with the QCD gauge group. The corresponding DGLAP equation with leading order splitting kernels now reads

$$ \frac{d f_q^{{\rm NS}}(x,\mu_F)} {d \log \mu_F^2} = \int\nolimits_x^1 \; \frac{dz} {z} \; \frac{\alpha_s} {2 \pi} \; P_{q \leftarrow q}(z) \; f_q^{{\rm NS}}\left( \frac{x} {z},\mu_F\right) =\frac{\alpha_s} {2 \pi} \; \left( P_{q \leftarrow q} \otimes f_q^{{\rm NS}} \right)(x,\mu_F) . $$
(2.141)

To solve it we need some kind of transformation which disentangles the convolution, namely a Mellin transform . Starting from a function \(f(x)\) of a real variable x we define the Mellin transform into moment space m

$$\begin{aligned}[b] \fbox{${\fancyscript{M}}[f](m) \equiv \displaystyle\int\limits_0^1 dx x^{m-1} f(x) $} \quad\quad f(x) = \frac{1} {2 \pi i} \int\nolimits_{c - i \infty}^{c - i \infty} dn \; \frac{{\fancyscript{M}}[f](m)}{x^n}.\\ \end{aligned} $$
(2.142)

The integration contour for the inverse transformation lies to the right of all singularities of the analytic continuation of \({\fancyscript{M}}[f](m),\) which fixes the offset c. The Mellin transform of a convolution is the product of the two Mellin transforms, which gives us the transformed DGLAP equation

$$ \begin{aligned} [b] {\fancyscript{M}}[P_{q \leftarrow q} \otimes f_q^{{\rm NS}}](m) &= {\fancyscript{M}}\left[\int\nolimits_0^1 \frac{dz}{z} P_{q \leftarrow q} \left(\frac{x} {z} \right) f_q^{{\rm NS}}(z) \right](m) \\ &= {\fancyscript{M}}[P_{q \leftarrow q}](m) \; {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_F) \\ \frac{d {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_F)} {d \log \mu_F^2} &= \frac{\alpha_s} {2 \pi} \; {\fancyscript{M}}[P_{q \leftarrow q}](m) \; {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_F) \end{aligned} $$
(2.143)

and its solution

$$ \begin{aligned}[b] {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_F) &= {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_{F,0}) \; \exp\, \left( \frac{\alpha_s} {2 \pi} \; {\fancyscript{M}}[P_{q \leftarrow q}](m) \log \frac{\mu_F^2} {\mu_{F,0}^2} \right) \\&= {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_{F,0}) \; \left(\frac{\mu_F^2} {\mu_{F,0}^2} \right)^{\frac{\alpha_s} {2 \pi} {\fancyscript{M}}[P_{q \leftarrow q}](m)}\\ &\equiv {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_{F,0}) \; \left( \frac{\mu_F^2} {\mu_{F,0}^2} \right)^{\frac{\alpha_s}{2 \pi} \gamma(m)}, \end{aligned} $$
(2.144)

defining \(\gamma(m) = {\fancyscript{M}}[P](m).\) Instead of assuming a fixed \(\alpha_s\) in the transformed DGLAP equation Eq. 2.143 we can include \(\alpha_s(\mu_R^2)\) in the running of the DGLAP equation, identifying the renormalization scale \(\mu_R\) of the strong coupling with the factorization scale \(\mu_F = \mu_R \equiv \mu\). This allows us to replace \(\log \mu^2\) in the DGLAP equation by \(\alpha_s,\) including the leading order Jacobian

$$ \frac{d} {d \log \mu^2} = \frac{d \log \alpha_s} {d \log \mu^2} \; \frac{d} {d \log \alpha_s} = \frac{1} {\alpha_s} \; \frac{d \alpha_s} {d \log \mu^2} \; \frac{d} {d \log \alpha_s} = - \alpha_s b_0 \; \frac{d} {d \log \alpha_s}. $$
(2.145)

This additional factor of \(\alpha_s\) on the left-hand side will cancel the factor \(\alpha_s\) on the right hand side of the DGLAP equation Eq. 2.143

$$ \begin{aligned} [b] \frac{d {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu)}{d \log \alpha_s} &= - \frac{1} {2 \pi b_0} \gamma(m) \; {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu) \\ {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu) &= {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_0) \; \exp \left( - \frac{1} {2 \pi b_0} \; \gamma(m) \log \frac{\alpha_s(\mu^2)} {\alpha_s(\mu_0^2)} \right) \\ &= {\fancyscript{M}}[f_q^{{\rm NS}}](m,\mu_{F,0}) \; \left( \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)^{\frac{\gamma(m)}{2 \pi b_0}}. \end{aligned} $$
(2.146)

Among other things in this derivation we neglect that some splitting functions have singularities and therefore the Mellin transform is not obviously well-defined. Our convolution is not really a convolution either, because we cut it off at \(Q_0^2\), etc.; but the final structure in Eq. 2.146 really holds.

Because we will need it in the next section we emphasize that the same kind of solution appears in pure Yang–Mills theory, i.e. in QCD without quarks. Looking at the different color factors in QCD this limit can also be derived as the leading terms in \(N_c.\) In that case there exists also only one splitting kernel defining an anomalous dimension \(\gamma,\) and we find in complete analogy to Eq. 2.146

$$ \fbox{${\fancyscript{M}}[f_g](m,\mu) = {\fancyscript{M}}[f_g](m,\mu_0) \; \left( {\frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)}} \right)^{{\frac{\gamma(m)}{2 \pi b_0}}}$}\,. $$
(2.147)

To remind ourselves that in this derivation we unify the renormalization and factorization scales we denote them just as \(\mu.\) This solution to the DGLAP equation is not completely determined: as a solution to a differential equation it also includes an integration constant which we express in terms of \(\mu_0.\) The DGLAP equation therefore does not determine parton densities, it only describes their evolution from one scale \(\mu_F\) to another, just like a renormalization group equation in the ultraviolet.

The structure of Eq. 2.147 already shows something we will in more detail discuss in the following Sect. 2.3.5: the splitting probability we find in the exponent. To make sense of such a structure we remind ourselves that such ratios of \(\alpha_s\) values to some power can appear as a result of a re-summed series. Such a series would need to include powers of \(({\fancyscript{M}}[\hat{P}])^n\) summed over n which corresponds to a sum over splittings with a varying number of partons in the final state. Parton densities cannot be formulated in terms of a fixed final state because they include effects from any number of collinearly radiated partons summed over the number of such partons. For the processes we can evaluate using parton densities fulfilling the DGLAP equation this means that they always have the form

$$ \fbox{$ pp \to \mu^+ \mu^- + X $}\qquad \hbox{where}\;{\it X}\; \hbox{includes any number of collinear jets} $$
(2.148)

Why is \(\gamma\) is referred to as the anomalous dimension of the parton density? This is best illustrated using a running coupling with a finite mass dimension, like the gravitational coupling \(G_{{\rm Planck}} \sim 1/M_{{\rm Planck}}^{2}.\) When we attach a renormalization constant Z to this coupling we first define a dimensionless running bare coupling g. In n dimensions this gives us

$$ g^{{\rm bare}} = M^{n-2} \; G_{{\rm Planck}} \quad g^{{\rm bare}} = M^{n-2} \; G_{{\rm Planck}} \to Z g(M^2). $$
(2.149)

For the dimensionless gravitational coupling we can compute the running

$$ \begin{aligned} [b] \frac{d g(M^2)} {d \log M} &= \frac{d} {d \log M} \left( \frac{1} {Z} M^{n-2} \; G_{{\rm Planck}} \right) \\ &= G_{{\rm Planck}} \left( \frac{1} {Z} M \frac{dM^{n-2}} {d M} - \frac{1} {Z^2} \frac{dZ} {d \log M} M^{n-2} \right) \\&= g(M) \left( n - 2 + \eta \right) \quad \hbox{with} \quad \eta = - \frac{1} {Z} \frac{dZ} {d \log M} \end{aligned} $$
(2.150)

Hence, there are two sources of running for the renormalized coupling \(g(M^2){:}\) first, there is the mass dimension of the bare coupling \(n-2,\) and secondly there is \(\eta,\) a quantum effect from the coupling renormalization. For obvious reasons we call \(\eta\) the anomalous dimension of \(G_{{\rm Planck}}.\)

This is similar to the running of the parton densities in Mellin space, as shown in Eq. 2.143 with \(\gamma(m)\) defined in Eq. 2.144, so we refer to \(\gamma\) as an anomalous dimension as well. The entire running of the transformed parton density arises from collinear splitting, parameterized by a finite \(\gamma.\) There is only a slight stumbling step in this analogy: usually, an anomalous dimension arises through renormalization involving a ultraviolet divergence and the renormalization scale. In our case we are discussing an infrared divergence and the factorization scale dependence.

2.3.5 Re-Summing Collinear Logarithms

Remembering how we arrive at the DGLAP equation we notice an analogy to the case of ultraviolet divergences and the running coupling. We start from universal infrared divergences. Those we describe in terms of splitting functions which we regularize using the plus prescription. The DGLAP equation plays the role of a renormalization group equation for example for the running coupling. It links parton densities evaluated at different scales \(\mu_F.\)

In analogy to the scaling logarithms considered in Sect. 2.2.3 we now test if we can point to a type of logarithm the DGLAP equation re-sums by reorganizing our perturbative series of parton splitting. To identify these re-summed logarithms we build a physical model based on collinear splitting but without using the DGLAP equation. We then solve it to see the resulting structure of the solutions and compare it to the structure of the DGLAP solutions in Eq. 2.147.

We start from the basic equation defining the physical picture of parton splitting in Eq. 2.97 . Only taking into account gluons in pure Yang–Mills theory it precisely corresponds to the starting point of our discussion leading to the DGLAP equation, schematically written as

$$ \sigma_{n+1} = \int \sigma_n \;\frac{d t} {t} dz \; \frac{\alpha_s} {2 \pi} \; \hat{P}_{g \leftarrow g}(z). $$
(2.151)

What we now need is an exact physical definition of the virtuality variable \(t\) describing initial-state splitting. If we remember that \(t=p_b^2<0\) we can follow Eq. 2.117 and introduce a positive transverse momentum variable \(\vec{p}_T^2\) such that

$$ -t = - \frac{p_T^2} {1-z} = \frac{\vec{p}_T^2}{1-z} > 0 \quad \Rightarrow \quad \frac{dt} {t} = \frac{d p_T^2} {p_T^2} = \frac{d \vec{p}_T^2} {\vec{p}_T^2}. $$
(2.152)

From the definition of \(p_T\) in the Sudakov decomposition Eq. 2.88 we see that \(\vec{p}_T^2\) is really the transverse three-momentum of the parton pair after splitting.

If beyond the single parton radiation discussed in Sect. 2.3.1 we consider a ladder of successive splittings of one gluon into two and we for a moment forget about the actual parton densities we can write the hadronic cross section in the collinear limit including the appropriate convolution as

$$ \begin{aligned} [b] \sigma_{n+1}(x,\mu_F) &= \int\nolimits_{\mu_0^2}^{\mu_F^2} \frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{\alpha_s(\mu_R^2)} {2 \pi} \; \int\nolimits_{x_0}^1 \frac{dx_n} {x_n} \; \hat{P}_{g \leftarrow g}\left( \frac{x} {x_n} \right) \sigma_n(x_n,\mu_0) \\ &= \int\nolimits_{x_0}^1 \frac{dx_n} {x_n} \; \hat{P}_{g \leftarrow g}\left( \frac{x} {x_n} \right) \sigma_n(x_n,\mu_0) \int\nolimits_{\mu_0^2}^{\mu_F^2} \frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{\alpha_s(\mu_R^2)} {2 \pi}. \end{aligned} $$
(2.153)

The dz in Eq. 2.151 we replace by the proper convolution \(\hat{P} \otimes \sigma_n,\) evaluated at the momentum fraction x. Because the splitting kernel is infrared divergent we cut off the convolution integral at \(x_0.\) Similarly, the transverse momentum integral is bounded by an infrared cutoff \(\mu_0\) and the physical external scale \(\mu_F.\) For the split of the two integrals in Eq. 2.153 it is crucial that \(\mu_0\) is the only scale the final matrix element \(\sigma_1\) depends on.

Identifying \(\mu_F\) with the upper boundary of the transverse momentum integration for collinear splitting is the first assumption we make for our model. The recursion in Eq. 2.153 we can then apply iteratively

$$ \begin{aligned}[b] \sigma_{n+1}(x,\mu_F) \sim& \int\nolimits_{x_0}^1 \frac{dx_n} {x_n} \; \hat{P}_{g \leftarrow g}\left( \frac{x} {x_n} \right) \; \cdots \; \int\nolimits_{x_0}^1 \frac{dx_1} {x_1} \; \hat{P}_{g \leftarrow g} \left(\frac{x_2} {x_1} \right) \; \sigma_1(x_1,\mu_0) \\ \times& \int\nolimits_{\mu_0}^{\mu_F} \frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{\alpha_s(\mu_R^2)} {2 \pi} \; \cdots \; \int\nolimits_{\mu_0}^{\mu_F} \frac{d \vec{p}_{T,1}^2} {\vec{p}_{T,1}^2} \frac{\alpha_s(\mu_R^2)} {2 \pi}. \end{aligned} $$
(2.154)

The two sets of integrals in this equation we will solve one by one, starting with the \(\vec{p}_T\) integrals.

The crucial physics assumption in our multiple-splitting model concerns the integration boundaries, in addition to the global upper limit \(\mu_F{:}\) as the second assumption the transverse momenta of the splittings should be strongly ordered; the first splitting integrated over \(\vec{p}_{T,1}^2\) is bounded from above by the next external scale \(\vec{p}_{T,2}^2,\) which is then bounded by \(\vec{p}_{T,3}^2,\) etc. For the n-fold \(\vec{p}_T^2\) integration this means

$$ \mu_0^2 < \vec{p}_{T,1}^2 < \vec{p}_{T,2}^2 < \cdots < \mu_F^{2} \; $$
(2.155)

The transverse momentum integrals in Eq. 2.154 then become

$$\begin{aligned}[b] \int\nolimits_{\mu_0}^{\mu_F} &\frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{\alpha_s(\vec{p}_{T,n}^2)} {2 \pi} \; \cdots \; \int\nolimits_{\mu_0}^{p_{T,3}} \frac{d \vec{p}_{T,2}^2} {\vec{p}_{T,2}^2} \frac{\alpha_s(\vec{p}_{T,2}^2)}{2 \pi} \; \int\nolimits_{\mu_0}^{p_{T,2}} \frac{d \vec{p}_{T,1}^2} {\vec{p}_{T,1}^2} \frac{\alpha_s(\vec{p}_{T,1}^2)} {2 \pi} \; \cdots \\=& \int\nolimits_{\mu_0}^{\mu_F} \frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{1} {2 \pi b_0 \log \frac{\vec{p}_{T,n}^2} {\Lambda_{{\rm QCD}}^2}} \; \cdots \; \int\nolimits_{\mu_0}^{p_{T,3}} \frac{d \vec{p}_{T,2}^2} {\vec{p}_{T,2}^2} \frac{1} {2 \pi b_0 \log \frac{\vec{p}_{T,2}^2} {\Lambda_{{\rm QCD}}^2}} \; \\ &\quad\times\int\nolimits_{\mu_0}^{p_{T,2}} \frac{d \vec{p}_{T,1}^2} {\vec{p}_{T,1}^2} \frac{1} {2 \pi b_0 \log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2}} \; \cdots \; \\ =& \frac{1} {(2 \pi b_0)^n} \int\nolimits_{\mu_0}^{\mu_F} \frac{d \vec{p}_{T,n}^2} {\vec{p}_{T,n}^2} \frac{1} {\log \frac{\vec{p}_{T,n}^2} {\Lambda_{{\rm QCD}}^2}} \; \cdots \; \int\nolimits_{\mu_0}^{p_{T,3}} \frac{d \vec{p}_{T,2}^2} {\vec{p}_{T,2}^2} \frac{1} {\log \frac{\vec{p}_{T,2}^2} {\Lambda_{{\rm QCD}}^2}} \; \\&\quad\times \int\nolimits_{\mu_0}^{p_{T,2}} \frac{d \vec{p}_{T,1}^2} {\vec{p}_{T,1}^2} \frac{1} {\log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2}} \; \cdots\, . \end{aligned} $$
(2.156)

Note that just as in Eq. 2.146 as our third assumption we identify the scale of the strong coupling \(\alpha_s\) with the transverse momentum scale of the splitting \(\mu_R^2 = \vec{p}_T^2.\) These integrals we can solve by switching variables, for example in the last integral

$$ \begin{aligned} \int\nolimits_{\mu_0}^{p_{T,2}} \frac{d \vec{p}_{T,1}^2} {\vec{p}_{T,1}^2} \frac{1} {\log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2}} &= \int\nolimits_{\mu_0}^{p_{T,2}} d \log \log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2} \quad \hbox{with} \quad \frac{d (ax)} {(ax) \log x} = d \log \log x \\ &= \int\nolimits_{\mu_0}^{p_{T,2}} d \left( \log \log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2} - \log \log \frac{\mu_0^2} {\Lambda_{{\rm QCD}}^2} \right) \\ &= \left[ \log \log \frac{\vec{p}_{T,1}^2} {\Lambda_{{\rm QCD}}^2} - \log \log \frac{\mu_0^2} {\Lambda_{{\rm QCD}}^2} \right]_{\vec{p}_{T,1}^2 = \mu_0^2}^{\vec{p}_{T,1}^2 = \vec{p}_{T,2}^2} \\ &= \log \frac{\log \vec{p}_{T,2}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2}. \end{aligned} $$
(2.157)

In the second line we shift the integration variable by a constant. To simplify the notation we throughout keep the integration boundaries in terms of \(|\vec{p}_{T,j}|.\) The chain of integrals over \(\vec{p}_{T,j}^2\) we can solve, provided the integrand does not have any additional \(\vec{p}_{T,j}^2\) dependence. This gives us

$$\begin{aligned}[b] &\int\nolimits_{\mu_0}^{\mu_F} d \log \frac{\log \vec{p}_{T,n}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \int\nolimits_{\mu_0}^{p_{T,n}} d \log \frac{\log \vec{p}_{T,n-1}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2}\cdots\\[5pt] &\quad \times \int\nolimits_{\mu_0}^{p_{T,3}} d \log \frac{\log \vec{p}_{T,2}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \int\nolimits_{\mu_0}^{p_{T,2}} d \log \frac{\log \vec{p}_{T,1}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \\[5pt] =& \int\nolimits_{\mu_0}^{\mu_F} d \log \frac{\log \vec{p}_{T,n}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \int\nolimits_{\mu_0}^{p_{T,n}} d \log \frac{\log \vec{p}_{T,n-1}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2}\cdots \\[5pt] & \quad \times \int\nolimits_{\mu_0}^{p_{T,3}} d \log \frac{\log \vec{p}_{T,2}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \left( \log \frac{\log \vec{p}_{T,2}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \right) \\[8pt] =& \int\nolimits_{\mu_0}^{\mu_F} d \log \frac{\log \vec{p}_{T,n}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \int\nolimits_{\mu_0}^{p_{T,n}} d \log \frac{\log \vec{p}_{T,n-1}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \; \cdots\\[8pt] & \quad \times \frac{1} {2} \left( \log \frac{\log \vec{p}_{T,3}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \right)^2 \\ =& \int\nolimits_{\mu_0}^{\mu_F} d \log \frac{\log \vec{p}_{T,n}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \left( \frac{1} {2} \cdots \frac{1} {n-1} \right) \left( \log \frac{\log \vec{p}_{T,n}^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \right)^{n-1} \\ =& \frac{1} {n!} \left( \log \frac{\log \mu_F^2/\Lambda_{{\rm QCD}}^2} {\log \mu_0^2/\Lambda_{{\rm QCD}}^2} \right)^n = \frac{1} {n!} \left( \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu_F^2)} \right)^n. \end{aligned} $$
(2.158)

This is the final result for the chain of transverse momentum integrals in Eq. 2.154. Again, we see that the strong coupling is evaluated at the factorization scale \(\mu_F,\) which means we identify \(\mu_R \equiv \mu_F\) following our third assumption.

To compute the convolution integrals over the momentum fractions in the same equation

$$ \begin{aligned} \sigma_{n+1}(x,\mu) \sim& \frac{1} {n!} \left( \frac{1} {2 \pi b_0} \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)^n \; \\ &\times \int\nolimits_{x_0}^1 \frac{dx_n} {x_n} \; \hat{P}_{g \leftarrow g}\left(\frac{x} {x_n}\right) \; \cdots \; \int\nolimits_{x_0}^1 \frac{dx_1} {x_1} \; \hat{P}_{g \leftarrow g}\left( \frac{x_2} {x_1} \right) \; \sigma_1(x_1,\mu_0), \end{aligned} $$
(2.159)

we again Mellin transform the equation into moment space

$$ \begin{aligned}[b] {\fancyscript{M}}[\sigma_{n+1}](m,\mu) \sim& \frac{1} {n!} \left( \frac{1} {2 \pi b_0} \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)^n \\ & \times {\fancyscript{M}} \left[ \int\nolimits_{x_0}^1 \frac{dx_n} {x_n} \hat{P}_{g \leftarrow g}\left( \frac{x} {x_n} \right) \cdots \int\nolimits_{x_0}^1 \frac{dx_1} {x_1} \hat{P}_{g \leftarrow g}\left(\frac{x_2} {x_1} \right) \sigma_1(x_1,\mu_0) \!\right](m) \\ =& \frac{1} {n!} \left( \frac{1} {2 \pi b_0} \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)^n \; \gamma(m)^n {\fancyscript{M}}[\sigma_1](m,\mu_0) \\ =& \frac{1} {n!} \left(\frac{1} {2 \pi b_0} \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \gamma(m) \right)^n {\fancyscript{M}}[\sigma_1](m,\mu_0), \end{aligned} $$
(2.160)

where we define \(\gamma(m) \equiv {\fancyscript{M}}[P](m).\) We can now sum the production cross sections for n collinear jets and obtain

$$ \begin{aligned}[b] \sum\limits_{n=0}^\infty {\fancyscript{M}}[\sigma_{n+1}](m,\mu) =& {\fancyscript{M}}[\sigma_1](m,\mu_0) \; \sum\limits_n \frac{1} {n!} \left( \frac{1} {2 \pi b_0} \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \; \gamma(m) \right)^n \\ =& {\fancyscript{M}}[\sigma_1](m,\mu_0) \; \exp \left( \frac{\gamma(m)} {2 \pi b_0}\; \log \frac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)\,. \end{aligned} $$
(2.161)

This way we can write the Mellin transform of the \((n+1)\) particle production rate as the product of the n-particle rate times a ratio of the strong coupling at two scales

$$ \fbox{$\displaystyle\sum_{n=0}^\infty {\fancyscript{M}}[\sigma_{n+1}](m,\mu) = {\fancyscript{M}}[\sigma_1](m,\mu_0) \; \left( \dfrac{\alpha_s(\mu_0^2)} {\alpha_s(\mu^2)} \right)^{\frac{\gamma(m)} {2 \pi b_0}} $}\,. $$
(2.162)

This is the same structure as the DGLAP equation’s solution in Eq. 2.147. It means that we should be able to understand the physics of the DGLAP equation using our model calculation of a gluon ladder emission, including the generically variable number of collinear jets in the form of \(pp \to \mu^+ \mu^- + X,\) as shown in Eq. 2.148.

We should remind ourselves of the three assumptions we need to make to arrive at this form. There are two assumptions which concern the transverse momenta of the successive radiation: first, the global upper limit on all transverse momenta should be the factorization scale \(\mu_F,\) with a strong ordering in the transverse momenta. This gives us a physical picture of the successive splittings as well as a physical interpretation of the factorization scale . Second, the strong coupling should be evaluated at the transverse momentum or factorization scale, so all scales are unified, in accordance with the derivation of the DGLAP equation.

Bending the rules of pure Yang–Mills QCD we can come back to the hard process \(\sigma_1\) as the Drell–Yan process \(q \bar{q} \to Z.\) Each step in n means an additional parton in the final state, so \(\sigma_{n+1}\) is Z production with n collinear partons On the left hand side of Eq. 2.162 we have the sum over any number of additional collinear partons; on the right-hand side we see fixed order Drell–Yan production without any additional partons, but with an exponentiated correction factor. Comparing this to the running parton densities we can draw the analogy that any process computed with a scale dependent parton density where the scale dependence is governed by the DGLAP equation includes any number of collinear partons.

The logarithms which are re-summed by scale dependent parton densities we can also identify. Going back to Eq. 2.84 reminds us that we start from the divergent collinear logarithms \(\log p_T^{{\rm max}}/p_T^{{\rm min}}\) arising from the collinear phase space integration. In our model for successive splitting we replace the upper boundary by \(\mu_F.\) The collinear logarithm of successive initial state parton splitting diverges for \(\mu_0 \to 0,\) but it gets absorbed into the parton densities and determines the structure of the DGLAP equation and its solutions. The upper boundary \(\mu_F\) tells us to what extent we assume incoming quarks and gluons to be a coupled system of splitting partons and what the maximum momentum scale of these splittings is. Transverse momenta \(p_T > \mu_F\) generated by hard parton splitting are not covered by the DGLAP equation and hence not a feature of the incoming partons anymore. They belong to the hard process and have to be consistently simulated, as we will see in Sects. 2.5.3 and 2.6. While this scale can be chosen freely we have to make sure that it does not become too large, because at some point the collinear approximation \(C \simeq\) constant in Eq. 2.84 ceases to hold and with it our entire argument. Only if we do everything correctly, the DGLAP equation re-sums logarithms of the maximal transverse momentum size of the incoming gluon. They are universal and arise from simple kinematics.

The ordering of the splittings we have to assume is not relevant unless we simulate this splitting, as we will see in the next section. For the details of this we have to remember that our argument follows from the leading collinear approximation introduced in Sect. 2.3.1. Therefore, the strong \(p_T\) ordering can in practice mean angular ordering or rapidity ordering as well, just applying a linear transformation.

2.4 Scales in LHC Processes

Looking back at Sects. 2.2 and 2.3 we introduced the factorization and renormalization scales step by step completely in parallel: first, computing perturbative higher order contributions to scattering amplitudes we encounter ultraviolet and infrared divergences. Both of them we regularize using dimensional regularization with \(n=4 - 2 \varepsilon<4\) for ultraviolet and \(n>4\) for infrared divergences, linked by analytic continuation. For both kinds of divergences we notice that they are universal, i.e. not process or observable dependent. This allows us to absorb ultraviolet and infrared divergences into a re-definition of the strong coupling and the parton density. This nominally infinite shift of parameters we refer to as renormalization for example of the strong coupling or as mass factorization absorbing infrared divergences into the parton distributions.

After renormalization as well as after mass factorization we are left with a scale artifact . Scales arise as part of a the pole subtraction: together with the pole \(1/\varepsilon\) we have a choice of finite contributions which we subtract with this pole. Logarithms of the renormalization and factorization scales will always be part of these finite terms. Moreover, in both cases the re-definition of parameters is not based on fixed order perturbation theory. Instead, it involves summing logarithms which otherwise can become large and spoil the convergence of our perturbative series in \(\alpha_s.\) The only special feature of infrared divergences as compared to ultraviolet divergences is that to identify the resummed logarithms we have to unify both scales to one.

The hadronic production cross section for the Drell–Yan process or other LHC production channels, now including both scales, reads

$$ \sigma_{{\rm tot}}(\mu_F,\mu_R)= \int\nolimits_0^1 dx_1 \int\nolimits_0^1 dx_2 \sum_{ij} f_i(x_1,\mu_F) f_j(x_2, \mu_F) \hat{\sigma}_{ij}(x_1 x_2 S, \alpha_s(\mu_R^2), \mu_F, \mu_R). $$
(2.163)

The Drell–Yan process has the particular feature that at leading order \(\hat{\sigma}_{q\bar{q}}\) only involves weak couplings, it does not include \(\alpha_s\) with its implicit renormalization scale dependence at leading order. Strictly speaking, in Eq. 2.163 the parton densities also depend on the renormalization scale because in their extraction we identify both scales. Carefully following their extraction we can separate the two scales if we need to. Lepton pair production and Higgs production in weak boson fusion are the two prominent electroweak production processes at the LHC.

The evolution of all running parameters from one renormalization/factorization scale to another is described either by renormalization group equation in terms of a beta function in the case of renormalization and by the DGLAP equation in the case of mass factorization. Our renormalization group equation for \(\alpha_s\) is a single equation, but in general they are sets of coupled differential equations for all relevant parameters, which again makes them more similar to the DGLAP equation.

There is one formal difference between these two otherwise very similar approaches. The fact that we can absorb ultraviolet divergences into process-independent, i.e. universal counter terms is called renormalizability and has been proven to all orders for the kind of gauge theories we are dealing with. The universality of infrared splitting kernels has not (yet) in general been proven, but on the other hand we have never seen an example where is fails for sufficiently inclusive observables like production rates. For a while we thought there might be a problem with factorization in supersymmetric theories using the \({\overline{\hbox{MS}}}\) scheme, but this issue has been resolved. A summary of the properties of the two relevant scales for LHC physics we show in Table 2.2.

Table 2.2 Comparison of renormalization and factorization scales appearing in LHC cross sections

The way we introduce factorization and renormalization scales clearly labels them as an artifact of perturbation theories with divergences. What actually happens if we include all orders in perturbation theory? For example, the re-summation of the self energy bubbles simply deals with one class of diagrams which have to be included, either order-by-order or rearranged into a re-summation. Once we include all orders in perturbation theory it does not matter according to which combination of couplings and logarithms we order it. An LHC production rate will then not depend on arbitrarily chosen renormalization or factorization scales \(\mu.\)

Practically, in Eq. 2.163 we evaluate the renormalized parameters and the parton densities at some scale. This scale dependence will only cancel once we include all implicit and explicit appearances of the scales at all orders. Whatever scale we choose for the strong coupling or parton densities will eventually be compensated by explicit scale logarithms. In the ideal case, these logarithms are small and do not spoil perturbation theory. In a process with one distinct external scale, like the Z mass, we know that all scale logarithms should have the form \(\log (\mu/m_Z).\) This logarithm vanishes if we evaluate everything at the ‘correct’ external energy scale, namely \(m_Z.\) In that sense we can think of the running coupling as a proper running observable which depends on the external energy of the process. This dependence on the external energy is not a perturbative artifact, because a cross section even to all orders does depend on the energy. The problem in particular for LHC analyses is that after analysis cuts every process will have more than one external energy scale.

We can turn around the argument of vanishing scale dependence to all orders in perturbation theory. This gives us an estimate of the minimum theoretical error on a rate prediction set by the scale dependence. The appropriate interval of what we consider reasonable scale choices depends on the process and the taste of the people doing this analysis. This error estimate is not at all conservative; for example the renormalization scale dependence of the Drell–Yan production rate or Higgs production in weak boson fusion is zero because \(\alpha_s\) only enters are next-to-leading order. At the same time we know that the next-to-leading order correction to the Drell–Yan cross section is of the order of 30%, which far exceeds the factorization scale dependence. Moreover, the different scaling behavior of a hadronic cross section shown in Table 2.2 implies that for example gluon-induced processes at typical x values around \(10^{-2}\) show a cancellation of the factorization and renormalization scale variation. Estimating theoretical uncertainties from scale dependence therefore requires a good understanding of the individual process and the way it is affected by the two scales.

Guessing the right scale choice for a process is hard, often impossible. For example in Drell–Yan production at leading-order there exists only one scale, \(m_Z.\) If we set \(\mu = m_Z\) all scale logarithms vanish. In reality, LHC observables include several different scales. Some of them appear in the hard process, for example in the production of two or three particles with different masses. Others enter through the QCD environment where at the LHC we only consider final state jets above a certain minimal transverse momentum. Even others appear though background rejection cuts in a specific analysis, for example when we only consider the Drell–Yan background for \(m_{\mu \mu} > 1\) TeV to Kaluza–Klein graviton production. Using likelihood methods does not improve the situation because the phase space regions dominated by the signal will introduce specific energy scales which affect the perturbative prediction of the backgrounds. This is one of the reasons why an automatic comparison of LHC events with signal or background predictions is bound to fail once it requires an estimate of the theoretical uncertainty on the background simulation.

All that means that in practice there is no way to define a ‘correct’ scale. On the other hand, there are definitely poor scale choices. For example, using \(1,000 \times m_Z\) as a typical scale in the Drell–Yan process will if nothing else lead to logarithms of the size \(\log 1,000\) whenever a scale logarithm appears. These logarithms eventually have to be cancelled to all orders in perturbation theory, inducing unreasonably large higher order corrections.

When describing jet radiation, we usually introduce a phase-space dependent renormalization scale, evaluating the strong coupling at the transverse momentum of the radiated jet \(\alpha_s(\vec{p}_{T,j}^2).\) This choice gives the best kinematic distributions for the additional partons because in Sect. 2.3.5 we have shown that it re-sums large collinear logarithms.

The transverse momentum of a final state particle is one of scale choices allowed by factorization; in addition to poor scale choices there also exist wrong scale choices, i.e. scale choices violating physical properties we need. Factorization or the Kinoshita–Lee–Nauenberg theorem which ensures that soft divergences cancel between real and virtual emission diagrams are such properties we should not violate—in QED the same property is called the Bloch–Nordsieck cancellation. Imagine picking a factorization scale defined by the partonic initial state, for example the partonic center-of-mass energy \(s = x_1 x_2 S.\) We know that this definition is not unique: for any final state it corresponds to the well defined sum of all momenta squared. However, virtual and real gluon emission generate different multiplicities in the final state, which means that the two sources of soft divergences only cancel until we multiply each of them with numerically different parton densities. Only scales which are uniquely defined in the final state can serve as factorization scales. For the Drell–Yan process such a scale could be \(m_Z,\) or the mass of heavy new-physics states in their production process. So while there is no such thing as a correct scale choice, there are more or less smart choices, and there are definitely very poor choices, which usually lead to an unstable perturbative behavior.

2.5 Parton Shower

In LHC phenomenology we are usually less interested in fixed-order perturbation theory than in logarithmically enhanced QCD effects. Therefore, we will not deepen our discussion of hadronic rates as shown in Eq. 2.163 based on fixed-order partonic cross sections convoluted with parton densities obeying the DGLAP equation. In Sect. 2.3.5 we have already seen that there exist more functions with the same structure as solutions to the DGLAP equation. In this section we will look for other structures which obey the DGLAP equation, leading us to Sudakov form factors and the parton shower. In Sect. 2.5.2 we will discuss some of the physical properties of the parton shower. Per se it is not clear how jet radiation described by the parton shower and jet radiation described by fixed-order QCD processes are linked. In Sect. 2.5.3 we will discuss ways to combine the two approaches in realistic LHC simulations, bringing us very close to contemporary research topics.

2.5.1 Sudakov Form Factor

The splitting kernels \(\hat{P}_{i \leftarrow j}(z)\) we introduce as something like splitting probabilities, but we never apply a probabilistic approach to parton splitting . The basis of such an interpretation are Sudakov form factors describing the splitting of a parton i into any of the partons j based on the factorized form Eq. 2.97

$$ \fbox{$\Delta_i(t) \equiv \Delta_i(t, t_0) = \exp \left( - \sum \limits_j \displaystyle\int \limits_{t_0}^{t} \dfrac{dt^{\prime}} {t^{\prime}} \int \limits_{0}^{1} dy \dfrac{\alpha_s} {2 \pi} \hat{P}_{j \leftarrow i}(y) \right)$}\,. $$
(2.164)

Before we check that such a function can obey the DGLAP equation we confirm that such exponentials appear in probabilistic arguments, similar to our discussion of the central jet veto in Sect. 1.5.2. Using Poisson statistics for something expected to occur p times, the probability of observing it n times is given by

$$ {\fancyscript{P}}(n;p) = \frac{p^n e^{-p}} {n!} \quad\quad {\fancyscript{P}}(0;p) = e^{-p}. $$
(2.165)

If the exponent in the Sudakov form factor in Eq. 2.164 describes the integrated splitting probability of a parton i this means that the Sudakov itself describes a non-splitting probability of the parton i into any final state j.

Based on such probabilistic Sudakov factors we can use a Monte Carlo (i.e. a Markov process without a memory of individual past steps) to compute a chain of parton splittings as depicted in Fig. 2.2. This will describe a quark or a gluon propagating forward in time. Starting from a point \((x_1, t_1)\) in momentum-virtuality space we step by step move to the next splitting point \((x_j, t_j)\). Following the original discussion \(t_2\) is the target virtuality at \(x_2\), and for time-like final state branching the virtuality is positive \(t_j\,>\,0\) in all points j. The Sudakov factor is a function of t, so it gives us the probability of not seeing any branching between \(t_1\) and \(t_2\) as \(\Delta(t_1)/\Delta(t_2)\,<\,1\). The appropriate cutoff scale \(t_0\) drops out of this ratio. Using a flat random number \(r_t\) the \(t_2\) distribution is implicitly given by the solution to

$$ \frac{\Delta(t_1)} {\Delta(t_2)} = r_t \in [0,1] \quad\qquad \hbox{with} \quad t_1 > t_2 > t_0 > 0. $$
(2.166)

Beyond the absolute cutoff scale \(t_0\) we assume that no resolvable branching occurs.

In a second step we need to compute the matching energy fraction \(x_2\) or the ratio \(x_2/x_1\) describing the momentum fraction which is kept in the splitting at \(x_2\). The y integral in the Sudakov factor in Eq. 2.164 gives us this probability distribution which we can again implicitly solve for \(x_2\) using a flat random number \(r_x\)

$$ \frac{\int\nolimits_{0}^{x_2/x_1} dy \frac{\alpha_s} {2\pi} \hat{P}(y)} {\int\nolimits_{0}^{1} dy \frac{\alpha_s} {2\pi} \hat{P}(y)} = r_x \in [0,1] \quad\qquad \hbox{with} \; x_1 > x_2 > 0. $$
(2.167)

For splitting kernels with soft divergences at \(y\,=\,0\) or \(y\,=\,1\) we should include a numerical cutoff in the integration because the probabilistic Sudakov factor and the parton shower do not involve the regularized splitting kernels.

Of the four momentum entries of the radiated parton the two equations Eqs. 2.166 and 2.167 give us two. The on-shell mass constraint fixes a third, so all we are left is the azimuthal angle distribution. We know from symmetry arguments that QCD splitting is insensitive to this angle, so we can generate it randomly between zero and \(2 \pi\). For final state radiation this describes probabilistic branching in a Monte Carlo program , just based on Sudakov form factors.

The same statement for initial state radiation including parton densities we will put on a more solid or mathematical footing. The derivative of the Sudakov form factor Eq. 2.164

$$ \frac{1} {\Delta_i(t)} \frac{d \Delta_i(t)} {dt} = - \sum \limits_j \frac{1} {t} \int \nolimits_0^1 dy \frac{\alpha_s} {2 \pi} \hat{P}_{j \leftarrow i}(y) $$
(2.168)

is precisely the second term in \(d f(x,t)/dt\) for diagonal splitting, as shown in Eq. 2.124

$$ \begin{aligned}[b] \frac{d f_i(x,t)} {d t} &= \frac{1} {t} \sum_j \left[ \int \nolimits_0^1 \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t\right) - \int \nolimits_0^1 dy \; \frac{\alpha_s} {2\pi} \; \hat{P}_{j \leftarrow i} (y) \; f_i\left(x,t\right) \right] \\ &= \frac{1} {t} \sum_j \int \nolimits_0^1 \frac{dz} {z}\; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t\right) + \frac{f_i(x,t)} {\Delta_i(t)} \frac{d \Delta_i(t)} {d t}. \end{aligned} $$
(2.169)

This relation suggests to consider the derivative of the \(f_i/\Delta_i\) instead of the Sudakov factor alone to obtain something like the DGLAP equation

$$ \begin{aligned} \frac{d} {d t} \frac{f_i(x,t)} {\Delta_i(t)} &= \frac{1} {\Delta_i(t)} \; \frac{d f_i(x,t)} {d t} - \frac{f_i(x,t)} {\Delta_i(t)^2} \; \frac{d \Delta_i(t)} {d t} \\&= \frac{1} {\Delta_i(t)} \; \left( \frac{1} {t} \sum_j \int \nolimits_0^1 \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t\right) + \frac{f_i(x,t)} {\Delta_i(t)} \frac{d \Delta_i(t)} {d t} \right) \\ & \quad- \frac{f_i(x,t)} {\Delta_i(t)^2} \; \frac{d \Delta_i(t)} {d t} \\ &= \frac{1} {\Delta_i(t)} \; \frac{1} {t} \; \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t\right)\!. \end{aligned} $$
(2.170)

In the last step we cancel what corresponds to the plus prescription for diagonal splitting, i.e. we remove the regularization of the splitting kernel at \(z \to 1.\) Therefore, we need to modify the upper integration boundary by a small parameter \(\varepsilon\) which can in principle depend on t. The resulting equation is the diagonal DGLAP equation with unsubtracted splitting kernels, solved by the ratio of parton densities and Sudakov factors

$$ \begin{aligned} \fbox{$t \dfrac{d} {d t} \dfrac{f_i(x,t)} {\Delta_i(t)} = \dfrac{d} {d \log t} \dfrac{f_i(x,t)} {\Delta_i(t)} = \displaystyle\sum_j \int \nolimits_0^{1-\varepsilon} \dfrac{dz} {z} \; \dfrac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; \dfrac{f_j\left(\dfrac{x} {z},t\right)} {\Delta_i(t)}$}\,.\\[5pt] \end{aligned} $$
(2.171)

We can study the structure of these solutions of the unsubtracted DGLAP equation by integrating \(f /\Delta\) between appropriate points in t

$$ \begin{aligned} \frac{f_i(x,t)} {\Delta_i(t)} &- \frac{f_i(x,t_0)} {\Delta_i(t_0)} = \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; \frac{f_j\left(\frac{x} {z},t^{\prime}\right)} {\Delta_i(t^{\prime})}\\ f_i(x,t) &= \frac{\Delta_i(t)} {\Delta_i(t_0)} f_i(x,t_0) + \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \; \frac{\Delta_i(t)} {\Delta_i(t^{\prime})} \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t^{\prime}\right) \\ &= \Delta_i(t) f_i(x,t_0) +\!\! \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \; \frac{\Delta_i(t)} {\Delta_i(t^{\prime})} \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t^{\prime}\right) \\ &\equiv \Delta_i(t,t_0) f_i(x,t_0) +\!\! \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \; \Delta_i(t,t^{\prime}) \sum_j\!\! \int\nolimits_0^{1-\varepsilon}\frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t^{\prime}\right)\!, \end{aligned} $$
(2.172)

where we choose \(t_0\) such that \(\Delta(t_0) = 1\) and introduce the notation \(\Delta(t_1, t_2) = \Delta(t_1, t_0) / \Delta(t_2, t_0)\) for the ratio of two Sudakov factors in the last line. This formula for the dependence of the parton density \(f_i(x,t)\) on x and t has a suggestive interpretation: corresponding to Eq. 2.165 the first term can be interpreted as ‘nothing happening to f between \(t_0\) and \(t^{\prime}\) because it is weighted by the Sudakov no-branching probability \(\Delta_i.\) The second term includes the ratio of Sudakov factors which just like in Eq. 2.166 means no branching between \(t^{\prime}\) and t. Integrating this factor times the splitting probability over \(t^{\prime} \in [t_0,t]\) implies at least one branching between \(t_0\) and t.

The issue of this interpretation of the Sudakov form factor in conjunction with the parton densities is its numerical usability in a probabilistic approach: starting from a parton density somewhere in \((x - t)\) space we need to evolve it to a fixed point \((x_n, t_n)\) given by the hard subprocess, e.g. \(q \bar q \to Z\) with \(m_Z\) giving the scale and energy fraction of the two quarks. Numerically it would be much easier to simulate backwards evolution where we start from the known kinematics of the hard process and the corresponding point in the \((x - t)\) plane and evolve towards the partons in the proton, ideally to a point where the probabilistic picture of collinear, stable, non-radiating quarks and gluons in the proton holds. This means we need to define a probability that a parton evolved backwards from a space-like \(t_2\,<\,0\) to \(t_1\,<\,0\) with \(|t_2| > |t_1|\) does not radiate or split.

For this final step we define a probability measure for the backwards evolution of partons \(\Pi(t_1,t_2;x).\) Just like the two terms in Eq. 2.172 it links the splitting probability to a probability of an undisturbed evolution. For example, we can write the probability that a parton is generated by a splitting in the interval \([t, t + \delta t],\) evaluated at \((t_2,x)\), as \(dF(t;t_2)\). The measure corresponding to a Sudakov survival probability is then

$$ \Pi(t_1,t_2;x) = 1 - \int\nolimits_{t_1}^{t_2} dF(t;t_2). $$
(2.173)

Comparing the definition of dF to the relevant terms in Eq. 2.172 and replacing \(t \to t_2\) and \(t^{\prime} \to t\) we know what happens for the combination

$$ \begin{aligned} [b] f_i(x,t_2) dF(t;t_2) &= \frac{dt} {t} \; \frac{\Delta_i(t_2)} {\Delta_i(t)} \; \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; f_j\left(\frac{x} {z},t \right) \\&= dt \; \Delta_i(t_2) \; \frac{1} {t} \sum_j \int \nolimits_0^{1-\varepsilon} \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}_{i \leftarrow j}(z) \; \frac{f_j\left(\frac{x} {z},t \right)} {\Delta_i(t)} \\&= dt \; \Delta_i(t_2) \; \frac{d} {d t} \frac{f_i(x,t)} {\Delta_i(t)} \quad\quad \hbox{using Eq. 2.171}. \end{aligned} $$
(2.174)

This means

$$ \Pi(t_1,t_2;x) = 1 -\frac{f_i(x,t) \Delta_i(t_2)} {f_i(x,t_2) \Delta_i(t)}\Bigg|_{t_1}^{t_2} = \frac{f_i(x,t_1) \Delta_i(t_2)} {f_i(x,t_2) \Delta_i(t_1)}, $$
(2.175)

and gives us a probability measure for backwards evolution: the probability of evolving back from \(t_2\) to \(t_1\) is described by a Markov process with a flat random number as

$$ \frac{f_i(x,t_1) \Delta_i(t_2)} {f_i(x,t_2) \Delta_i(t_1)} = r \in [0,1] \quad\quad \hbox{with} \; |t_2| > |t_1|. $$
(2.176)

While we cannot write down this procedure in a closed form, it shows how we can algorithmically generate initial state as well as final state parton radiation patterns based on the unregularized DGLAP equation and the Sudakov factors solving this equation. One remaining issue is that in our derivation of the collinear re-summation interpretation of the parton shower we assume some a strong ordering of the radiated partons which we will discuss in the next section.

2.5.2 Soft Gluon Emission

To this point we have built our parton shower on collinear parton splitting or radiation and its universal properties indicated by Eq. 2.97. Deriving the diagonal splitting kernels in Eqs. 2.102 and 2.115 we encounter an additional source of infrared divergences, namely soft gluon emission corresponding to energy fractions \(z \to 0, 1.\) Its radiation pattern is also universal, just like the collinear case. One way to study this soft divergence without an overlapping collinear pole is gluon radiation off a massive quark with momentum \(q + k\) and mass m, which could be attached to some hard process as a splitting final state. The initial quark momentum \(q + k\) splits into a hard quark q and a soft gluon k with \(k^2 \ll q^2 = m^2\)

$$ \begin{aligned}[b] {\fancyscript{M}}_{n+1} &= g_s T^a \; \varepsilon_\mu^{\ast}(k) \; \bar{u}(q) \gamma^\mu \frac{{/\!\!\!q} + {/\!\!\!k} + m} {(q+k)^2 - m^2}\; {\fancyscript{M}}_n \\&= g_s T^a \; \varepsilon_\mu^{\ast}(k) \; \bar{u}(q) \left[ -{/\!\!\!q} \gamma^\mu +2 q^\mu +m \gamma^\mu - \gamma^\mu {/\!\!\!k} \right] \; \frac{1} {2(q k) + {\fancyscript{O}}(k^2)} \; {\fancyscript{M}}_n \\ &= g_s T^a \; \varepsilon_\mu^{\ast}(k) \; \bar{u}(q) \frac{q^\mu + {\fancyscript{O}}({/\!\!\!k})} {(q k) + {\fancyscript{O}}(k^2)} \; {\fancyscript{M}}_n \quad \hbox{Dirac equation} \quad \bar{u}(q) ({/\!\!\!q} -m ) = 0 \\ &\sim g_s T^a \; \varepsilon_\mu^{\ast}(k) \; \frac{q^\mu} {(q k)} \; \bar{u}(q) \; {\fancyscript{M}}_n \\&\to g_s \; \varepsilon_\mu^{\ast}(k) \; \left( \sum_j \hat{T}^a_j \frac{q_j^\mu} {(q_j k)} \right) \; \bar{u}(q) \; {\fancyscript{M}}_n \end{aligned} $$
(2.177)

The conventions are similar to Eq. 2.102, \({\fancyscript{M}}_n\) includes all additional terms except for the spinor of the outgoing quark with momentum \(q + k.\) Neglecting the gluon momentum altogether defines the leading term of the eikonal approximation .

In the last step we simply add all possible sources j of gluon radiation. This defines a color operator which we insert into the matrix element and which assumes values of \(+T_{ij}^{a}\) for radiation off a quark, \(-T_{ji}^{a}\) for radiation off an antiquark and \(-i f_{abc}\) for radiation off a gluon. For a color neutral process like our favorite Drell–Yan process adding an additional soft gluon \(q\bar{q} \to Zg\) it returns \(\sum_j \hat{T}_j = 0.\)

The matrix element in Eq. 2.177 we need to square. It includes a polarization sum and will therefore depend on the gauge. We choose the general axial gauge for massless gauge bosons

$$ \sum_{{\rm pol}} \varepsilon_\mu^{\ast}(k) \varepsilon_{\it v}(k) = - g_{\mu {\it v}} +\frac{k_\mu n_{\it v} + n_\mu k_{\it v}} {(n k)} -n^2 \frac{k_\mu k_{\it v}} {(n k)^2} = - g_{\mu {\it v}} +\frac{k_\mu n_{\it v} + n_\mu k_{\it v}} {(n k)}, $$
(2.178)

with a light-like reference vector n obeying \(n^2 = 0.\) The matrix element squared then reads

$$ \begin{aligned} \overline{|{\fancyscript{M}}_{n+1}|^{2}} &= g_{s}^{2}\; \left( - g_{\mu {\it v}} +\frac{k_{\mu} n_{{\it v}} +n_{\mu} k_{{\it v}}} {(n k)} \right)\;{\left( \sum_{j} \hat{T}_{j}^{a} \frac{q_{j}^{\mu}}{(q_{j} k)} \right)}^{\dag}\;\left( \sum_{j} \hat{T}_{j}^{a} \frac{q_{j}^{\it v}} {(q_{j} k)} \right) \; \overline{|{\fancyscript{M}}_{n}|^{2}} \\ &= g_s^2 \; \left( -\left( \sum_{j} \hat{T}_{j}^{a} \frac{q_{j}^\mu} {(q_{j} k)} \right)^{\dag}\left( \sum_{j} \hat{T}_{j}^{a} \frac{q_{j \mu}} {(q_{j} k)} \right)\right.\\ & + \frac{2} {(n k)} { \left( \sum_{j} \hat{T}_{j}^{a} \right)}^{\dag}\left( \sum_{j} \hat{T}_{j}^{a}\left. \frac{(q_{j} n)}{(q_{j} k)} \right) \; \right) \overline{|{\fancyscript{M}}_n|^2}\\ &= - g_s^2 \;{\left( \sum_{j} \hat{T}_j^a \frac{q_j^\mu} {(q_j k)} \right)}^{\dag} \left( \sum_{j}\hat{T}_j^a\frac{q_{j \mu}} {(q_j k)} \right) \; \overline{|{\fancyscript{M}}_n|^2}. \end{aligned}$$
(2.179)

The insertion operator in the matrix element has the form of an insertion current multiplied by its hermitian conjugate. This current describes the universal form of soft gluon radiation off an n-particle process

$$ \fbox{$\overline{|{\fancyscript{M}}_{n+1}|^2} \equiv - g_s^2 \; (J^{\dag} \cdot J) \; \overline{|{\fancyscript{M}}_n|^2}$} \quad \hbox{with} \quad J^{a \mu}(k,\{q_j\}) = \sum_j \hat{T}^a_j \; \frac{q_j} {(q_j k)}. $$
(2.180)

The squared current appearing in the matrix element squared, Eq. 2.179, we can further simplify to

$$ \begin{aligned} (J^{\dag} \cdot J) &= \sum_j \hat{T}^a_j \hat{T}^a_j \; \frac{q_j^2} {(q_j k)^2} + 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \frac{(q_i q_j)} {(q_i k)(q_j k)} \\&= \sum_j \hat{T}^a_j \left( - \sum_{i \ne j} \hat{T}^a_i \right) \frac{q_j^2} {(q_j k)^2} + 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \frac{(q_i q_j)} {(q_i k)(q_j k)} \\&= - \left( \sum_{i < j} + \sum_{i > j} \right) \hat{T}^a_i \hat{T}^a_j \frac{q_j^2} {(q_j k)^2} + 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \frac{(q_i q_j)} {(q_i k)(q_j k)} \\&= 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \left( \frac{(q_i q_j)} {(q_i k)(q_j k)} -\frac{q_i^2} {2(q_i k)^2} -\frac{q_j^2} {2(q_j k)^2} \right) \quad \hbox{massive case} \\&= 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \frac{(q_i q_j)} {(q_i k)(q_j k)} \hbox{massless partons} \\ &= 2 \sum_{i < j} \hat{T}^a_i \hat{T}^a_j \; \frac{(q_i q_j)} {(q_i k) + (q_j k)} \; \left(\frac{1} {(q_i k)} +\frac{1} {(q_j k)} \right)\!. \end{aligned} $$
(2.181)

In the last step we only bring the eikonal factor into a different form which sometimes comes in handy because it separates the two divergences associated with \(q_i\) and with \(q_j\).

At this point we return to massless QCD partons, keeping in mind that the ansatz Eq. 2.177 ensures that the insertion currents only model soft, not collinear radiation. Just as a side remark at this stage—our definition of the insertion current \(J^{a \mu}\) in Eq. 2.180 we can generalize to colored processes, where the current becomes dependent on the gauge vector n to cancel the n dependence of the polarization sum

$$ J^{a \mu}(k,\{p_j\}) = \sum_j \hat{T}^a_j \; \left( \frac{p_j} {(p_j k)} - \frac{n} {(n k)} \right) $$
(2.182)

This dipole radiation term in Eqs. 2.180 and 2.182 we can study to see how successive soft gluon radiation will be organized for example in terms of emission angles. Given that in the interpretation of the DGLAP equation and its solutions the ordering of the collinear emissions plays a crucial role this is an important question. As it will turn out, the soft emission case will help us understand this feature.

At this stage, calling the terms in Eq. 2.182 a dipole is a little bit of a stretch if we compare it to a multi-pole series. To see the actual dipole structure we would need to look at the color structure. We start by symmetrizing the soft radiation dipole with respect to the two hard momenta in a particular way

$$ \begin{aligned} (J^{\dag} \cdot J)_{ij} &\sim W_{ij} = \frac{(q_i q_j)} {(q_i k)(q_j k)} \\&= \frac{1- \cos \theta_{ij}} {(1-\cos \theta_{ig})(1-\cos \theta_{jg})}\,\, \hbox{in terms of opening angles} \; \theta \\ &= \frac{1} {2} \left( \frac{1- \cos \theta_{ij}} {(1-\cos \theta_{ig})(1-\cos \theta_{jg})} + \frac{1} {1-\cos \theta_{ig}} - \frac{1} {1-\cos \theta_{jg}} \right) + (i \leftrightarrow j) \\&\equiv W^{[i]}_{ij} + W^{[j]}_{ij}. \end{aligned} $$
(2.183)

Each of the two terms we need to integrate over the gluon’s phase space, including the azimuthal angle integration

$$ \begin{aligned} \int \nolimits_0^{2 \pi} d \phi_{ig} W^{[i]}_{ij} &= \frac{1} {2} \int \nolimits_0^{2 \pi} d \phi_{ig} \left(\frac{1- \cos \theta_{ij}} {(1-\cos \theta_{ig})(1-\cos \theta_{jg})} + \frac{1} {1-\cos \theta_{ig}} - \frac{1} {1-\cos \theta_{jg}} \right) \\&= \frac{1} {2} \int \nolimits_0^{2 \pi} d \phi_{ig} \; \left[ \frac{1} {1-\cos \theta_{ig}} + \frac{1} {1-\cos \theta_{jg}} \left(\frac{1- \cos \theta_{ij}} {1-\cos \theta_{ig}} - 1 \right) \right]\!. \end{aligned} $$
(2.184)

To disentangle the different angular integrations we express the three parton vectors in polar coordinates where the initial parton i propagates into the x direction, the interference partner j in the \((x - y)\) plane, and the soft gluon in the full three-dimensional space

$$\begin{aligned}[b] \vec{n}_i &= (1,0,0) \hbox{hard parton} \\ \vec{n}_j &= (\cos \theta_{ij}, \sin \theta_{ij}, 0)\hbox{interference partner} \\ \vec{n}_g &= (\cos \theta_{ig}, \sin \theta_{ig} \cos\phi_{ig}, \sin \theta_{ig} \sin \phi_{ig})\quad\hbox{soft gluon} \\ 1 - \cos \theta_{jg} \equiv (\vec{n}_j \vec{n}_g) &= 1- \cos \theta_{ij} \cos\theta_{ig} +\sin \theta_{ij} \sin \theta_{ig} \cos\phi_{ig}, \end{aligned} $$
(2.185)

From the scalar product between these four-vectors we see that of the terms appearing in Eq. 2.184 only the opening angle \(\theta_{jg}\) is related to \(\phi_{ig},\) which for the azimuthal angle integration means

$$ \begin{aligned} \int \nolimits_0^{2 \pi} d \phi_{ig} \; W^{[i]}_{ij} &= \frac{\pi} {1-\cos \theta_{ig}} + \frac{1} {2} \; \left(\frac{1- \cos \theta_{ij}} {1-\cos \theta_{ig}} - 1 \right) \; \int \nolimits_0^{2 \pi} d \phi_{ig} \; \frac{1} {1-\cos \theta_{jg}} \\[3pt] &= \frac{1} {1-\cos \theta_{ig}} \left[ \pi + \frac{\cos \theta_{ig} - \cos \theta_{ij}} {2} \; \int \nolimits_0^{2 \pi} d \phi_{ig} \; \frac{1} {1-\cos \theta_{jg}} \right]\!.\\[5pt] \end{aligned} $$
(2.186)

The azimuthal angle integral in this expression for \(W^{[i]}_{ij}\) we can solve

$$ \begin{aligned}[b] \int\nolimits_0^{2 \pi} d \phi_{ig} \frac{1} {1-\cos \theta_{jg}} &= \int\nolimits_0^{2 \pi} d \phi_{ig} \frac{1} {1- \cos \theta_{ij} \cos \theta_{ig} +\sin \theta_{ij} \sin \theta_{ig} \cos\phi_{ig}} \\[3pt] &= \int\nolimits_0^{2 \pi} d \phi_{ig} \frac{1} {a - b \cos\phi_{ig}} \\[3pt] &= \oint_{{\rm unit}\;{\rm circle}} d z \; \frac{1} {iz} \frac{1} {a - b \frac{z+1/z} {2}} \\[3pt] &= \frac{2} {i} \oint d z \; \frac{1} {2az - b - b z^2} \\[3pt] &= \frac{2 i} {b} \oint \; \frac{dz} {(z-z_-)(z-z_+)} \quad\quad \hbox{with} \; z_\pm = \frac{a} {b} \pm \sqrt{\frac{a^2}{b^2} - 1} \\[3pt] &= \frac{2 i} {b} \; 2 \pi i \frac{1} {z_--z_+} \quad\quad\quad z_- \; \hbox{inside contour} \\[3pt] &= \frac{2 \pi} {\sqrt{a^2-b^2}} \\ &= \frac{2 \pi} {\sqrt{ (\cos \theta_{ig} - \cos \theta_{ij} )^2 }} = \frac{2 \pi} {|\cos \theta_{ig} - \cos \theta_{ij}|}. \end{aligned} $$
(2.187)

The crucial coordinate transformation is \(z = \exp (i \phi_{ig})\) and \(\cos \phi_{ig} = (z + 1/z)/2\) in the third line. For the entire integral in Eq. 2.184 this gives us

$$ \begin{aligned} \int \nolimits_0^{2 \pi} d \phi_{ig} \; W^{[i]}_{ij} &= \frac{1} {1-\cos \theta_{ig}} \left[ \pi + \frac{\cos \theta_{ig} - \cos \theta_{ij}} {2} \; \frac{2 \pi} {|\cos \theta_{ig} - \cos \theta_{ij}|} \right] \\ &= \frac{\pi} {1-\cos \theta_{ig}} \left[ 1 + \hbox{sign}(\cos \theta_{ig} - \cos \theta_{ij}) \right] \\ &= \left\{\begin{array}{ll}\dfrac{\pi} {1-\cos \theta_{ig}} &\quad \hbox{if} \quad \theta_{ig} < \theta_{ij} \\ 0 &\quad \hbox{else.}\end{array}\right. \end{aligned} $$
(2.188)

The soft gluon is only radiated at angles between zero and the opening angle of the initial parton i and its hard interference partner or spectator j. The same integral over \(W_{ij}^{[j]}\) gives the same result, with switched roles of i and j. After combining the two permutations the probability of radiating a soft gluon at an angle larger than the earlier branching is zero; successive emission of soft gluons obeys an angular ordering .

Going back to collinear radiation, we know from Eqs. 2.83 and 2.152 that the different variables describing parton splitting, namely the angle \(\theta,\) the transverse momentum \(p_T,\) or the virtuality \(\sqrt{t},\) are equivalent as long as we are only interested in the leading logarithm arising from the collinear phase space integration. They can be transformed into each other by a simple linear transformation.

Different implementations of the parton shower order their jet radiation by different variables: historically, of the two most widely used event generators PYTHIA uses the virtuality or the transverse momentum while HERWIG uses the angle. Once we re-sum the collinear logarithms the finite shifts which appear in the transformation between the ordering parameters are also re-summed, so the underlying variable in the parton shower does matter in the comparison to data. This is even more true for the matching of the parton shower and the matrix element which we will discuss in the next section.

In this section we have learned an argument for using the angle to order parton radiation: soft radiation is ordered in the angle \(\theta\), and a parton shower would be well advised to match this behavior. On the other hand, in Sect. 2.3.5 we have learned that the physical interpretation of the collinear logarithms re-summed by the parton shower is based on a transverse momentum ordering. If theory does not define the golden way to order parton radiation we will have to rely on Tevatron or LHC measurements to tell us which implementation works best. This includes the new SHERPA parton shower which is not based on the collinear splitting kernels but on the soft-collinear QCD dipoles introduced in this section.

2.5.3 CKKW and MLM Schemes

The main problem with QCD at the LHC is the range of energy scales of the jets we encounter. Collinear jets with their small transverse momenta are well described by a parton shower. From Sect. 2.3.5 we know that strictly speaking the parton shower only fills the phase space region up to a maximum transverse momentum \(p_T < \mu_F.\) In contrast, hard jets with large transverse momentum are described by matrix elements which we compute using the QCD Feynman rules. They fill the non-collinear part of phase space which is not covered by the parton shower. Because of the collinear logarithmic enhancement we discussed in Sect. 2.3.5 we expect many more collinear and soft jets than hard jets at the LHC.

The natural question then becomes: what is the range of ‘soft’ or ‘collinear’ and what is ‘hard’? Applying a consistency condition we can define collinear jet radiation by the validity of the collinear approximation in Eq. 2.84. The maximum \(p_T\) of a collinear jet is the upper end of the region for which the jet radiation cross section behaves like \(1/p_T\) or the point where the distribution \(p_T d\sigma/dp_T\) leaves its plateau. For harder and harder jets we will at some point become limited by the partonic energy available at the LHC, which means the \(p_T\) distribution of additional jets will start dropping faster than \(1/p_T\). Collinear logarithms will become numerically irrelevant and jets will be described by the regular matrix element squared without any re-summation.

Quarks and gluons produced in association with gauge bosons at the Tevatron behave like collinear jets for \(p_T \lesssim 20\,\hbox{GeV},\) because quarks at the Tevatron are limited in energy. At the LHC, jets produced in association with tops behave like collinear jets to \(p_T \sim 150\,\hbox{GeV},\) jets produced with 500 GeV gluinos behave like collinear jets to \(p_T\) scales larger than 300 GeV. This is not good news, because collinear jets means many jets, and many jets produce combinatorial backgrounds and ruin the missing momentum resolution of the detector: if we are looking for example for two jets to reconstruct an invariant mass you can simply plot all events as a function of this invariant mass and remove the backgrounds by requiring all event to sit around a peak in \(m_{jj}.\) If we have for example three jets in the event we have to decide which of the three jet-jet combinations should go into this distribution. If this is not possible we have to consider two of the three combinations as uncorrelated ‘background’ events. In other words, we make three histogram entries out of each signal or background event and consider all three background events plus two of the three signal combinations as background. This way the signal-to-background ratio decreases from \(N_S / N_B\) to \(N_S/(3N_B + 2N_S).\) A famous victim of such combinatorics was for a long time the Higgs discovery channel \(pp \to t\bar{t}H\) with \(H \to b\bar{b}.\)

For theorists this means that at the LHC we have to reliably model collinear and hard jets. For simplicity, in this section we will first limit our discussion to final state radiation, for example off the R-ratio process \(e^+ e^- \to q \bar{q}\) from Sect. 2.1.1. Combining collinear and hard jets in the final state has to proceed in two steps. The first of them has nothing to do with the parton shower: the problem we need to solve is that the parton shower by construction generates a definitive number of jets. We can categorize the generated events by counting the number of jets in the final state, i.e. the parton shower events are jet-exclusive. On the other hand, the total rate for the hard process we compute as \(e^+ e^- \to q \bar{q} +X\), with any number of collinear jets in the final state denoted by X. Predictions involving parton densities and the DGLAP equation are jet-inclusive. Any scheme combining the parton shower and hard matrix elements has to follow the path

  1. 1.

    Define jet-exclusive events from the hard matrix elements and the parton shower

  2. 2.

    Combine final states with different numbers of final state particles

  3. 3.

    Reproduce matrix element results in high-\(p_T\) and well separated phase space region

  4. 4.

    Reproduce parton shower results for collinear and soft radiation

  5. 5.

    Interpolate smoothly and avoid double counting of events.

For specific processes at the Tevatron the third and fourth point on this list have actually been tackled by so-called matrix element corrections in the parton shower Monte Carlos PYTHIA and HERWIG .

For example the final state of the process \(e^+ e^- \to q\bar{q} +X\) often involves more than two jets due to final state splitting. Even for the first step of defining jet-exclusive predictions from the matrix element we have to briefly consider the geometry of different jets. To separate jet-inclusive event samples into jet-exclusive event samples we have to define some kind of jet separation parameter. As a start, we radiate a gluon off one of the quark legs, which gives us a \(q \bar{q} g\) final state. This additional gluon can either be collinear with and hence geometrically close to one of the quarks or not. Jet algorithms which decide if we count such a splitting as one or two jets we describe in detail in Sect. 3.1.1. They are based on a choice of collinearity measure \(y_{ij}\) which we can for example construct as a function of the distance in \(R\) space, introduced in Eq. 2.35, and the transverse momenta. We define two jets as collinear and hence as one jet if \(y_{ij} < y_{{\rm resol}}\) where \(y_{{\rm resol}}\) we give to the algorithm. As a result, the number of jets in an event will of course depend on this resolution parameter \(y_{{\rm resol}}.\)

For the second step of combining hard and collinear jet simulation the same resolution parameter appears in a form where it becomes a collinear vs hard matching parameter \(y_{{\rm match}},\) i.e. it allows us to clearly assign each hadron collider event a number of collinear jets and a number of hard jets. Such an event with its given number of more or less hard jets we can then describe either using matrix elements or using a parton shower, where ‘describe’ means computing the relative probability of different phase space configurations. The parton shower will do well for jets with \(y_{ij} < y_{{\rm match}}.\) In contrast, if for our closest jets we find \(y_{ij} > y_{{\rm match}},\) we know that collinear logarithms did not play a major role, so we should use the hard matrix element. If we assign the hard process a typical energy or virtuality scale \(t_{{\rm hard}}\) we can translate the matching parameter \(y_ {{\rm match}}\) into a virtuality scale \(t_{{\rm match}} = y_ {{\rm match}}^2 t_{{\rm hard}}\), below which we do not trust the hard matrix element. For example for the Drell–Yan process the hard scale would be something like the Z mass.

The CKKW jet combination scheme first tackles the problem of defining jet-exclusive final states. While an exclusive rate requires a process to have exactly a given number of jets, an inclusive rate is defined as the number of events in which we for example identify n jets and ignore everything else appearing in the event. For example, additional collinear jets which we usually denote as ‘ + X’ will be included.

The main ingredient to translating one into the other are non-splitting probabilities, i.e. Sudakov factors . They can transform inclusive n-particle rates into exact n-particle rates, with no additional final state jet outside a given resolution scale. Analytically we can compute integrated splitting probabilities \(\Gamma_j(t_{{\rm hard}},t)\) which for quarks and gluons are implicitly defined through the Sudakov factors which we introduce in Eq. 2.164

$$ \begin{aligned} [b] \Delta_q(t_{{\rm hard}},t_{{\rm match}}) =& \exp \left( - \int \nolimits_{t_{{\rm match}}}^{t_{{\rm hard}}} \frac{dt} {t} \int \nolimits_0^1 dy \frac{\alpha_s} {2 \pi} \hat{P}_{q \leftarrow q}(y) \right) \\ \equiv& \exp \left( - \int \nolimits_{t_{{\rm match}}}^{t_{{\rm hard}}} d t \; \Gamma_q(t_{{\rm hard}},t) \right) \\ \Delta_g(t_{{\rm hard}},t_{{\rm match}}) \equiv& \exp \left( - \int \nolimits_{t_{{\rm match}}}^{t_{{\rm hard}}} d t \; \left[ \Gamma_g(t_{{\rm hard}},t) +\Gamma_f(t) \right] \right). \end{aligned} $$
(2.189)

For final state radiation t corresponds to the original \(\sqrt{p_a^2}\) and, moving forward in time, is ordered according to \(t_{{\rm hard}} > t > t_{{\rm match}}.\) The resolution of individual jets we identify with the matrix element-shower matching scale \(t_{{\rm match}}.\) The y integration in the original definition we can carry out in the leading logarithm approximation, giving us

$$ \begin{aligned} [b] \Gamma_q(t_{{\rm hard}},t) &\equiv \Gamma_{q \leftarrow q}(t_{{\rm hard}},t) = \frac{C_F} {\pi} \; \frac{\alpha_s(t)} {t} \left(\frac{1} {2} \log \frac{t_{{\rm hard}}} {t} -\frac{3}{4} \right) \\ \Gamma_g(t_{{\rm hard}},t) &\equiv \Gamma_{g \leftarrow g}(t_{{\rm hard}},t) = \frac{C_A} {\pi} \; \frac{\alpha_s(t)} {t} \left(\frac{1}{2} \log \frac{t_{{\rm hard}}}{t} -\frac{11}{12} \right) \\ \Gamma_f(t) &\equiv \Gamma_{q \leftarrow g}(t) = \frac{n_f} {6\pi} \; \frac{\alpha_s(t)} {t}. \end{aligned} $$
(2.190)

The virtualities \(t_{{\rm hard}} > t\) correspond to the incoming (mother) and outgoing (daughter) parton. Unfortunately, this formula is somewhat understandable from a probabilistic picture of parton splitting, but not quite. Terms arising from next-to-leading logarithms spoil the limit \(t_{{\rm match}} \to t_{{\rm hard}},\) where the probability of no splitting should approach unity. Technically, we can deal with the finite terms in the Sudakov factors by requiring them to be positive semi-definite, i.e. by replacing \(\Gamma(t_{{\rm hard}},t_{{\rm match}}) < 0\) by zero. For the general argument this problem with the analytic expressions for the splitting functions is irrelevant; to avoid unnecessary approximations in the y integration more recent CKKW implementations integrate the splitting kernels numerically.

To get a first idea how to transform inclusive into exact n-jet rates we compute the probability to see exactly two jets in the process \(e^{+} e^{-} \to q\bar{q}.\) Looking at Fig. 2.3 this means that none of the two quarks in the final state radiate a resolved gluon between the virtualities \(t_{{\rm hard}}\) (given by the qqZ vertex) and \(t_{{\rm match}} < t_{{\rm hard}}.\) As will become important later we specify that this no-radiation statement assumes a jet resolution as given by end point of the external quark and gluon legs. The probability we have to multiply the inclusive two-jet rate with is then \(\left[ \Delta_q(t_{{\rm hard}},t_{{\rm match}}) \right]^{2}\!,\) once for each quark. Whatever happens at virtualities below \(t_{{\rm match}}\) will be governed by the parton shower and does not matter anymore. Technically, this requires us to define a vetoed parton shower which we will describe in Sect. 2.6.3.

Fig. 2.3
figure 3

Vetoed showers on two-jet and three-jet contributions. The scale at the gauge boson vertex is \(t_{{\rm hard}}\). The two-jet (three-jet) diagram implies exactly two (three) jets at the resolution scale \(t_{{\rm match}}\), below which we rely on the parton shower. Figure from Ref. [2]

Next, what is the probability that the initially two-jet final state evolves exactly into three jets, again following Fig. 2.3? We know that it contains a factor \(\Delta_q(t_ {{\rm hard}},t_{{\rm match}})\) for one untouched quark.

After splitting at \(t_q\) with the probability \(\Gamma_q(t_q,t_ {{\rm hard}})\) the second quark survives to \(t_{{\rm match}},\) giving us a factor \(\Delta_q(t_q,t_{{\rm match}}).\) If we assign the virtuality \(t_g\) to the radiated gluon at the splitting point we find the gluon’s survival probability \(\Delta_g(t_g,t_{{\rm match}}).\) So together we find

$$ \Delta_q(t_{{\rm hard}},t_{{\rm match}}) \; \Gamma_q(t_ {{\rm hard}},t_q) \; \Delta_q(t_q,t_{{\rm match}}) \; \Delta_g(t_g,t_{{\rm match}}) \cdots $$
(2.191)

That is all there is, with the exception of the intermediate quark. If we label the quark virtuality at which the second quark radiates a gluon by \(t_q\) there has to appear another factor describing that the quark, starting from \(t_{{\rm hard}},\) gets to \(t_q\) untouched. Naively we would guess that this probability is given by \(\Delta_q(t_{{\rm hard}},t_q).\) However, this implies no splittings resolved at the fixed lower scale \(t_q,\) but what we really mean is no splitting between \(t_{{\rm hard}}\) and \(t_q\) resolved at a third scale \(t_{{\rm match}} < t_q\) given by the quark leg hitting the parton shower regime. We therefore better compute the probability of no splitting between \(t_{{\rm hard}}\) and \(t_q,\) namely \(\Delta_q(t_{{\rm hard}},t_{{\rm match}}),\) but under the condition that splittings from \(t_q\) down to \(t_{{\rm match}}\) are explicitly allowed.

If zero splittings gives us a probability factor \(\Delta_q(t_{{\rm hard}},t_{{\rm match}}),\) to describe exactly one splitting from \(t_q\) on we add a factor \(\Gamma(t_q,t)\) with an unknown splitting point t. This point t we integrate over between the resolution point \(t_{{\rm match}}\) and the original splitting point \(t_q.\) This is the same argument as in our physical interpretation of the Sudakov factors solving the DGLAP equation Eq. 2.172. For an arbitrary number of possible splittings between \(t_q\) and \(t_{{\rm match}}\) we find the sum

$$ \begin{aligned} \Delta_q(t_{{\rm hard}},t_{{\rm match}}) &\left[ 1 + \int \nolimits_{t_{{\rm match}}}^{t_q} d t \; \Gamma_q(t_q,t) + \hbox{more splittings} \right] \\ \quad &= \Delta_q(t_{{\rm hard}},t_{{\rm match}}) \; \exp \left[\,\, \int \nolimits_{t_{{\rm match}}}^{t_q} d t \; \Gamma_q(t_q,t) \right] = \frac{\Delta_q(t_{{\rm hard}},t_{{\rm match}})} {\Delta_q(t_q,t_{{\rm match}})}.\\[5pt] \end{aligned} $$
(2.192)

The factors \(1/n!\) in the Taylor series appear because for example radiating two jets in the same t interval can proceed ordered in two ways, both of which lead to the same final state. Once again: the probability of nothing happening between \(t_{{\rm hard}}\) and \(t_q\) we compute from the probability of nothing happening between \(t_{{\rm hard}}\) and \(t_{{\rm match}}\) times any number of possible splittings between \(t_q\) and \(t_{{\rm match}}.\)

Collecting all factors from Eqs. 2.191 and 2.192 gives us the probability to find exactly three partons resolved at \(t_{{\rm match}}\) as part of the inclusive sample

$$ \begin{aligned} \Delta_q(t_{{\rm hard}},t_{{\rm match}}) \; \Gamma_q(t_ {{\rm hard}},t_q) \; &\Delta_q(t_q,t_{{\rm match}}) \; \Delta_g(t_g,t_{{\rm match}}) \; \frac{\Delta_q(t_{{\rm hard}},t_ {{\rm match}})} {\Delta_q(t_q,t_{{\rm match}})} \\ &= \Gamma_q(t_{{\rm hard}},t_q) \; [\Delta_q(t_{{\rm hard}},t_ {{\rm match}})]^2 \; \Delta_g(t_g,t_{{\rm match}}).\\[5pt] \end{aligned} $$
(2.193)

This result is what we expect: both quarks go through untouched, just like in the two-parton case. In addition, we need exactly one splitting producing a gluon, and this gluon cannot split further. This example illustrates how we can compute these probabilities using Sudakov factors: adding a gluon corresponds to adding a splitting probability times the survival probability for this gluon, everything else magically drops out. At the end, we only integrate over the splitting point \(t_q.\)

This discussion allows us to write down the first step of the CKKW algorithm, combining different hard n-jet channels into one consistent set of events. One by one we turn inclusive n-jet events into exact n-jet events. We can write down the slightly simplified algorithm for final state radiation. As a starting point, we compute all leading-order cross sections for n-jet production with a lower jet radiation cutoff at \(t_{{\rm match}}.\) This cutoff ensures that all jets are hard and that all corresponding cross sections \(\sigma_{n,i}\) are finite. The second index \(i\) describes different non-interfering parton configurations for a given number of final state jets, like \(q\bar{q} gg\) and \(q\bar{q} q\bar{q}\) for \(n\,=\,4.\) The purpose of the algorithm is to assign a weight (probability, matrix element squared,...) to a given phase space point, statistically picking the correct process and combining them properly. It proceeds event by event:

  1. 1.

    For each jet final state \((n,i)\) compute the relative probability \(P_{n,i} = \sigma_{n,i}/ \sum_{k,j} \sigma_{k,j}\)

  2. 2.

    Select a final state \((n,i)\) with its probability \(P_{n,i}\)

  3. 3.

    Assign the momenta from the phase space generator to, assumed, hard external particles

  4. 4.

    Compute the transition matrix element \({\left|{\fancyscript{M}} \right|^{2}}\) including parton shower below \(t_{{\rm match}}\)

  5. 5.

    Use a jet algorithm to compute the shower history, i.e. all splitting virtualities \(t_j\) in each event

  6. 6.

    Check that this history corresponds to possible Feynman diagrams and does not violate any symmetries

  7. 7.

    For each internal and external line compute the Sudakov non-splitting probability down to \(t_{{\rm match}}\)

  8. 8.

    Re-weight the \(\alpha_s\) values of each splitting using the \(k_T\) scale from the shower history

  9. 9.

    Combine matrix element, Sudakovs, and \(\alpha_s\) into a final weight

This final event weight we can use to compute distributions from weighted events or to decide if to keep or discard an event when producing unweighted events. The construction ensures that the relative weight of the different n-jet rates is identical to the probabilities we initially computed. In step 4 the CKKW event generation first chooses the appropriate hard scale in the event; in step 5 we compute the individual starting scale for the parton shower applied to each of the legs. Following our example, this might be \(t_{{\rm hard}}\) for partons leaving the hard process itself or \(t_g\) for a parton appearing via later splitting.

In a second step of the CKKW scheme we match this combined hard matrix element with the parton shower, given the matching point \(t_{{\rm match}}\). From the final experimental resolution scale \(t_{{\rm resol}}\) up to a matching scale \(t_{{\rm match}}\) we rely on the parton shower to describe jet radiation while above the matching scale jet radiation is explicitly forbidden by the Sudakov non-splitting probabilities. Individually, both regimes consistently combine different n-jet processes. All we need to make sure is that there is no double counting.

From the discussion of Eq. 2.192 we know that Sudakovs describing the evolution between two scales and using a third scale as the resolution are going to be the problem. Carefully distinguishing the scale of the actual splitting from the scale of jet resolution is the key. The CKKW scheme starts each parton shower at the point where the parton first appears, and it turns out that we can use this argument to keep the resolution regimes \(y > y_{{\rm match}}\) and \(y < y_{{\rm match}}\) separate. There is a simple way to check this, namely if the \({y_{{\rm match}}\ {\it dependence}}\) drops out of the final combined probabilities. The answer for final state radiation is yes, as proven in the original paper, including a hypothetical next-to-leading logarithm parton shower. The CKKW scheme is implemented in the publicly available SHERPA and MadEvent event generators.

An alternative to the CKKW scheme which has been developed independently but incorporates essentially the same physics is the MLM scheme, for example implemented in ALPGEN or MadEvent . Its main difference to the CKKW scheme is that it avoids computing the survival properties using Sudakov form factors. Instead, it vetos those events which CKKW removes by applying the Sudakov non-splitting probabilities. This way MLM avoids problems with splitting probabilities beyond the leading logarithms, for example the finite terms appearing in Eq. 2.190, which can otherwise lead to a mismatch between the actual shower evolution and the analytic expressions of the Sudakov factors. In addition, the veto approach allows the MLM scheme to combine a set of independently generated n-parton events, which can be convenient.

In the MLM scheme we veto events which are simulated the wrong way after the hard matrix element and the parton shower have defined a set of complete events. This avoids double counting of events which on the one hand are generated with n hard jets from the matrix element and on the other hand appear for example as \((n-1)\) hard jets with an additional jet from the parton shower. After applying a jet algorithm (which in the case of ALPGEN is a cone algorithm and in case of MadEvent is a \(k_T\) algorithm) we compare the showered event with the un-showered hard event by identifying each reconstructed showered jet with the partons we started from. If all jet–parton combinations match and there exist no additional resolved jets we know that the showering has not altered the hard structure of the event. This corresponds to adding the Sudakov non-splitting probabilities in the CKKW scheme. If there is a significant change between the original hard parton event and the showered event this event has to go. The only exception to this rule is the set of events with the highest jet multiplicity for which additional jets can only come from the parton shower. After defining the proper exclusive n-jet event sets we can again use the parton shower to describe more collinear jet radiation between \(t_{{\rm match}}\) and \(t_{{\rm resol}}.\)

After combining the samples we still need a backwards evolution of a generated event to know the virtuality scales which fix \(\alpha_s(Q^2).\) As a side effect, if we also know the Feynman diagrams describing an event we can check that a certain splitting with its color structure is actually possible. For the parton shower or splitting simulation we need to know the interval of virtualities over which for example the additional gluon in the previous two-jet example can split. The lower end of this interval is universally given by \(t_{{\rm match}},\) but the upper end we cannot extract from the record event by event. Therefore, to compute the \(\alpha_s\) values at each splitting point we start the parton shower at an universal hard scale, chosen as the hard(est) scale of the process.

Aside from such technical details all merging schemes are conceptually similar enough that we should expect them to reproduce each others’ results, and they largely do. But the devil is in the details, so experiment will tell which scheme as part of which event generator produces the most usable results to understand LHC data.

To summarize, we can use the CKKW and MLM schemes to first combine n-jet events with variable n and then consistently add the parton shower. In other words, we can for example simulate \(Z+n\) jets production at the LHC to arbitrarily large numbers of jets, limited only by computational resources and the physical argument that at some point any additional jet radiation will be described by the parton shower. This combination will describe all jets correctly over the entire collinear and hard phase space. In Fig. 2.4 we show the number of jets expected to be produced in association with a pair of top quarks and a pair of heavy new states at the LHC. The details of these heavy scalar gluons are secondary for the basic features of these distributions, the only parameter which matters is their mass, i.e. the hard scale of the process which sets the factorization scale and defines the upper limit of collinearly enhanced initial-state radiation. We see that heavy states come with many jets radiated at \(p_T \lesssim 30\,\hbox{GeV},\) where most of these jets vanish once we require transverse momenta of at least 100 GeV. This figure tells us that an analysis which asks for a reconstruction of two W-decay jets may well be swamped by combinatorics .

Fig. 2.4
figure 4

Number of additional jets with a transverse momentum of at least 30, 50 or 100 GeV radiated off top pair production and the production of heavy states at the LHC. An example for such heavy states are scalar gluons with a mass of 300 or 600 GeV, pair-produced in gluon fusion. Figures from Ref. [3]

Looking at the individual columns in Fig. 2.4 there is one thing we have to keep in mind: each of the merged matrix elements combined into this sample is computed at leading order. The emission of real particles is included, virtual corrections are not. In other words, the CKKW and MLM schemes give us all jet distributions, but only to leading order in the strong coupling. When we combine the different jet multiplicities to evaluate total rates, jet merging improves the rate prediction because it includes contributions from all orders in \(\alpha_s,\) provided they come with a potentially large logarithm from jet emission. From all we know, these leading logarithms dominate the higher order QCD corrections for most LHC processes, but it is not obvious how general this feature is and how we can quantify it. This is certainly true for all cases where higher order effects appear unexpectedly large and can be traced back to new partonic processes or phase space configurations opening up at higher jet multiplicities. Systematically curing some of this shortcoming (but at a prize) will be the topic of the next section.

Before moving on to an alternative scheme we will illustrate why Higgs or exotics searches at the LHC really care about progress in QCD simulations: one way to look for heavy particles decaying into jets, leptons and missing energy is the variable

$$ \begin{aligned}[b] m_{{\rm eff}} &={/\!\!\!E} _{T} + \sum_j E_{T,j} + \sum_\ell E_{T,\ell} \\ &= {/\!\!\!p} _T + \sum_j p_{T,j} + \sum_\ell p_{T,\ell} \qquad \hbox{(for massless quarks, leptons)} \end{aligned} $$
(2.194)

This variable and its relatives we will discuss in detail in Sect. 3.3.2. For gluon-induced QCD processes the effective mass should be small while the new physics signal’s effective mass scale will be determined by the heavy masses.

For QCD jets as well as for W and Z plus jets backgrounds we can study the production of many jets using the CKKW scheme. Figure 2.5 shows the two critical distributions. First, in the number of hard jets we see the so-called staircase scaling behavior, namely constant ratios of exclusive \((n + 1)\)-jet and n-jet rates \(\sigma_{n+1}/\sigma_n.\) Such a scaling is closely related to the pattern we discuss in Eq. 2.133, in the context of the central jet veto of Sect. 1.5.2. The particularly interesting aspect of staircase scaling is that the constant ratio is the same for jet-inclusive and jet-exclusive cross sections \(P_{{\rm incl}} = P_{{\rm excl}},\) as shown in Eq. 2.134.

Fig. 2.5
figure 5

Exclusive number of jets and effective mass distributions for pure QCD jet events at the LHC with a center-of-mass energy of 7 TeV and \(p_{T,j} > 50\) GeV. The curves including the \(\alpha_s\) uncertainty and a scale variation (tuning parameter) are computed with SHERPA and a fully merged sample including up to six hard jets. These distributions describe typical backgrounds for searches for jets plus missing energy with fake missing energy, which could originate from supersymmetric squark and gluino production . Figures from Ref. [4]

The consistent variation of \(\alpha_s\) gives a small parametric uncertainty on these rates. A common scaling factor \(\mu/\mu_0\) for all factorization, renormalization and shower scales in the process following our argument of Sect. 2.4 is strictly speaking not fixed by our physical interpretation in terms of resummation; such a factor as part of the leading logarithm can be factored out as a subleading finite term, so it should really be considered a tuning parameters for each simulation tool. Using the same simulation we also show the effective mass and observe a drop towards large values of \(m_{{\rm eff}}.\) However, this drop is nowhere as pronounced as in parton shower predictions. This analysis shows that the naive parton shower is not a good description of QCD background processes to the production of heavy particles. Using a very pragmatic approach and tune the parton shower to correctly describe LHC data even in this parameter region will most likely violate basic concepts like factorization, so we would be well advised to use merging schemes like CKKW or MLM for such predictions.

2.6 Next-to-Leading Orders and Parton Shower

As we know for example for the R ratio from Sect. 2.1.1 the precision of a leading order QCD calculation in terms of the strong coupling constant \(\alpha_s\) is not always sufficient to match the experimental accuracy. In such a case we need to compute observables to higher order in QCD. On the other hand, in Sect. 2.3.5 we have seen that the parton shower does not respect a fixed order perturbation theory. With its collinear logarithm it sums particular terms to all orders in \(\alpha_s.\) So how can we on the one hand compute higher order corrections to for example the Drell–Yan cross section and distributions and in addition consistently combine them with the parton shower?

Such a combination would remove one of the historic shortcomings of parton shower Monte Carlos. Apart from the collinear approximation for jet radiation they were always limited by the fact that in the words of one of the authors they ‘only do shapes’. In other words, the normalization of the simulated event sample will always be leading order in perturbative QCD and hence subject to large theoretical uncertainties. The reason for this shortcoming is that collinear jet radiation relies on a hard process and the corresponding production cross section and works with splitting probabilities, but never touches the total cross section it started from.

As a solution we compute higher order cross sections to normalize the total cross section entering the respective Monte Carlo simulation. This is what we call a K factor : \(K = \sigma^{{\rm improved}}/\sigma^{{\rm MC}} = \sigma^{{\rm improved}}/\sigma^{{\rm LO}}.\) It is crucial to remember that higher order cross sections integrate over unobserved additional jets in the final state. So when we normalize the Monte Carlo we assume that we can first integrate over additional jets and obtain \(\sigma^{{\rm improved}}\) and then just normalize the Monte Carlo which puts back these jets in the collinear approximation. Obviously, we should try to do better than that, and there are two ways to improve this traditional Monte Carlo approach, the MC@NLO scheme and the POWHEG scheme.

2.6.1 Next-to-Leading Order in QCD

When we compute the next-to-leading order correction to a cross section, for example to Drell–Yan production, we consider all contributions of the order \({G}_{\!F} \alpha_s.\) There are three obvious sets of Feynman diagrams we have to square and multiply, namely the Born contribution \(q \bar{q} \to Z,\) the virtual gluon exchange for example between the incoming quarks, and the real gluon emission \(q \bar{q} \to Zg.\) An additional set of diagrams we should not forget are the crossed channels \(q g \to Zq\) and \(\bar{q} g \to Z \bar{q}.\) Only amplitudes with the same external particles can be squared, so we find the matrix-element-squared contributions

$$ \begin{aligned}[b] |{\fancyscript{M}}_B|^2 &\propto G_F \\ 2 \hbox{Re} \; {\fancyscript{M}}_V^{\ast} {\fancyscript{M}}_B & \propto G_F \alpha_s \quad |{\fancyscript{M}}_{Zg}|^2 \propto G_F \alpha_s \quad |{\fancyscript{M}}_{Zq}|^2, |{\fancyscript{M}}_{Z\bar{q}}|^2 \propto G_F \alpha_s . \end{aligned} $$
(2.195)

Strictly speaking, we have to include counter terms, which following Eq. 2.54 are a modification of \(|\cal{M}_{\hat{B}}|^2.\) These counter terms we add to the interference of Born and virtual gluon diagrams to remove the ultraviolet divergences. However, this is not the issue we want to discuss.

Infrared poles arise from two sources, soft and collinear divergences. To avoid the complication of overlapping collinear and soft divergences we will follow a toy model by Bryan Webber. It describes simplified particle radiation off a hard process: the energy of the system before radiation is \(x_s\) and the energy of the outgoing particle (call it photon or gluon) is x, so \(x<x_s<1.\) When we compute next-to-leading order corrections to a hard process, the different contributions, neglecting crossed channels, are

$$ \frac{d \sigma}{dx} \Big|_B = B \delta(x)\quad \quad\frac{d \sigma} {dx} \Big|_V = \alpha_s \left( -\frac{B} {2\varepsilon} + V \right) \delta(x)\quad \quad\frac{d \sigma} {dx} \Big|_R = \alpha_s \frac{R(x)} {x}. $$
(2.196)

The constant B describes the Born process and the factorizing poles in the virtual contribution. The coupling constant \(\alpha_s\) ignores factors 2 and \(\pi\) or color factors. We immediately see that the integral over x in the real emission rate is logarithmically divergent in the soft limit, similar to the collinear divergences we know. From factorization which we have illustrated based on the universality of the leading splitting kernels we know that in the collinear and soft limits the real emission has to follow the Born matrix element

$$ \fbox{$\displaystyle\lim_{x\to 0} R(x) = B$}\,. $$
(2.197)

An observable computed beyond leading order includes contributions from real gluon emission and virtual gluon exchange. If the observable is infrared safe it will have a smooth limit towards vanishing gluon energy \(O(x) \to O(0).\) The virtual corrections alone diverge, but the expectation value including virtual and real gluon contributions after dimensional regularization is finite. Because we are interested in infrared divergences we choose \(n = 4 + 2 \varepsilon\) dimensions with \(\varepsilon > 0,\) just like in Sect. 2.3.3, and schematically obtain the two divergent contributions

$$ \langle O \rangle \sim \int \nolimits_0^1 dx \; \frac{O(x)} {x^{1-2\varepsilon}} - \frac{O(0)} {2 \varepsilon}. $$
(2.198)

This kind of combination has a finite limit for \(\varepsilon \to 0.\) However, for numerical applications and event simulation we need to implement this cancellation differently.

The expectation value of any infrared safe observable over the entire phase space, including Born terms, virtual corrections and real emission, is given by

$$ \begin{aligned} \langle O \rangle\equiv\langle O \rangle_B + \langle O \rangle_V + \langle O \rangle_R = \mu_F^{-2\varepsilon}\int \nolimits_0^1 \; dx \; \frac{O(x)} {x^{-2\varepsilon}} \left[ \frac{d \sigma} {dx} \Big|_B +\frac{d \sigma} {dx} \Big|_V +\frac{d \sigma} {dx} \Big|_R \right]\!.\\[2mm] \end{aligned} $$
(2.199)

The same way the renormalization and factorization scales appear, dimensional regularization now yields an additional factor \(1/x^{-2\varepsilon}.\) Because we know its structure, we will omit this factorization scale factor in the following.

When we compute for example a distribution of the energy of one of the heavy particles in the process, we can extract a histogram from of the integral for \(\langle O \rangle\) in Eq. 2.199 and obtain a normalized distribution. The problem is that we have to numerically integrate over x, and the individual parts of the integrand in Eq. 2.199 are not integrable.

There exist two methods to combine the virtual and real contributions to an observable and produce a finite physics result. The first way historically introduced by the Dutch loop school for example to compute QCD corrections to top pair production is the numerically highly efficient phase space slicing : we divide the divergent phase space integral into a finite part and a pole, by introducing a small parameter \(\Delta,\) which acts like

$$ \begin{aligned} \langle O \rangle_R + \langle O \rangle_V =& \int \nolimits_0^1 \; dx \; \frac{O(x)} {x^{-2\varepsilon}} \; \frac{d \sigma} {dx} \Big|_R + \langle O \rangle_V \\ =& \left( \int \nolimits_0^\Delta + \int \nolimits_\Delta^1 \right) \; dx \; \alpha_s \frac{R(x) O(x)} {x^{1-2\varepsilon}} + \langle O \rangle_V \\ =& \,\alpha_s R(0) \; O(0) \int \nolimits_0^\Delta dx \; \frac{1} {x^{1-2\varepsilon}} + \alpha_s \int \nolimits_\Delta^1 dx \; \frac{R(x) O(x)} {x} + \langle O \rangle_V \\ =& \,\alpha_s B \; O(0) \; \frac{\Delta^{2\varepsilon}} {2\varepsilon} + \alpha_s \int \nolimits_\Delta^1 dx \; \frac{R(x) O(x)} {x} + \langle O \rangle_V \\ =& \,\alpha_s \frac{B\; O(0)} {2} \; \frac{2 \varepsilon \log \Delta + {\fancyscript{O}}(\varepsilon^2)} {\varepsilon} + \alpha_s \int \nolimits_\Delta^1 dx \; \frac{R(x) O(x)} {x} + \alpha_s V O(0) \\ =& \,\alpha_s B O(0) \; \log \Delta + \alpha_s \int \nolimits_\Delta^1 dx \; \frac{R(x) O(x)} {x} + \alpha_s V O(0) + {\fancyscript{O}}(\varepsilon). \end{aligned} $$
(2.200)

The two sources of \(\log \Delta\) dependence have to cancel in the final expression, so we can evaluate the integral at finite but small values of \(\Delta\). An amusing numerical trick is to re-write the explicit \(\log \Delta\) contribution into a NLO-type phase space integral. If the eikonal approximation is given in terms of a Mandelstam variable \(\delta(s_4)\) and the cut-off has mass dimension two we can write

$$ \log \frac{\Delta} {\mu^2} = \int\nolimits_0^{s_4^{{\rm max}}} d s_4 \; \log \frac{\Delta} {\mu^2} \; \delta(s_4) = \int\nolimits_0^{s_4^{{\rm max}}} d s_4 \; \left[ \frac{\log \frac{s_4^{{\rm max}}}{\mu^2}}{s_4^{{\rm max}} - \Delta} -\frac{1} {s_4} \right] $$
(2.201)

and similarly for \(\log^2 \Delta.\) This representation we can integrate along with the real emission phase space. The result will be a finite value for the next-to-leading order rate in the limit \(\Delta \to 0\) and exactly \(\varepsilon = 0.\)

The fundamental shortcoming of phase space slicing is that it requires an analytical integration to produce sensible observable distributions. To avoid cancellations between integrals and replace them by cancellations among integrands we use a subtraction method to define integrable functions under the x integral in Eq. 2.199. While our toy model appears more similar to the Frixione–Kunszt–Signer subtraction scheme than to the Catani–Seymour scheme, both of them really are equivalent at the level of the soft-collinear toy model. Starting from the individually divergent virtual and real contributions we first subtract and then add again a smartly chosen term, in this toy model identical to a plus-subtraction following Eq. 2.125

$$ \begin{aligned} \langle O \rangle_R + \langle O \rangle_V =& \int \nolimits_0^1 \; dx \; \alpha_s \frac{R(x) O(x)} {x^{1-2\varepsilon}} + \langle O \rangle_V\\ =& \int \nolimits_0^1 dx \left( \frac{\alpha_s R(x)O(x)} {x^{1-2\varepsilon}} -\frac{\alpha_s R(0) O(0)} {x^{1-2\varepsilon}} \right) + \int \nolimits_0^1 dx \frac{\alpha_s B O(0)} {x^{1-2\varepsilon}} + \langle O \rangle_V \\ =& \,\alpha_s \int \nolimits_0^1 dx \; \frac{R(x)O(x)-BO(0)} {x} + \alpha_s \frac{B \; O(0)} {2\varepsilon} + \langle O \rangle_V \\ =& \,\alpha_s \int \nolimits_0^1 dx \; \frac{R(x)O(x)-BO(0)} {x} + \alpha_s V O(0) \quad \hbox{using Eq. \,2.196}. \end{aligned} $$
(2.202)

In the subtracted real emission integral we take the limit \(\varepsilon \to 0\) because the asymptotic behavior of \(R(x \to 0)\) regularizes this integral without any dimensional regularization required. The first integral precisely cancels the divergence from the virtual correction. We end up with a perfectly finite x integral for the sum of all three contributions, even in the limit \(\varepsilon = 0,\) i.e. there is no numerically small parameter in the expression

$$ \begin{aligned}[b] \langle O \rangle &= \langle O \rangle_B + \langle O \rangle_V + \langle O \rangle_R = B \; O(0) + \alpha_s V \; O(0) + \alpha_s \int \nolimits_0^1 \; dx \; \frac{R(x)\; O(x) -B \; O(0)} {x} \\ &= \int \nolimits_0^1 \; dx \; \left[ O(0) \; \left( B + \alpha_s V - \alpha_s \frac{B} {x} \right) + O(x) \; \alpha_s \; \frac{R(x)} {x} \right]. \end{aligned} $$
(2.203)

This subtraction procedure is a standard method to compute next-to-leading order corrections involving one-loop virtual contributions and the emission of one additional parton.

As a side remark, we can numerically improve this expression using a distribution relation

$$ \begin{aligned} \int \nolimits_0^1 dx \; \frac{f(x)} {x^{1-2 \varepsilon}} &= \int \nolimits_0^1 dx \; \frac{f(x)- \theta(x_c-x) f(0)} {x^{1-2 \varepsilon}} + f(0) \int \nolimits_0^{x_c} dx \; x^{-1+2 \varepsilon} \\ &= \int \nolimits_0^1 dx \; \frac{f(x)-\theta(x_c-x) f(0)} {x} \left( 1 + 2 \varepsilon \log x + {\fancyscript{O}}(\varepsilon^2) \right) + f(0) \; \frac{x_c^{2 \varepsilon}} {2 \varepsilon} \\ &= \int \nolimits_0^1 dx \; \Bigg( \frac{f(x)-\theta(x_c-x) f(0)} {x} \\ & + 2 \varepsilon \; \frac{f(x)-\theta(x_c-x) f(0)} {x} \log x + \frac{x_c^{2 \varepsilon}} {2 \varepsilon} \; f(x) \delta(x) \Bigg). \end{aligned} $$
(2.204)

In terms of appropriately defined distributions we can write this relation as

$$ \frac{1} {x^{1-2 \varepsilon}} = \frac{x_c^{2 \varepsilon}} {2 \varepsilon} \; \delta(x) + \left( \frac{1} {x} \right)_c + 2 \varepsilon \left( \frac{\log x} {x} \right)_c. $$
(2.205)

This c-subtraction first introduced as part of the Frixione–Kunszt–Signer subtraction scheme is defined as

$$ \int \nolimits_0^1 dx \; f(x) \; g(x)_c = \int \nolimits_0^1 dx \; \left( f(x) g(x) - f(0) g(x) \theta(x_c -x) \right)\!, $$
(2.206)

and is a generalization of the plus subtraction defined in Eq. 2.125 which we reproduce choosing \(x_c = 1.\) Linking the delta distribution to the divergent integral over \(1/x\) it is also reminiscent of the principal value integration, but for an endpoint singularity and a dimensionally regularized phase space. Effectively combining phase space subtraction Eq. 2.202 and phase space slicing Eq. 2.200, we include a cutoff in the integrals holding the subtraction terms

$$ \begin{aligned}[b] \langle O \rangle_R =\ & \alpha_s \int \nolimits_0^1 \; dx \; \frac{R(x) O(x)} {x^{1-2\varepsilon}}\\ =\ & \alpha_s \int \nolimits_0^1 dx \; \frac{R(x)O(x)-\theta(x_c-x) B O(0)} {x} \left( 1 + 2 \varepsilon \log x \right)\\&\quad\quad\quad + \alpha_s B O(0) \; \frac{x_c^{2\varepsilon}} {2\varepsilon} + {\fancyscript{O}}(\varepsilon^2).\\ \end{aligned} $$
(2.207)

The dependence on the finite cutoff parameter \(x_c\) drops out of the final result. The numerical behavior, however, should be improved if we subtract the infrared divergence only close to the actual pole where following Eq. 2.197 we understand the behavior of the real emission amplitude.

The formula Eq. 2.203 is, in fact, a little tricky: usually, the Born-type kinematics would come with an explicit factor \(\delta(x),\) which in this special case we can omit because of the integration boundaries. We can re-write the same formula in a more appropriate way to compute distributions, possibly including experimental cuts

$$ \begin{aligned}\fbox{$\dfrac{d \sigma}{d O} = \displaystyle\int \nolimits_0^1 \; dx \; \left[ I(O)_{{\rm LO}} \;\left( B + \alpha_s V - \alpha_s \dfrac{B} {x}\right)+ I(O)_{{\rm NLO}} \; \alpha_s \; \dfrac{R(x)} {x} \right]$}\,. \\[5pt] \end{aligned} $$
(2.208)

The transfer function \(I(O)\) is defined in a way that formally does precisely what we require: at leading order we evaluate \(I(O)\) using the Born kinematics \(x\,=\,0\) while for the real emission kinematics it allows for general \(x\,=\,0 \ldots 1.\)

2.6.2 MC@NLO Method

For example in Eq. 2.199 we integrate over the entire phase space of the additional parton. For a hard additional parton or jet the cross section looks well defined and finite, provided we fully combine real and virtual corrections. An infrared divergence appears after integrating over small but finite \(x \to 0\) from real emission, and we cancel it with an infrared divergence in the virtual corrections proportional to a Born-type momentum configuration \(\delta(x).\) In terms of a histogram in x we encounter the real emission divergence at small x, and this divergence is cancelled by a negative delta distribution at \(x=0.\) Obviously, this will only give a well behaved distribution after integrating over at least a range of x values just above zero.

This soft and collinear subtraction scheme for next-to-leading order calculations leads us to the first method of combining or matching next-to-leading order calculations with a parton shower. Instead of the contribution from the virtual corrections contributing at \(\delta(x)\) what we would rather want is a smeared virtual corrections pole which coincides with the justified collinear approximation and cancels the real emission over the entire low-x range. This contribution we can view as events with a negative weight, i.e. counter-events. Negative events negative reactions with experimentalists, because they cause problems in a chain of probabilistic statements like a detector simulation. Fundamentally, there is really no problem with them as long as any physical prediction we make after adding all leading order and next-to-leading order contributions gives a positive cross section.

Because we know they describe collinear jet radiation correctly such a modification will make use of Sudakov factors . We can write them as a function of the energy fraction z and find \(d {\fancyscript{P}} = \alpha_s P(z)/z dz.\) Note that we avoid the complicated proper two-dimensional description of Eq. 2.164 in favor of the simpler picture just in terms of particle energy fractions as introduced in the last section.

Once we integrate over the entire phase space this modified subtraction scheme has to give the same result as the next-to-leading order rate . Smearing the integrated soft-collinear subtraction term using the splitting probabilities entering the parton shower means that the MC@NLO subtraction scheme has to be adjusted to the parton shower we use.

Let us consider the perturbatively critical but otherwise perfectly fine observable, the radiated photon spectrum as a function of the (external) energy scale z. We know what this spectrum looks like for the collinear and hard kinematic configurations

$$ \frac{d \sigma} {d z} \Big|_{{\rm LO}} = \alpha_s \; \frac{B P(z)} {z} \quad\quad \frac{d \sigma} {d z} \Big|_ {{\rm NLO}} = \alpha_s \; \frac{R(z)}{z}. $$
(2.209)

The first term describes parton shower radiation from the Born diagram at order \(\alpha_s,\) while the second term is the hard real emission defined in Eq. 2.196. The transfer functions we would have to include in the general form of Eq. 2.208 to arrive at this equation are

$$ \begin{aligned} [b] I(z,1) \Big|_{{\rm LO}} &= \alpha_s \; \frac{P(z)} {z} \\ I(z,x_M) \Big|_{{\rm NLO}} &= \delta(z-x) + \alpha_s \; \frac{P(z)} {z} \; \theta(x_M(x)-z). \end{aligned} $$
(2.210)

The second term in the real radiation transfer function arises because at the next order in perturbation theory the parton shower also acts on the real emission process. It requires that enough energy to radiate a photon with an energy z be available, where \(x_M\) is the energy available at the respective stage of showering, i.e. \(z < x_M.\)

These transfer functions we can include in Eq. 2.208

$$ \begin{aligned} [b] \frac{d \sigma} {d z} &= \int \nolimits_0^1 dx \; \left[ I(z,1) \; \left( B + \alpha_s V - \alpha_s \frac{B} {x} \right) + I(z,x_M) \; \alpha_s \; \frac{R(x)} {x} \right] \\ &= \int \nolimits_0^1 dx \; \left[ \alpha_s \frac{P(z)} {z} \left( B + \alpha_s V - \alpha_s \frac{B} {x} \right) + \left( \delta(x-z) + {\fancyscript{O}}(\alpha_s) \right) \; \alpha_s \frac{R(x)} {x} \right] \\ &= \int \nolimits_0^1 dx \; \left[ \alpha_s \; \frac{B P(z)} {z} + \alpha_s \; \frac{R(z)} {z} \right] + {\fancyscript{O}}(\alpha_s^2) \\ &= \alpha_s \; \frac{B P(z) + R(z)} {z} + {\fancyscript{O}}(\alpha_s^2). \end{aligned} $$
(2.211)

All Born terms proportional to \(\delta(z)\) vanish because their contributions would be unphysical. This already fulfills the first requirement for our scheme, without having done anything except for including a transfer function. Now, we can integrate over z and calculate the total cross section \(\sigma_{{\rm tot}}\) with a cutoff \(z_{{\rm min}}\) for consistency. However, Eq. 2.211 includes an additional term which spoils the result: the same kind of jet radiation is included twice, once through the matrix element and once through the shower. This is precisely the double counting which we avoid in the CKKW scheme. So we are still missing something.

We also knew we would fall short, because our strategy includes a smeared virtual subtraction term which for finite x should cancel the real emission. This subtraction is not yet included. Factorization tells us how to write such a subtraction term using the splitting function P as defined in Eq. 2.209 to turn the real emission term into a finite contribution

$$ \frac{R(x)} {x} \quad \longrightarrow \quad \frac{R(x) - B P(x)} {x}. $$
(2.212)

Because this is an ad hoc subtraction term we also have to add it to the Born-type contribution. This leads us to a modified version of Eq. 2.208, now written for general observables

$$ \begin{aligned} \fbox{$\frac{d \sigma} {d O} = \int \nolimits_0^1 dx \left[ I(O,1) \left( B + \alpha_s V - \frac{\alpha_s B} {x} + \frac{\alpha_s B P(x)} {x} \right) + I(O,x_M) \alpha_s \frac{R(x)-B P(x)} {x} \right]$}\,.\\[2mm] \end{aligned} $$
(2.213)

Looking back at different methods of removing ultraviolet divergences this modification from the minimal soft and collinear subtraction in Eq. 2.208 to a physical subtraction term corresponding to the known radiation pattern reminds us of different renormalization schemes. The minimal \(\overline{\hbox{MS}}\) scheme will always guarantee finite results, but for example the on-shell scheme with its additional finite terms has at least to a certain degree beneficial properties when it comes to understanding its physical meaning. This is the same for the MC@NLO method: we replace the minimal subtraction terms by physically motivated non-minimal subtraction terms such that the radiation pattern of the additional parton is described correctly.

So when we use this form to compute the z spectrum to order \(\alpha_s\) it will in addition to Eq. 2.211 include an integrated subtraction term contributing to the Born-type kinematics

$$ \begin{aligned} \frac{d \sigma} {d z} &\longrightarrow \int \nolimits_0^1 \; dx \; \left[ \alpha_s \; \frac{B P(z)} {z} + \alpha_s \; \delta(x-z) \; \left( \frac{R(x)} {x} - \frac{B P(x)} {x} \right) \right] + {\fancyscript{O}}(\alpha_s^2) \notag\\ &= \int \nolimits_0^1 \; dx \; \alpha_s \; \frac{B P(z) + R(z) - B P(z)} {z} + {\fancyscript{O}}(\alpha_s^2) \notag\\ &= \alpha_s \; \frac{R(z)} {z} + {\fancyscript{O}}(\alpha_s^{2}). \end{aligned} $$
(2.214)

This is exactly the distribution we expect.

Following the above argument the subtraction scheme implemented in the Monte Carlo generator MC@NLO describes hard emission just like a next-to-leading order calculation. This includes the next-to-leading order normalization of the rate as well as the next-to-leading order distributions for those particles produced in the original hard process. For example for W+jets production such corrections to the W and leading jet distributions matter, while for the production of heavy new particles their distributions hardly change at next-to-leading order. The distribution of the first radiated parton is included at leading order, as we see in Eq. 2.214. Finally, additional collinear particle emissions is simulated using Sudakov factors, precisely like a parton shower.

Most importantly, it avoids double counting between the first hard emission and the collinear jets, which means it describes the entire \(p_T\) range of jet emission for the first and hardest radiated jet consistently. Those additional jets, which do not feature in the next-to-leading order calculation, are added through the parton shower, i.e. in the collinear approximation. As usually, what looked fairly easy in our toy example is much harder in QCD reality, but the setup is the same.

2.6.3 POWHEG Method

As described in Sect. 2.6.2 the MC@NLO matching scheme for a next-to-leading order correction and the parton shower is based on an extended subtraction scheme. It starts from a given parton shower and avoids double counting by modifying the next-to-leading corrections. An interesting question is: can we also combine a next-to-leading order calculation by keeping the next-to-leading order structure and apply a modified parton shower? The main ingredient to this structure are Sudakov factors introduced in Sect. 2.5.1 and used for the CKKW merging scheme in Sect. 2.5.3.

In contrast to the MC@NLO scheme the POWHEG (Positive Weight Hardest Emission Generator) scheme does not introduce counter events or subtraction terms. It considers the next-to-leading order calculation of a cross section a combination of an m-particle and an \((m + 1)\)-particle process and attempts to adjust the parton shower attached to each of these two contributions such that there is no double counting.

Our starting point is the next-to-leading order computation of a cross section following Eq. 2.196. We can combine it with appropriate soft and collinear subtraction terms C in the factorized \((m + 1)\)-particle phase space where for simplicity we assume that the integrated subtraction terms exactly cancel the divergences from the virtual corrections:

$$ \begin{aligned} d \sigma &= B \; d \phi_m + \alpha_s \left( - \frac{B} {2\varepsilon} +V \right) \; d \phi_m + \alpha_s R \; dz\; dt^{\prime} d \phi_m \\ &= B \; d \phi_m + \alpha_s V \; d \phi_m + \alpha_s \left( R - C {\mathbb{P}} \right) \; d \phi_m d z dt' \quad \hbox{soft-collinearly subtracted} \\ &= \left[ \alpha_s V + \alpha_s (R - C) {\mathbb{P}}\; d z\; d t \right] \; d \phi_m + B \; d \phi_m \left[ 1 + \frac{\alpha_s R} {B} \left( 1 - {\mathbb{P}} \right) d z\; dt^{\prime} \right].\\[5pt] \end{aligned} $$
(2.215)

The \((m + 1)\)-particle phase we factorize into the m-particle phase space and the azimuthally symmetric remainder \(dz\,dt',\) as suggested by Eq. 2.97. With these two proper integration variables we cannot assume a toy form \(R/x\) for the real emission, so we denote the real emission as R alone. The projector \({\mathbb{P}}\) maps the nominal \((m + 1)\)-particle phase space of the real emission onto the m-particle phase space of the leading order process.

The first term in Eq. 2.215 is suppressed by one power of \(\alpha_s,\) so we can add a parton shower to it without any worry. The second term consists of the Born contribution and the hard emission of one parton, so we have to avoid double counting when defining the appropriate Sudakov factors. Moreover, a serious problem appears in Eq. 2.215 when we interpret it probabilistically: nothing forces the combination of virtual and subtracted real emission in the first brackets to be positive. To cure this shortcoming we can instead combine all m-particle contributions into one term

$$ \begin{aligned} d \sigma &= \left[ B + \alpha_s V + \alpha_s (R - C) {\mathbb{P}}\, dz\,\,dt^{\prime} \right] \; d \phi_m \left[ 1 + \frac{\alpha_s R} {B} \left( 1 - {\mathbb{P}} \right) dz\, dt^{\prime} \right] + {\fancyscript{O}}(\alpha_s^2) \\ &\equiv \overline{B} \; d \phi_m \left[ 1 + \frac{\alpha_s R} {B} \left( 1 - {\mathbb{P}} \right) d z\,\,dt^{\prime} \right] + {\fancyscript{O}}(\alpha_s^2) \\ &= \overline{B} \; d \phi_m \left[ 1 + \frac{\alpha_s R} {B} \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) d z\,\,dt^{\prime} \right] + {\fancyscript{O}}(\alpha_s^2), \end{aligned} $$
(2.216)

where the combination \(\overline{B}\) can only become negative if the regularized next-to-leading contribution over-compensates the Born term which would indicate a breakdown of perturbation theory. The second term in the brackets needs to project the real emission onto the hard \((m + 1)\)-particle phase space. If we replace the symbolic projection \(( 1 - {\mathbb{P}} )\) by a step function in terms of the transverse momentum of the radiated parton \(p_T(t^{\prime},z)\) we can ensure that it really only appears for hard radiation above \(p_T^ {{\rm min}}\) and at the same time keep the integral over the radiation phase space finite.

From CKKW jet merging we know what we have to do to combine an m-particle process with an \((m+1)\)-particle process, even in the presence of the parton shower: the m-particle process has to be exclusive, which means we need to attach a Sudakov factor \(\Delta\) to veto additional jet radiation to the first term in the brackets of Eq. 2.216. In the CKKW scheme the factor in the front of the brackets would be B and not \(\overline{B}.\) The introduction of \(\overline{B}\) is nothing but a re-weighting factor for the events contributing to the m-particle configuration which we need to maintain the next-to-leading order normalization of the combined m-particle and \((m+1)\)-particle rates. The second factor \(\alpha_s R/B\) is essentially the multiplicative PYTHIA or HERWIG matrix element correction used for an improved simulation for example of \(W+\!\)jet events.

The appropriate Sudakov factor for the real emission has to veto only hard jet radiation from an additional parton shower. This way we ensure that for the \((m + 1)\)-particle contribution the hardest jet radiation is given by the matrix element R, i.e. no splitting occurs in the hard regime \(p_T > p_T^{{\rm min}}.\) Such a vetoed shower we can define in analogy to the (diagonal) Sudakov survival probability Eq. 2.164 by adding a step function which limits the unwanted splittings to \(p_T > p_T^{{\rm min}}\)

$$ \Delta(t,p_T^{{\rm min}}) = \exp \left( - \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \int \nolimits_0^1 dz \ \frac{\alpha_s}{2 \pi} \hat{P}(z) \; \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) \right)\!, $$
(2.217)

omitting the resolution \(t_0\) in the argument. This modified Sudakov factor indicates that in contrast to the MC@NLO method we now modify the structure of the parton shower which we combine with the higher order matrix elements.

For the vetoed Sudakov factors to make sense we need to show that they obey a DGLAP equation like Eq. 2.172, including the veto condition in the splitting kernel

$$ \begin{aligned} f(x,t) &= \Delta(t,p_T^{{\rm min}}) f(x,t_0)\\ &+ \int \nolimits_{t_0}^t \frac{dt^{\prime}}{t^{\prime}} \; \Delta(t,t^{\prime},p_T^{{\rm min}}) \int \nolimits_0^1 \frac{dz} {z} \; \frac{\alpha_s} {2\pi} \; \hat{P}(z) \; \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) f\left(\frac{x} {z},t^{\prime}\right)\!, \\[5pt] \end{aligned} $$
(2.218)

where we again show the diagonal case to simplify the notation. The proof of this formula starts from Eq. 2.172 with the modification of an explicit veto. Using \(1 = \theta(g) + (1 - \theta(g))\) we find Eq. 2.218 more or less straight away. The bottom line is that we can consistently write down vetoed Sudakov probabilities and build a parton shower out of them.

Inserting both Sudakov factors into Eq. 2.216 gives us for the combined next-to-leading order exclusive contributions

$$ \begin{aligned} \fbox{$d \sigma = \overline{B} \; d \phi_m \left[ \Delta(t,0) + \Delta(t^{\prime},p_T^{{\rm min}}) \dfrac{\alpha_s R}{B}\theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) d t^{\prime} dz \right] + {\fancyscript{O}}(\alpha_s^2)$}\;.\\[2mm] \end{aligned} $$
(2.219)

The first Sudakov factor is not vetoed which means it is evaluated at \(p_T^{{\rm min}} = 0.\)

Based on the next-to-leading order normalization of the integrated form of Eq. 2.219 we can determine the form of the splitting probability entering the Sudakov factor from the perturbative series: the term in brackets integrated over the entire phase space has to give unity. Starting from Eq. 2.218 we first compute the derivative of the Sudakov factor with respect to one of its integration boundaries, just like in Eq. 2.168

$$ \begin{aligned} \frac{d \Delta(t,p_T^{{\rm min}})}{d t} &= \frac{d} {d t} \exp \left( - \int \nolimits_{t_0}^t \frac{dt^{\prime}} {t^{\prime}} \int \nolimits_0^1 dz \frac{\alpha_s} {2 \pi} \hat{P}(z) \; \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) \right) \\ &= \Delta(t,p_T^{{\rm min}}) \; \frac{(-1)} {t} \int \nolimits_0^1 dz \frac{\alpha_s} {2 \pi} \hat{P}(z) \; \theta \left( p_T(t,z) - p_T^{{\rm min}} \right). \end{aligned} $$
(2.220)

Using this relation we indeed find for the integral over the second term in the brackets of Eq. 2.219

$$\begin{aligned}[b] & \int \nolimits_{t_0}^t d t^{\prime} dz \Delta(t^{\prime},p_T^{{\rm min}}) \frac{\alpha_s R} {B} \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right) \\ &\quad = - \int \nolimits_{t_0}^t dt^{\prime} \; \frac{d \Delta(t^{\prime},p_T^{{\rm min}})} {d t^{\prime}} \; \frac{\int dz \frac{\alpha_s R} {B} \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right)} {\int dz \frac{\alpha_s} {2 \pi t^{\prime}} \hat{P}(z) \; \theta \left( p_T(t^{\prime},z) - p_T^{{\rm min}} \right)} \\ &\quad = - \int \nolimits_{t_0}^t dt^{\prime} \; \frac{d \Delta(t^{\prime},p_T^{{\rm min}})}{d t^{\prime}} \\ &\quad = - \Delta(t,p_T^{{\rm min}}) \quad\Leftrightarrow \quad \fbox{$\dfrac{\alpha_s R} {B} = \dfrac{\alpha_s} {2 \pi t^{\prime}}\hat{P}(z)$}\;.\\ \end{aligned} $$
(2.221)

Looking back at Eq. 2.97 this corresponds to identifying \(B = \sigma_n\) and \(\alpha_s R = \sigma_{n+1}.\) In the POWHEG scheme the Sudakov factors are based on the simulated splitting probability \(\alpha_s R/B\) instead of the splitting kernels. This replacement is nothing new, though. We can already read it off Eq. 2.97.

A technical detail which we have not mentioned yet is that the POWHEG scheme assumes that our Sudakov factors can be ordered in such a way that the hardest emission always occurs first. Following the discussion in Sect. 2.5.2 we expect any collinear transverse momentum ordering to be disrupted by soft radiation, ordered by the angle . The first emission of the parton shower might well appear at large angles but with small energy, which means it will not be particularly hard.

For the POWHEG shower this soft radiation has to be removed or moved to a lower place in the ordering of the splittings. The condition to treat soft emission separately we know from CKKW merging, namely Eq. 2.192: the scale at which we resolve a parton splitting does not have to identical with the lower boundary of simulated splittings. We can construct a parton shower taking into account such splitting kernels, defining a truncated shower . This modified shower is the big difference between the MC@NLO scheme and the POWHEG scheme in combining next-to-leading order corrections with a parton shower. In the MC@NLO scheme we modify the next-to-leading order correction for a given shower, but the shower stays the same. In the POWHEG scheme the events get re-weighted according to standard building blocks of a next-to-leading order calculation, but the shower has to be adapted.

In Sects. 2.5.3, 2.6.2 and 2.6.3 we have introduced different ways to simulate jet radiation at the LHC. The main features and shortcomings of the matching and merging approaches we summarize in Table 2.3.

Table 2.3 Comparison of the MC@NLO and CKKW schemes combining collinear and hard jets

At this stage it is up to the competent user to pick the scheme which describes their analysis best. First of all, if there is a well defined and sufficiently hard scale in the process, the old-fashioned Monte Carlo with a tuned parton shower will be fine, and it is by far the fastest method. When for some reason we are mainly interested in one hard jet we can use MC@NLO or POWHEG and benefit from the next-to-leading order normalization. This is the case for example when a gluon splits into two bottoms in the initial state and we are interested in the radiated bottom jet and its kinematics. In cases where we really need a large number of jets correctly described we will end up with CKKW or MLM simulations. However, just like the old-fashioned parton shower Monte Carlo we need to include the normalization of the rate by hand. Or we are lucky and combined versions of CKKW and POWHEG, as currently developed by both groups, will be available.

I am not getting tired of emphasizing that the conceptual progress in QCD describing jet radiation for all transverse momenta is absolutely crucial for LHC analyses. If I were a string theorist I would definitely call this achievement a revolution or even two, like 1917 but with the trombones and cannons of Tchaikovsky’s 1812. In contrast to a lot of other progress in theoretical physics jet merging solves a problem which would otherwise have limited our ability to understand LHC data, no matter what kind of Higgs or new physics we are looking for.

2.7 Further Reading

Just like the Higgs part the QCD part of these lecture notes is something in between a text book chapter and a review of QCD and mostly focused on LHC searches. Some corners I cut, in particular when calculations do not affect the main topic, namely the resummation of logarithms in QCD and the physical meaning of these logarithms. There is no point in giving a list of original references, but I will list a few books and review articles which should come in handy if you would like to know more:

  • I started learning high energy theory including QCD from Otto Nachtmann’s book. I still use his appendices for Feynman rules because I have not seen another book with as few (if not zero) typos [5].

  • Similar, but maybe a little more modern is the Standard Model primer by Cliff Burgess and Guy Moore [6]. At the end of it you will find more literature.

  • The best source to learn QCD at colliders is the pink book by Keith Ellis, James Stirling, and Bryan Webber. It includes everything you ever wanted to know about QCD [1] and more. This QCD section essentially follows its Chap. 5.

  • A little more phenomenology you can find in Günther Dissertori, Ian Knowles and Michael Schmelling’s book [7]. Again, I borrowed some of the discussions in the QCD section from there.

  • If you would like to learn how to for example compute higher order cross sections to Drell–Yan production, Rick Field works it all out [8].

  • For those of you who are now hooked on QCD and jet physics at hadron colliders there are two comprehensive reviews by Steve Ellis [9] and by Gavin Salam [10].

  • Aimed more at perturbative QCD at the LHC is the QCD primer by John Campbell, Joey Huston, and James Stirling [11].

  • Coming to the usual brilliant TASI lectures, there are Dave Soper’s [12] and George Sterman’s [13] notes. Both of them do not exactly use my kind of notations and are comparably formal, but they are a great read if you know something about QCD already. More on the phenomenological side there are Mike Seymour’s lecture notes [14].

  • The only review on leading order jet merging is by Michelangelo Mangano and Tim Stelzer [15]. The original CKKW paper beautifully explains the general idea for final state radiation, and I follow their analysis [2]. For other approaches there is a very concise discussion included with the comparison of the different models [16].

  • To understand MC@NLO there is nothing like the original papers by Bryan Webber and Stefano Frixione [17].

  • The POWHEG method is really nicely described in the original paper by Paolo Nason [18]. Different processes you can find discussed in detail in a later paper by Stefano Frixione, Paolo Nason, and Carlo Oleari [19].

  • Even though they are just hand written and do not include a lot of text it might be useful to have a look at Michael Spira’s QCD lecture notes [20] to view some of the topics from a different angle.