1 Introduction

In 2012 the ATLAS and CMS Collaborations announced the observation [1, 2] of a new boson with a mass, \(m_\text {H}\), of about 125\(\,\text {GeV} \, \)and properties consistent, within uncertainties, with expectations for a standard model (SM) Higgs boson. The Higgs boson is the particle predicted to exist as a consequence of the spontaneous symmetry breaking mechanism acting in the electroweak sector of the SM [35]. This mechanism was first suggested nearly fifty years ago [611], and introduces a complex scalar field, which also gives masses to the fundamental fermions through a Yukawa interaction. Results using the full available dataset have recently been published by CMS [1219], and by ATLAS [2025].

The diphoton decay channel provides a clean final-state topology that allows the mass of the decaying object to be reconstructed with high precision. Having in mind the discovery of a low mass Higgs boson in the diphoton channel, the electromagnetic calorimeter performance was a design priority for CMS. The diphoton decay is mediated by loop diagrams containing charged particles. The top quark loop and the W boson loop diagrams dominate the decay amplitude, though they contribute with opposite sign. The branching fraction is small, reaches a maximum value of 0.23 % at \(m_\text {H}=125\,\text {GeV} \, \) and falls steeply to values less than 0.1 % above 150\(\,\text {GeV} \, \) [26]. As a consequence the search reported in this paper is limited to the mass range, \(110<m_\text {H}<150\,\text {GeV}\, \) . Despite the small branching fraction and the presence of a large diphoton continuum background, the diphoton decay mode provides an expected signal significance for the 125\(\,\text {GeV}\, \)SM Higgs boson that is one of the highest among all the decay modes.

This paper presents the analysis performed on the full dataset collected in 2011 and 2012, reconstructed with the final detector calibration values, in \(\mathrm {p}\mathrm {p}\) collisions at the Large Hadron Collider (LHC), with an integrated luminosity of 5.1\(\,\text {fb}^\text {-1}\)at a centre-of-mass energy of 7\(\,\text {TeV}\ \)(herein referred to as the “7\(\,\text {TeV}\ \)dataset”) and 19.7\(\,\text {fb}^\text {-1}\)at 8\(\,\text {TeV}\ \)(“8\(\,\text {TeV}\ \) dataset”). The results supersede those previously reported by CMS for this decay mode [27, 28].

The primary production mechanism of the Higgs boson at the LHC is gluon-gluon fusion (ggH) [29] with additional smaller contributions from vector boson fusion (VBF) [30] and production in association with a \(\mathrm {W}\) or \(\text {Z}\) boson (VH) [31] or a \(\text {t}\overline{\text {t}}\) pair (\(\text {t}\overline{\text {t}}\text {H}\) ) [32, 33]. Events from specific production mechanisms are identified and classified by the presence of additional objects in the final state. Requiring the presence of two forward jets, in addition to the photon pair, favours events produced by the VBF mechanism, while event classes designed to preferentially select VH or \(\text {t}\overline{\text {t}}\text {H}\) production require the presence of muons, electrons, missing transverse energy from neutrinos, or jets arising from the hadronization of b quarks. To achieve the best sensitivity, the remaining events, and also the dijet events selected as having a VBF signature, are further separated using multivariate classifiers that provide measures of their probability to be signal rather than background. The signal is measured performing a simultaneous fit to the diphoton invariant mass distributions in the various event classes. The signal model is derived from simulation, while the background is obtained from the fit to data. A very large sample of events is available in which a \(\text {Z}\) boson decays to a pair of electrons; treating the electron showers in these events as if they were from photons allows precise and detailed knowledge to be obtained concerning the accuracy of the simulation of the signal, specifically the simulation of the energy reconstruction and selection of photons, and the simulation of the selection and classification of diphoton events.

With respect to analyses of this decay mode previously reported by CMS there are refinements in methodology, which are described in the main body of the paper. In addition, the analysis uses an improved intercalibration of the electromagnetic calorimeter channels and an improved energy regression algorithm to correct the clustered energy, resulting in better energy resolution. The simulation of the signal and Z boson samples is also improved. The changes in the energy-equivalent noise in the electromagnetic calorimeter during the data-taking period are simulated, and a significantly increased time window is used to simulate the effect of deposited energy coming from interactions in earlier bunch crossings.

The paper is organized as follows. After a brief description of the CMS detector and event reconstruction in Sect. 2 and of the data and simulated samples in Sect. 3, the reconstruction and identification of photons is detailed in Sect. 4. The issue of identifying the diphoton vertex is covered in Sect. 5. In Sect. 6 the event classification is described. The section first describes the construction of a multivariate event classifier which takes as input quantities associated with the two photons, and then goes on to describe the tagging of events by the presence of objects in the final state, in addition to the photon pair, that give the event a signature characteristic of one of the production processes. It concludes by detailing the use of two multivariate event classifiers to additionally subdivide into classes both the untagged events, and the events tagged as coming from the VBF process. Sections 7 and 8 describe, respectively, the signal and background models used in the statistical procedures which provide the results of the analysis, and Sect. 9 discusses the systematic uncertainties taken into account in those procedures. Section 10 outlines three alternative analyses that use specific variations of methodology that provide corroboration of particular aspects of the main analysis. Finally, in Sect. 11 the results of the measurements of the Higgs boson production and its properties are presented and discussed.

2 CMS detector

The central feature of the CMS apparatus is a superconducting solenoid, 13 \(\text {\,m}\)  in length and with an inner diameter of 6 \(\text {\,m}\) , which provides an axial magnetic field of 3.8 \(\text {\,T}\) . The bore of the solenoid is instrumented with both the central tracker and the calorimeters. The steel flux-return yoke outside the solenoid hosts gas ionization detectors used to identify and reconstruct muons.

The CMS experiment uses a right-handed coordinate system, with the origin at the nominal interaction point, the \(x\) axis pointing to the centre of the LHC, the \(y\) axis pointing up (perpendicular to the LHC plane), and the \(z\) axis along the anticlockwise-beam direction. The polar angle \(\theta \) is measured from the positive \(z\) axis and the azimuthal angle \(\phi \) is measured in the \(x\)\(y\) plane. Transverse energy, denoted by \(E_{\mathrm {T}}\,\), is defined as the product of energy and \(\sin \theta \), with \(\theta \) being measured with respect to the nominal interaction point. Charged-particle trajectories are measured by the silicon pixel and strip tracker, with full azimuthal coverage within \(|\eta | < 2.5\), where the pseudorapidity \(\eta \) is defined as \(\eta = -\ln [\tan (\theta /2)]\). A lead tungstate crystal electromagnetic calorimeter (ECAL) and a brass/scintillator hadron calorimeter (HCAL) surround the tracking volume and cover the region \(|\eta | < 3\). The ECAL barrel extends to \(|\eta | < 1.48\) while the ECAL endcaps cover the region \(1.48 < |\eta | < 3.0\). A lead/silicon-strip preshower detector is located in front of the ECAL endcap in the region \(1.65 < |\eta | < 2.6\). The preshower detector includes two planes of silicon sensors measuring the \(x\) and \(y\) coordinates of the impinging particles. A steel/quartz-fibre Cherenkov forward calorimeter extends the calorimetric coverage to \(|\eta | < 5.0\). In the region \(|\eta | < 1.74\), the HCAL cells have widths of 0.087 in both \(\eta \) and \(\phi \). In the \(\eta \)\(\phi \) plane, and for \(|\eta | < 1.48\), the HCAL cells map on to 5\(\times \)5 ECAL crystal arrays to form calorimeter towers projecting radially outwards from points slightly offset from the nominal interaction point. In the endcap, the ECAL arrays matching the HCAL cells contain fewer crystals.

Calibration of the ECAL is achieved exploiting the \(\phi \)–symmetry of the energy flow, and using photons from \(\mathrm {\pi ^{0}}\rightarrow \mathrm {\gamma }\mathrm {\gamma }\) and \(\eta \rightarrow \mathrm {\gamma }\mathrm {\gamma }\) decays, and electrons from \(\mathrm {W}\rightarrow \mathrm {e}\nu \,\) and \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) decays [34]. Changes in the transparency of the ECAL crystals due to irradiation during the LHC running periods and their subsequent recovery are monitored continuously, and corrected for, using light injected from a laser system [34].

The first level of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4\(\,\mu \text {s}\) . The high-level trigger processor farm further decreases the event rate from around 100 \(\text {\,kHz}\)  to around 400 \(\text {\,Hz}\) , before data storage.

A more detailed description of the CMS detector can be found in Ref. [35].

Reconstruction of the photons used in this analysis is described in Sect. 4, and uses a clustering of the energy recorded in the ECAL, known as a “supercluster”, which may be extended in the \(\phi \) direction to form an extended cluster or group of clusters.

The global event reconstruction (also called particle-flow event reconstruction) consists of reconstructing and identifying each particle with an optimized combination of all subdetector information [36, 37]. In this process, the identification of the particle type (photon, electron, muon, charged hadron, neutral hadron) plays an important role in the determination of the particle direction and energy. Photons are identified as ECAL energy clusters not linked to the extrapolation of any charged-particle trajectory to the ECAL. Electrons are identified as a primary charged-particle track associated with ECAL energy clusters corresponding to this track’s extrapolation to the ECAL and to possible bremsstrahlung photons emitted along the way through the tracker material. Muons are identified as a track in the central tracker consistent with either a track or several hits in the muon system, associated with less energy in the calorimeters than would be deposited by a charged hadron or electron. Charged hadrons are identified as charged-particle tracks neither identified as electrons, nor as muons. Finally, neutral hadrons are identified as HCAL energy clusters not linked to any charged hadron trajectory, or as ECAL and HCAL energy excesses with respect to the expected energy deposited by a matching charged hadron.

The energy of photons used in the global event reconstruction is directly obtained from the ECAL measurement. The energy of electrons is determined from a combination of the track momentum at the main interaction vertex, the corresponding ECAL cluster energy, and the energy sum of all bremsstrahlung photons attached to the track. The energy of muons is obtained from the corresponding track momentum. The energy of charged hadrons is determined from a combination of the track momentum and the corresponding ECAL and HCAL energy, calibrated for the nonlinear response of the calorimeters. Finally, the energy of neutral hadrons is obtained from the corresponding calibrated ECAL and HCAL energies.

For each event, hadronic jets are clustered from these reconstructed particles using the infrared- and collinear-safe anti-\(k_{\mathrm {T}}\) algorithm [38] with a size parameter of 0.5. The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and the scale is found in the simulation to be within 5–10 % of the true momentum over the whole transverse momentum spectrum and detector acceptance. Jet energy corrections are derived from simulation, and are confirmed with in situ measurements using the energy balance of dijet and \(\mathrm {\gamma }/\text {Z}+\text {jet}\) events [39]. The jet energy resolution typically amounts to 15 % (8 %) at 10 (100)\(\,\text {GeV}\, \) , to be compared to about 40 % (12 %) obtained when the calorimeters alone are used for jet clustering.

To identify jets originating from the hadronization of bottom quarks, the combined secondary vertex b-tagging algorithm [40] is employed. The algorithm tags jets from b-hadron decays by identifying their displaced decay vertex. The working point of the tagging algorithm used provides an efficiency for identifying b-quark jets of about 70 % and a misidentification probability for jets from light quarks and gluons of about 1 %.

The missing transverse energy vector is taken as the negative vector sum of all reconstructed particle candidate transverse momenta in the global event reconstruction, and its magnitude is referred to as \(E_{\mathrm {T}}^{\text {miss}}\,\).

3 Data sample and simulated events

The events used in the analysis were selected by diphoton triggers with asymmetric transverse energy thresholds and complementary photon selections. One selection requires a loose calorimetric identification based on the shape of the electromagnetic shower and loose isolation requirements on the photon candidates, while the other requires only that the photon candidate has a high value of the \(R_\mathrm {9}\) shower shape variable. High trigger efficiency is maintained by allowing both photons to satisfy either selection. The \(R_\mathrm {9}\) variable is defined as the energy sum of 3\(\times \)3 crystals centred on the most energetic crystal in the supercluster divided by the energy of the supercluster. Photons that convert before reaching the calorimeter tend to have wider showers and lower values of \(R_\mathrm {9}\) than unconverted photons. To cover the entire data taking period two trigger threshold configurations are used: \(E_{\mathrm {T}}\,>26\ (18)\,\text {GeV}\, \) on the leading (trailing) photon, and \(E_{\mathrm {T}}\,>36\ (22)\,\text {GeV}\, \) . The measured trigger efficiency is \(99.4~\%\) for events satisfying the diphoton preselection required for events entering the analysis, as described in Sect. 4.

The Monte Carlo (MC) simulation of detector response employs a detailed description of the CMS detector, and uses \({\textsc {geant}}\,4\) version 9.4 (patch 03) [41]. Simulated events include simulation of the multiple \(\mathrm {p}\mathrm {p}\) interactions taking place in each bunch crossing and are weighted to reproduce the distribution of the number of interactions in data. They thus simulate the effects of pileup—the presence of signals from multiple \(\mathrm {p}\mathrm {p}\) interactions, in multiple bunch crossings, in each recorded event. The interactions used to simulate pileup are generated with the same versions of pythia [42], 6.424 or 6.426, that are used for other purposes as described below. The pythia tunes used for the underlying event activity are Z2 and Z2* for the 7 and 8\(\,\text {TeV}\ \)samples, respectively [43]. Simulated Higgs boson signal events are used both for training of multivariate discriminants and to construct the signal model used in the statistical procedures employed to extract the results. Sufficient samples have been produced to ensure that the samples of simulated signal events used for construction of the signal model (Sect. 7) are not used for training the multivariate discriminants. The MC signal event samples for the ggH and VBF processes are obtained using the next-to-leading order (NLO) matrix-element generator powheg  (version 1.0) [4448] interfaced with pythia. For the 7\(\,\text {TeV}\ \)samples, events are weighted so that the transverse momentum spectrum of Higgs bosons produced by the ggH process agrees with the next-to-next-to-leading logarithm + NLO distribution computed by hqt  (version 1.0) [4951]. At 8\(\,\text {TeV}\ \), powheg has been tuned following the recommendations of the LHC Higgs Cross Section Working Group [52] and reproduces the hqt spectrum. The ggH process cross section is reduced by 2.5 % for all values of \(m_\text {H}\) to account for the interference with nonresonant diphoton production [53]. For the VH and \(\text {t}\overline{\text {t}}\text {H}\) processes pythia is used alone; processes are generated at leading-order by pythia, and higher order diagrams are accounted for only by pythia’s “parton showering” model. The SM Higgs boson cross sections and branching fractions used are taken from Ref. [54]. Samples used for the testing of spin hypotheses were generated with leading-order accuracy by jhugen [55, 56], interfaced to pythia.

Simulated samples of \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\) , \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\) , and \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\) events used for comparison with data, and for the derivation of energy scale and resolution smearing corrections are generated with MadGraph, sherpa, and powheg  [57], allowing comparisons to be made between the different generators.

Simulated background samples are used only for training multivariate discriminants and defining selection and classification criteria. The background is simulated using a combination of samples. At \(\sqrt{s}=7\,\text {TeV}\ \) the diphoton processes are simulated using a combination of MadGraph  5 [58] interfaced to pythia for processes apart from the gluon-fusion box diagram, and pythia alone for the box diagram. At \(\sqrt{s}=8\,\text {TeV}\ \) the diphoton continuum processes involving two prompt photons are simulated using sherpa  1.4.2 [59]. The sherpa samples give a noticeably improved description of diphoton continuum events accompanied by one or two jets, and enable training of a more effective multivariate discriminant in the case of diphoton-plus-dijet events. The remaining processes where one of the photon candidates arises from misidentified jet fragments are simulated using pythia alone, the cross sections of the processes are scaled by \(K\)-factors derived from CMS measurements [60, 61].

4 Photon reconstruction and identification

Photon candidates for the analysis are reconstructed from energy deposits in the ECAL using algorithms that constrain the superclusters in \(\eta \) and \(\phi \) to the shapes expected from electrons and photons with high \(p_{\mathrm {T}}\) . The algorithms do not make any hypothesis as to whether the particle originating from the interaction point is a photon or an electron; when reconstructed in this way, electrons from \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events provide measurements of the photon trigger, reconstruction, and identification efficiencies, and of the photon energy scale and resolution. The clustering algorithms achieve a rather complete (\(\approx \)95 %) collection of the energy of photons and electrons, even those that undergo conversion and bremsstrahlung in the material in front of the ECAL. In the barrel region, superclusters are formed from five-crystal-wide strips in \(\eta \), centred on the locally most energetic crystal (seed), and have a variable extension in \(\phi \). In the endcaps, where the crystals are arranged according to an \(x\)\(y\) rather than an \(\eta \)\(\phi \) geometry, matrices of 5\(\times \)5 crystals, which may partially overlap and are centred on a locally most energetic crystal, are summed if they lie within a narrow \(\phi \) road. The photon candidates are required to be within the fiducial region \(|\eta |<2.5\), excluding the barrel-endcap transition region \(1.44 < |\eta | < 1.57\), where the photon reconstruction is suboptimal. The fiducial region requirement is applied to the supercluster position in the ECAL, i.e. the value of \(\eta \) is calculated with respect to the origin of the coordinate system. The exclusion of the barrel-endcap transition region ensures complete clustering of the accepted showers in either the ECAL barrel or endcaps.

About half of the photons convert in the material upstream of the ECAL. If the resulting charged particle tracks originate sufficiently close to the interaction point so as to pass through three or more tracking layers, conversion track pairs may be reconstructed and matched to the photon candidate.

4.1 Photon energy

The photon energy is computed from the signals recorded by the ECAL. In the region covered by the preshower detector (\(|\eta | > 1.65\)) the signals recorded in it are also considered. In order to obtain the best energy resolution, the calorimeter signals are calibrated and corrected for several detector effects [34]. The variation of crystal transparency during the run is continuously monitored and corrected for using a factor based on the measured change in response to the light from the laser system, with the response for each crystal being computed approximately every 40 minutes. The single-channel response of the ECAL is equalized exploiting the \(\phi \)-symmetry of the energy flow, the mass constraint on the energy of the two photons in \(\mathrm {\pi ^{0}}\) and \(\mathrm {\eta }\) decays, and the momentum constraint on the energy of isolated electrons from \(\mathrm {W}\)- and \(\text {Z}\)-boson decays. Finally, the containment of the shower in the clustered crystals, the shower losses for photons that convert in the material upstream of the calorimeter, and the effects of pileup, are corrected using a multivariate regression technique. The photon energy response distribution is parameterized by a function with a Gaussian core and two power law tails, an extended form of the Crystal Ball function [62]. The regression provides a per-photon estimate of the parameters of the function, and therefore a prediction of the distribution of the ratio of true energy to uncorrected supercluster energy. The most probable value of this distribution is taken as the corrected photon energy. The width of the Gaussian core is further used as a per-photon estimator of the energy uncertainty. The regression input variables are a collection of shower shape variables including \(R_\mathrm {9}\) of the supercluster, the ratio of the 5\(\times \)5 crystal energy centred around the seed crystal to the uncorrected supercluster energy sum, the energy-weighted \(\eta \)-width and \(\phi \)-width of the supercluster, and the ratio between the hadronic energy behind the supercluster and the electromagnetic energy of the cluster. The global \(\eta \) coordinate of the supercluster is included, and for the barrel the global \(\phi \) coordinate and the coordinates of the seed cluster with respect to the crystal centre are also included. In the endcap, the ratio of preshower energy to raw supercluster energy is included. Finally, the number of primary vertices and the median energy density \(\rho \) [63] in the event are included in order to allow for the correction of residual energy scale effects due to pileup.

A multistep procedure has been implemented to correct the energy scale in data, and to determine the parameters of Gaussian smearing to be applied to showers in simulated events so as to reproduce the energy resolution seen in data. First, the energy scale in data is equalized with that in simulated events, and residual long-term drifts in the response are corrected, using \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) decays in which the electron showers are reconstructed as photons. The data are corrected as a function of the time at which they were taken, using 8 epochs in the 7\(\,\text {TeV}\ \)dataset and 51 epochs in the 8\(\,\text {TeV}\ \)dataset. Following this, the photon energy resolution predicted by the simulation is made more realistic by adding a Gaussian smearing determined from the comparison between the \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) line-shape in data and in simulated events. The amount of smearing required is extracted differentially in \(|\eta |\) (two bins in the barrel and two in the endcap) and \(R_\mathrm {9}\) (two bins). In the fits from which the required amount of smearing is extracted, the data energy scale is allowed to float, and a residual scale correction for the data is extracted in the same eight bins. A sufficient number of \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events is available in the 8\(\,\text {TeV}\ \)data to allow a third step, in which the energy scale for the ECAL barrel is further corrected in 20 bins defined by ranges in \(|\eta |\), \(R_\mathrm {9}\), and \(E_{\mathrm {T}}\,\), and the smearing magnitude is allowed to have an energy dependence; the additional energy resolution (\(\sigma /E\)) is parameterized as the quadratic sum of a constant term and a term proportional to \(1/\sqrt{E_{\mathrm {T}}\,}\), and the relative magnitude of the two components extracted from the fits.

Figure 1 shows the invariant mass of electron pairs reconstructed in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events in the 8\(\,\text {TeV}\ \)data and in simulated events in which the electron showers are reconstructed as photons, and the full set of corrections to the data, and smearings of the simulated energies, are applied. The selection applied to the diphoton candidates is the same, apart from the inversion of the electron veto, as is applied to diphoton candidates entering the analysis (as described in Sect. 6). There is excellent agreement between the data and the simulation in the core of the distributions. A slight discrepancy is present in the low-mass tail in the endcaps, where the Gaussian smearing is not enough to account for some noticeable non-Gaussian energy loss. The mass peaks are shifted from the true \(\text {Z}\)-boson mass, both in data and simulation, because the electron showers are reconstructed as photons.

Fig. 1
figure 1

Invariant mass of \(\mathrm {e}^+\mathrm {e}^-\) pairs in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events in the 8\(\,\text {TeV}\ \)data (points), and in simulated events (histogram), in which the electron showers are reconstructed as photons, and the full set of photon corrections and smearings are applied. The comparison is shown for (left) events with both showers in the barrel, and (right) the remaining events. For each bin, the ratio of the number of events in data to the number of simulated events is shown in the lower main plot

4.2 Photon preselection

The continuum background to the \(\text {H}\rightarrow \gamma \gamma \) process is mainly due to prompt diphoton production, with a reducible contribution from \(\mathrm {p}\mathrm {p}\rightarrow \gamma + \text {jet}\) and dijet processes where at least one of the objects reconstructed as a photon comes from a jet. Typically these photon candidates come from one or more neutral mesons that take a substantial fraction of the total jet \(p_{\mathrm {T}}\) and are thus relatively isolated from hadronic activity in the detector. In the transverse momentum range of interest, the photons from neutral pion decays are rather collimated and are reconstructed as a single photon. In the events used for the analysis, i.e. after all selection and classification criteria are applied, MC simulation predicts that about 70 % of the total background is due to the irreducible prompt diphoton production.

The photons entering the analysis are required to satisfy preselection criteria similar to, but slightly more stringent than, the trigger requirements. These consist of

  • \(p_\mathrm {T}^{\gamma 1}\,>33\,\text {GeV}\, \) and \(p_\mathrm {T}^{\gamma 2}\,>25\,\text {GeV}\, \) , where \(p_\mathrm {T}^{\gamma 1}\) and \(p_\mathrm {T}^{\gamma 2}\) are the transverse momenta of the leading (in \(p_{\mathrm {T}}\,\)) and subleading photons, respectively.

  • a selection on the hadronic leakage of the shower, measured as the ratio of energy in HCAL cells behind the supercluster to the energy in the supercluster,

  • a loose selection based on isolation and the shape of the shower,

  • an electron veto, which removes the photon candidate if its supercluster is matched to an electron track with no missing hits in the innermost tracker layers, thus excluding almost all \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events.

The selection requirements are applied with different stringency in four categories defined to match the different selections used in the trigger. The four categories are shown in Table 1.

Table 1 Photon preselection efficiencies for both the 7 and 8\(\,\text {TeV}\ \)datasets measured for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events, where the electrons are reconstructed as photons, in four photon categories. The statistical uncertainties in the efficiencies found in simulated events are negligible, and the uncertainties measured in data are discussed in the text

The efficiency of the photon preselection is measured in data using a “tag-and-probe” technique [64]. The efficiency of all preselection criteria, except the electron veto requirement, is measured using \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events. The efficiency for photons to satisfy the electron veto requirement is measured using \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) events, in which the photon is produced by final-state radiation, which provide a more than \(99~\%\) pure source of prompt photons. The ratio of the photon efficiency measured in data to that found in simulated \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events, \(\epsilon _\text {data}/\epsilon _\mathrm {MC}\), is consistent with unity in all categories. The complete set of efficiencies, in data and in simulated \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events, and the ratios \(\epsilon _\text {data}/\epsilon _\mathrm {MC}\), are shown in Table 1. The systematic uncertainty in the measurement is included in both the efficiencies and the ratio. The statistical uncertainties in the efficiencies measured in simulated events are negligible. The measured \(\epsilon _\text {data}/\epsilon _\mathrm {MC}\) ratios are used to correct the simulated signal sample, and the associated uncertainties are taken into account as systematic uncertainties in the signal extraction procedure. For photons in simulated Higgs boson events the efficiency of the preselection criteria in the four categories ranges from 92 to 99 %.

4.3 Photon identification

A boosted decision tree (BDT), implemented using the tmva [65] framework, is trained to separate prompt photons from photon candidates resulting from misidentification of jet fragments passing the preselection requirements. The following variables are used as inputs to the photon identification BDT:

  1. 1.

    Lateral shower shape variables, six of which use data from the ECAL crystals, and one of which measures the shower spread in the preshower detector (where it is present). The shape variables obtained in the MC simulation are compared to those observed in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) and \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) data samples. No significant differences are observed.

  2. 2.

    Isolation variables, based on the particle-flow algorithm [37], and using sums of the \(p_{\mathrm {T}}\,\) of photons, and of charged hadrons, within regions of \(\Delta R<0.3\) around the candidate, where \(\Delta R=\sqrt{{(\Delta \phi )^2+(\Delta \eta )^2}}\). Two charged-hadron isolation variables are used: one that considers charged hadrons coming from the vertex chosen for the event (described in Sect. 5), and one that is the largest of all such \(p_{\mathrm {T}}\,\) sums among those made for each reconstructed vertex. The second variable is effective when a photon candidate originating from misidentification of jet fragments comes from a vertex other than the chosen one (Sect. 5 describes the vertex choice).

  3. 3.

    The energy median density per unit area in the event, \(\rho \). This variable is introduced to allow the BDT classifier to take into account the pileup dependence of the isolation variables.

  4. 4.

    The pseudorapidity and energy of the supercluster corresponding to the reconstructed photon. These variables are introduced to allow the dependence of the shower topology and isolation variables on \(\eta \) and \(p_{\mathrm {T}}\) to be taken into account.

Figure 2 shows the photon identification BDT score of the lower-scoring photon in diphoton pairs with an invariant mass, \(m_{\gamma \gamma }\), in the range \(100<m_{\gamma \gamma }<180\,\text {GeV} \, \) , for events passing the preselection in the 8\(\,\text {TeV}\ \)dataset and for simulated background events (histogram with shaded error bands showing the statistical uncertainty). The tall histogram on the right corresponds to simulated Higgs boson signal events. Although the simulated background events are only used for training the BDT, it is worth noting that the agreement of their BDT score distribution with that in data is good. The bump that can be seen in both distributions at a BDT score of slightly above 0.1 corresponds to events where both photons are prompt and, therefore, signal-like.

Fig. 2
figure 2

Photon identification BDT score of the lower-scoring photon of diphoton pairs with an invariant mass in the range \(100<m_{\gamma \gamma }<180\,\text {GeV} \, \) ,  for events passing the preselection in the 8\(\,\text {TeV}\ \)dataset (points), and for simulated background events (histogram with shaded error bands showing the statistical uncertainty). Histograms are also shown for different components of the simulated background, in which there are either two, one, or zero prompt signal-like photons. The tall histogram on the right (righthand vertical axis) corresponds to simulated Higgs boson signal events

The agreement between data and simulation for photon identification is assessed using electrons from \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) decays, photons from \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) decays, and the highest-\(p_{\mathrm {T}}\) photon in diphoton events with \(m_{\gamma \gamma }> 160\,\text {GeV}\, \) in which the relative magnitude of the contribution from misidentified jet fragments is small. Figure 3 shows a comparison of the photon identification BDT score for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) electron showers reconstructed as photons in the barrel, for data and MC simulated events. The events must pass all the preselection requirements, but the electron veto condition is inverted. The systematic uncertainty assigned to the photon identification BDT score is shown as a band, and corresponds to a shift of \(\pm \)0.01 in the score. The comparison is made for the 8\(\,\text {TeV}\ \)dataset, and is shown for two sets of events with different numbers of primary vertices, \(N_\mathrm {vtx}\), to demonstrate the independence of the result from effects coming from pileup. The differences between the distributions for the data and the simulation fall within the assigned systematic uncertainties for both the lower-pileup (\(N_\mathrm {vtx}\le 15\)) and higher-pileup (\(N_\mathrm {vtx}>15\)) sets of events, and the difference between the distributions in the two sets is negligible.

Fig. 3
figure 3

Comparison of the photon identification BDT score for electron showers in the barrel in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events in the 8\(\,\text {TeV}\ \)dataset and MC simulated events, for events passing the preselection, but with the electron veto condition inverted. The systematic uncertainty assigned to the photon identification BDT score is shown as a band. The comparison is shown for two sets of events with different numbers of primary vertices, \(N_\mathrm {vtx}\). For each bin, the ratio of the number of events in data to the number of simulated events is shown in the lower plot

5 Diphoton vertex

The mean number of \(\mathrm {p}\mathrm {p}\) interactions per bunch crossing is 9 in the 7\(\,\text {TeV}\ \)dataset and 21 in the 8\(\,\text {TeV}\ \)dataset. In the longitudinal direction, \(z\), the interaction vertices, built from the reconstructed tracks, have a distribution with an rms spread of about 6 (5)\(\,\text {cm}\) in the 7 (8)\(\,\text {TeV}\ \)dataset.

The diphoton mass resolution has contributions from the resolution of the measurement of the photon energies and the measurement of the angle between the two photons. If the vertex from which the photons originate is known to within about 10\(\,\text {mm}\) , then the experimental resolution on the angle between them makes a negligible contribution to the mass resolution. Thus, if the diphoton is associated with the charged particle vertex corresponding to the interaction in which it originated, then the mass resolution will be entirely dominated by the photon energy resolution, since the longitudinal coordinate of the charged particle vertices is known to greater precision than 10\(\,\text {mm}\) .

5.1 Diphoton vertex identification

No charged particle tracks result from photons that do not convert, so the diphoton vertex is identified indirectly, using the kinematic properties of the diphoton system and its correlations with the kinematic properties of the recoiling tracks. If either of the photons converts, the direction of the resulting tracks can provide additional information.

Three discriminating variables are calculated for each reconstructed primary vertex: the sum of the squared transverse momenta of the charged particle tracks associated with the vertex, and two variables that quantify the vector and scalar balance of \(p_{\mathrm {T}}\) between the diphoton system and the charged particle tracks associated with the vertex. The three variables are:

  1. 1.

    \(\sum {\varvec{p}}_{\mathrm {T}}\,^\mathrm{2}\)

  2. 2.

    \(-\sum ({\varvec{p}}_{\mathrm {T}}\,\cdot \frac{\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,}{|\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,|})\), and

  3. 3.

    \((|\sum {\varvec{p}}_{\mathrm {T}}\,| - |\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,|)/(|\sum {\varvec{p}}_{\mathrm {T}}\,| + |\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,|)\),

where the sums are over the transverse momentum vectors of the charged tracks, \({\varvec{p}}_{\mathrm {T}}\,\), and \(\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,\) is the transverse momentum vector of the diphoton system. In addition, if either photon is associated with any charged particle tracks that have been identified as resulting from conversion, then a further variable, \(g_\text {conv}\,\), is used, as defined below. An estimate of the primary vertex longitudinal position, \(z_\mathrm {e}\,\), is obtained from the conversion track(s), and the additional variable \(g_\text {conv}\,\) is defined as the pull between \(z_\mathrm {e}\,\) and the longitudinal position of the reconstructed vertex, \(z_\mathrm {vtx}\,\): \(g_\text {conv}\,=|z_\mathrm {e}\,-z_\mathrm {vtx}\, |/\sigma \), where \(\sigma \) is the uncertainty in \(z_\mathrm {e}\,\). The variables are used as the inputs to a multivariate system based on a BDT to choose the reconstructed vertex to be associated with the diphoton system.

The vertex finding efficiency, defined as the efficiency that the chosen vertex is within 10\(\,\text {mm}\) of the true vertex location, has been measured using \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\,\) events. The performance of the algorithm is evaluated after re-reconstruction of the vertices following removal of the muon tracks, so that the event mimics a diphoton event. The use of tracks from a converted photon to locate the vertex is validated in \(\gamma + \text {jet}\) events. In both cases the ratio of the efficiency measured in data to that measured in MC simulation is within 1 % of unity when viewed as a function of the number of vertices in the event. When viewed as a function of the \(\text {Z}\)-boson \(p_{\mathrm {T}}\,\), the deviation of the ratio from unity increases to a few percent in the region where \(p_{\mathrm {T}}^\mathrm {Z}\,<15\,\text {GeV} \, \) . The measured ratio as a function of the \(\text {Z}\)-boson \(p_{\mathrm {T}}\,\) is used as a correction to the vertex finding efficiency in simulated Higgs boson signal events. The vertex finding efficiency for a Higgs boson of mass 125\(\,\text {GeV} \, \), integrated over its \(p_{\mathrm {T}}\) spectrum, is computed to be 85.4 (79.6) % in the 7 (8)\(\,\text {TeV}\ \)dataset. Figure 4 shows the efficiency with which a diphoton system is assigned to a vertex reconstructed within 10\(\,\text {mm}\) of the true diphoton vertex in simulated Higgs boson events (\(m_\text {H}= 125\,\text {GeV}\)) in the 8\(\,\text {TeV}\ \)dataset, as a function of the transverse momentum of the diphoton system.

Fig. 4
figure 4

Fraction of diphoton vertices (solid points) assigned, by the vertex assignment BDT, to a reconstructed vertex within 10\(\,\text {mm}\) of their true location in simulated Higgs boson events, \(m_\text {H}= 125\,\text {GeV} \, \) , \(\sqrt{s}=8\,\text {TeV}\ \), as a function of \(p_{\mathrm {T}}^{\gamma \gamma }\,\). Also shown is a band, the centre of which is the mean prediction, from the vertex probability BDT (described in Sect. 5.2), of the probability of correctly locating the vertex. The mean is calculated in \(p_{\mathrm {T}}^{\gamma \gamma }\)  bins, and the width of the band represents the event-to-event uncertainty in the estimates

5.2 Per-event vertex probability

A second vertex-related multivariate discriminant has been designed to estimate, event-by-event, the probability for the vertex assignment to be within 10\(\,\text {mm}\) of the diphoton interaction point. This, in conjunction with the event-by-event estimate of the energy resolution of each photon, is used to estimate the diphoton mass resolution for each individual event, and this estimate is used in the event classification, as described in Sect. 6. The inputs of the vertex probability BDT are

  • the values of the vertex identification BDT output for the three most likely vertices in the event,

  • the total number of reconstructed vertices in the event,

  • the transverse momentum of the diphoton system, \(p_{\mathrm {T}}^{\gamma \gamma }\,\),

  • the distances between the chosen vertex and the second- and third-best vertices,

  • the number of photons with an associated conversion track or tracks.

The vertex probability BDT is tested with simulated signal events as shown in Fig. 4, and the performance in data is tested using \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\,\) events. Validation of the vertex probability BDT for events in which conversion tracks are present is achieved using \(\gamma + \text {jet}\) events in which one or more conversion tracks are reconstructed. The probability to identify a close-enough vertex (vertex probability) has a linear relationship with the vertex probability BDT score, the parameters of which are obtained from a fit using a sample of simulated signal events. Figure 5 shows the distribution of the vertex probability estimate, obtained from the BDT score, in \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\,\) events. The charged particle tracks belonging to the muon pair are used to identify the vertex, and are then removed from the event before re-reconstructing the vertices and passing them to the vertex identification and the vertex probability BDTs. The \(p_{\mathrm {T}}\) of the dimuon pair is used in the BDT calculation in place of \(\varvec{p}^{\mathrm {\gamma }\mathrm {\gamma }}_\mathrm {T}\,\). The vertex identified by the muons is assumed to be the correct or true vertex, so that if the vertex assignment BDT chooses that vertex, it chooses the right vertex, otherwise it chooses the wrong vertex. The vertex probability estimates in data (points), are compared to MC simulation (histograms). The comparison is made separately for events in which the vertex assignment BDT assigns the right vertex, and for those in which it assigns a wrong vertex.

Fig. 5
figure 5

Distribution of the vertex probability estimate in \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\,\) events. The vertex probability estimates in 8\(\,\text {TeV}\ \)data (points), are compared to the estimates in MC simulation (histograms). The comparison is made separately for events in which the vertex is assigned to the same (open circles and filled histogram), or to a different vertex (filled circles and outlined histogram), as that identified by the muons

6 Event classification

The analysis uses events with two photon candidates satisfying the preselection requirements (described in Sect. 4.3) with an invariant mass, \(m_{\gamma \gamma }\), in the range \(100<m_{\gamma \gamma }<180\,\text {GeV} \, \) , and with \(p_\mathrm {T}^{\gamma 1}\,>m_{\gamma \gamma }/3\) and \(p_\mathrm {T}^{\gamma 2}\,>m_{\gamma \gamma }/4\). In the rare case of multiple diphoton candidates, the one with the highest \(p_\mathrm {T}^{\gamma 1}\,+p_\mathrm {T}^{\gamma 2}\,\) is selected. The use of \(p_{\mathrm {T}}\) thresholds scaled by \(m_{\gamma \gamma }\)prevents the distortion of the low end of the \(m_{\gamma \gamma }\)spectrum that results if a fixed threshold is used. An additional requirement is applied on the photon identification BDT scores for both photons, which are required to be greater than \(-0.2\) (see Fig. 2). This requirement retains more than 99 % of simulated signal events fulfilling the other analysis selection requirements, while removing about 24 % of events in data. The requirements listed above are referred to as the “full diphoton preselection”.

To achieve the best analysis performance, the events are separated into classes based on both their mass resolution and their relative probability to be due to signal rather than background. The first step in the classification of the events involves the extraction of those tagged by the presence of objects in the final state, in addition to the photon pair, that give the event a signature characteristic of one of the production processes. The remaining untagged events, which constitute the majority (\(\approx \)99 %) of the events used in the analysis, are classified according to a variable constructed using multivariate techniques.

The classification procedure, which is described in detail below, results in 11 event classes for the 7\(\,\text {TeV}\ \)dataset and 14 for the 8\(\,\text {TeV}\ \)dataset. The event classes, and the expected number of SM Higgs boson events and estimated background in those classes, are set out later, in Table 3, together with the composition of the expected SM Higgs boson signal in terms of the production processes, and the diphoton mass resolution expected for the signal in each of the classes. To ensure that the classes are mutually exclusive, events are tested against the class selection requirements in a fixed order as described in Sect. 6.4.

6.1 Multivariate event classifier

A multivariate event classifier, the diphoton BDT, is constructed to satisfy the following criteria:

  1. 1.

    The diphoton BDT should assign a high score to events that have

    1. (a)

      good diphoton mass resolution,

    2. (b)

      high probability of being signal rather than background.

  2. 2.

    The classifier should not select events according to the mass of the diphoton system relative to the particular mass of the Higgs boson signal used for training.

The classifier incorporates a per-event estimate of the diphoton mass resolution, the identification BDT scores of both photons, and the kinematic properties of the diphoton system, except for \(m_{\gamma \gamma }.\) To avoid any dependence on \(m_\text {H},\) the transverse momenta and resolutions are divided by \(m_{\gamma \gamma }\).

The complete list of variables used in the BDT is the same as used in previous versions of the analysis [28]: the scaled photon transverse momenta (\(p_\mathrm {T}^{\gamma 1}\,/m_{\gamma \gamma }\) and \(p_\mathrm {T}^{\gamma 2}\,/m_{\gamma \gamma }\)), the pseudorapidities of both photons, the photon identification BDT classifier values for both photons, the cosine of the angle between the two photons in the transverse plane, the expected relative diphoton mass resolutions under the hypotheses of selecting the correct/a wrong interaction vertex, and also the probability of selecting the correct vertex.

The diphoton mass resolution depends on several factors: the location of the associated energy deposits in the calorimeter; whether or not one or both photons converted in the detector volume in front of the calorimeter; and the probability that the true diphoton vertex has been identified. Events in which one of the photons has a low identification BDT score are more likely to be due to background processes. The Higgs signal-to-background ratio, \(S/B\), varies with the kinematic properties of the diphoton system mainly through the \(\eta \) of the photons (highest \(S/B\) when both are in the barrel), and \(p_{\mathrm {T}}^{\gamma \gamma }\,\) (highest \(S/B\) for large \(p_{\mathrm {T}}^{\gamma \gamma }\,\)). The BDT is trained using a simulated signal sample having a mass, \(m_\text {H}=123\,\text {GeV} \, \) , near the centre of the mass range of the analysis. The relative abundance of events from different production processes in the sample is set according to the expectations for a SM Higgs boson with that mass.

The multivariate classifier assigns a score to each event. It has been verified that selecting simulated background events with high diphoton BDT score does not result in any peak in the diphoton invariant mass distribution of the selected events. Figure 6 shows, for the 8\(\,\text {TeV}\ \)dataset, how the BDT performs on simulated SM \(\text {H}\rightarrow \mathrm {\gamma }\mathrm {\gamma }\,\) signal events with \(m_\text {H}=125\,\text {GeV} \, \) , and on data satisfying the full diphoton preselection. The classifier score has been transformed such that the sum of signal events from all processes has a uniform, flat, distribution. This transformation assists visualization of the performance of the BDT. The outlined histogram, following the data points, is for simulated background events. The vertical dashed lines indicate the boundaries of the untagged event classes, the determination of which is described in Sect. 6.3. Given that the data are completely dominated by background events, it can be seen that the signal-to-background ratio increases substantially with the classifier score, and that the VBF, VH, and \(\text {t}\overline{\text {t}}\text {H}\) processes tend to achieve high scores, due to their significantly harder \(p_{\mathrm {T}}^{\gamma \gamma }\) spectrum [66, 67].

Fig. 6
figure 6

Transformed diphoton BDT classifier score for events satisfying the full diphoton preselection in the 8\(\,\text {TeV}\ \)data (points with error bars, left axis), and for simulated signal events from the four production processes (solid filled histograms, right axis). The outlined histogram, following the data points, is for simulated background events. The vertical dashed lines show the boundaries of the untagged event classes, with the leftmost dashed line representing the score below which events are discarded and not used in the final analysis (described in Sect. 6.3)

Figure 7 shows a comparison of the transformed classifier score for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) data and for MC simulated events, in which for both cases the electrons are reconstructed as photons. The electron showers in the events satisfy the full diphoton preselection requirements with the electron veto condition inverted. The classifier score has been subjected to the same transformation as was used for Fig. 6. The score for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events peaks at low values whilst Higgs boson signal events have a flat distribution, reflecting the differences between the two types of event, but it can be seen that sufficient numbers of \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events are present even at high values of the classifier score to enable the agreement between data and MC simulation to be adequately tested there. The good agreement between MC simulation and data for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events constitutes an important check that the modeling of the BDT input variables and their correlations in the simulation of the Higgs boson signal is accurate. The simulated events have been weighted so that the \(\text {Z}\)-boson \(p_{\mathrm {T}}\) distribution matches that observed in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) data. The band indicates the systematic uncertainty resulting from propagating to the diphoton BDT event classifier both the uncertainty associated with the photon identification BDT score (which corresponds to a shift of \(\pm \)0.01 of the score) and the uncertainty in the per-photon estimate of the energy resolution (which amounts to a scaling of its value by \(\pm \)10 %). Since the magnitudes of these two uncertainties were chosen to cover the discrepancies between data and simulation in the tails of the distributions of the two variables, the resulting uncertainty in the diphoton BDT event classifier appears to be slightly overestimated.

Fig. 7
figure 7

Transformed diphoton BDT classifier score for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events in 8\(\,\text {TeV}\ \)data, and in MC simulation, in which the electrons are reconstructed as photons. The distribution of simulated events is represented by a histogram, and the data by points with error bars. For each bin, the ratio of the number of events in data to the number of simulated events is shown in the lower plot. The bands in the two plots indicate the systematic uncertainty related to the MC cluster shape uncertainty (see text). The vertical dashed lines show the boundaries of the untagged event classes, with the leftmost dashed line representing the score below which events are discarded and not used in the final analysis (described in Sect. 6.3)

6.2 Events tagged by exclusive signatures

Selections enriched in Higgs boson production mechanisms other than ggH can be made by requiring, in addition to the diphoton pair, the presence of other objects which provide signatures of the production mechanism. Higgs bosons produced by VBF are accompanied by a pair of jets separated by a large rapidity gap. Those resulting from the VH production mechanism may be accompanied by one or more charged leptons, large \(E_{\mathrm {T}}^{\text {miss}}\,\), or jets from the decay of the \(\mathrm {W}\) or \(\text {Z}\) boson. Those resulting from \(\text {t}\overline{\text {t}}\text {H}\) production are, as a result of the decay of the top quarks, accompanied by b quarks, and may be accompanied by charged leptons or additional jets.

The tagging of dijet events, targeting VBF production, significantly increases the overall sensitivity of the analysis and precision on the measured signal strength, and increases the sensitivity to deviations of the Higgs boson couplings from their expected values. The tagging aimed at the VH process increases the sensitivity to deviations of the couplings, and the \(\text {t}\overline{\text {t}}\text {H}\) tagging further probes the compatibility of the observed signal with a SM Higgs boson.

The \(p_{\mathrm {T}}\) spectrum of Higgs bosons produced by the VBF, VH, and \(\text {t}\overline{\text {t}}\text {H}\) processes is significantly harder than that of Higgs bosons produced by ggH, or of background diphotons. This results in a harder leading-photon \(p_{\mathrm {T}}\) spectrum. In the tagged-class selections advantage is taken of this difference by raising the \(p_{\mathrm {T}}\) requirement on the leading photon.

6.2.1 Dijet-tagged event selection and BDT classifiers for VBF production

Vector boson fusion production results in two forward jets, originating from the two scattered quarks. Separating events tagged by the presence of dijets compatible with the VBF process into specific event classes not only increases the separation between signal and background, it also increases the separation between signal production processes. In the purest VBF dijet-tagged class the signal is expected to have a contribution of only 18 % from ggH production. A loose preselection of dijet events is defined and a dijet BDT is trained to separate VBF signal from diphoton background using samples of MC events satisfying this dijet preselection. Signal events from ggH satisfying the dijet preselection are included as background in the training. Details of the dijet preselection and the BDT input variables are given below. A further, “combined”, BDT is then trained. This BDT has only three input variables: the score of the dijet BDT, the score of the diphoton BDT, and the transverse momentum of the diphoton system divided by its mass, \(p_{\mathrm {T}}^{\gamma \gamma }\,/m_{\gamma \gamma }\). Events for the VBF dijet-tagged classes are selected, from those satisfying the loose dijet preselection, by placing a minimum requirement on their combined BDT score, and the selected events are then classified using that score.

The dijet preselection is applied to diphoton events satisfying the full diphoton preselection and requires the leading (in \(p_{\mathrm {T}}\,\)) and subleading jets in the event, within \(|\eta |<4.7\), to have \(p_{\mathrm {T}}\,>30\) and 20\(\,\text {GeV}\, \)respectively, and for the pair to have an invariant mass \(m_\mathrm {jj}\,>250\,\text {GeV} \, \) . The pseudorapidity requirement (\(|\eta |<4.7\)) is more restrictive than the full detector acceptance (\(|\eta |\lesssim 5\)), to avoid the use of jets for which the energy corrections are large and less reliable, and is found to decrease the signal acceptance by \(<\)2 %. Additionally, the \(p_{\mathrm {T}}\) threshold of the leading photon is raised, requiring \(p_\mathrm {T}^{\gamma 1}\,>m_{\gamma \gamma }/2\) for VBF dijet-tagged events.

The jet energy measurement is calibrated to correct for detector effects using samples of dijet, \(\gamma + \text {jet}\), and \(\text {Z}+ \text {jet}\) events [39]. The energy from pileup interactions and from the underlying event is also included in the reconstructed jets. This energy is subtracted using an \(\eta \)-dependent transverse momentum density calculated with the jet areas technique [63, 68, 69], evaluated on an event-by-event basis. Particles produced in pileup interactions may be clustered into jets of relatively large \(p_{\mathrm {T}}\) , referred to as pileup jets. These pileup jets are largely removed using selection criteria based on the width of the jet or the compatibility of the tracks in a jet with the primary vertex [70]. Finally, jets within \(\Delta R<0.5\) of either of the photons are rejected to exclude the possibility of photons having been included in the reconstruction of the jet.

The variables used in the dijet BDT are the scaled transverse momenta of the photons, \(p_\mathrm {T}^{\gamma 1}\,/m_{\gamma \gamma }\) and \(p_\mathrm {T}^{\gamma 2}\,/m_{\gamma \gamma }\), the transverse momenta of the leading and subleading jets, \(p_\mathrm {T}^\mathrm {j1}\,\) and \(p_\mathrm {T}^\mathrm {j2}\,\), the dijet invariant mass, \(m_\mathrm {jj}\,\), the difference between the pseudorapidities of the jets, \(|\Delta \eta _\text {jj}|\), the difference between the average pseudorapidity of the two jets and the pseudorapidity of the diphoton system, \(|\eta _{\gamma \gamma }-(\eta _\text {j1}+\eta _\text {j2})/2|\) [71], and the absolute difference in the azimuthal angle between the diphoton system and the dijet system, \(\Delta \phi _{\mathrm {\gamma }\mathrm {\gamma }{\mathrm {jj}}}\,\). Because of the large theoretical uncertainty in the cross section due to higher-order contributions to the ggH process accompanied by two jets in the region very close to \(\Delta \phi _{\mathrm {\gamma }\mathrm {\gamma }{\mathrm {jj}}}\,=\pi \) [54, 72], the maximum value of the variable is restricted to \(\pi -0.2\); events with \(\Delta \phi _{\mathrm {\gamma }\mathrm {\gamma }{\mathrm {jj}}}\,>\pi -0.2\) are treated as if the value was \(\pi -0.2\).

6.2.2 Lepton-, dijet-, and \(E_{\mathrm {T}}^{\text {miss}}\) -tagged event classes for VH production

The selection requirements for the classes aimed at selecting events produced by the VH process have been obtained by minimizing the expected uncertainty in the measurement of signal strength of the process, using data in control regions to estimate the background and MC signal samples to estimate the signal efficiency. Four classes are defined: events with a muon or an electron are separated into two classes, according to whether there is significant \(E_{\mathrm {T}}^{\text {miss}}\,\) or another lepton in the event, or there is not; a third class selects events with two or more jets; and the fourth class consists of events with large \(E_{\mathrm {T}}^{\text {miss}}\,\). The leading photon in the events selected for the lepton classes and for the \(E_{\mathrm {T}}^{\text {miss}}\,\)-tagged class is required to satisfy \(p_\mathrm {T}^{\gamma 1}\,>3m_{\gamma \gamma }/8\); for the dijet-tagged VH class the requirement is tighter, \(p_\mathrm {T}^{\gamma 1}\,>m_{\gamma \gamma }/2\).

Muons are reconstructed with the particle-flow algorithm and are required to be within \(|\eta |<2.4\). A tight selection is applied, based on the quality of the track and the number of hits in the tracker and muon spectrometer. A strict match between the tracker and the muon spectrometer segments is also applied to reduce the contamination from muons produced in decays of hadrons and from beam halo interactions. Finally, a loose particle-flow isolation requirement is applied.

Electrons are identified as clusters of energy deposited in the ECAL matched to tracks. Electron candidates are required to have an ECAL supercluster within the same fiducial region as for photons. Electron identification is based on a multivariate technique [14]. The electron track has to fulfil requirements on the transverse and longitudinal impact parameter with respect to the electron vertex and cannot have more than one missing hit in the innermost layers of the tracker. Electrons from conversions are excluded as described in Ref. [73] and a loose particle-flow isolation requirement is applied.

The tightly selected lepton class (“VH tight \(\ell \)”) is characterised by the full signature of a leptonically decaying \(\mathrm {W}\) or \(\text {Z}\) boson, and requires, in addition to the electron or muon, the presence of \(E_{\mathrm {T}}^{\text {miss}}\,>45\,\text {GeV}\, \) or another lepton of the same flavour as the first and with opposite sign. For the lepton plus \(E_{\mathrm {T}}^{\text {miss}}\,\) signature the \(p_{\mathrm {T}}\) of the lepton is required to be greater than 20\(\,\text {GeV} \, \) . For the dilepton signature the lepton \(p_{\mathrm {T}}\) requirement is relaxed to \(p_{\mathrm {T}}\,>10\,\text {GeV} \, \) , but the invariant mass of the pair is required to be between 70 and 110\(\,\text {GeV} \, \) . For the loose lepton class (“VH loose \(\ell \)”) only a single electron or muon with \(p_{\mathrm {T}}\,>20\,\text {GeV}\, \) is required but additional requirements are made to reduce background from leptonic decays of \(\text {Z}\) bosons with initial- or final-state radiation: muons and electrons are required to be separated from the closest photon by \(\Delta R>1.0\), and the invariant mass of electron-photon pairs is required to be more than 10\(\,\text {GeV}\, \)away from the \(\text {Z}\)-boson mass. In addition, a conversion veto is applied to the electrons to reduce the number of electrons originating from photon conversions.

Events selected for the dijet-tagged VH class are required to have a pair of jets with \(p_{\mathrm {T}}\,>40\,\text {GeV} \, \), within the region \(|\eta |<2.4\), and with an invariant mass within the range \(60<m_\mathrm {jj}\,<120\,\text {GeV} \, \) ; additional jets may also be present. The \(p_{\mathrm {T}}\) of the diphoton system is required to satisfy \(p_{\mathrm {T}}^{\gamma \gamma }\,>13m_{\gamma \gamma }/12\). The selection also exploits the expected angular distribution of the diphoton pair with respect to the dijet pair from the vector boson decay. The angle, \(\theta ^\star \), that the diphoton system makes, in the diphoton-dijet centre-of-mass frame, with respect to the direction of motion of the diphoton-dijet system in the lab frame is computed. The distribution of \(\cos \theta ^\star \) for signal events coming from VH production is rather flat, whereas background and signal events from ggH production result in \(\cos \theta ^\star \) distributions strongly peaked at \(|{\cos \theta ^\star }|=1\). Consequently \(|{\cos \theta ^\star }|<0.5\) is required.

For the \(E_{\mathrm {T}}^{\text {miss}}\,\) tag, additional selection criteria are applied on the azimuthal angular separation between the diphoton system and the \(E_{\mathrm {T}}^{\text {miss}}\,\) direction, \(|\Delta \phi _{\mathrm {\gamma }\mathrm {\gamma }E_{\mathrm {T}}^{\text {miss}}\,}|>2.1\), and between the diphoton system and the leading jet in the event, \(|\Delta \phi _{\mathrm {\gamma }\mathrm {\gamma }\mathrm {j}^1}|<2.7\). Discrepancies between data and simulated events in the direction and magnitude of the \(E_{\mathrm {T}}^{\text {miss}}\,\) vector have been studied in detail and a set of corrections derived, some of which need to be applied to simulated events, and others to data. The corrected \(E_{\mathrm {T}}^{\text {miss}}\,\) is required to satisfy \(E_{\mathrm {T}}^{\text {miss}}\,>70\,\text {GeV} \, \) .

In addition to the requirements described above, a minimum requirement is also made on the diphoton BDT classifier score for entry into the event classes tagging VH production. The severity of the requirement is optimized for each class: 0.17 for the two lepton-tagged classes, 0.62 for the \(E_{\mathrm {T}}^{\text {miss}}\,\)-tagged class, and 0.76 for the VH dijet-tagged class, where the numerical scale is the classifier score shown in Figs. 6 and 7.

6.2.3 Event classes tagged for \(\text {t}\overline{\text {t}}\text {H}\)  production

The production of Higgs bosons in association with top quarks has a small cross section, and so the overall cross section times branching fraction of the decay to photons is only 0.3 \(\text {\,fb}\)  at NLO. Therefore, in the full dataset only a handful of events are expected. To maximize signal efficiency we devise event selections that collect both leptonic and hadronic decays of the top quarks, defining both a lepton-tagged and a multijet-tagged event class.

As for the VH event classes, the selection requirements for the classes aimed at selecting events produced by the \(\text {t}\overline{\text {t}}\text {H}\) process have been obtained by minimizing the expected uncertainty in the measurement of signal strength of the process, using data in control regions to estimate the background, and MC signal samples to estimate the signal efficiency. The leading photon is required to have \(p_\mathrm {T}^{\gamma 1}\,>m_{\gamma \gamma }/2\). Jets are required to have \(p_{\mathrm {T}}\,>25\,\text {GeV}\, \) and both classes require the presence of at least one b-tagged jet. The lepton tag is then defined by requiring at least one more jet in the event and at least one electron or muon with \(p_{\mathrm {T}}\,>20\,\text {GeV} \, \) , and the multijet tag is defined by the requirement of at least four more jets in the event and no lepton. Requirements are also made on the minimum diphoton BDT classifier score for entry into the two classes tagging \(\text {t}\overline{\text {t}}\text {H}\,\): 0.17 for the lepton class, and 0.48 for the multijet class, where the numerical scale is the classifier score shown in Figs. 6 and 7. For the 7\(\,\text {TeV}\ \)dataset the events in the two classes are combined after selection to form a single \(\text {t}\overline{\text {t}}\text {H}\,\) event class.

6.3 Classification of VBF dijet-tagged and untagged events

Classes for the VBF dijet-tagged events and the untagged events are defined using the scores of the classification BDTs: the combined dijet-diphoton BDT score is used to select and define the dijet-tagged classes, and the diphoton BDT score defines the untagged class into which the untagged events are placed. The BDT score requirements that constitute the event class boundaries are set by an optimization procedure, using simulated event samples, aimed at minimizing the expected uncertainty in the signal strength. To avoid biases, the simulated events are divided into three non-overlapping sets, which are then used only for the training of the BDTs, or the optimization of event class boundaries, or to model the signal in the extraction of the final results. The number of available simulated events limits the statistical precision in the optimization procedure. The small number of simulated events for some background processes where one or more of the photon candidates result from misidentified jet fragments, results in a very uneven and spikey distribution of the event classifier scores for the simulated background in the range of BDT scores in which there is some contribution from these processes, but it is rare. So, for the event class boundary optimization procedure, the event classifier BDT scores are smoothed, using an adaptive-width Gaussian smoothing in the RooFit package [74]. Differences in performance of less than about 2 % are indistinguishable from statistical fluctuations and are regarded as insignificant.

As a result of the optimization procedure, four untagged event classes and two VBF dijet-tagged classes are defined for the 7\(\,\text {TeV}\ \)dataset. For the 8\(\,\text {TeV}\ \)dataset five untagged and three dijet-tagged classes are defined. Events that fail the requirement on the combined dijet-diphoton BDT score to enter the VBF dijet-tagged classes may enter other event classes. Untagged events that have a diphoton BDT score less than the lower boundaries of the untagged classes in the two datasets are not used in the final statistical analysis. The goal of the optimization setting the diphoton BDT score requirements, which define the untagged classes, is to minimize the expected uncertainty in the overall signal strength measurement. The goal of the optimization for the setting of the combined dijet-diphoton BDT score boundaries, which define the VBF dijet-tagged classes, is to minimize the expected uncertainty in the signal strength associated with the VBF production mechanism. When optimizing the boundaries for the 7\(\,\text {TeV}\ \)dataset, for which the number of MC background events available is particularly limited, the number of dijet-tagged classes is limited to two and the lower boundary of the lowest dijet-tagged class is fixed so that the same efficiency times acceptance is obtained for VBF signal events as in the 8\(\,\text {TeV}\ \)dataset.

Figure 8 shows the combined dijet-diphoton BDT score for events satisfying the dijet preselection in 8\(\,\text {TeV}\ \)data, and for simulated signal events from the four production processes. The outlined histogram is for simulated background events; the shaded error bands on the histogram show the statistical uncertainty in the simulation. The VBF dijet-tagged class boundaries used for the 8\(\,\text {TeV}\ \)dataset are shown by vertical dashed lines. The classifier score is transformed such that signal events produced by the VBF process have a uniform, flat, distribution across the full range of the score. This allows the visualization of the extent to which signal events produced by the VBF process are favoured over background (which predominates in the data), and signal events produced by other processes. Events with scores below the lower boundary fail the VBF dijet-tagged selection, but remain candidates for inclusion in other classes.

Fig. 8
figure 8

Score of the combined dijet-diphoton BDT for events satisfying the dijet preselection in 8\(\,\text {TeV}\ \)data (points with error bars, left axis) and for simulated signal events from the four production processes (histograms, right axis). The outlined histogram is for simulated background events; the shaded error bands on the histogram show the statistical uncertainty in the simulation. The vertical dashed lines show the boundaries of the event classes, with the leftmost dashed line representing the score below which events are not included in the VBF dijet-tagged classes, but remain candidates for inclusion in other classes. The classifier score is transformed such that signal events produced by the VBF process have a uniform, flat, distribution

The lower boundary on the untagged event class with the lowest signal-to-background ratio controls the total number of events used in the analysis and the overall signal efficiency times acceptance of the analysis (see Fig. 6). The boundary excludes events with very low score in the diphoton BDT for which the background is poorly modelled by MC simulation. Exclusion of these events has the advantage of allowing a better assessment of the expected sensitivity of the analysis, but the exact placement of the boundary is of little consequence.

It is found that, within the statistical uncertainty described above, it makes no difference if the optimization goal is the expected overall uncertainty in signal strength, the expected significance of the signal, or the expected uncertainty in the measured signal strength associated with the VBF production mechanism. It is also found that the performance maxima that fix the event class boundaries are rather shallow, so that the boundaries can be moved without significantly changing the expected performance. Adding further event classes for either the untagged or the VBF dijet-tagged events does not significantly improve the expected performance.

The overall efficiency times acceptance for SM Higgs boson events with \(m_\text {H}=125\,\text {GeV}\, \) is 49.3 % (48.6 %) in the 8 (7)\(\,\text {TeV}\ \)analysis. Investigating the properties of the simulated signal events in the untagged classes reveals, as expected, that the best untagged class (“untagged 0”) contains events in which the diphoton system has high \(p_{\mathrm {T}}\)  (almost all events have \(p_{\mathrm {T}}^{\gamma \gamma }\,>80\,\text {GeV}\)), while the second best class (“untagged 1”) is dominated by events in which both photons are unconverted and situated in the central barrel region of the ECAL.

6.4 Procedure of classification

In total there are 14 event classes for the analysis of the 8\(\,\text {TeV}\ \)dataset and 11 for the analysis of the 7\(\,\text {TeV}\ \)dataset. To ensure that the classes are mutually exclusive, events are tested against the class selection requirements in a fixed order: first the production-signature tagged classes ranked by expected signal-to-background ratio, then the untagged classes. Once selected, events are no longer candidates for inclusion in other classes. The ordering is that shown in Table 2, which lists the classes together with their key selection requirements.

Table 2 Event classes for the 7 and 8\(\,\text {TeV}\ \)datasets and some of their main selection requirements

7 Signal model

A parametric signal model is constructed separately for each event class and for each production mechanism from a fit of the simulated invariant mass shape, after applying the corrections determined from comparisons of data and simulation for \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) and \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) events, for nine values of \(m_\text {H}\) in the range \(110\le m_\text {H}\le 150\,\text {GeV} \, \) , at 5\(\,\text {GeV}\, \)intervals. The two possible cases regarding diphoton vertex identification, correct vertex and wrong (misidentified) vertex, are fitted separately. Good descriptions of the distributions, including the tails, can be achieved using a sum of Gaussian functions, where the means are not required to be identical. The fits are first performed for the \(m_\text {H}=125\,\text {GeV}\, \) MC sample to determine the number of Gaussian functions to be used and the starting values of their parameters for the further fits to the other eight samples. As many as five Gaussian functions are used, although in most cases the use of two or three results in a good fit. Signal models for intermediate values of \(m_\text {H}\) are obtained by linear interpolation of the fitted parameters.

Table 3 shows the number of expected signal events from a SM Higgs boson with \(m_\text {H}=125\,\text {GeV}\, \) as well as the background density at that mass for each of the event classes in the 7 and 8\(\,\text {TeV}\ \)datasets. The background estimate is obtained from a fit to the data, as described in Sect. 8, and is given as the differential rate, \(\text {d}N/\text {d}m_{\gamma \gamma }\) (events/GeV), at \(m_{\gamma \gamma }=125\,\text {GeV} \, \) . The table also shows the fraction of each Higgs boson production process (as predicted by MC simulation) as well as the mass resolution, measured both by half the width of the narrowest interval containing 68.3 % of the invariant mass distribution, \(\sigma _\text {eff}\), and by the full width at half maximum of the distribution divided by 2.35, \(\sigma _\mathrm {HM}\).

Table 3 Expected number of SM Higgs boson events (\(m_\text {H}=125\,\text {GeV}\, \)) and estimated background (“Bkg.”) at \(m_{\gamma \gamma }=125\,\text {GeV}\, \) for all event classes of the 7 and 8\(\,\text {TeV}\ \)datasets. The composition of the SM Higgs boson signal in terms of the production processes and its mass resolution is also given. The number corresponding to the production process making the largest contribution to each event class is highlighted in boldface. Numbers are omitted for production processes representing less than 0.05 % of the total signal. The variables used to characterize the resolution, \(\sigma _\text {eff}\) and \(\sigma _\mathrm {HM}\), are defined in the text

It can be seen that in all classes \(\sigma _\text {eff}>\sigma _\mathrm {HM}\) since the tails of the signal mass distribution are always somewhat larger relative to the width of the core of the distribution than would be the case for a Gaussian distribution. Untagged events with the best mass resolution are selected to the best event classes, and even ignoring the improving mass resolution, and considering a wide window to include all the signal events, the signal-to-background ratio improves by an order of magnitude going from the worst to the best untagged class—a significantly larger variation than the change in resolution. The highest signal-to-background ratio is achieved in the tagged classes, many of which manage to also achieve high levels of purity with respect to contamination from the ggH process.

The mass resolution achieved has improved significantly with respect to analyses of this decay mode previously reported by CMS [28], due to improved intercalibration of the ECAL, complemented by the improved supercluster energy correction regression described in Sect. 4.1. For events in which both photons are in the barrel the \(\sigma _\text {eff}\) has been reduced by around 5 % in 7 \(\,\text {TeV}\ \)data, and by more than 20 % in 8 \(\,\text {TeV}\ \)data. When at least one photon is in the endcap region the \(\sigma _\text {eff}\) has been reduced by around 20 % in 7\(\,\text {TeV}\ \)data, and by more than 30 % in 8\(\,\text {TeV}\ \)data. The reduction in \(\sigma _\mathrm {HM}\), representing the core of the distribution, is slightly larger, generally an additional 5 % better, when compared to \(\sigma _\text {eff}\).

8 Statistical methodology

To extract a result or measurement a simultaneous binned maximum-likelihood fit to the diphoton invariant mass distributions in all the event classes is performed over the range \(100<m_{\gamma \gamma }<180\,\text {GeV} \, \) . Binned fits are used for speed of computation, and the bin size chosen, 250\(\,\text {MeV}\) , is sufficiently small compared to the mass resolution that no information is lost. It has been verified that a binned fit with this bin size gives the same result as an unbinned fit. The signal model is derived from MC simulation after applying the corrections determined from data/MC comparisons of \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) and \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) events, as described in the previous section. The background is evaluated by fitting the \(m_{\gamma \gamma }\) distribution in data, without reference to the MC simulation. Thus the likelihood to be evaluated in a signal-plus-background fit is

$$\begin{aligned} \mathcal {L}\,=\mathcal {L}\,(\text {data}|s(p,m_{\gamma \gamma })+f(m_{\gamma \gamma })), \end{aligned}$$
(1)

where \(p\) comprises those parameters of the signal, such as \(m_\text {H}\) or the signal strength, that are allowed to vary in the fit, \(s(p,m_{\gamma \gamma })\) is the parametric signal model, and \(f(m_{\gamma \gamma })\) the background fit function.

The chosen test statistic, used to determine how signal- or background-like the data are, is based on the profile likelihood ratio. Systematic uncertainties are incorporated into the analysis via nuisance parameters and treated according to the frequentist paradigm. A description of the general methodology can be found in Refs. [75, 76]. Unless stated otherwise, the results presented here are obtained using asymptotic formulae [77], including updates introduced in the RooStats package [78].

It is important that the choice of background fit function does not bias the estimate of background obtained from the fit for any signal mass hypothesis, \(m_\text {H}\), in the range of the search.

A change has been made with respect to the method used to obtain previous results, which is described in Ref. [28]. Previously, a single fit function was chosen for each class after a study of the potential bias on the estimated background. The potential bias using the chosen function was required to be negligible. The number of degrees of freedom of the fit was increased until the bias became at least five times smaller than the statistical uncertainty in the number of fitted events in a mass window corresponding to the full width at half maximum of the corresponding signal model, for any mass in the range \(110\le m_\text {H}\le 150\,\text {GeV} \, \) .

For the results reported in this paper a method, the discrete profiling method, has been developed [79] to treat the uncertainty associated with the choice of the function used to fit the background, in a similar way to systematic uncertainties associated with the measurements. The choice of the function used to fit the background, in any particular event class, is included as a discrete nuisance parameter in the likelihood function used to extract the result. All reasonable families of functions should be considered, although in practice it is found that the choice needs to be made between functions in the same families as were previously considered: exponentials, power-law functions, polynomials in the Bernstein basis, and Laurent series. When performing either a background-only fit, or a signal-plus-background fit, by minimizing the value of twice the negative logarithm of the likelihood all functions in these families are tried, with a penalty term added to account for the number of free parameters in the fitting function.

The penalized likelihood function, \(\widetilde{\mathcal {L}}\,_f\), for a single fixed background fitting function, \(f\), is defined as

$$\begin{aligned} -2\ln \widetilde{\mathcal {L}}\,_f=-2\ln \mathcal {L}\,_f+kN_{f}, \end{aligned}$$
(2)

where \(\mathcal {L}\,_f\) is the unpenalized likelihood function, \(N_{f}\) is the number of free parameters in \(f\), and \(k\) is a constant. When measuring a quantity, \(p\), the likelihood ratio, \(q(p)\), is used:

$$\begin{aligned} q(p)=-2\ln \frac{\widetilde{\mathcal {L}}\,(\text {data}|p,\hat{\theta }_p,\hat{f}_p)}{\widetilde{\mathcal {L}}\,(\text {data}|\hat{p},\hat{\theta },\hat{f})}, \end{aligned}$$
(3)

where the numerator represents the maximum of \(\widetilde{\mathcal {L}}\,\) given \(p\), achieved for the best-fit values of the nuisance parameters, \(\theta =\hat{\theta }_p\), and a particular background function, \(f=\hat{f}_p\). The denominator corresponds to the global maximum of \(\widetilde{\mathcal {L}}\,\), where \(p=\hat{p}\), \(\theta =\hat{\theta }\), and \(f=\hat{f}\). Choosing the functional form of the background that maximizes \(\widetilde{\mathcal {L}}\,\) for any particular value of \(p\) yields confidence intervals on \(p\) that can only be wider than those obtained using the single fixed functional form from the global best fit, \(f=\hat{f}\).

Two values of \(k\), which sets the magnitude of the penalty for increasing the number of free parameters in the fit, have been tested in detail. The values of \(k=1\) and \(k=2\) can be justified, respectively, by the \(\chi ^\mathrm{2}\ p\)-value and the Akaike information criterion [80]. It is found in tests made with pseudo-experiments that with a value of \(k=1\) the method gives consistently good coverage and negligible bias.

In order to test coverage and bias we generate pseudo-data. To do that we need first to fit the data, thus facing a problem similar to, but not to be confused with, the original problem of choosing the background fit function to model the background in the analysis. The method used to generate pseudo-data is as follows. For each event class in turn, functions from each of the families used in the discrete profiling method, and listed above, are fit to the data. In each family, the number of degrees of freedom (number of exponentials, number of terms in the series, degree of the polynomial, etc.) is increased until the \(\chi ^2\) between N+1 degrees of freedom and N degrees of freedom for the fit to data shows no significant improvement (\(p\text {-value}<0.05\) obtained from the F-distribution [81]). At that point the function with N degrees of freedom is retained as representative of that family of functions. For each event class, the fits to the data with the retained representative functions for that class, are used to generate pseudo-background distributions.

The discrete profiling method is applied to pseudo-experiments in which signals having a range of strengths, from half to twice that of the SM, are added to the pseudo-background. The tests have demonstrated that the discrete profiling method provides good coverage of the uncertainty associated with the choice of the function, for all the functions considered as generators of background, and provides an estimate of the signal strength with negligible bias. The criterion used for this is similar and approximately equivalent to that used previously [28], the median of the distribution of the pull on the signal strength, \((\mu _\text {meas.}-\mu _\text {true})/\sigma _{\mu _\text {meas.}}\), should be less than 0.14. This value is chosen because satisfaction of this criterion ensures that any underestimation of the uncertainty in the signal strength is less than 1 %.

The \(m_{\gamma \gamma }\)distributions in the 25 event classes in the 7 and 8\(\,\text {TeV}\ \)data samples, together with the results of a simultaneous fit of the signal-plus-background model, are shown in Figs. 916. The \(m_{\gamma \gamma }\)distribution of the combined event classes is shown in Sect. 11. The distributions are labeled with the \(\sqrt{s}\) and integrated luminosity of the combined datasets, reflecting the fact that the signal-plus-background fit is a simultaneous fit to the 25 event classes. Data points are drawn for all bins, including those in which there are no events. The error bars are calculated using the Garwood procedure [82] to provide correct coverage of the Poisson uncertainty. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters, and are computed from the variation in pseudo-experiments on the fitted background yield in bins corresponding to those used to display the data. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated. The fit is performed on the data from all event class distributions simultaneously, with a single overall value of the signal strength free to vary in the fit.

Fig. 9
figure 9

Events in the four untagged classes of the 7\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 10
figure 10

Events in the five untagged classes of the 8\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 11
figure 11

Events in the two VBF dijet-tagged classes of the 7\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 12
figure 12

Events in the three VBF dijet-tagged classes of the 8\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 13
figure 13

Events in the VH-tagged classes of the 7\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 14
figure 14

Events in the VH-tagged classes of the 8\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit are computed from the fit uncertainty in the background yield in bins corresponding to those used to display the data. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 15
figure 15

Events in the \(\text {t}\overline{\text {t}}\text {H}\,\)-tagged class of the 7\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model for \(m_\text {H}=124.7\,\text {GeV}\, \). The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

Fig. 16
figure 16

Events in the two \(\text {t}\overline{\text {t}}\text {H}\,\)-tagged classes of the 8\(\,\text {TeV}\ \)dataset, binned as a function of \(m_{\gamma \gamma }\), together with the result of a fit of the signal-plus-background model. The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit are computed from the fit uncertainty in the background yield in bins corresponding to those used to display the data. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated

9 Systematic uncertainties

The uncertainty related to the background modelling, and how it is handled, has been discussed in the previous section. The systematic uncertainties related to the signal model are described below. A useful measure of the relative importance of the various systematic uncertainties can be obtained by tabulating their contributions to the total uncertainty in the final results for the best-fit signal strength and the best-fit mass. This is done in Tables 7 and 8 in Sect. 11 where the results of the analysis are discussed.

The systematic uncertainties assigned to all events are

  • PDF, and theory uncertainties: the theory systematic uncertainties in the production cross section and the diphoton branching fraction follow the recommendations of the LHC Higgs Cross Section Working Group [54, 83]. As can be seen in Table 7, these uncertainties make up the largest contribution to the uncertainty in the signal strength, and are dominated by the uncertainty in the ggH process cross section, coming from both uncertainties due to the missing higher orders and uncertainties related to the parton distribution functions. The effect of these theory uncertainties on the overall acceptance and on the classification of the accepted events is included by varying the \(p_{\mathrm {T}}\,\) and rapidity distributions of the simulated Higgs boson events as they are changed by the theory uncertainties.

  • Integrated luminosity: the luminosity uncertainty is estimated as described in Refs. [84, 85], and amounts to a 2.2 % (2.6 %) uncertainty in the signal yield in the 7 (8)\(\,\text {TeV}\ \)datasets, respectively.

  • Vertex finding efficiency: the uncertainty in the vertex finding efficiency is taken from the uncertainty in the measurement of the corresponding data/MC scale factor obtained using \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\,\) events. We assign an additional 1 % uncertainty in the vertex finding efficiency, related to the amount of activity resulting in charged particle tracks in signal events, which is derived by varying the pythia underlying event tunes in ggH events. Since the vertex-finding efficiency varies considerably with \(p_{\mathrm {T}}^{\gamma \gamma }\,\), there is an uncertainty in the overall efficiency coming from the uncertainty in the signal \(p_{\mathrm {T}}\)  distribution, leading to a further uncertainty of 0.2 % to be added to the uncertainty in the data/MC scale factor for both the 7 and 8\(\,\text {TeV}\ \)datasets.

  • Trigger efficiency: the uncertainty in the trigger efficiency is extracted from \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events using a tag-and-probe technique. Rescaling is used to take into account the difference in the \(R_\mathrm {9}\) distributions of electrons and photons. The uncertainty value obtained is slightly less than 1 %, but an uncertainty of 1 % has been assigned.

The systematic uncertainties related to individual photons are

  • Photon energy scale uncertainty resulting from electron/photon differences: an important source of uncertainty in the energy scale of photons is the imperfect modelling of the difference between electrons and photons by the MC simulation, the most important cause of which is an imperfect description of the material between the interaction point and the ECAL. Studies of electron bremsstrahlung, photon conversion vertices, and the multiple scattering of pions suggest a deficit of material in the simulation. Although the deficit is almost certainly in specific structures and localized regions—and this hypothesis is supported by the studies—the data/MC discrepancies are slightly smaller than what would be caused by a 10 % uniform deficit of material in the region \(|\eta |<1.0\) and a 20 % uniform deficit for \(|\eta |>1.0\). The resulting uncertainty in the energy scale has been assessed using simulated samples in which the tracker material is increased uniformly by 10 and 20 %, and an uncertainty, with differing magnitude in eight bins (\(\eta \): three barrel and one endcap, and \(R_\mathrm {9}\): two bins) is assigned to photon energies. The systematic uncertainty in the energy scale ranges from 0.03 % in the central ECAL barrel up to 0.3 % in the outer endcap. Two nuisance parameters, one for \(|\eta |<1.0\) and one for the remainder of the \(\eta \) range used in the analysis, are introduced to model this uncertainty, which is fully correlated between the 7 and 8\(\,\text {TeV}\ \)datasets. Another difference between data and simulation, relevant to electron-photon differences, is the modelling of the varying fraction of scintillation light reaching the photodetector as a function of the longitudinal depth in the crystal at which it was emitted. Ensuring adequate uniformity was a major accomplishment in the lead tungstate crystal development that was achieved by depolishing one face of each barrel crystal, but an uncertainty in the degree of uniformity achieved remains [86, 87]. In addition, the uniformity is modified by the radiation-induced loss of transparency of the crystals. The effect of the uncertainty, including the effect of radiation-induced transparency loss, has been simulated. It results in a difference in the energy scale between electrons and unconverted photons which is not present in the standard simulation. The magnitude of the uncertainty in the photon energy scale is 0.04 % for photons with \(R_\mathrm {9}>0.94\) and 0.06 % for those with \(R_\mathrm {9}<0.94\), but the signs of the energy shifts are opposed, and the two anti-correlated uncertainties result in an uncertainty about 0.015 % in the mass scale. A further small uncertainty is added to account for imperfect electromagnetic shower simulation by \({\textsc {geant}}\,4\) version 9.4.p03. A simulation made with an improved shower description, using the Seltzer–Berger model for the bremsstrahlung energy spectrum [88], changes the energy scale for both electrons and photons. The much smaller changes in the difference between the electron and photon energy scales, although mostly consistent with zero, are interpreted as a limitation on our knowledge of the correct simulation of the showers, leading to a further uncertainty of 0.05 %.

  • Energy scale nonlinearity: possible differences between MC simulation and data in the extrapolation from shower energies typical of electrons from \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) decays, to those typical of photons from \(\text {H}\rightarrow \mathrm {\gamma }\mathrm {\gamma }\,\) decays, have been investigated with \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) data samples by binning the events according to the scalar sum of the \(E_{\mathrm {T}}\,\) of the two electron showers, and by studying electron showers in \(\mathrm {W}\rightarrow \mathrm {e}\nu \,\) events in which the electron \(p_{\mathrm {T}}\,\) is also measured by the tracker. The effect of the differential nonlinearity in the measurement of photon energies has an effect of up to 0.1 % on the diphoton mass scale for diphoton masses close to \(m_{\gamma \gamma }=125\,\text {GeV}\, \).  In the best untagged event class, in which the diphoton transverse momentum is particularly high, the effect is up to 0.2 %. The uncertainties are not completely correlated between the 7 and 8\(\,\text {TeV}\ \)datasets, since the energy response regression (Sect. 4.1), which would be strongly implicated in any nonlinearity, uses independent sets of regression weights for the two datasets. Moreover, \(E_{\mathrm {T}}\,\)-dependent scale corrections have been applied at 8\(\,\text {TeV}\ \)for barrel photons, while the corrections at 7\(\,\text {TeV}\ \)are not \(E_{\mathrm {T}}\,\)-dependent. Studies suggest that there may be as much as 20 % correlation between the uncertainties in the energy scale nonlinearities in the 7 and 8\(\,\text {TeV}\ \)datasets, and this correlation is included in the implementation of the uncertainties. This uncertainty makes a significant contribution to the uncertainty in the measured Higgs boson mass, as can be seen in Table 8.

  • Measuring and correcting the energy scale in data, and the energy resolution in simulation: the energy scale and resolution in data are measured with electrons from \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) decays. The statistical uncertainties in the measurements are small, but the methodology, which is described in Sect. 4.1, gives rise to a number of systematic uncertainties related to the imperfect agreement between data and MC simulation. These are estimated and accounted for in the same eight bins (4 bins in \(|\eta |\) and 2 bins in \(R_\mathrm {9}\)) as are used to derive the scale corrections and the resolution smearings for simulated events. The uncertainties range from 0.05 % for unconverted photons in the ECAL central barrel, to 0.1 % for converted photons in the ECAL outer endcaps. In addition, for the barrel region, the uncertainty in the energy dependence of the Gaussian smearing applied to the simulation, is also accounted for. The energy dependence of the smearing is controlled by a parameter that shares the smearing between a constant term and a term proportional to \(1/\sqrt{E_{\mathrm {T}}\,}\), and the uncertainty pertains to this sharing. Finally, there is an overall uncertainty that accounts for possible misdescription of the \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) line-shape in simulation.

  • Photon identification BDT score, and estimate of the per-photon energy resolution: the uncertainties in these two quantities are discussed together since they are studied in the same way, and the dominant underlying cause of the observed differences between data and simulation is, almost certainly, the imperfect simulation of the shower shape—despite the fact that no obvious differences between data and simulation can be observed when the shower shape variables are examined individually. The combined contribution of the uncertainties in these two quantities dominates the experimental contribution to the systematic uncertainty in the signal strength, and has been labeled “shower shape modelling” in Table 7. The agreement between data and simulation is examined when the photon candidates are electron showers reconstructed as photons in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events, photons in \(\text {Z}\rightarrow \mathrm {\mu ^+}\mathrm {\mu ^-}\mathrm {\gamma }\,\) events, and leading photons in preselected diphoton events where \(m_{\gamma \gamma }>160\,\text {GeV} \, \) . It is found that among the input variables to the diphoton BDT, only the distributions of the photon identification BDT score and the per-photon energy resolution estimate show significant differences between data and simulation. A variation of \(\pm \)0.01 on the photon identification BDT score, together with an uncertainty in the per-photon energy resolution estimate, parameterized as a rescaling of the resolution estimate by \(\pm \)10 % about its nominal value, fully covers the differences observed in all three of the above data samples.

  • Photon preselection efficiency: the uncertainty in the photon preselection efficiency is taken as the uncertainty in the data/MC preselection efficiency scale factors, which are measured using \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events with a tag-and-probe technique (see Table 1).

The effect of the single photon uncertainties is propagated to the diphoton quantities: diphoton efficiency, diphoton mass scale, and diphoton mass resolution. For instance, to obtain the magnitude of the mass-scale uncertainty resulting from a particular photon energy uncertainty, which may relate only to certain photons (such as barrel photons with \(R_\mathrm {9}>0.94\)), the energy of photons in simulated signal events to which the uncertainty applies is shifted by the \(1\,\sigma \) single photon uncertainty. The resulting shift of the mean of the diphoton mass distribution in each event class is determined. This shift corresponds to the effect of the single photon energy uncertainty in the diphoton mass scale and may be different for each event class. The effect of single photon uncertainties on the diphoton selection efficiency and diphoton resolution are determined in a similar way.

The sources of systematic uncertainty for the event classes targeting specific production modes are

  • Uncertainties in jet requirements: the largest uncertainty related to the tagging of production processes comes from a theory uncertainty and concerns the probability of producing additional jets in gluon-fusion Higgs boson production. The Stewart–Tackmann procedure [72] recommended by the LHC Higgs Cross Section Working Group [54] has been used to quantify the uncertainty in the yield of ggH events in the VBF dijet-tagged classes. The resulting uncertainty agrees comfortably with our previous estimation [28] derived by varying the underlying event tunes in ggH events produced by pythia, and that method is retained to estimate the uncertainty associated with additional jet production in the yield of ggH events in the \(\text {t}\overline{\text {t}}\text {H}\) multijet-tagged class. There is a further contribution to the uncertainty in the yield of ggH events in the \(\text {t}\overline{\text {t}}\text {H}\) multijet-tagged class arising from the uncertainty in the probability of gluon splitting to \(\text {b}\overline{\text {b}}\,\), which is estimated from the discrepancy observed between data and powheg simulation in the fraction of additional b-tagged jets in samples of \(\text {t}\overline{\text {t}}\,\)+jets events, where the \(\text {t}\overline{\text {t}}\,\) pair is identified by the presence of two charged leptons in the final state. Additionally, since few events from the simulated signal samples of ggH are selected for the \(\text {t}\overline{\text {t}}\text {H}\) multijet-tagged class, there is a contribution due to the limited sample size. For the VBF dijet-tagged classes, the VH dijet-tagged class, and the \(\text {t}\overline{\text {t}}\text {H}\) multijet-tagged class there is an uncertainty in the effect of the algorithm used to reject jets from pileup (in the 8\(\,\text {TeV}\ \)dataset only). Further small contributions are due to the uncertainties in the jet energy scale and resolution corrections.

  • Lepton identification efficiency: for both electrons and muons, the uncertainty in the identification efficiency is computed by varying the data/simulation efficiency scale factor by its uncertainty. The resulting differences in the selection efficiency for the event classes tagged by leptons, range from 0.2 to 0.5 % depending on the event category, and are taken as systematic uncertainties.

  • \(E_{\mathrm {T}}^{\text {miss}}\,\) selection efficiency: systematic uncertainties due to \(E_{\mathrm {T}}^{\text {miss}}\)  reconstruction are estimated both in signal events in which real \(E_{\mathrm {T}}^{\text {miss}}\)  is expected (such as in \(\mathrm {W}(\ell \nu )\text {H}\) production) and in the other Higgs production mechanisms. For WH events the uncertainty is estimated by applying or not the \(E_{\mathrm {T}}^{\text {miss}}\)  corrections and taking the difference in efficiency of 2.6 % as a systematic uncertainty. For the other processes, ggH, VBF, and \(\text {t}\overline{\text {t}}\text {H}\,\), what is uncertain is the fraction of events in the tail of the \(E_{\mathrm {T}}^{\text {miss}}\)  distribution. This is evaluated by comparing diphoton data and simulated events in control samples enriched in \(\mathrm {\gamma }\)+jet events, which have a similar \(E_{\mathrm {T}}^{\text {miss}}\)  distribution to the Higgs signal events. The systematic uncertainty amounts to 4 %.

  • b-tagging efficiency: the uncertainty in the b-tagging efficiency used in the selection for the \(\text {t}\overline{\text {t}}\text {H}\,\)-tagged classes, is evaluated by varying the measured b-tagging efficiency scale factors between data and simulation within their uncertainty. The resulting uncertainty in the signal yield is 1.3 % in the lepton-tagged class and 1.1 % in the multijet-tagged class.

10 Alternative analyses

Three alternative analyses are performed using particular variations of methodology, which help to provide verification of different aspects of the analysis described in the previous sections.

10.1 Cut-based analysis

The first of these, the “cut-based” analysis described in Ref. [28], does not use multivariate techniques for selection or classification of events. Photon identification is performed by dividing photons into four mutually exclusive categories depending on whether the photon is in the barrel or endcap, and on whether or not it has \(R_\mathrm {9}>0.94\). The identification selection requirements are then particular to the category, and use a subset of the discriminating variables that are used in the multivariate photon identification described in Sect. 4.3.

Four mutually exclusive diphoton event classes are constructed by splitting the events according to the same categorization criteria as is used for single photons in the photon identification. Subsequently these four classes are each split according to the transverse momentum of the diphoton system. The four event classes are

  1. 0.

    Both photons are in the barrel and have \(R_\mathrm {9}>0.94\).

  2. 1.

    Both photons are in the barrel and at least one of them fails the requirement of \(R_\mathrm {9}>0.94\).

  3. 2.

    At least one photon is in the endcap and both photons have \(R_\mathrm {9}>0.94\).

  4. 3.

    At least one photon is in the endcap and at least one of them fails the requirement \(R_\mathrm {9}>0.94\).

Photons with a high value of the \(R_\mathrm {9}\) variable are predominantly unconverted and have a better energy resolution than those with a lower value, and photon candidates with a high value of \(R_\mathrm {9}\) are also less likely to arise from misidentification of jet fragments. Similarly, photons in the barrel have both better energy resolution and are more likely to be signal photons. Thus, the classification serves a similar purpose to the one using the BDT event classifier: events with good diphoton mass resolution, resulting from photons with good energy resolution, and with better signal-to-background ratio are grouped together. Each of the four event classes is then split into two according to the transverse momentum of the diphoton system. Since the \(p_{\mathrm {T}}^{\gamma \gamma }\) spectrum resulting from Higgs bosons produced by the VBF, VH, or \(\text {t}\overline{\text {t}}\text {H}\) processes is significantly harder than that of the diphoton background, this separation improves the sensitivity of the analysis by increasing the expected signal-to-background ratio in the high-\(p_{\mathrm {T}}^{\gamma \gamma }\) event classes. The magnitude of the improvement in sensitivity is about 5 %, and has a very weak dependence on the precise value of the \(p_{\mathrm {T}}^{\gamma \gamma }\) threshold chosen. To avoid modification of the shape of the invariant mass spectrum by the threshold, the classification uses the ratio \(p_{\mathrm {T}}^{\gamma \gamma }\,/m_{\gamma \gamma }\), with a threshold value of 0.32, corresponding to \(p_{\mathrm {T}}^{\gamma \gamma }\,=40\,\text {GeV}\, \) at \(m_{\gamma \gamma }=125\,\text {GeV} \, \) .

Event classes tagged by signatures of VBF, VH, and \(\text {t}\overline{\text {t}}\text {H}\) production are also included in the cut-based analysis. The event classes tagged for VH and \(\text {t}\overline{\text {t}}\text {H}\) production are defined in exactly the same way as described in Sect. 6.2, with the exception that the minimum requirements on the diphoton BDT scores are replaced by the cut-based photon identification requirements. A dijet tag is defined to select signal events produced by the VBF process by requiring a pair of jets satisfying requirements on the same variables as are used by the main analysis in the dijet BDT described in Sect. 6.2.1. These selection requirements are listed in Table 4. The tagged events are subdivided into two classes depending on whether they additionally satisfy tighter requirements on the \(p_{\mathrm {T}}\) of the second jet and the dijet mass, \(p_\mathrm {T}^\mathrm {j2}\,>30\,\text {GeV}\, \!\!, m_\mathrm {jj}\,>500\,\text {GeV}\, \).

Signal and background models are constructed in the same way as in the main analysis and are fitted to the \(m_{\gamma \gamma }\) distributions. Since this analysis does not use multivariate techniques for event selection or for event classification, it provides some degree of cross-checking on their use in the main analysis.

Table 4 Selection requirements for the VBF dijet tag in the cut-based and dijet 2D analyses. The variables are defined in Sect. 6.2.1

10.2 Sideband background model analysis

The second alternative analysis approach, the “sideband background model” analysis described in Ref. [28], uses the same multivariate techniques as the standard analysis to select the events, but employs a very different procedure to model the background. For any given mass hypothesis, \(m_\text {H}\), a signal region is defined as the \({\pm }2~\%\) range centred on \(m_\text {H}\). A contiguous set of sidebands is defined in the mass distribution on either side of the signal region, from which the background is extracted. Each sideband is defined to have the same width of \({\pm }2~\%\) relative to the diphoton mass that corresponds to its centre. A total of eight sidebands are defined, four on either side of the signal region. Six sidebands are used to obtain the background estimate, with a sideband on either side of the signal region left unused in order to avoid signal contamination.

The result is extracted by counting events in the signal region, in bins that are defined using two-dimensional (2D) distributions of the diphoton BDT score and the diphoton mass in the form \(\Delta m/m_\text {H}\), where \(\Delta m=m_{\gamma \gamma }-m_\text {H}\) and \(m_\text {H}\) is the Higgs boson mass hypothesis. The distributions, for simulated signal and background events, are in the form of histograms, and after applying a smoothing algorithm to them, seven event bins are defined for the untagged events by defining regions ranked by signal-to-background ratio in the 2D plane. For the tagged events, the event bins correspond to the tagged classes described in Sect. 6.2.

The overall normalization of the background model is obtained from a parametric fit to the inclusive mass spectrum, with the signal region excluded from the fit, and it is easy to account for the small uncertainty associated with the choice of function in this single fit. The number of events in each event bin is obtained from the data in each of the six sidebands. It is assumed that, for any sideband, the fraction of events in each bin is a linear function of the invariant mass of the sideband central mass, and that there is negligible signal contamination in the sidebands. These assumptions have been verified within the assigned systematic uncertainties. The sideband background analysis does not rely on a parametric fit to the \(m_{\gamma \gamma }\)distribution to model the background shape in the signal region, and thus provides a valuable cross-check of the background modelling used in the main analysis.

10.3 Dijet 2D analysis

The third alternative analysis, the “dijet 2D” analysis, uses a different method for extracting the signal produced by the VBF production process. The dijet invariant mass, \(m_\mathrm {jj}\,\), of the pair of jets that accompany the production of a Higgs boson by the VBF mechanism, tends to be larger than that of pairs of jets found in either background events or in events produced by the ggH process. The analysis takes advantage of this by extracting the VBF signal in a parametric 2D fit of signal and background in the (\(m_{\gamma \gamma }\), \(m_\mathrm {jj}\,\)) plane. The initial selection of events for the analysis makes a requirement on the photon identification BDT score (Sect. 4.3). Dijet-tagged events are required to satisfy the same requirements as for the VBF dijet tag in the cut-based analysis, shown in Table 4. The invariant mass of the dijet pair is required to satisfy \(m_\mathrm {jj}\,>250\,\text {GeV} \, \) , and the selected events in the 7 and 8\(\,\text {TeV}\ \)datasets are divided in two and four event classes, respectively, based solely on the estimated diphoton mass resolution. The remaining events, not selected for the VBF dijet-tagged classes, are classified in the same way as in the main analysis. The 2D fit is applied to the events in the dijet-tagged classes using parametric 2D signal and background models. The signal in the other event classes is extracted using a one-dimensional fit to the \(m_{\gamma \gamma }\) distribution, as in the main analysis. This analysis provides an alternative approach to extracting the VBF signal, which provides most of the sensitivity in the measurement of vector-boson-initiated production.

11 Results

Figure 17 shows the \(m_{\gamma \gamma }\)distribution of the combined data in the 7 and 8\(\,\text {TeV}\ \)samples, together with the sum of the signal-plus-background fits to the 25 event classes which results in a best-fit mass \(m_\text {H}=124.7\,\text {GeV}\, \). The uncertainty bands shown on the background component of the fit include the uncertainty due to the choice of function and the uncertainty in the fitted parameters. These bands do not contain the Poisson uncertainty which must be included when the full uncertainty in the number of background events in any given mass range is estimated. The excess of events over the background expectation visible near \(m_{\gamma \gamma }=125\,\text {GeV}\, \) can be seen more clearly after subtraction of the background component, shown in the lower plot.

Fig. 17
figure 17

Sum of the 25 signal-plus-background model fits to the event classes in both the 7 and 8\(\,\text {TeV}\ \)datasets, together with the data binned as a function of \(m_{\gamma \gamma }\). The \(1\sigma \) and \(2\sigma \) uncertainty bands shown for the background component of the fit are computed from the fit uncertainty in the background yield in bins corresponding to those used to display the data. These bands do not contain the Poisson uncertainty that must be included when the full uncertainty in the number of background events in any given mass range is estimated. The lower plot shows the residual data after subtracting the fitted background component

11.1 Significance of the signal and its strength

The local \(p\)-value quantifies the probability for the background to produce a fluctuation as large, or larger, than the apparent signal observed, within a specified search range and uncorrected for the “look-elsewhere effect” [89]. Figure 18 shows the local \(p\)-value, in the mass range \(110<m_\text {H}<150\,\text {GeV}\) , calculated separately for the 7 and 8\(\,\text {TeV}\ \)datasets as well as their combination. Lines indicating the \(p\)-values expected for a SM Higgs boson, for the three cases, are also shown. The values of expected significance have been calculated using the background expectation obtained from the signal-plus-background fit, the so-called post-fit expectation. The post-fit model corresponds to the parametric bootstrap described in the statistics literature [90, 91], and includes information gained in the fit regarding the values of all parameters, including the best-fit mass.

Fig. 18
figure 18

Local \(p\)-values as a function of \(m_\text {H}\) for the 7\(\,\text {TeV}\ \), 8\(\,\text {TeV}\ \), and the combined dataset. The values of the expected significance, calculated using the background expectation obtained from the signal-plus-background fit, are shown as dashed lines

The significance of the minimum of the local \(p\)-value, at 124.7\(\,\text {GeV}\) , is 5.7 \(\sigma \) where a local significance of 5.2 \(\sigma \) is expected from the SM Higgs boson. To better visualize the excess of events, with respect to the background expectation, and its significance, the diphoton mass spectrum is plotted with each event used in the analysis weighted by a factor depending on the category in which it falls. The weight is proportional to \(S/(S+B)\), where \(S\) and \(B\) are the numbers of expected signal and background events, respectively, counted in a mass window corresponding to \({\pm }1\sigma _\text {eff}\) and centred on \(m_{\gamma \gamma }=124.7\,\text {GeV}\) . The background is calculated from the signal-plus-background fit. The motivation for this choice of weights is explained in Ref. [92]. The weighted data, the weighted signal model, and the weighted background model are normalized such that the integral of the weighted signal model matches the number of signal events obtained from the best fit. The resulting distribution, and the corresponding background subtracted spectrum, are shown in Fig. 19.

Fig. 19
figure 19

Diphoton mass spectrum weighted by the ratio \(S/(S+B)\) in each event class, together with the background subtracted weighted mass spectrum

The signal strength is quantified by \(\mu =\sigma /\sigma _\mathrm {SM}\,\), where \(\sigma /\sigma _\mathrm {SM}\) denotes the production cross section times the relevant branching fractions, relative to the SM expectation. In Fig. 20 the combined best-fit signal strength, \(\hat{\mu }\,\), is shown as a function of the Higgs boson mass hypothesis, both for the standard analysis (top) and for the cut-based analysis (bottom). The two analyses agree well across the entire mass range. In addition to the signal around 125\(\,\text {GeV}\), both analyses see a small upward fluctuation at 150\(\,\text {GeV}\), which is found to have a maximum local significance of just over \(2\,\sigma \) at \(m_\text {H}=151\,\text {GeV}\, \)—slightly beyond the mass range of our analysis.

Fig. 20
figure 20

Best-fit signal strength, \(\hat{\mu }\,\), shown as a function of the mass hypothesis, \(m_\text {H}\). The results are shown for the standard analysis (top), and for the cut-based cross-check analysis (bottom)

The best-fit signal strength for the main analysis, when the value of \(m_\text {H}\) is treated as an unconstrained parameter in the fit, is \(\hat{\mu }\,=1.14^{+0.26}_{-0.23}\), with the corresponding best-fit mass being \(\widehat{m}_\text {H}\,=124.7\,\text {GeV}\). The expected uncertainties in the best-fit signal strength, at this mass, are +0.24 and \(-0.22\). The values of the best-fit signal strength, derived separately for the 7 and 8\(\,\text {TeV}\ \)datasets, are listed in Table 5. For the cut-based analysis the corresponding value is \(\hat{\mu }\,=1.29^\mathrm{+0.29}_{-0.26}\) at \(\widehat{m}_\text {H}\,=124.6\,\text {GeV}\), and for the sideband background model analysis the value measured is \(\hat{\mu }\,=1.06^{+0.26}_{-0.23}\) at \(\widehat{m}_\text {H}\,=124.7\,\text {GeV}\). These values are shown in Table 6 together with the expected uncertainty, and the corresponding values for the main analysis.

Table 5 Values of the best-fit signal strength, \(\hat{\mu }\,\), when \(m_\text {H}\) is treated as an unconstrained parameter, for the 7\(\,\text {TeV}\), 8\(\,\text {TeV}\), and combined datasets. The corresponding best-fit value of \(m_\text {H}\), \(\widehat{m}_\text {H}\), is also given
Table 6 Expected and observed best-fit values of the signal strength for a SM Higgs boson signal in the alternative analyses, together with their uncertainties, indicating the expected uncertainty in the measurement at the best-fit values of \(m_\text {H}\), and the best-fit values obtained from the data. The corresponding values for the main analysis are shown for comparison

The uncertainty in the signal strength may be separated into statistical and systematic contributions, with the latter further divided into those having, or not, a theoretical origin: \(\hat{\mu }\,=1.14\pm 0.21\,\text {(stat)}\,\ ^{+0.09}_{-0.05}\,\text {(syst)}\,\ ^{+0.13}_{-0.09}\,\text {(theo)}\,\), where the statistical contribution includes all uncertainties in the background modelling. The separation of contributions can be taken further and Table 7 lists a finer breakdown of the contributions to the systematic uncertainty, where the contributions of the 81 nuisance parameters in the analysis are grouped according to their physical origin, as relevant to the signal strength uncertainty.

Table 7 Magnitude of the uncertainty in the best fit signal strength, \(\hat{\mu }\,\), induced by the systematic uncertainties in the signal model. To obtain the values, the quadratic subtraction, needed to remove the statistical uncertainty, is made for the positive and negative uncertainties separately. The values quoted are the average magnitudes of the positive and negative uncertainties. The statistical uncertainty includes all uncertainties in the background modelling

In Fig. 21 the best-fit signal strength, \(\hat{\mu }\,\), is shown for each event class in the combined 7 and 8\(\,\text {TeV}\ \)datasets, fixing \(m_\text {H}=124.7\,\text {GeV}\, \) in the fits. The horizontal bars indicate \({\pm }1\,\sigma \) uncertainties in the values, and the vertical line and band indicate the best-fit signal strength in the combined fit to the data and its uncertainty. The signal-plus-background fit for the VH tight-lepton tagged class in the 7\(\,\text {TeV}\ \)dataset, when done alone, does not converge because in this class and in the region of \(m_{\gamma \gamma }\) where the signal is expected there are no events in the data. No value for the signal strength in this class is shown in the figure. The \(\chi ^2\) probability of the values for the 24 remaining classes being compatible with the overall best-fit signal strength is 74 %.

Fig. 21
figure 21

Values of \(\hat{\mu }\,\) measured individually for all event classes in the 7 and 8\(\,\text {TeV}\ \)datasets, fixing \(m_\text {H}=124.7\,\text {GeV}\, \). The horizontal bars indicate \({\pm }1\,\sigma \) uncertainties in the values, and the vertical line and band indicate the best-fit signal strength in the combined fit to the data and its uncertainty

11.2 Mass measurement

The four main Higgs boson production mechanisms can be associated with either fermion couplings (ggH and \(\text {t}\overline{\text {t}}\text {H}\) ) or vector boson couplings (VBF and VH). To make the measurement of the mass of the observed resonance less model dependent the signal strengths of the production processes involving the Higgs boson coupling to fermions and the production processes involving the coupling to vector bosons, are allowed to vary independently. The two signal strength modifiers are denoted \(\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\) and \(\mu _\text {VBF, VH}\). Figure 22 (top) shows the resulting scan of the negative-log-likelihood ratio, \(q\), defined in Equation 3, as a function of the mass hypothesis, where \(\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\) and \(\mu _\text {VBF, VH}\) are treated as unconstrained parameters in the fit, giving the mass of the observed boson as \(124.70\pm 0.34\,\text {GeV}\, \).

Figure 22 (bottom) shows a map of the value of \(q\) in a two-dimensional scan of the (\(m_\text {H}\), \(\mu \)) plane. Here only a single signal strength modifier is allowed to vary, thus requiring \(\mu =\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}=\mu _\text {VBF, VH}\), and the mass measured is unchanged. If the mass is measured in the 7 and 8\(\,\text {TeV}\ \)datasets separately the values are found to differ by less than \(1\sigma \). The uncertainty in the measured mass can be separated into statistical and systematic contributions: \(\widehat{m}_\text {H}\,=124.70\pm 0.31\,\text {(stat)}\,\pm 0.15\,\text {(syst)}\,\,\text {GeV}\, \). Systematic uncertainties from theory play a negligible role. However, the effect of interference between ggH and the continuum diphoton background produced via quark loops has not been taken into account. This interference is expected to result in a downward shift of the observed mass [93, 94]. Taking the parameterization given in Ref. [94] we expect a shift of less than 20\(\,\text {MeV}\) in our analysis.

Fig. 22
figure 22

(top) Scan of the likelihood ratio, \(q\), as a function of the hypothesised mass when \(\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\) and \(\mu _\text {VBF, VH}\) are allowed to vary independently. (bottom) Map of \(q(m_\text {H},\mu )\) showing the \(1\,\sigma \) and \(2\,\sigma \) regions, and the best-fit point \((\widehat{m}_\text {H}\,, \hat{\mu }\,)=(124.70\,\text {GeV}\, \!\!, 1.14)\)

The calibration of the energy scale is achieved using \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\,\) events as a reference, as described in Sect. 4.1. Systematic uncertainties related to individual photons as described in Sect. 9 are propagated to the signal model, where they result in uncertainties in the signal peak position and width. The three main sources of systematic uncertainty in the energy scale that contribute to the uncertainty in the measured mass are shown in Table 8, where the contributions of the 81 nuisance parameters in the analysis are grouped according to their physical origin, as relevant to the mass uncertainty. The largest contributions are due to the possible imperfect simulation of (i) differences in detector response to electrons and photons arising from a number of factors that have been discussed in Sect. 9, and (ii) the energy scale nonlinearity in the extrapolation from the \(\text {Z}\)-boson mass to the Higgs boson mass. A further contribution comes from the uncertainties in the setting of the energy scale itself, that is, in the procedure and methodology of using measurement of the invariant mass in \(\text {Z}\rightarrow \mathrm {e}^+\mathrm {e}^-\)  events in which the electron showers are reconstructed as photons. Other sources of systematic uncertainty contribute little.

Additional possible sources of uncertainty that have been investigated and found to be negligible are a possible bias related to the choice of background parameterization, which has been studied using pseudo-experiments where the effect is found to be less than 10\(\,\text {MeV}\) ; the effect of the switch of preamplifier when very large signals, \(E\gtrsim 200\,\text {GeV}\, \) in the barrel and \(E_{\mathrm {T}}\,\gtrsim 80\,\text {GeV}\, \) in the endcaps, are digitized using a preamplifier with lower gain; and the effect of imperfect simulation of the effect of signals from interactions in previous bunch crossings.

Table 8 Magnitude of the uncertainty in the best fit mass induced by the systematic uncertainties in the signal model. These numbers have been obtained by quadratic subtraction of the statistical uncertainty. The statistical uncertainty includes all uncertainties in the background modelling

11.3 Production mechanisms and coupling modifiers

Figure 23 shows the \(1\,\sigma \) and \(2\,\sigma \) contours, computed as the variations around the likelihood maximum, for the signal strength modifiers \(\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\) and \(\mu _\text {VBF, VH}\). The best-fit values of these signal strength modifiers, when they are both allowed to vary, and \(m_\text {H}\) is treated as an unconstrained parameter in the fit, are found to be \(\hat{\mu }\,_{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}=1.13^{+0.37}_{-0.31}\) and \(\hat{\mu }\,_\text {VBF, VH}=1.16^{+0.63}_{-0.58}\). These numbers are tabulated in Table 9, together with the expected uncertainty in each signal strength modifier.

Fig. 23
figure 23

Map of the likelihood ratio \(q(\mu _{\text {g}\text {g}\text {H}\!\!, \text {t}\overline{\text {t}}\text {H}\,},\mu _\text {VBF, VH})\) with \(m_\text {H}\) treated as an unconstrained parameter. The \(1\,\sigma \) and \(2\,\sigma \) uncertainty contours are shown. The cross indicates the best-fit values, (\(\hat{\mu }\,_{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\!\!, \hat{\mu }\,_\text {VBF, VH})=(1.13, 1.16)\), and the diamond represents the SM expectation

If the signal strengths of all four production processes are allowed to vary independently in the fit, the values of \(\sigma /\sigma _\mathrm {SM}\,\) measured for each process are compatible with the expectations for a SM Higgs boson, as shown in Fig. 24. The signal mass, common to all four processes, is treated as an unconstrained parameter in the fit. The horizontal bars indicate \({\pm }1\,\sigma \) uncertaintiesin the values. For comparison, the dijet 2D analysis obtains the value \(\hat{\mu }\,_\text {VBF}=1.6^{+0.9}_{-0.7}\), whereas the result of the main analysis, shown in the plot, is \(\hat{\mu }\,_\text {VBF}=1.6^{+0.8}_{-0.7}\). Table 10 shows the four signal strengths observed, with the contributions to their uncertainties separated into statistical and systematic components. The systematic uncertainty has been separated, where feasible, into the contributions from theoretical uncertainties, and other (experimental) uncertainties.

Table 9 Expected and observed best-fit values of the signal strength modifiers \(\mu _{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}\) and \(\mu _\text {VBF, VH}\) for a SM Higgs boson signal together with their uncertainties, indicating the expected uncertainty in the measurement and the best-fit values obtained from the data
Fig. 24
figure 24

Best-fit signal strength, \(\hat{\mu }\,\), measured for each of the production processes in a combined fit where the signal strengths of all four processes have been allowed to vary independently in the fit. The signal mass, common to all four processes, is treated as an unconstrained parameter in the fit. The horizontal bars indicate \({\pm }1\,\sigma \) uncertainties in the values for the individual processes. The band corresponds to \({\pm }1\,\sigma \) uncertainties in the value obtained from the combined fit with a single signal strength

Various parameterizations of the couplings can be used to further test the compatibility of the observed new particle with the predictions for a SM Higgs boson [54]. Figure 25 shows two-dimensional likelihood scans of \(\kappa _\mathrm {f}\) versus \(\kappa _\mathrm {V}\) (top) and \(\kappa _\mathrm {g}\) versus \(\kappa _{\gamma }\) (bottom). The variables \(\kappa _\mathrm {V}\) and \(\kappa _\mathrm {f}\) are, respectively, the coupling modifiers of the new particle to vector bosons and to fermions; alternatively, \(\kappa _{\gamma }\) and \(\kappa _\mathrm {g}\) are the effective coupling modifiers to photons and to gluons; all four variables are expressed relative to the SM expectations. For each scan a fixed value of \(m_\text {H}=124.7\,\text {GeV}\, \) is used, and it has been verified that allowing \(m_\text {H}\) to vary produces an indistinguishable result. The best-fit points are \((\kappa _\mathrm {V}, \kappa _\mathrm {f})=(1.06, 1.05)\), and \((\kappa _{\gamma }, \kappa _\mathrm {g})=(1.14, 0.90)\).

Table 10 Best-fit signal strength modifiers for the four production processes. The total uncertainty for each process is separated into statistical (stat) and systematic contributions. The systematic uncertainty has been separated, where feasible, into the contributions from theoretical (theo), and experimental (exp) uncertainties. To obtain the values, the quadratic subtraction, needed to remove the statistical uncertainty, is made for the positive and negative uncertainties separately. The values quoted are the average magnitudes of the positive and negative uncertainties
Fig. 25
figure 25

Maps of the likelihood ratio \(q(\kappa _\mathrm {V},\kappa _\mathrm {f})\) (top), and \(q(\kappa _{\gamma },\kappa _\mathrm {g})\) (bottom), showing the \(1\,\sigma \) and \(2\,\sigma \) uncertainty contours. The crosses indicate the best-fit values, and the diamonds indicate the SM expectation

11.4 Decay width

It is possible to set a limit on the width of the observed signal, albeit a limit far in excess of the SM expectation of 4\(\,\text {MeV}\) for \(m_\text {H}=125\,\text {GeV}\, \). To accommodate the natural width of the Higgs boson, the Gaussian components used in the signal model of the SM analysis, where the signal width is assumed to be negligible as compared to the detector resolution, are replaced by an analytic convolution of a Breit–Wigner distribution (modelling a nonzero decay width) with a Gaussian distribution (modelling the detector resolution).

A profile likelihood estimator is used to calculate upper limits on the width of the observed boson whilst allowing the Higgs boson mass to vary in the fit. Figure 26 shows a scan of the negative-log-likelihood ratio as a function of the observed new particle’s decay width for the combined 7 and 8\(\,\text {TeV}\ \)dataset. The observed (expected) upper limit on the width is found to be 2.4 (3.1)\(\,\text {GeV}\, \)at a 95 % confidence level (CL).

Fig. 26
figure 26

Scan of the negative-log-likelihood ratio as a function of the Higgs boson decay width. The observed (expected) upper limit on the width is found to be 2.4 (3.1)\(\,\text {GeV}\, \)at a 95 % CL

11.5 Search for additional Higgs-boson-like states

To search for a possible additional Higgs-boson-like state, \(\text {H}'\), in the mass range \(110\le m_{\text {H}'}\le 150\,\text {GeV}\), the observed signal around 125\(\,\text {GeV}\, \)is added to the background model and its mass and signal strength are allowed to vary in the fit. An additional, independent signal model is introduced as a second Higgs boson, for which the exclusion limits are calculated using the modified frequentist method and the \(\mathrm {CL}_\mathrm {s}\) criterion [95, 96]. In order to set limits for the combined 7 and 8\(\,\text {TeV}\ \)datasets it is necessary to make an assumption about the ratio of cross sections of the new state at 7 and 8\(\,\text {TeV}\). By expressing the limit in terms of the SM cross section times branching fraction we implicitly assume that the ratio is that of the SM. The resulting exclusion limit is shown in Fig. 27. Once sufficiently away from 125\(\,\text {GeV}\), the same limit is obtained as when searching for a single SM Higgs boson. The shading indicates a window with a width of 10\(\,\text {GeV}\) , centred at the best-fit mass, where the expected sensitivity to a second Higgs boson is severely degraded due to the presence of the already observed state.

Fig. 27
figure 27

Exclusion limit on the signal strength, \(\sigma \mathrm {'}/\sigma _\mathrm {SM}\), for a second Higgs-boson-like state with SM couplings taking the observed state at 125\(\,\text {GeV}\, \)as part of the background. The shading indicates a window with a width of 10\(\,\text {GeV}\), centred at the best-fit mass, where the expected sensitivity to a second Higgs boson is severely degraded due to the presence of the already observed state

A further particular case of interest is when the second state couples only to fermions, for example in the alignment limit of some two-Higgs-doublet models [97]. We also examine the case where the second state couples only to bosons at the tree level. Figure 28 shows the exclusion limits obtained when the observed signal near 125\(\,\text {GeV}\, \)is added to the background model and its mass and signal strength are allowed to vary in the fit, and an additional state produced (top) only by the gluon-fusion process, or (bottom) only by the VBF and VH processes. The limits are given in terms of the SM cross section times branching fraction for those processes. Even for the VBF and VH processes, which have lower cross sections, an additional state with SM-like signal strength is excluded or disfavoured over much of the mass range.

Fig. 28
figure 28

Exclusion limits on \(\sigma \mathrm {'}/\sigma _\mathrm {SM}\,\) for a second Higgs-boson-like state produced with gluon–gluon fusion only (top) or VBF and VH only (bottom) taking the observed state at 125\(\,\text {GeV}\, \)as part of the background. The shading indicates a window with a width of 10 GeV, centred at the best-fit mass, where the expected sensitivity to a second Higgs boson is severely degraded due to the presence of the already observed state

The shaded regions in Figs. 27 and 28, where the expected sensitivity to a second Higgs boson is severely degraded due to the presence of the already observed state, are probed by a dedicated search using the high resolution of the diphoton channel to provide sensitivity to a pair of states separated by only a few \(\,\text {GeV}\, \). The signal model is re-parameterized with two signals, having masses \(m_{\text {H}'}\) and \(m_{\text {H}'}+\Delta m\). The relative strengths of the two signals, parameterized by the variable \(x\), is allowed to vary such that the two signals are modulated by \(\mu x\) and \(\mu (1-x)\) respectively, where \(\mu \) is the total signal strength and \(x\) is the fraction of signal contained in the state lower in mass. A two-dimensional scan of \(\Delta m\) and \(x\) is obtained, while allowing both \(m_{\text {H}'}\) and \(\mu \) to vary as free parameters in the fit. Figure 29 shows the expected (upper plot) and observed (lower plot) negative-log-likelihood ratio in the \((x, \Delta m)\) plane. Sensitivity is expected in regions where \(\Delta m\) is close to or greater than the experimental mass resolution and where the two signal strengths are similar. The black cross shows the best-fit value, and the lines correspond to the \(1\sigma \) and \(2\sigma \) uncertainty contours for the SM (i.e. a single state). It can be seen that a region of the parameter space is disfavoured at more than \(2\sigma \): where the ratio of the signal strengths is between 0.2 and 0.8 and the mass difference is greater than values ranging between 2.5 and 4\(\,\text {GeV}\, \)depending on the ratio of the signal strengths. The somewhat asymmetrical shape of the excluded region and the position of the best-fit value, are a reflection of the slightly asymmetrical mass peak seen in Fig. 19, also reflected in the figures showing the local \(p\)-value, and exclusion limit as a function of \(m_\text {H}\).

Fig. 29
figure 29

Map of the values of the likelihood ratio \(q(x,\Delta m)\) for two near mass-degenerate states parameterized by \(x\) (the fraction of signal in the lower mass state) and \(\Delta m\) (the mass difference between the states). The black cross shows the best-fit value, and the lines correspond to the \(1\sigma \) and \(2\sigma \) uncertainty contours for the SM (single state) expectation (upper plot) and the observation (lower plot)

11.6 Testing spin hypotheses

The Landau–Yang theorem forbids the direct decay of a spin-1 particle into a pair of photons [98, 99]. However, it is of interest to compare the hypothesis of a spin-2 “graviton-like” model with minimal couplings, \(2^{+}_{m}\,\), [55], to that of a spin-0 SM-Higgs-boson-like, \(0^{+}\,\), model. As the \(2^{+}_{m}\,\) is just one of many possible realizations of the spin-2 tensor structure, an attempt has been made to make the analysis as model independent as possible. Tests have been performed for hypotheses in which the \(2^{+}_{m}\) resonance is produced entirely by gluon-fusion (\(\text {g}\text {g}\)), in which it is produced entirely by quark-antiquark annihilation (\(\text {q}\overline{\text {q}}\)), and for cases in which it is produced by a mixture of the two processes. The cosine of the scattering angle in the Collins–Soper frame, \(\cos \theta ^{*}_{\text {\tiny {CS}}}\)  [100], is used to discriminate between the two hypotheses. The angle is defined, in the diphoton rest frame, as that between the collinear photons and the line that bisects the acute angle between the colliding protons:

$$\begin{aligned} \cos \theta ^{*}_{\text {\tiny {CS}}}\,=2\times \frac{E^{\gamma 2}p^{\gamma 1}_{z}-E^{\gamma 1}p^{\gamma 2}_{z}}{m_{\gamma \gamma }\sqrt{m_{\gamma \gamma }^{2}+(p_{\mathrm {T}}^{\gamma \gamma }\,)^{2}}}, \end{aligned}$$
(4)

where \(E^{\gamma 1}\) and \(E^{\gamma 2}\) are the energies of the leading and subleading photons, \(p^{\gamma 1}_{z}\) and \(p^{\gamma 2}_{z}\) are the \(z\) components of their momenta, and \(m_{\gamma \gamma }\) and \(p_{\mathrm {T}}^{\gamma \gamma }\,\) are the invariant mass and transverse momentum of the diphoton system. In the rest frame of a spin-0 boson the decay photons are isotropic, and so, before the acceptance requirements, the distribution of \(\cos \theta ^{*}_{\text {\tiny {CS}}}\) is uniformly flat under the \(0^{+}\) hypothesis. In general this is not the case for the decay of a spin-2 particle.

To increase the sensitivity, the events are categorized using the same four diphoton event classes used in the cut-based analysis, described in Sect. 10.1, but without the addition classification based on \(p_{\mathrm {T}}^{\gamma \gamma }\,\) used there. Within each diphoton class, the events are binned in \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) to discriminate between the different spin hypotheses. The events are thus split into 20 event classes, four \((\eta , R_\mathrm {9})\) diphoton classes with five \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) bins each, for both the 7 and 8\(\,\text {TeV}\ \)datasets, giving a total of 40 event classes.

Although the acceptance times efficiency, \(A\times \epsilon \), varies considerably as a function of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) , this variation is, for gluon-fusion production, independent of the spin-parity models tested. This is also true in the restricted ranges of \(\eta \) and \(R_\mathrm {9}\) defined by the diphoton classes, which allows the extraction of the signal yield in bins of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) in a reasonably model independent way. Figure 30 shows \(A\times \epsilon \) for \(0^{+}\) (all SM production modes), \(2^{+}_{m}\) (gluon-fusion) and \(2^{+}_{m}\) (\(\text {q}\overline{\text {q}}\) production) as a function of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,\), as calculated for the 8\(\,\text {TeV}\ \)dataset. The \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) bin boundaries are shown by vertical dashed lines. The value of \(A\times \epsilon \) for the \(2^{+}_{m}\) models divided by \(A\times \epsilon \) for SM is shown below, where the bands indicate the spread of values among the four diphoton classes. It can be seen that the ratio is flat, independent of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) , except at the highest values of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) where the relative contribution from SM VBF production is significant. The events in the region where the ratio falls from its flat level, \(0.75<|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,<1.0\), are collected in a separate bin, and the \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) bin boundaries for the remaining events are chosen to maintain approximately the same event yield in each bin.

Fig. 30
figure 30

Product of acceptance and efficiency \(A\times \epsilon \) for \(0^{+}\) (all SM production modes), \(2^{+}_{m}\) (gluon-fusion) and \(2^{+}_{m}\) (\(\text {q}\overline{\text {q}}\) production) as a function of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,\), as calculated for the 8\(\,\text {TeV}\ \)dataset. The value of \(A\times \epsilon \) for the \(2^{+}_{m}\) models divided by \(A\times \epsilon \) for SM is shown below, where the bands indicate the spread of values among the four diphoton classes. The \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\) bin boundaries are shown by vertical dashed lines

Figure 31 shows histograms of the expected signal strength, \(\mu \), relative to the SM expectation in the five bins of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,\) for the SM, and for two \(2^{+}_{m}\,\) models: where the \(2^{+}_{m}\,\) resonance is produced entirely by gluon-fusion (\(\text {g}\text {g}\)), and where it is produced entirely by quark-antiquark annihilation (\(\text {q}\overline{\text {q}}\)). The expected values in the five bins are obtained by constructing a representative pseudo-data model in which the overall signal strength has been set to be that obtained from fitting the model in question, plus background, to the data. When generating pseudo-experiments for a particular model, the value of all the free parameters, including the signal nuisance parameters, the background shape parameters, and the overall signal strength, are set to their best-fit values obtained by fitting the model in question to the data with a single overall value of the signal strength. The post-fit expected value of the signal strength for the SM signal model is thus that which is observed when simultaneously fitting the 40 event classes with a single signal strength, i.e. \(1.31^{+0.33}_{-0.31}\). The observed \(\mu \) values in the five bins shown in the figure are obtained from a simultaneous fit of the SM-signal-plus-background model to the 40 event classes, with five signal strength variables (one for each \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,\) bin) and a common \(m_\text {H}\) allowed to vary.

Fig. 31
figure 31

Histograms showing signal strength in five bins of \(|{\cos \theta ^{*}_{\text {\tiny {CS}}}}|\,\) expected for SM, for \(2^{+}_{m}\,\) produced by \(\text {g}\text {g}\), and for \(2^{+}_{m}\) produced by \(\text {q}\overline{\text {q}}\). The signal strength observed in the data is shown by the black points

The separation between the two models is extracted using a test statistic defined as twice the negative logarithm of the ratio of the likelihoods for the \(0^{+}\,\) signal plus background hypothesis and the \(2^{+}_{m}\,\) signal plus background hypothesis when performing a simultaneous fit of all forty event classes together, \(q=-2\,{\ln (\mathcal {L}_{2^{+}_{m} + \text {bkg}}/\mathcal {L}_{0^+ + \text {bkg}})}\). The test is made under the assumption that the \(2^{+}_{m}\) state is produced entirely by either gluon-fusion, or entirely by quark-antiquark annihilation, or by three intermediate mixtures of \(\text {g}\text {g}\) and \(\text {q}\overline{\text {q}}\) spin-2 production. The fraction of the spin-2 state produced by \(\text {q}\overline{\text {q}}\) annihilation is parameterized by the variable \(f_{\text {q}\overline{\text {q}}}\,\), so that the total signal plus background, \(f(m_\text {H})\), is given by

$$\begin{aligned} f(m_\text {H})&= \mu [(1-f_{\text {q}\overline{\text {q}}}\,)\times S_{\text {g}\text {g}}^{2^{+}_{m}\,}(m_\text {H})\nonumber \\&+f_{\text {q}\overline{\text {q}}}\,\times S_{\text {q}\overline{\text {q}}}^{2^{+}_{m}\,}(m_\text {H})]+B(m_\text {H}), \end{aligned}$$
(5)

where \(S_{\text {g}\text {g}}^{2^{+}_{m}\,}(m_\text {H})\) is the \(\text {g}\text {g}\)-produced \(2^{+}_{m}\,\) signal, \(S_{\text {q}\overline{\text {q}}}^{2^{+}_{m}\,}(m_\text {H})\) the \(\text {q}\overline{\text {q}}\)-produced \(2^{+}_{m}\,\) signal, \(\mu \) is a signal strength modifier, and \(B(m_\text {H})\) is the background. Figure 32 shows the values of the test statistic as a function of \(f_{\text {q}\overline{\text {q}}}\,\). Table 11 gives the values of \(1-\mathrm {CL}_\mathrm {s}\,\), expected and observed, which measures the extent to which the spin-2 model is disfavoured, for different values of \(f_{\text {q}\overline{\text {q}}}\,\). The hypothesis of the signal being \(2^{+}_{m}\,\) is disfavoured for all values of \(f_{\text {q}\overline{\text {q}}}\,\) tested. When produced entirely by gluon fusion, it is disfavoured with a \(1-\mathrm {CL}_\mathrm {s}\,\) value of 94 % (92 % expected). When produced entirely by \(\text {q}\overline{\text {q}}\) annihilation it is disfavoured with a \(1-\mathrm {CL}_\mathrm {s}\,\) value of 85 % (83 % expected). Intermediate mixtures, where there is less sensitivity to distinguish between the models, are somewhat less disfavoured.

Fig. 32
figure 32

Test statistic for pseudo-experiments generated under the SM, \(0^{+}\) , hypothesis (open squares) and the graviton-like, \(2^{+}_{m}\) , hypothesis (open diamonds), as a function of the fraction, \(f_{\text {q}\overline{\text {q}}}\,\), of \(\text {q}\overline{\text {q}}\) production. The observed distribution in the data is shown by the black points

Table 11 Expected and observed values of \(1-\mathrm {CL}_\mathrm {s}\,\) for the \(2^{+}_{m}\) signal hypothesis with respect to the \(0^{+}\) hypothesis, for different mixtures of \(\text {g}\text {g}\) and \(\text {q}\overline{\text {q}}\) production

12 Summary

We report the observation of the diphoton decay mode of the recently discovered Higgs boson and measurement of some of its properties. The analysis uses the entire dataset collected by the CMS experiment in proton-proton collisions during the 2011 and 2012 LHC running periods. The data samples correspond to integrated luminosities of 5.1\(\,\text {fb}^\text {-1}\)at \(\sqrt{s}=7\,\text {TeV}\ \) and 19.7\(\,\text {fb}^\text {-1}\)at 8\(\,\text {TeV}\). The selected events are subdivided into classes, designed to enhance the overall sensitivity and to increase the sensitivity to individual Higgs production mechanisms, and the results of the search in all classes are reported.

A clear signal is observed in the diphoton channel at a mass of 124.7\(\,\text {GeV}\, \)with a local significance of \(5.7\,\sigma \), where a significance of \(5.2\,\sigma \) is expected for the standard model Higgs boson. The mass is measured to be \(124.70\pm 0.34\,\text {GeV} \, =124.70\pm 0.31\,\text {(stat)}\,\pm 0.15\,\text {(syst)}\,\,\text {GeV}\, \) , and the best-fit signal strength relative to the standard model prediction is \(1.14^{+0.26}_{-0.23}=1.14\pm 0.21\,\text {(stat)}\,^{+0.09}_{-0.05}\,\text {(syst)}\,^{+0.13}_{-0.09}\,\text {(theo)}\). The best-fit values for the signal strength modifiers associated with the ggH and \(\text {t}\overline{\text {t}}\text {H}\) production mechanisms, and with the VBF and VH mechanisms are found to be \(\hat{\mu }\,_{\text {g}\text {g}\text {H}, \text {t}\overline{\text {t}}\text {H}\,}=1.13^{+0.37}_{-0.31}\) and \(\hat{\mu }\,_\text {VBF, VH}= 1.16^{+0.63}_{-0.58}\).

A direct upper limit on the natural width of the state is set at 2.4\(\,\text {GeV}\, \)(3.1\(\,\text {GeV}\, \)expected) at a 95 % confidence level, and additional SM-like Higgs bosons are excluded at a 95 % confidence level in a large fraction of the mass range between 110 and 150\(\,\text {GeV}\, \). The SM spin-0 hypothesis for the observed state is compared to a graviton-like spin-2 hypothesis with minimal couplings. The hypothesis of the signal being \(2^{+}_{m}\,\) is disfavoured. When produced entirely by gluon fusion, it is disfavoured with a \(1-\mathrm {CL}_\mathrm {s}\,\) value of 94 % (92 % expected).

All the results are compatible with the expectations from a standard model Higgs boson.