5.7 Systematic Uncertainties

Both the signal model based on simulated observations and data-driven background model in this analysis are not perfectly known, hence a set of nuisance parameter have to be considered in the statistical model to account for such lack of certainty, as generally discussed in Section 3.1.4.1. Each nuisance parameter, which can affect the signal, the background component or both, effectively leads to an increase of the uncertainty on the parameters of interest. For analysis where upper limits are set such as this, the presence of these unknown parameters increases the total interval width. The effect of these parameters in the final statistical estimates is also often referred as systematic uncertainty. A list of the sources of systematic uncertainty considered in this analysis, and their estimated relative effect in the expected upper limit for the SM Higgs pair production, is provided on the Table 5.4.

Table 5.4: List of systematic uncertainties considered in this analysis, and their relative impact on the expected limit for the SM HH production. The relative impact is obtained by fixing the nuisance parameters corresponding to each source and recalculating the expected limit.

The main sources of uncertainty in this analysis are those associated with the data-driven background model. For each classifier output bin, an independent nuisance parameter is included that accounts for the possible variation of the background prediction due to the limited data statistics of the artificial events used for building the background model and the accuracy limitations found during the bias correction procedure described in Section 5.6.2. Because the data-driven technique described in the previous section does not provide a way to estimate the normalisation of the background, the background normalisation is added a nuisance parameter that is left fully unconstrained.

Regarding systematic uncertainties due to nuisance parameters of the simulation-based signal distribution, the most relevant factors are the uncertainties in the measure differences between data and simulation in b-tagging efficiencies. These are estimated by recomputing the signal distribution weighted by a factor that accounts for a one standard deviation for each of the relevant nuisance parameters and interpolating in-between as described in Section 3.1.3.4. The uncertainty due to the modelling of the pile-up contribution is included by considering the different effect of pile-up reweighting when a \(\pm4.6\%\) variation on the total inelastic cross section value at 13 TeV is allowed [181]. The effect due to the modelling uncertainties in jet energy resolution and scale are estimated by smearing or shifting the reconstructed jet energy respectively, according to their corresponding uncertainties as a function of the jet \(p_T\) and \(\lvert \eta \rvert\), and evaluating the effect on the final summary statistic. For all the mentioned sources of uncertainty, both the effect on the classifier output distribution and its normalisation have been considered.

After a correction by the observed discrepancies between the data and simulation, the uncertainty on the trigger efficiency after to a 2% effect on the signal normalisation. The total signal component normalisation is also affected by the uncertainty in the measurement of the integrated luminosity \(\mathcal{L}_\textrm{int}\), which has been estimated during the 2016 data-taking period to be \(2.5\%\)[182]. The effect of theoretical uncertainties that affect the simulation samples are modelled using per-event weights provided by the simulation software. In particular, the effect of a variation of the renormalisation \(\mu_\textrm{R}\) and factorisation \(\mu_\textrm{F}\) scales on the signal efficiency are estimated by taking the maximum and the minimum difference with respect to the nominal efficiency when varying \(\mu_\textrm{R}\) and \(\mu_\textrm{F}\) each individually as well as both together up and down by a factor of two. For estimating the total signal efficiency variation due to parton distribution function (PDF) uncertainties, the PDF4LHC recommendations [183] are followed, computing the variation as the standard deviation of a set of 100 MC replicas of the NNPDF 3.0 set [178].