Statistical Learning and Inference at Particle Collider Experiments

6.2 Problem Statement

Let us consider a set of \(n\) i.i.d. observations \(D = \{\boldsymbol{x}_0,...,\boldsymbol{x}_n\}\) where \(\boldsymbol{x} \in \mathcal{X} \subseteq \mathbb{R}^d\), and a generative model which implicitly defines a probability density \(p(\boldsymbol{x} | \boldsymbol{\theta})\) used to model the data. The generative model is a function of the vector of parameters \(\boldsymbol{\theta} \in \mathcal{\Theta} \subseteq \mathbb{R}^p\), which includes both relevant and nuisance parameters. We want to learn a function \(\boldsymbol{s} : \mathcal{D} \subseteq \mathbb{R}^{d\times n} \rightarrow \mathcal{S} \subseteq \mathbb{R}^{b}\) that computes a summary statistic of the dataset and reduces its dimensionality so likelihood-free inference methods can be applied effectively. From here onwards, \(b\) will be used to denote the dimensionality of the summary statistic \(\boldsymbol{s}(D)\).

While there might be infinite ways to construct a summary statistic \(\boldsymbol{s} (D)\), we are only interested in those that are informative about the subset of interest \(\boldsymbol{\omega} \in \mathcal{\Omega} \subseteq \mathcal{\Theta}\) of the model parameters. The concept of statistical sufficiency is especially useful to evaluate whether summary statistics are informative. In the absence of nuisance parameters, classical sufficiency can be characterised by means of the factorisation criterion (see Section 3.1.3 for more details): \[ p(D|\boldsymbol{\omega}) = h(D) g(\boldsymbol{s}(D) | \boldsymbol{\omega} ) \qquad(6.1)\] where \(h\) and \(g\) are non-negative functions. If \(p(D | \boldsymbol{\omega})\) can be factorised as indicated, the summary statistic \(\boldsymbol{s}(D)\) will yield the same inference about the parameters \(\boldsymbol{\omega}\) as the full set of observations \(D\). When nuisance parameters have to be accounted in the inference procedure, alternate notions of sufficiency are commonly used such as partial or marginal sufficiency [190], [191]. Nonetheless, for the problems of relevance in this work, the probability density is not available in closed form so the general task of finding a sufficient summary statistic cannot be tackled directly. Hence, alternative methods to build summary statistics have to be followed.

For simplicity, let us consider a problem where we are only interested in performing statistical inference on a single one-dimensional model parameter \(\boldsymbol{\omega} = \{ \omega_0\}\) given some observed data. Be given a summary statistic \(\boldsymbol{s}\) and a statistical procedure to obtain an unbiased interval estimate of the parameter of interest which accounts for the effect of nuisance parameters. The resulting interval can be characterised by its width \(\Delta \omega_0 = \hat{\omega}^{+}_0- \hat{\omega}^{-}_0\), defined by some criterion so as to contain on average, upon repeated samping, a given fraction of the probability density, e.g. a central \(68.3\%\) interval. The expected size of the interval depends on the summary statistic \(\boldsymbol{s}\) chosen: in general, summary statistics that are more informative about the parameters of interest will provide narrower confidence or credible intervals on their value. Under this figure of merit, the problem of choosing an optimal summary statistic can be formally expressed as finding a summary statistic \(\boldsymbol{s}^{\ast}\) that minimises the interval width: \[ \boldsymbol{s}^{\ast} = \textrm{arg min}_{\boldsymbol{s}} \Delta \omega_0. \qquad(6.2)\] The above construction can be extended to several parameters of interest by considering the interval volume or any other function of the resulting confidence or credible regions.