6 Inference-Aware Neural Optimisation
An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.
– John Tukey
By this point, it should be evident that powerful statistical inference is the ultimate objective of all experimental high-energy analyses. Supervised learning based on simulated observations or acquired data from control regions, and in particular probabilistic classification, provides a way to extract and approximate estimate of the latent variables of the generative model. Those latent variable estimates are in turn very useful to construct powerful summary for statistical inference. While this approach is very often encountered in experimental high energy physics, complex computer simulations are also required for many other scientific disciplines, making inference very challenging due to the intractability of the likelihood evaluation for the observed data. Summary statistics based on a supervised learning algorithms can be asymptotically optimal if the generative model is fully defined, as is the case for the output of soft classification for mixture models where we are interested in the mixture coefficients, as demonstrated in Section 4.3.1. Unfortunately, their usefulness can rapidly decrease when additional uncertain parameters affect the generative model.
As a practical example, in the analysis presented in Chapter 5, the limiting factor for experimental sensitivity was not in the choice of summary statistics but rather on the lack of detailed knowledge about the expected contribution from background processes, which had to be addressed by the inclusion of nuisance parameters. The technique presented in this chapter, referred to as INFERNO and published in [186], is an attempt to tackle directly the problem of constructing non-linear summary statistics from a statistical perspective that directly addresses the goal of the final inference question. The key contribution required for achieving such goal is to leverage the technology that has been developed for recent machine learning techniques, to build inference-aware loss functions that approximate the expected uncertainty on the parameters of interest, accounting for the effect of nuisance parameters.