6.4 Related Work
Classification or regression models have been implicitly used to construct summary statistics for inference in several scientific disciplines. For example, in experimental particle physics, the mixture model structure of the problem makes it amenable to supervised classification based on simulated datasets [193], [194]. While a classification objective can be used to learn powerful feature representations and increase the sensitivity of an analysis, it does not take into account the details of the inference procedure or the effect of nuisance parameters like the solution proposed here.
The first known effort to include the effect of nuisance parameters in classification and explain the relation between classification and the likelihood ratio was by Neal [195]. In the mentioned work, Neal proposes training of classifier including a function of nuisance parameter as additional input together with a per-observation regression model of the expectation value for inference. Cranmer et al. [188] improved on this concept by using a parametrised classifier to approximate the likelihood ratio which is then calibrated to perform statistical inference. At variance with the mentioned works, we do not consider a classification objective at all and the neural network is directly optimised based on an inference-aware loss. Additionally, once the summary statistic has been learnt the likelihood can be trivially constructed and used for classical or Bayesian inference without a dedicated calibration step. Furthermore, the approach presented in this work can also be extended, as done by Baldi et al. [134] by a subset of the inference parameters to obtain a parametrised family of summary statistics with a single model.
Recently, Brehmer et al. [196]–[198] further extended the approach of parametrised classifiers to better exploit the latent-space structure of generative models from complex scientific simulators. Additionally they propose a family of approaches that include a direct regression of the likelihood ratio and/or likelihood score in the training losses. While extremely promising, the most performing solutions are designed for a subset of the inference problems at the LHC and they require considerable changes in the way the inference is carried out. The aim of the algorithm proposed here is different, as we try to learn sample summary statistics that may act as a plug-in replacement of classifier-based dimensionality reduction and can be applied to general likelihood-free problems where the effect of the parameters can be modelled or approximated.
Within the field of Approximate Bayesian Computation (ABC), there have been some attempts to use neural network as a dimensionality reduction step to generate summary statistics. For example, Jiang et al. [199] successfully employ a summary statistic by directly regressing the parameters of interest and therefore approximating the posterior mean given the data, which then can be used directly as a summary statistic.
A different path is taken by Louppe et al. [200], where the authors present a adversarial training procedure to enforce a pivotal property on a predictive model. The main concern we have on the use of that approach is that a classifier which is pivotal with respect to nuisance parameters might not be optimal, neither for classification nor for statistical inference. Instead of aiming for being pivotal, the summary statistics learnt by our algorithm attempt to find a transformation that directly reduces the expected effect of nuisance parameters over the parameters of interest.