• PhD Thesis - Pablo de Castro
  • Abstract
  • Preface
  • Acknowledgements
  • Introduction
  • 1 Theory of Fundamental Interactions
    • 1.1 The Standard Model
      • 1.1.1 Essentials of Quantum Field Theory
      • 1.1.2 Quantum Chromodynamics
      • 1.1.3 Electroweak Interactions
      • 1.1.4 Symmetry Breaking and the Higgs Boson
    • 1.2 Beyond the Standard Model
      • 1.2.1 Known Limitations
      • 1.2.2 Possible Extensions
    • 1.3 Phenomenology of Proton Collisions
      • 1.3.1 Main Observables
      • 1.3.2 Parton Distribution Functions
      • 1.3.3 Factorisation and Generation of Hard Processes
      • 1.3.4 Hadronization and Parton Showers
  • 2 Experiments at Particle Colliders
    • 2.1 The Large Hadron Collider
      • 2.1.1 Injection and Acceleration Chain
      • 2.1.2 Operation Parameters
      • 2.1.3 Multiple Hadron Interactions
      • 2.1.4 Experiments
    • 2.2 The Compact Muon Solenoid
      • 2.2.1 Experimental Geometry
      • 2.2.2 Magnet
      • 2.2.3 Tracking System
      • 2.2.4 Electromagnetic Calorimeter
      • 2.2.5 Hadronic Calorimeter
      • 2.2.6 Muon System
      • 2.2.7 Trigger and Data Acquisition
    • 2.3 Event Simulation and Reconstruction
      • 2.3.1 A Generative View
      • 2.3.2 Detector Simulation
      • 2.3.3 Event Reconstruction
  • 3 Statistical Modelling and Inference at the LHC
    • 3.1 Statistical Modelling
      • 3.1.1 Overview
      • 3.1.2 Simulation as Generative Modelling
      • 3.1.3 Dimensionality Reduction
      • 3.1.4 Known Unknowns
    • 3.2 Statistical Inference
      • 3.2.1 Likelihood-Free Inference
      • 3.2.2 Hypothesis Testing
      • 3.2.3 Parameter Estimation
  • 4 Machine Learning in High-Energy Physics
    • 4.1 Problem Description
      • 4.1.1 Probabilistic Classification and Regression
    • 4.2 Machine Learning Techniques
      • 4.2.1 Boosted Decision Trees
      • 4.2.2 Artificial Neural Networks
    • 4.3 Applications in High Energy Physics
      • 4.3.1 Signal vs Background Classification
      • 4.3.2 Particle Identification and Regression
  • 5 Search for Anomalous Higgs Pair Production with CMS
    • 5.1 Introduction
    • 5.2 Higgs Pair Production and Anomalous Couplings
    • 5.3 Analysis Strategy
    • 5.4 Trigger and Datasets
    • 5.5 Event Selection
    • 5.6 Data-Driven Background Estimation
      • 5.6.1 Hemisphere Mixing
      • 5.6.2 Background Validation
    • 5.7 Systematic Uncertainties
    • 5.8 Analysis Results
    • 5.9 Combination with Other Decay Channels
  • 6 Inference-Aware Neural Optimisation
    • 6.1 Introduction
    • 6.2 Problem Statement
    • 6.3 Method
    • 6.4 Related Work
    • 6.5 Experiments
      • 6.5.1 3D Synthetic Mixture
  • 7 Conclusions and Prospects
  • References

Statistical Learning and Inference at Particle Collider Experiments

6.4 Related Work

Classification or regression models have been implicitly used to construct summary statistics for inference in several scientific disciplines. For example, in experimental particle physics, the mixture model structure of the problem makes it amenable to supervised classification based on simulated datasets [193], [194]. While a classification objective can be used to learn powerful feature representations and increase the sensitivity of an analysis, it does not take into account the details of the inference procedure or the effect of nuisance parameters like the solution proposed here.

The first known effort to include the effect of nuisance parameters in classification and explain the relation between classification and the likelihood ratio was by Neal [195]. In the mentioned work, Neal proposes training of classifier including a function of nuisance parameter as additional input together with a per-observation regression model of the expectation value for inference. Cranmer et al. [188] improved on this concept by using a parametrised classifier to approximate the likelihood ratio which is then calibrated to perform statistical inference. At variance with the mentioned works, we do not consider a classification objective at all and the neural network is directly optimised based on an inference-aware loss. Additionally, once the summary statistic has been learnt the likelihood can be trivially constructed and used for classical or Bayesian inference without a dedicated calibration step. Furthermore, the approach presented in this work can also be extended, as done by Baldi et al. [134] by a subset of the inference parameters to obtain a parametrised family of summary statistics with a single model.

Recently, Brehmer et al. [196]–[198] further extended the approach of parametrised classifiers to better exploit the latent-space structure of generative models from complex scientific simulators. Additionally they propose a family of approaches that include a direct regression of the likelihood ratio and/or likelihood score in the training losses. While extremely promising, the most performing solutions are designed for a subset of the inference problems at the LHC and they require considerable changes in the way the inference is carried out. The aim of the algorithm proposed here is different, as we try to learn sample summary statistics that may act as a plug-in replacement of classifier-based dimensionality reduction and can be applied to general likelihood-free problems where the effect of the parameters can be modelled or approximated.

Within the field of Approximate Bayesian Computation (ABC), there have been some attempts to use neural network as a dimensionality reduction step to generate summary statistics. For example, Jiang et al. [199] successfully employ a summary statistic by directly regressing the parameters of interest and therefore approximating the posterior mean given the data, which then can be used directly as a summary statistic.

A different path is taken by Louppe et al. [200], where the authors present a adversarial training procedure to enforce a pivotal property on a predictive model. The main concern we have on the use of that approach is that a classifier which is pivotal with respect to nuisance parameters might not be optimal, neither for classification nor for statistical inference. Instead of aiming for being pivotal, the summary statistics learnt by our algorithm attempt to find a transformation that directly reduces the expected effect of nuisance parameters over the parameters of interest.