Journal of High Energy Physics

, 2018:34 | Cite as

(Machine) learning to do more with less

Open Access
Regular Article - Experimental Physics


Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard “fully supervised” approach (which relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called “weakly supervised” technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to a class of systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided on GitHub.


Beyond Standard Model Hadron-Hadron scattering (experiments) Particle correlations and fluctuations Supersymmetry 


Open Access

This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.


  1. [1]
    ATLAS collaboration, Performance of b-Jet Identification in the ATLAS Experiment, 2016 JINST 11 P04008 [arXiv:1512.01094] [INSPIRE].
  2. [2]
    CMS collaboration, Identification of b quark jets at the CMS Experiment in the LHC Run 2, CMS-PAS-BTV-15-001.
  3. [3]
    ATLAS collaboration, Performance and Calibration of the JetFitterCharm Algorithm for c-Jet Identification, ATL-PHYS-PUB-2015-001 (2015).
  4. [4]
    CMS collaboration, Identification of c-quark jets at the CMS experiment, CMS-PAS-BTV-16-001.
  5. [5]
    J. Cogan, M. Kagan, E. Strauss and A. Schwarztman, Jet-Images: Computer Vision Inspired Techniques for Jet Tagging, JHEP 02 (2015) 118 [arXiv:1407.5675] [INSPIRE].ADSCrossRefGoogle Scholar
  6. [6]
    L.G. Almeida, M. Backović, M. Cliche, S.J. Lee and M. Perelstein, Playing Tag with ANN: Boosted Top Identification with Pattern Recognition, JHEP 07 (2015) 086 [arXiv:1501.05968] [INSPIRE].ADSCrossRefGoogle Scholar
  7. [7]
    L. de Oliveira, M. Kagan, L. Mackey, B. Nachman and A. Schwartzman, Jet-images — deep learning edition, JHEP 07 (2016) 069 [arXiv:1511.05190] [INSPIRE].CrossRefGoogle Scholar
  8. [8]
    P. Baldi, K. Bauer, C. Eng, P. Sadowski and D. Whiteson, Jet Substructure Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D 93 (2016) 094034 [arXiv:1603.09349] [INSPIRE].ADSGoogle Scholar
  9. [9]
    D. Guest, J. Collado, P. Baldi, S.-C. Hsu, G. Urban and D. Whiteson, Jet Flavor Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D 94 (2016) 112002 [arXiv:1607.08633] [INSPIRE].ADSGoogle Scholar
  10. [10]
    K. Datta and A. Larkoski, How Much Information is in a Jet?, JHEP 06 (2017) 073 [arXiv:1704.08249] [INSPIRE].ADSCrossRefGoogle Scholar
  11. [11]
    C. Shimmin et al., Decorrelated Jet Substructure Tagging using Adversarial Neural Networks, Phys. Rev. D 96 (2017) 074034 [arXiv:1703.03507] [INSPIRE].ADSGoogle Scholar
  12. [12]
    K. Cranmer and R.S. Bowman, PhysicsGP: A Genetic Programming Approach to Event Selection, Comput. Phys. Commun. 167 (2005) 165 [physics/0402030] [INSPIRE].
  13. [13]
    S. Whiteson and D. Whiteson, Machine learning for event selection in high energy physics, Eng. Appl. Artif. Intell. 22 (2009) 1203.CrossRefMATHGoogle Scholar
  14. [14]
    P. Baldi, P. Sadowski and D. Whiteson, Searching for Exotic Particles in High-Energy Physics with Deep Learning, Nature Commun. 5 (2014) 4308 [arXiv:1402.4735] [INSPIRE].ADSCrossRefGoogle Scholar
  15. [15]
    J. Searcy, L. Huang, M.-A. Pleier and J. Zhu, Determination of the W W polarization fractions in ppW ± W ± jj using a deep machine learning technique, Phys. Rev. D 93 (2016) 094033 [arXiv:1510.01691] [INSPIRE].ADSGoogle Scholar
  16. [16]
    P. Baldi, K. Cranmer, T. Faucett, P. Sadowski and D. Whiteson, Parameterized neural networks for high-energy physics, Eur. Phys. J. C 76 (2016) 235 [arXiv:1601.07913] [INSPIRE].ADSCrossRefGoogle Scholar
  17. [17]
    P.T. Komiske, E.M. Metodiev and M.D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination, JHEP 01 (2017) 110 [arXiv:1612.01551] [INSPIRE].ADSCrossRefMATHGoogle Scholar
  18. [18]
    J. Barnard, E.N. Dawe, M.J. Dolan and N. Rajcic, Parton Shower Uncertainties in Jet Substructure Analyses with Deep Neural Networks, Phys. Rev. D 95 (2017) 014018 [arXiv:1609.00607] [INSPIRE].ADSGoogle Scholar
  19. [19]
    L.-G. Pang, K. Zhou, N. Su, H. Petersen, H. Stöcker and X.-N. Wang, An equation-of-state-meter of QCD transition from deep learning, arXiv:1612.04262 [INSPIRE].
  20. [20]
    G. Kasieczka, T. Plehn, M. Russell and T. Schell, Deep-learning Top Taggers or The End of QCD?, JHEP 05 (2017) 006 [arXiv:1701.08784] [INSPIRE].ADSCrossRefGoogle Scholar
  21. [21]
    G. Louppe, M. Kagan and K. Cranmer, Learning to Pivot with Adversarial Networks, arXiv:1611.01046 [INSPIRE].
  22. [22]
    L. de Oliveira, M. Paganini and B. Nachman, Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis, Comput. Softw. Big Sci. 1 (2017) 4 [arXiv:1701.05927] [INSPIRE].CrossRefGoogle Scholar
  23. [23]
    J. Pearkes, W. Fedorko, A. Lister and C. Gay, Jet Constituents for Deep Neural Network Based Top Quark Tagging, arXiv:1704.02124 [INSPIRE].
  24. [24]
    G. Louppe, K. Cho, C. Becot and K. Cranmer, QCD-Aware Recursive Neural Networks for Jet Physics, arXiv:1702.00748 [INSPIRE].
  25. [25]
    B.T. Huffman, T. Russell and J. Tseng, Tagging b quarks without tracks using an Artificial Neural Network algorithm, arXiv:1701.06832 [INSPIRE].
  26. [26]
    Y.-H. He, Deep-Learning the Landscape, arXiv:1706.02714 [INSPIRE].
  27. [27]
    L.M. Dery, B. Nachman, F. Rubbo and A. Schwartzman, Weakly Supervised Classification in High Energy Physics, JHEP 05 (2017) 145 [arXiv:1702.00414] [INSPIRE].ADSCrossRefMATHGoogle Scholar
  28. [28]
    N. Quadrianto, A.J. Smola, T.S. Caetano and Q.V. Le, Estimating labels from label proportions, J. Mach. Learn. Res. 10 (2009) 2349.MathSciNetMATHGoogle Scholar
  29. [29]
    F.X. Yu, K. Choromanski, S. Kumar, T. Jebara and S.-F. Chang, On Learning from Label Proportions, arXiv:1402.5902.
  30. [30]
    T.G. Dietterich, R.H. Lathrop and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artif. Intell. 89 (1997) 31.CrossRefMATHGoogle Scholar
  31. [31]
    J. Amores, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence 201 (2013) 81.MathSciNetCrossRefMATHGoogle Scholar
  32. [32]
    J.R. Andersen et al., Les Houches 2015: Physics at TeV Colliders Standard Model Working Group Report, arXiv:1605.04692 [INSPIRE].
  33. [33]
    S.D. Ellis, T.S. Roy and J. Scholtz, Jets and Photons, Phys. Rev. Lett. 110 (2013) 122003 [arXiv:1210.1855] [INSPIRE].ADSCrossRefGoogle Scholar
  34. [34]
    S.D. Ellis, T.S. Roy and J. Scholtz, Phenomenology of Photon-Jets, Phys. Rev. D 87 (2013) 014015 [arXiv:1210.3657] [INSPIRE].ADSGoogle Scholar
  35. [35]
    T. Cohen, M.J. Dolan, S. El Hedri, J. Hirschauer, N. Tran and A. Whitbeck, Dissecting Jets and Missing Energy Searches Using n-body Extended Simplified Models, JHEP 08 (2016) 038 [arXiv:1605.01416] [INSPIRE].ADSCrossRefGoogle Scholar
  36. [36]
    S. Iwamoto, G. Lee, Y. Shadmi and Y. Weiss, Tagging new physics with charm, JHEP 09 (2017) 114 [arXiv:1703.05748] [INSPIRE].ADSCrossRefGoogle Scholar
  37. [37]
    G. Barello, S. Chang, C.A. Newby and B. Ostdiek, Don’t be left in the dark: Improving LHC searches for dark photons using lepton-jet substructure, Phys. Rev. D 95 (2017) 055007 [arXiv:1612.00026] [INSPIRE].ADSGoogle Scholar
  38. [38]
    A. Buckley, A. Shilton and M.J. White, Fast supersymmetry phenomenology at the Large Hadron Collider using machine learning techniques, Comput. Phys. Commun. 183 (2012) 960 [arXiv:1106.4613] [INSPIRE].ADSCrossRefMATHGoogle Scholar
  39. [39]
    N. Bornhauser and M. Drees, Determination of the CMSSM Parameters using Neural Networks, Phys. Rev. D 88 (2013) 075016 [arXiv:1307.3383] [INSPIRE].ADSGoogle Scholar
  40. [40]
    S. Caron, J.S. Kim, K. Rolbiecki, R. Ruiz de Austri and B. Stienen, The BSM-AI project: SUSY-AI-generalizing LHC limits on supersymmetry with machine learning, Eur. Phys. J. C 77 (2017) 257 [arXiv:1605.02797] [INSPIRE].ADSCrossRefGoogle Scholar
  41. [41]
    G. Bertone, M.P. Deisenroth, J.S. Kim, S. Liem, R. Ruiz de Austri and M. Welling, Accelerating the BSM interpretation of LHC data with machine learning, arXiv:1611.02704 [INSPIRE].
  42. [42]
    P. Bechtle et al., SCYNet: Testing supersymmetric models at the LHC with neural networks, Eur. Phys. J. C 77 (2017) 707 [arXiv:1703.01309] [INSPIRE].ADSCrossRefGoogle Scholar
  43. [43]
    E.M. Metodiev, B. Nachman and J. Thaler, Classification without labels: Learning from mixed samples in high energy physics, JHEP 10 (2017) 174 [arXiv:1708.02949] [INSPIRE].ADSCrossRefGoogle Scholar
  44. [44]
    F. Chollet, Keras, (2015).
  45. [45]
    D.P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980.
  46. [46]
    J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Math. Stat. Probab., University of California, 1965/66, vol. 1 (1967), pp. 281-297.Google Scholar
  47. [47]
    S. Lloyd, Least squares quantization in pcm, IEEE Trans. Inf. Theory 28 (1982) 129.MathSciNetCrossRefMATHGoogle Scholar
  48. [48]
    Abstracts, Biometrics 21 (1965) 761 [].
  49. [49]
    F. Pedregosa et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825.MathSciNetMATHGoogle Scholar
  50. [50]
    ATLAS collaboration, Search for squarks and gluinos in final states with jets and missing transverse momentum using 36 fb −1 of \( \sqrt{s}=13 \) TeV pp collision data with the ATLAS detector, ATLAS-CONF-2017-022 (2017).
  51. [51]
    CMS collaboration, Search for supersymmetry in multijet events with missing transverse momentum in proton-proton collisions at 13 TeV, CMS-PAS-SUS-16-033.
  52. [52]
    J. Alwall et al., The automated computation of tree-level and next-to-leading order differential cross sections and their matching to parton shower simulations, JHEP 07 (2014) 079 [arXiv:1405.0301] [INSPIRE].ADSCrossRefGoogle Scholar
  53. [53]
    T. Sjöstrand, S. Mrenna and P.Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026 [hep-ph/0603175] [INSPIRE].
  54. [54]
    DELPHES 3 collaboration, J. de Favereau et al., DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057 [arXiv:1307.6346] [INSPIRE].
  55. [55]
    M. Cacciari, G.P. Salam and G. Soyez, FastJet User Manual, Eur. Phys. J. C 72 (2012) 1896 [arXiv:1111.6097] [INSPIRE].ADSCrossRefGoogle Scholar
  56. [56]
    M. Cacciari, G.P. Salam and G. Soyez, The anti-k t jet clustering algorithm, JHEP 04 (2008) 063 [arXiv:0802.1189] [INSPIRE].ADSCrossRefMATHGoogle Scholar
  57. [57]
    A. Krogh and J.A. Hertz, A simple weight decay can improve generalization, in Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950-957, Morgan Kaufmann Publishers Inc., San Francisco, CA, U.S.A. (1991) [].
  58. [58]
    M.J. Strassler and K.M. Zurek, Echoes of a hidden valley at hadron colliders, Phys. Lett. B 651 (2007) 374 [hep-ph/0604261] [INSPIRE].
  59. [59]
    J. Kang and M.A. Luty, Macroscopic Strings and ‘Quirks’ at Colliders, JHEP 11 (2009) 065 [arXiv:0805.4642] [INSPIRE].ADSCrossRefGoogle Scholar
  60. [60]
    R. Harnik and T. Wizansky, Signals of New Physics in the Underlying Event, Phys. Rev. D 80 (2009) 075015 [arXiv:0810.3948] [INSPIRE].ADSGoogle Scholar
  61. [61]
    S. Knapen, S. Pagan Griso, M. Papucci and D.J. Robinson, Triggering Soft Bombs at the LHC, JHEP 08 (2017) 076 [arXiv:1612.00850] [INSPIRE].ADSCrossRefGoogle Scholar
  62. [62]
    R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006) 21.CrossRefGoogle Scholar
  63. [63]
    L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (2010) 1.CrossRefGoogle Scholar
  64. [64]
    R. Maclin and D.W. Opitz, Popular ensemble methods: An empirical study, J. Artif. Intell. Res. 11 (1999) 169 [arXiv:1106.0257].MATHGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Institute of Theoretical ScienceUniversity of OregonEugeneU.S.A.

Personalised recommendations