Abstract
Determining the best method for training a machine learning algorithm is critical to maximizing its ability to classify data. In this paper, we compare the standard “fully supervised” approach (which relies on knowledge of event-by-event truth-level labels) with a recent proposal that instead utilizes class ratios as the only discriminating information provided during training. This so-called “weakly supervised” technique has access to less information than the fully supervised method and yet is still able to yield impressive discriminating power. In addition, weak supervision seems particularly well suited to particle physics since quantum mechanics is incompatible with the notion of mapping an individual event onto any single Feynman diagram. We examine the technique in detail — both analytically and numerically — with a focus on the robustness to issues of mischaracterizing the training samples. Weakly supervised networks turn out to be remarkably insensitive to a class of systematic mismodeling. Furthermore, we demonstrate that the event level outputs for weakly versus fully supervised networks are probing different kinematics, even though the numerical quality metrics are essentially identical. This implies that it should be possible to improve the overall classification ability by combining the output from the two types of networks. For concreteness, we apply this technology to a signature of beyond the Standard Model physics to demonstrate that all these impressive features continue to hold in a scenario of relevance to the LHC. Example code is provided on GitHub.
Article PDF
Similar content being viewed by others
References
ATLAS collaboration, Performance of b-Jet Identification in the ATLAS Experiment, 2016 JINST 11 P04008 [arXiv:1512.01094] [INSPIRE].
CMS collaboration, Identification of b quark jets at the CMS Experiment in the LHC Run 2, CMS-PAS-BTV-15-001.
ATLAS collaboration, Performance and Calibration of the JetFitterCharm Algorithm for c-Jet Identification, ATL-PHYS-PUB-2015-001 (2015).
CMS collaboration, Identification of c-quark jets at the CMS experiment, CMS-PAS-BTV-16-001.
J. Cogan, M. Kagan, E. Strauss and A. Schwarztman, Jet-Images: Computer Vision Inspired Techniques for Jet Tagging, JHEP 02 (2015) 118 [arXiv:1407.5675] [INSPIRE].
L.G. Almeida, M. Backović, M. Cliche, S.J. Lee and M. Perelstein, Playing Tag with ANN: Boosted Top Identification with Pattern Recognition, JHEP 07 (2015) 086 [arXiv:1501.05968] [INSPIRE].
L. de Oliveira, M. Kagan, L. Mackey, B. Nachman and A. Schwartzman, Jet-images — deep learning edition, JHEP 07 (2016) 069 [arXiv:1511.05190] [INSPIRE].
P. Baldi, K. Bauer, C. Eng, P. Sadowski and D. Whiteson, Jet Substructure Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D 93 (2016) 094034 [arXiv:1603.09349] [INSPIRE].
D. Guest, J. Collado, P. Baldi, S.-C. Hsu, G. Urban and D. Whiteson, Jet Flavor Classification in High-Energy Physics with Deep Neural Networks, Phys. Rev. D 94 (2016) 112002 [arXiv:1607.08633] [INSPIRE].
K. Datta and A. Larkoski, How Much Information is in a Jet?, JHEP 06 (2017) 073 [arXiv:1704.08249] [INSPIRE].
C. Shimmin et al., Decorrelated Jet Substructure Tagging using Adversarial Neural Networks, Phys. Rev. D 96 (2017) 074034 [arXiv:1703.03507] [INSPIRE].
K. Cranmer and R.S. Bowman, PhysicsGP: A Genetic Programming Approach to Event Selection, Comput. Phys. Commun. 167 (2005) 165 [physics/0402030] [INSPIRE].
S. Whiteson and D. Whiteson, Machine learning for event selection in high energy physics, Eng. Appl. Artif. Intell. 22 (2009) 1203.
P. Baldi, P. Sadowski and D. Whiteson, Searching for Exotic Particles in High-Energy Physics with Deep Learning, Nature Commun. 5 (2014) 4308 [arXiv:1402.4735] [INSPIRE].
J. Searcy, L. Huang, M.-A. Pleier and J. Zhu, Determination of the W W polarization fractions in pp → W ± W ± jj using a deep machine learning technique, Phys. Rev. D 93 (2016) 094033 [arXiv:1510.01691] [INSPIRE].
P. Baldi, K. Cranmer, T. Faucett, P. Sadowski and D. Whiteson, Parameterized neural networks for high-energy physics, Eur. Phys. J. C 76 (2016) 235 [arXiv:1601.07913] [INSPIRE].
P.T. Komiske, E.M. Metodiev and M.D. Schwartz, Deep learning in color: towards automated quark/gluon jet discrimination, JHEP 01 (2017) 110 [arXiv:1612.01551] [INSPIRE].
J. Barnard, E.N. Dawe, M.J. Dolan and N. Rajcic, Parton Shower Uncertainties in Jet Substructure Analyses with Deep Neural Networks, Phys. Rev. D 95 (2017) 014018 [arXiv:1609.00607] [INSPIRE].
L.-G. Pang, K. Zhou, N. Su, H. Petersen, H. Stöcker and X.-N. Wang, An equation-of-state-meter of QCD transition from deep learning, arXiv:1612.04262 [INSPIRE].
G. Kasieczka, T. Plehn, M. Russell and T. Schell, Deep-learning Top Taggers or The End of QCD?, JHEP 05 (2017) 006 [arXiv:1701.08784] [INSPIRE].
G. Louppe, M. Kagan and K. Cranmer, Learning to Pivot with Adversarial Networks, arXiv:1611.01046 [INSPIRE].
L. de Oliveira, M. Paganini and B. Nachman, Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis, Comput. Softw. Big Sci. 1 (2017) 4 [arXiv:1701.05927] [INSPIRE].
J. Pearkes, W. Fedorko, A. Lister and C. Gay, Jet Constituents for Deep Neural Network Based Top Quark Tagging, arXiv:1704.02124 [INSPIRE].
G. Louppe, K. Cho, C. Becot and K. Cranmer, QCD-Aware Recursive Neural Networks for Jet Physics, arXiv:1702.00748 [INSPIRE].
B.T. Huffman, T. Russell and J. Tseng, Tagging b quarks without tracks using an Artificial Neural Network algorithm, arXiv:1701.06832 [INSPIRE].
Y.-H. He, Deep-Learning the Landscape, arXiv:1706.02714 [INSPIRE].
L.M. Dery, B. Nachman, F. Rubbo and A. Schwartzman, Weakly Supervised Classification in High Energy Physics, JHEP 05 (2017) 145 [arXiv:1702.00414] [INSPIRE].
N. Quadrianto, A.J. Smola, T.S. Caetano and Q.V. Le, Estimating labels from label proportions, J. Mach. Learn. Res. 10 (2009) 2349.
F.X. Yu, K. Choromanski, S. Kumar, T. Jebara and S.-F. Chang, On Learning from Label Proportions, arXiv:1402.5902.
T.G. Dietterich, R.H. Lathrop and T. Lozano-Pérez, Solving the multiple instance problem with axis-parallel rectangles, Artif. Intell. 89 (1997) 31.
J. Amores, Multiple instance classification: Review, taxonomy and comparative study, Artificial Intelligence 201 (2013) 81.
J.R. Andersen et al., Les Houches 2015: Physics at TeV Colliders Standard Model Working Group Report, arXiv:1605.04692 [INSPIRE].
S.D. Ellis, T.S. Roy and J. Scholtz, Jets and Photons, Phys. Rev. Lett. 110 (2013) 122003 [arXiv:1210.1855] [INSPIRE].
S.D. Ellis, T.S. Roy and J. Scholtz, Phenomenology of Photon-Jets, Phys. Rev. D 87 (2013) 014015 [arXiv:1210.3657] [INSPIRE].
T. Cohen, M.J. Dolan, S. El Hedri, J. Hirschauer, N. Tran and A. Whitbeck, Dissecting Jets and Missing Energy Searches Using n-body Extended Simplified Models, JHEP 08 (2016) 038 [arXiv:1605.01416] [INSPIRE].
S. Iwamoto, G. Lee, Y. Shadmi and Y. Weiss, Tagging new physics with charm, JHEP 09 (2017) 114 [arXiv:1703.05748] [INSPIRE].
G. Barello, S. Chang, C.A. Newby and B. Ostdiek, Don’t be left in the dark: Improving LHC searches for dark photons using lepton-jet substructure, Phys. Rev. D 95 (2017) 055007 [arXiv:1612.00026] [INSPIRE].
A. Buckley, A. Shilton and M.J. White, Fast supersymmetry phenomenology at the Large Hadron Collider using machine learning techniques, Comput. Phys. Commun. 183 (2012) 960 [arXiv:1106.4613] [INSPIRE].
N. Bornhauser and M. Drees, Determination of the CMSSM Parameters using Neural Networks, Phys. Rev. D 88 (2013) 075016 [arXiv:1307.3383] [INSPIRE].
S. Caron, J.S. Kim, K. Rolbiecki, R. Ruiz de Austri and B. Stienen, The BSM-AI project: SUSY-AI-generalizing LHC limits on supersymmetry with machine learning, Eur. Phys. J. C 77 (2017) 257 [arXiv:1605.02797] [INSPIRE].
G. Bertone, M.P. Deisenroth, J.S. Kim, S. Liem, R. Ruiz de Austri and M. Welling, Accelerating the BSM interpretation of LHC data with machine learning, arXiv:1611.02704 [INSPIRE].
P. Bechtle et al., SCYNet: Testing supersymmetric models at the LHC with neural networks, Eur. Phys. J. C 77 (2017) 707 [arXiv:1703.01309] [INSPIRE].
E.M. Metodiev, B. Nachman and J. Thaler, Classification without labels: Learning from mixed samples in high energy physics, JHEP 10 (2017) 174 [arXiv:1708.02949] [INSPIRE].
F. Chollet, Keras, https://github.com/fchollet/keras (2015).
D.P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:1412.6980.
J. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. 5th Berkeley Symp. Math. Stat. Probab., University of California, 1965/66, vol. 1 (1967), pp. 281-297.
S. Lloyd, Least squares quantization in pcm, IEEE Trans. Inf. Theory 28 (1982) 129.
Abstracts, Biometrics 21 (1965) 761 [http://www.jstor.org/stable/2528559].
F. Pedregosa et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825.
ATLAS collaboration, Search for squarks and gluinos in final states with jets and missing transverse momentum using 36 fb −1 of \( \sqrt{s}=13 \) TeV pp collision data with the ATLAS detector, ATLAS-CONF-2017-022 (2017).
CMS collaboration, Search for supersymmetry in multijet events with missing transverse momentum in proton-proton collisions at 13 TeV, CMS-PAS-SUS-16-033.
J. Alwall et al., The automated computation of tree-level and next-to-leading order differential cross sections and their matching to parton shower simulations, JHEP 07 (2014) 079 [arXiv:1405.0301] [INSPIRE].
T. Sjöstrand, S. Mrenna and P.Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026 [hep-ph/0603175] [INSPIRE].
DELPHES 3 collaboration, J. de Favereau et al., DELPHES 3, A modular framework for fast simulation of a generic collider experiment, JHEP 02 (2014) 057 [arXiv:1307.6346] [INSPIRE].
M. Cacciari, G.P. Salam and G. Soyez, FastJet User Manual, Eur. Phys. J. C 72 (2012) 1896 [arXiv:1111.6097] [INSPIRE].
M. Cacciari, G.P. Salam and G. Soyez, The anti-k t jet clustering algorithm, JHEP 04 (2008) 063 [arXiv:0802.1189] [INSPIRE].
A. Krogh and J.A. Hertz, A simple weight decay can improve generalization, in Proceedings of the 4th International Conference on Neural Information Processing Systems, NIPS’91, pp. 950-957, Morgan Kaufmann Publishers Inc., San Francisco, CA, U.S.A. (1991) [http://dl.acm.org/citation.cfm?id=2986916.2987033].
M.J. Strassler and K.M. Zurek, Echoes of a hidden valley at hadron colliders, Phys. Lett. B 651 (2007) 374 [hep-ph/0604261] [INSPIRE].
J. Kang and M.A. Luty, Macroscopic Strings and ‘Quirks’ at Colliders, JHEP 11 (2009) 065 [arXiv:0805.4642] [INSPIRE].
R. Harnik and T. Wizansky, Signals of New Physics in the Underlying Event, Phys. Rev. D 80 (2009) 075015 [arXiv:0810.3948] [INSPIRE].
S. Knapen, S. Pagan Griso, M. Papucci and D.J. Robinson, Triggering Soft Bombs at the LHC, JHEP 08 (2017) 076 [arXiv:1612.00850] [INSPIRE].
R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 6 (2006) 21.
L. Rokach, Ensemble-based classifiers, Artif. Intell. Rev. 33 (2010) 1.
R. Maclin and D.W. Opitz, Popular ensemble methods: An empirical study, J. Artif. Intell. Res. 11 (1999) 169 [arXiv:1106.0257].
Open Access
This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
ArXiv ePrint: 1706.09451
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Cohen, T., Freytsis, M. & Ostdiek, B. (Machine) learning to do more with less. J. High Energ. Phys. 2018, 34 (2018). https://doi.org/10.1007/JHEP02(2018)034
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/JHEP02(2018)034