Branes with brains: exploring string vacua with deep reinforcement learning

  • James HalversonEmail author
  • Brent Nelson
  • Fabian Ruehle
Open Access
Regular Article - Theoretical Physics


We propose deep reinforcement learning as a model-free method for exploring the landscape of string vacua. As a concrete application, we utilize an artificial intelligence agent known as an asynchronous advantage actor-critic to explore type IIA compactifications with intersecting D6-branes. As different string background configurations are explored by changing D6-brane configurations, the agent receives rewards and punishments related to string consistency conditions and proximity to Standard Model vacua. These are in turn utilized to update the agent’s policy and value neural networks to improve its behavior. By reinforcement learning, the agent’s performance in both tasks is significantly improved, and for some tasks it finds a factor of \( \mathcal{O}(200) \) more solutions than a random walker. In one case, we demonstrate that the agent learns a human-derived strategy for finding consistent string models. In another case, where no human-derived strategy exists, the agent learns a genuinely new strategy that achieves the same goal twice as efficiently per unit time. Our results demonstrate that the agent learns to solve various string theory consistency conditions simultaneously, which are phrased in terms of non-linear, coupled Diophantine equations.


Superstring Vacua D-branes 


Open Access

This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.


  1. [1]
    S. Ashok and M.R. Douglas, Counting flux vacua, JHEP 01 (2004) 060 [hep-th/0307049] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  2. [2]
    W. Taylor and Y.-N. Wang, The F-theory geometry with most flux vacua, JHEP 12 (2015) 164 [arXiv:1511.03209] [INSPIRE].MathSciNetzbMATHGoogle Scholar
  3. [3]
    J. Halverson, C. Long and B. Sung, Algorithmic universality in F-theory compactifications, Phys. Rev. D 96 (2017) 126006 [arXiv:1706.02299] [INSPIRE].
  4. [4]
    W. Taylor and Y.-N. Wang, Scanning the skeleton of the 4D F-theory landscape, JHEP 01 (2018) 111 [arXiv:1710.11235] [INSPIRE].
  5. [5]
    R. Altman, J. Carifio, J. Halverson and B.D. Nelson, Estimating Calabi-Yau hypersurface and triangulation counts with equation learners, JHEP 03 (2019) 186 [arXiv:1811.06490] [INSPIRE].CrossRefzbMATHGoogle Scholar
  6. [6]
    W. Lerche, D. Lüst and A. Schellekens, Chiral four-dimensional heterotic strings from self-dual lattices, Nucl. Phys. B 287 (1987) 477.Google Scholar
  7. [7]
    F. Denef and M.R. Douglas, Computational complexity of the landscape. I, Annals Phys. 322 (2007) 1096 [hep-th/0602072] [INSPIRE].
  8. [8]
    J. Halverson and F. Ruehle, Computational complexity of vacua and near-vacua in field and string theory, Phys. Rev. D 99 (2019) 046015 [arXiv:1809.08279] [INSPIRE].
  9. [9]
    M. Cvetič, I. Garcia-Etxebarria and J. Halverson, On the computation of non-perturbative effective potentials in the string theory landscape: IIB/F-theory perspective, Fortsch. Phys. 59 (2011) 243 [arXiv:1009.5386] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  10. [10]
    Y.-H. He, Deep-learning the landscape, arXiv:1706.02714 [INSPIRE].
  11. [11]
    D. Krefl and R.-K. Seong, Machine learning of Calabi-Yau volumes, Phys. Rev. D 96 (2017) 066014 [arXiv:1706.03346] [INSPIRE].
  12. [12]
    F. Ruehle, Evolving neural networks with genetic algorithms to study the string landscape, JHEP 08 (2017) 038 [arXiv:1706.07024] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  13. [13]
    J. Carifio, J. Halverson, D. Krioukov and B.D. Nelson, Machine learning in the string landscape, JHEP 09 (2017) 157 [arXiv:1707.00655] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  14. [14]
    D. Klaewer and L. Schlechter, Machine learning line bundle cohomologies of hypersurfaces in toric varieties, Phys. Lett. B 789 (2019) 438 [arXiv:1809.02547] [INSPIRE].
  15. [15]
    J. Liu, Artificial neural network in cosmic landscape, JHEP 12 (2017) 149 [arXiv:1707.02800] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  16. [16]
    Y.-N. Wang and Z. Zhang, Learning non-Higgsable gauge groups in 4D F-theory, JHEP 08 (2018) 009 [arXiv:1804.07296] [INSPIRE].MathSciNetCrossRefGoogle Scholar
  17. [17]
    R. Jinno, Machine learning for bounce calculation, arXiv:1805.12153 [INSPIRE].
  18. [18]
    K. Bull, Y.-H. He, V. Jejjala and C. Mishra, Machine learning CICY threefolds, Phys. Lett. B 785 (2018) 65 [arXiv:1806.03121] [INSPIRE].
  19. [19]
    T. Rudelius, Learning to inflate, JCAP 02 (2019) 044 [arXiv:1810.05159] [INSPIRE].CrossRefGoogle Scholar
  20. [20]
    V. Jejjala, A. Kar and O. Parrikar, Deep learning the hyperbolic volume of a knot, arXiv:1902.05547 [INSPIRE].
  21. [21]
    K. Hashimoto, S. Sugishita, A. Tanaka and A. Tomiya, Deep learning and holographic QCD, Phys. Rev. D 98 (2018) 106014 [arXiv:1809.10536] [INSPIRE].
  22. [22]
    A. Cole and G. Shiu, Topological data analysis for the string landscape, JHEP 03 (2019) 054 [arXiv:1812.06960] [INSPIRE].CrossRefzbMATHGoogle Scholar
  23. [23]
    A. Mütter, E. Parr and P.K.S. Vaudrevange, Deep learning in the heterotic orbifold landscape, Nucl. Phys. B 940 (2019) 113 [arXiv:1811.05993] [INSPIRE].
  24. [24]
    I.J. Goodfellow et al., Generative adversarial networks, arXiv:1406.2661 [INSPIRE].
  25. [25]
    H. Erbin and S. Krippendorf, GANs for generating EFT models, arXiv:1809.02612 [INSPIRE].
  26. [26]
    J. Carifio et al., Vacuum selection from cosmology on networks of string geometries, Phys. Rev. Lett. 121 (2018) 101602 [arXiv:1711.06685] [INSPIRE].CrossRefGoogle Scholar
  27. [27]
    D. Silver et al., Mastering the game of go with deep neural networks and tree search, Nature 529 (2016) 484.CrossRefGoogle Scholar
  28. [28]
    D. Silver et al., Mastering the game of go without human knowledge, Nature 550 (2017) 354.CrossRefGoogle Scholar
  29. [29]
    I. Bello et al., Neural combinatorial optimization with reinforcement learning, arXiv:1611.09940.
  30. [30]
    R.S. Sutton and A.G. Barto, Reinforcement learning: an introduction, MIT Press, U.S.A. (1998).Google Scholar
  31. [31]
  32. [32]
    G. Brockman et al., Openai gym, arXiv:1606.01540.
  33. [33]
    V. Mnih et al., Asynchronous methods for deep reinforcement learning, arXiv:1602.01783.
  34. [34]
    R. Williams, A class of gradient-estimating algorithms for reinforcement learning in neural networks, in ICNN, M. Caudill and C. Butler eds. IEEE, New York U.S.A. (1987).Google Scholar
  35. [35]
    R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (1992) 229.zbMATHGoogle Scholar
  36. [36]
    V. Mnih et al., Human-level control through deep reinforcement learning, Nature 518 (2015) 529.CrossRefGoogle Scholar
  37. [37]
    M. Birck et al., Multi-task reinforcement learning: An hybrid a3c domain approach, (2017).Google Scholar
  38. [38]
    M.R. Douglas and W. Taylor, The landscape of intersecting brane models, JHEP 01 (2007) 031 [hep-th/0606109] [INSPIRE].MathSciNetCrossRefGoogle Scholar
  39. [39]
    A.M. Uranga, D-brane probes, RR tadpole cancellation and k-theory charge, Nucl. Phys. B 598 (2001) 225 [hep-th/0011048] [INSPIRE].
  40. [40]
    E. Witten, An SU(2) anomaly, Phys. Lett. B 117 (1982) 324 [INSPIRE].
  41. [41]
    F. Gmeiner et al., One in a billion: MSSM-like D-brane statistics, JHEP 01 (2006) 004 [hep-th/0510170] [INSPIRE].
  42. [42]
    M.R. Douglas, The statistics of string/M theory vacua, JHEP 05 (2003) 046 [hep-th/0303194] [INSPIRE].MathSciNetCrossRefGoogle Scholar
  43. [43]
    B.S. Acharya, F. Denef and R. Valandro, Statistics of M-theory vacua, JHEP 06 (2005) 056 [hep-th/0502060] [INSPIRE].MathSciNetCrossRefGoogle Scholar
  44. [44]
    E.I. Buchbinder, A. Constantin and A. Lukas, The moduli space of heterotic line bundle models: a case study for the tetra-quadric, JHEP 03 (2014) 025 [arXiv:1311.1941] [INSPIRE].CrossRefGoogle Scholar
  45. [45]
    M. Cvetič, J. Halverson, D. Klevers and P. Song, On finiteness of Type IIB compactifications: Magnetized branes on elliptic Calabi-Yau threefolds, JHEP 06 (2014) 138 [arXiv:1403.4943] [INSPIRE].MathSciNetCrossRefzbMATHGoogle Scholar
  46. [46]
    S. Groot Nibbelink, O. Loukas, F. Ruehle and P.K.S. Vaudrevange, Infinite number of MSSMs from heterotic line bundles?, Phys. Rev. D 92 (2015) 046002 [arXiv:1506.00879] [INSPIRE].
  47. [47]
    V. Mnih et al., Asynchronous methods for deep reinforcement learning, arXiv:1602.01783.
  48. [48]
    S. Tokui, K. Oono, S. Hido and J. Clayton, Chainer: a next-generation open source framework for deep learning, in the proceedings of the Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS ), December 7-12, Montreal, Canada (2015).
  49. [49]
    M. Cvetič, T. Li and T. Liu, Supersymmetric Pati-Salam models from intersecting D6-branes: a road to the standard model, Nucl. Phys. B 698 (2004) 163 [hep-th/0403061] [INSPIRE].
  50. [50]
    M. Bukov et al., Reinforcement learning in different phases of quantum control, Phys. Rev. X 8 (2018) 031086.Google Scholar
  51. [51]
    R. Sweke et al., Reinforcement learning decoders for fault-tolerant quantum computation, arXiv:1810.07207.
  52. [52]
    V. Rosenhaus and W. Taylor, Diversity in the tail of the intersecting brane landscape, JHEP 06 (2009) 073 [arXiv:0905.1951] [INSPIRE].MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2019

Authors and Affiliations

  1. 1.Department of PhysicsNortheastern UniversityBostonU.S.A.
  2. 2.CERN, CERN, Theoretical Physics DepartmentGeneva 23Switzerland
  3. 3.Rudolf Peierls Centre for Theoretical PhysicsOxford UniversityOxfordU.K.

Personalised recommendations