Structural and Multidisciplinary Optimization

, Volume 59, Issue 5, pp 1521–1542 | Cite as

A method for model selection using reinforcement learning when viewing design as a sequential decision process

  • Jaskanwal P. S. ChhabraEmail author
  • Gordon P. Warn
Research Paper


In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.


Reinforcement learning Tradespace Decision making under uncertainty Sequential decision process Design Multi-fidelity 


Funding information

This study received support from the National Science Foundation (NSF) under NSF Grant CMMI-1455444, and the Graduate Excellence Fellowship provided by the College of Engineering at Pennsylvania State University.

Compliance with ethical standards


Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Pennsylvania State University.


  1. Abel D, Hershkowitz DE, Littman ML (2017) Near optimal behavior via approximate state abstraction. arXiv:1701.04113
  2. Adam S, Busoniu L, Babuska R (2012) Experience replay for real-time reinforcement learning control. IEEE Trans Syst Man Cybern Part C Appl Rev 42(2):201–212CrossRefGoogle Scholar
  3. AISC (2010) Seismic provisions for structural steel buildings. American Institute of Steel Construction, AISCGoogle Scholar
  4. AISC (2011) Steel construction manual. American Institute of Steel ConstructionGoogle Scholar
  5. Andre D, Russell SJ (2002) State abstraction for programmable reinforcement learning agents. In: 18th national conference on artificial intelligence. American Association for Artificial Intelligence, Edmonton, pp 119–125Google Scholar
  6. American Society of Civil Engineers (2007) Seismic rehabilitation of existing buildings, ASCE Publications, 41(6):64–71. Reston, VAGoogle Scholar
  7. ATC (1996) Seismic evaluation and retrofit of concrete buildings. Applied Technology Council, Report ATC-40Google Scholar
  8. Balling R (1999) Design by shopping: a new paradigm?. In: Proceedings of the third world congress of structural and multidisciplinary optimization (WCSMO-3), vol 1, pp 295–297Google Scholar
  9. Bauchau OA, Craig JI (2009) Structural analysis: with applications to aerospace structures, vol 163. Springer Science & Business MediaGoogle Scholar
  10. Bellman R (1957) Dynamic programming. Princeton University PressGoogle Scholar
  11. Busemeyer JR, Townsend JT (1993) Decision field theory: a dynamic-cognitive approach to decision making in an uncertain environment. Psychol Rev 100(3):432–459CrossRefGoogle Scholar
  12. Chhabra JPS, Warn GP (2017) Sequential decision process for tradespace exploration by bounding probabilistic decision criteria using mean-risk analysis. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68112Google Scholar
  13. Chopra AK, Goel RK (1999) Capacity-demand-diagram methods based on inelastic design spectrum. Earthq Spectra 15(4):637–656CrossRefGoogle Scholar
  14. Deierlein GG, Reinhorn AM, Willford MR (2010) Nonlinear structural analysis for seismic design. Technical Report 4, NEHRP. NIST GCR 10-917-5Google Scholar
  15. FEMA (2000) Recommended seismic design criteria for new steel Moment-Frame buildings federal emergency management agency. Washington, DCGoogle Scholar
  16. FEMA (2006) NEHRP recommended provisions: design examples FEMA 451. Building seismic safety council. Washington, DCGoogle Scholar
  17. Foutch DA, Yun S-Y (2002) Modeling of steel moment frames for seismic loads. J Constr Steel Res 58 (5):529–564CrossRefGoogle Scholar
  18. Frey DD, Herder PM, Wijnia Y, Subrahmanian E, Katsikopoulos K, Clausing DP (2009) The pugh controlled convergence method: model-based evaluation and implications for design theory. Res Eng Des 20 (1):41–58CrossRefGoogle Scholar
  19. Geramifard A, Walsh TJ, Tellex S, Chowdhary G, Roy N, How JP et al (2013) A tutorial on linear function approximators for dynamic programming and reinforcement learning. Found Trends®;, Mach Learn 6 (4):375–451CrossRefzbMATHGoogle Scholar
  20. Givan R, Dean T, Greig M (2003) Equivalence notions and model minimization in Markov decision processes. Artif Intell 147(1):163–223MathSciNetCrossRefzbMATHGoogle Scholar
  21. Kalyanakrishnan S, Stone P (2007) Batch reinforcement learning in a complex domain. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems, pp 94. ACMGoogle Scholar
  22. Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th international symposium on artificial intelligence and mathematics, pp 531–539Google Scholar
  23. Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321CrossRefGoogle Scholar
  24. Mattson CA, Messac A (2003) Concept selection using s-pareto frontiers. AIAA J 41(6):1190–1198CrossRefGoogle Scholar
  25. Mazzoni S, McKenna F, Scott MH, Fenves GL et al (2006) OpenSees command language manual. Pacific Earthquake Engineering Research (PEER) CenterGoogle Scholar
  26. Miller SW, Simpson TW, Yukish MA (2017) Two applications of design as a sequential decision process. In: ASME 2017 international design engineering technical conferences and computers and information in engineering conference, Cleveland, Ohio, USA. American Society of Mechanical Engineers. Paper no. DETC2017-68150Google Scholar
  27. Miller SW, Simpson TW, Yukish MA, Bennett LA, Lego SE, Stump GM (2013) Preference construction, sequential decision making, and trade space exploration. In: ASME 2013 international design engineering technical conferences and computers and information in engineering conference, Portland, Oregon. American Society of Mechanical Engineers. Paper no. DETC2013/DAC-13098Google Scholar
  28. Miller SW, Yukish MA, Simpson TW (2018) Design as a sequential decision process. Struct Multidiscip Optim 57(1):305–324MathSciNetCrossRefGoogle Scholar
  29. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRefGoogle Scholar
  30. Moehle J, Deierlein GG (2004) A framework methodology for performance-based earthquake engineering. In: 13th world conference on earthquake engineering, Vancouver, Canada. Paper no. 679Google Scholar
  31. Pareto V (1971) Manual of political economy. MacmillanGoogle Scholar
  32. Payne JW, Bettman JR, Johnson EJ (1993) The adaptive decision maker. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  33. Qian Z, Seepersad CC, Joseph VR, Allen JK, Wu CJ (2006) Building surrogate models based on detailed and approximate simulations. J Mech Des 128(4):668–677CrossRefGoogle Scholar
  34. Sarkisian M, Shook D, Desai D, Wang N (2015) Developing a basis for design–embodied carbon in structures. In: IABSE symposium report, vol 105. International Association for Bridge and Structural Engineering, pp 1–8Google Scholar
  35. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
  36. Shahan DW, Seepersad CC (2012) Bayesian network classifiers for set-based collaborative design. J Mech Des 134(7):071001CrossRefGoogle Scholar
  37. Shocker AD, Ben-Akiva M, Boccara B, Nedungadi P (1991) Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Mark Lett 2(3):181– 197CrossRefGoogle Scholar
  38. Simpson TW, Poplinski J, Koch PN, Allen J (2001) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17(2):129–150CrossRefzbMATHGoogle Scholar
  39. Simpson TW, Spencer D, Yukish MA, Stump G (2008) Visual Steering commands and test problems to support research in trade space exploration. In: 12th AIAA/ISSMO multidisciplinary analysis and optimization conference, Victoria, British Columbia, Canada. Paper no. AIAA 2008-6085Google Scholar
  40. Singer DJ, Doerry N, Buckley ME (2009) What is set-based design? Nav Eng J 121(4):31–43CrossRefGoogle Scholar
  41. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, CambridgezbMATHGoogle Scholar
  42. Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, pp 1057–1063Google Scholar
  43. Tsetsos K, Usher M, Chater N (2010) Preference reversal in multiattribute choice. Psychol Rev 117 (4):1275CrossRefGoogle Scholar
  44. Unal M, Miller SW, Chhabra JPS, Warn GP, Yukish MA, Simpson TW (2017) A sequential decision process for the system-level design of structural frames. Struct Multidiscip Optim 56(5):991– 1011CrossRefGoogle Scholar
  45. Ward AC (1989) A theory of quantitative inference for artifact sets applied to a mechanical design compiler. Technical report, DTIC DocumentGoogle Scholar
  46. Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279– 292CrossRefzbMATHGoogle Scholar
  47. Woodruff MJ, Reed PM, Simpson TW (2013) Many objective visual analytics: rethinking the design of complex engineered systems. Struct Multidiscip Optim 48(1):201–219CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Pennsylvania State UniversityState CollegeUSA

Personalised recommendations