Abstract
Chapter 1 argued that causal modeling allows risk managers to predict the probable consequences of alternative actions, thereby supporting rational (consequence-driven) deliberation and decision-making. This is practical when enough knowledge and data are available to create and validate causal models, using technical methods such as influence diagrams or simulation models, or more black-box statistical methods such as Granger causality testing and intervention analysis. But what should a decision-maker do when not enough is known to construct a reliable causal model? How can risk analysts help to improve policy and decision-making when the correct probabilistic causal relation between alternative acts and their probable consequences is unknown? This is the challenge of risk management with model uncertainty. It drives technical debates and policy clashes in problems from preparing for climate change, to managing emerging diseases, to operating complex and hazardous facilities safely.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alagoz O, Hsu H, Schaefer AJ, Roberts MS (2010) Markov decision processes: a tool for sequential decision making under uncertainty. Med Decis Making 30(4):474–483
Balaji PG, German X, Srinivasan D (2010) Urban traffic signal control using reinforcement learning agents. Intell Trans Syst IET 4(3):177–188
Ben-Haim Y (2001) Information-gap decision theory. Academic, San Diego
Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton, NJ
Ben-Tal A, Bertsimas D, Brown DB (2010) A soft robust model for optimization under ambiguity. Oper Res 58(4):1220–1234, Part 2 of 2
Bertsimas D, Brown DB (2009) Constructing uncertainty sets for robust linear optimization. Oper Res 57(6):1483–1495
Bertsimas D, Brown DB, Caramanis C (2011) Theory and applications of robust optimization. SIAM Rev 53(3):464–501
Blum A, Mansour Y (2007) From external to internal regret. J Mach Learn Res 8:1307–1324
Bolton RJ, Hand DJ (1999) Statistical fraud detection: a review. Stat Sci 17(3):235–255
Bryant B, Lempert RJ (2010) Thinking inside the box: a participatory, computer assisted approach to scenario discovery. Technol Forecast Soc Change 77(1):34–49
Buckley JJ (1986) Stochastic dominance: an approach to decision making under risk. Risk Anal 6(1):35–41
Burton R (2008) On being certain: believing you are right even when you’re not. St. Martin’s Press, New York
Busoniu L, Babuska R, Schutter BD (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cyb-Part C: Appl Rev 38(2):156–172, www.sciweavers.org/publications/comprehensive-survey-multiagent-reinforcement-learning
Cai C, Liao X, Cari L (2009) Learning to explore and exploit in POMDPs. In: The conference on advances in neural information processing systems, vol 22, pp 198–206. http://people.ee.duke.edu/∼lcarin/LearnE2_NIPS09_22_FINAL.pdf
Carpenter TE, O’Brien JM, Hagerman AD, McCarl BA (2011) Epidemic and economic impacts of delayed detection of foot-and-mouth disease: a case study of a simulated outbreak in California. J Vet Diagn Invest 23(1):26–33, http://www.ncbi.nlm.nih.gov/pubmed/21217024
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, New York, New York
Chades I, Bouteiller B (2005) Solving multiagent Markov decision processes: a forest management example. In: MODSIM 2005 international congress on modelling and simulation
Chen Y, Chen Y (2009) Combining incremental hidden markov model and adaboost algorithm for anomaly intrusion detection. In: Proceedings of the ACM SIGKDD workshop on cybersecurity and intelligence informatics, Paris, June 28–28. Chen H, Dacier M, Moens M, Paass G, Yang CC (eds) CSI-KDD ‘09. ACM, New York, pp 3–9. DOI= http://doi.acm.org/10.1145/1599272.1599276
Churchman CW (1967) Wicked problems. Manage Sci 14(4):B141–B142
Condorcet NC de (1785) Essai sur l’Application de l’Analyse a la Probabilite des Decisions Rendues a la Pluralite des voix, Paris
Cortés EA, Gámez M, Rubio NG (2007) Multiclass corporate failure prediction by Adaboost.M1. Int Adv Econ Res 13(3):301–312
Dalamagkidis D, Kolokotsa D, Kalaitzakis K, Stavrakakis GS (2007) Reinforcement learning for energy conservation and comfort in buildings. Build Environ 42:2686–2698, http://www.tuc.gr/fileadmin/users_data/elci/Kalaitzakis/J.38.pdf
Das TK, Savachkin AA, Zhu Y (2007) A large scale simulation model of pandemic influenza outbreaks for development of dynamic mitigation strategies. IIE Trans 40(9):893–905, http://www.eng.usf.edu/∼das/papers/das_r1.pdf
Dickens L, Broda K, Russo A (2010) The dynamics of multi-agent reinforcement learning. In: Coelho H, Studer R, Wooldridge M (eds) Frontiers in artificial intelligence and applications, vol 215. Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence. http://www.doc.ic.ac.uk/∼lwd03/ecai2010.pdf
Ermon S, Conrad J, Gomes C, Seman B (2011) Risk-sensitive policies for sustainable renewable resource allocation. In: Proceedings of 22nd international joint conference on artificial intelligence (IJCAI), Barcelona
Ernst D, Stan G-B, Gongalves J, Wehenkel L (2006) Clinical data based optimal STI strategies for HIV: a reinforcement learning approach 45th IEEE conference on decision and control, San Diego, 13–15 Dec, pp 667–672. http://www.montefiore.ulg.ac.be/∼stan/CDC_2006.pdf
Fan W, Stolfo S, Zhang J, Chan P (1999) Adacost: misclassification cost-sensitive boosting. In: Proceedings of 16th international conference on machine learning, Bled, pp 97–105
Fiege J, McCurdy B, Potrebko P, Champion H, Cull A (2011) PARETO: a novel evolutionary optimization approach to multiobjective IMRTs planning. Med Phys 38(9):5217–5229
Forsell, Garcia F, Sabbadin R (2009) Reinforcement learning for spatial processes. In: Proceedings of the world IMACS/MODSIM congress, Cairns, 13–17 July 2009. http://www.mssanz.org.au/modsim09/C1/forsell.pdf
Fredriksson A, Forsgren A, Hårdemark B (2011) Minimax optimization for handling range and setup uncertainties in proton therapy. Med Phys 38(3):1672–1684 Fu M (2002) Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing 14(3):192–215
Gardner D (2009) The science of fear: how the culture of fear manipulates your brain. Penguin Group, New York
Ge L, Mourits MC, Kristensen AR, Huirne RB (2010) A modelling approach to support dynamic decision-making in the control of FMD epidemics. Prev Vet Med 95(3–4):167–74, July 1. http://www.ncbi.nlm.nih.gov/pubmed/20471708s
Geibel P, Wysotzk F (2005) Risk-sensitive reinforcement learning applied to control under constraint. J Artif Intell Res 24:81–108
Gilboa I, Schmeidler D (1989) Maxmin expected utility with a non-unique prior. J Math Econ 18:141–153
Green CS, Benson C, Kersten D, Schrater P (2010) Alterations in choice behavior by manipulations of world model. Proc Natl Acad Sci U S A 107(37):16401–16406
Gregoire PL, Desjardins C, Laumonier J, Chaib-draa B (2007) Urban traffic control based on learning agents. In: Intelligent transportation systems conference. ITSC 2007 IEEE: 916–921, Seattle, Print ISBN: 978-1-4244-1396-6, doi: 10.1109/ITSC.2007.4357719
Hansen LP, Sargent TJ (2001) Robust control and model uncertainty. Am Econ Rev 91:60–66
Hansen LP, Sargent TJ (2008) Robustness. Princeton University Press, Princeton
Harford T (2011) Adapt: why success always starts with failure. Farra, Straus and Giroux, New York
Hauskrecht M, Fraser H (2000) Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artif Intell Med 18(3):221–244. www.ncbi.nlm.nih.gov/pubmed/10675716, http://veryoldwww.cs.pitt.edu/∼milos/research/AIMJ-2000.pdf
Hazen E, Seshadhri C (2007) Efficient learning algorithms for changing environments. In: ICML ‘09 proceedings of the 26th annual international conference on machine learning, New York. http://ie.technion.ac.il/∼ehazan/papers/adap-icml2009.pdf
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401, http://mpdc.mae.cornell.edu/Courses/UQ/2676803.pdf
Hrdlicka J, Klema J (2011) Schizophrenia prediction with the adaboost algorithm. Stud Health Technol Inform 169:574–578
Hu W, Hu W, Maybank S (2008) AdaBoost-based algorithm for network intrusion detection. IEEE Trans Syst Man Cybern B Cybern 38(2):577–583
Hutter M, Poland J (2005) Adaptive online prediction by following the perturbed leader. J Mach Learn Res 6:639–660, http://jmlr.csail.mit.edu/papers/volume6/hutter05a/hutter05a.pdf
Inaniwa T, Kanematsu N, Furukawa T, Hasegawa A (2011) A robust algorithm of intensity modulated proton therapy for critical tissue sparing and target coverage. Phys Med Biol 56(15):4749–4770. http://www.ncbi.nlm.nih.gov/pubmed/21753233
Itoh H, Nakamura K (2007) Partially observable Markov decision processes with imprecise parameters. Artif Intell 171(8–9):453–490
Izadi MT, Buckeridge DL (2007). Optimizing anthrax outbreak detection using reinforcement learning. In: IAAI’07 proceedings of the 19th national conference on Innovative applications of artificial intelligence – Volume 2, AAAI Press, Vancouver, http://www.aaai.org/Papers/AAAI/2007/AAAI07-286.pdf
Jafari A, Greenwald A, Gondek D, Ercal G (2001) On no-regret learning, fictitious play, and Nash equilibrium. In: Proceedings of the eighteenth international conference on machine learning, Morgan Kaufmann, San Francisco, pp 226–233. www.cs.brown.edu/∼amy/papers/icml.pdf
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J Mach Learn Res 11:1563–1600
Jung J, Liu CC, Tanimoto S, Vittal V (2002) Adaptation in load shedding under vulnerable operating conditions. IEEE Trans Power Syst 17:1199–1205
Kahneman D (2011) Thinking fast and slow. Farrar, Straus, and Giroux, New York
Kahnt T, Park SQ, Cohen MX, Beck A, Heinz A, Wrase J (2009) Dorsal striatal-midbrain connectivity in humans predicts how reinforcements are used to guide decisions. J Cogn Neurosci 21(7):1332–1345
Kaplan S, Garrick BJ (1981) On the quantitative definition of risk. Risk Anal 1(1):11–27, http://josiah.berkeley.edu/2007Fall/NE275/CourseReader/3.pdf
Koop G, Tole L (2004) Measuring the health effects of air pollution: to what extent can we really say that people are dying from bad air? J Environ Econ Manag 47:30–54, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.6048
Kuyer L, Whiteson S, Bakker B, Vlassis N (2008) Multiagent reinforcement learning for urban traffic control using coordination graphs. In: ECML 2008: proceedings of the nineteenth European conference on machine learning, Perth, pp 656–671
Laeven R, Stadje MA (2011) Entropy coherent and entropy convex measures of risk. Tilburg University CentER discussion paper 2011–2031. http://arno.uvt.nl/show.cgi?fid=114115
Lee EK, Chen CH, Pietz F, Benecke B (2010) Disease propagation analysis and mitigation strategies for effective mass dispensing. In: AMIA annual symposium proceedings, pp 427–431, published online on 13 Nov 2010. http://www.ncbi.nlm.nih.gov/pubmed/21347014
Lempert RJ, Collins MT (2007) Managing the risk of uncertain threshold response: comparison of robust, optimum, and precautionary approaches. Risk Anal 27(4):1009–1026
Lempert R, Kalra N (2008) Managing climate risks in developing countries with robust decision making. World resources report, Washington, DC. http://www.worldresourcesreport.org/files/wrr/papers/wrr_lempert_and_kalra_uncertainty.pdf
Lizotte DJ, Gunter L, Laber E, Murphy SA (2008) Missing data and uncertainty in batch reinforcement learning, NIPS-08 workshop on model uncertainty and risk in RL. http://www.cs.uwaterloo.ca/∼ppoupart/nips08-workshop/nips08-workshop-schedule.html
Lu F, Boritz JE, Covvey HD (2006) Adaptive fraud detection using Benford’s law. In: Advances in artificial intelligence: 19th conference of the Canadian society for computational studies of intelligence. Québec City. http://bit.csc.lsu.edu/∼jianhua/petrov.pdf
Maccheroni F, Marinacci M, Rustichini A (2006) Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica 74:1447–1498
Makridakis S, Hibon M (2000) The M3-competition: results, conclusions and implications. Int J Forecast 16:451–476, http://www.forecastingprinciples.com/files/pdf/Makridakia-The%20M3%20Competition.pdf
Marchau VAWJ, Walker WE, van Wee GP (2010) Dynamic adaptive transport policies for handling deep uncertainty. Technol Forecast Soc Change 77(6):940–950
Masnadi-Shirazi H, Vasconcelos N (2007) Asymmetric boosting. In: Proceedings 24th international conference on machine learning, New York, pp 609–619
McDonald-Madden E, Chadès I, McCarthy MA, Linkie M, Possingham HP (2011) Allocating conservation resources between areas where persistence of a species is uncertain. Ecol Appl 21(3):844–858, http://www.ncbi.nlm.nih.gov/pubmed/21639049
Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM (2010) Comparison of AdaBoost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans Med Imaging 29(1):30–43
Ni Y, Liu Z-Q (2008) Bounded-parameter partially observable Markov decision processes. In: Proceedings of the eighteenth international conference on automated planning and scheduling, Sydney
Niua B, Jinb Y, Lua WC, Li GZ (2009) Predicting toxic action mechanisms of phenols using AdaBoost learner. Chemometr Intell Lab Syst 96(1):43–48
Osada H, Fujita S (2005) CHQ: a multi-agent reinforcement learning scheme for partially observable Markov decision processes. IEICE – Trans Inf Syst E88-D(5):1004–1011
Perkins TJ, Barto AG (2002) Lyapunov design for safe reinforcement learning. J Mach Learn Res 3:803–883, http://jmlr.csail.mit.edu/papers/volume3/perkins02a/perkins02a.pdf
Regan K, Boutilier C (2008) Regret-based reward elicitation for Markov decision processes. NIPS-08 workshop on model uncertainty and risk in RL. http://www.cs.uwaterloo.ca/∼ppoupart/nips08-workshop/nips08-workshop-schedule.html
Rittel H, Webber M (1973). Dilemmas in a general theory of planning. Policy Sci (4):155–169. [Reprinted in Cross N (ed) (1984) Developments in design methodology. Wiley, Chichester, pp 135–144]. http://www.uctc.net/mwebber/Rittel+Webber+Dilemmas+General_Theory_of_Planning.pdf
Ross S, Pineau J, Chaib-draa B, Kreitmann P (2011) POMDPs: a new perspective on the explore-exploit tradeoff in partially observable domains. J Mach Learn Res 12:1729–1770
Sabbadin R, Spring D, Bergonnier E (2007) A Reinforcement-learning application to biodiversity conservation in costa-rican forest. In: 17th Inter. Congress on Modelling and Simulation (MODSIM’07). http://www.mssanz.org.au/MODSIM07/papers/41_s34/AReinforcement_s34_Sabbadin_.pdf Savio A, García-Sebastián M, Graña M, Villanúa J (2009) Results of an Adaboost approach on Alzheimer’s disease detection on MRI. Bioinspired applications in artificial and natural computation lecture notes in computer science, vol 5602, pp 114–123. www.ehu.es/ccwintco/uploads/1/11/GarciaSebastianSavio-VBM_SPM_SVM-IWINAC2009_v2.pdf
Schaefer AJ, Bailey MD, Shechter SM, Roberts MS (2004) Handbook of operations research/management science applications in health care, Modeling medical treatment using Markov decision processes. Kluwer, Boston, pp 593–612, http://www.ie.pitt.edu/∼schaefer/Papers/MDPMedTreatment.pdf
Schönberg T, Daw ND, Joel D, O’Doherty JP (2007) Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J Neurosci 27(47):12860–12867
Smith JE, von Winterfeldt D (2004) Decision analysis in “management science”. Manag Sci 50(5):561–574
Srinivasan J, Gadgil S (2002) Asian brown cloud – fact and fantasy. Curr Sci 83:586–592
Su Q, Lu W, Niu B, Liu X (2011) Classification of the toxicity of some organic compounds to tadpoles (Rana Temporaria) through integrating multiple classifiers. Mol Inform 30(8):672–675
Sutton RS, Barto AG (2005) Reinforcement learning: an introduction, MIT Press. Cambridge, MA. http://rlai.cs.ualberta.ca/∼sutton/book/ebook/the-book.html
Svetnik V, Wang T, Tong C, Liaw A, Sheridan RP, Song Q (2005) Boosting: an ensemble learning tool for compound classification and QSAR modeling. J Chem Inf Model 45(3):786–799, http://www.ncbi.nlm.nih.gov/pubmed/15921468
Szepesvari C (2010) Reinforcement learning algorithms, Morgan & Claypool Publishers. http://books.google.com/books?id=qwtphfl7U74C&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false
Tan C, Chen H, Xia C (2009) Early prediction of lung cancer based on the combination of trace element analysis in urine and an Adaboost algorithm. J Pharm Biomed Anal 49(3):746–752
Walker WE, Marchau VAWJ, Swanson D (2010) Addressing deep uncertainty using adaptive policies introduction to section 2. Technol Forecast Soc Change 77(6):917–923
Waltman L, van Eck NJ (2009) Robust evolutionary algorithm design for socio-economic simulation: some comments. Comput Econ 33:103–105, http://repub.eur.nl/res/pub/18660/RobustEvolutionary_2008.pdf
Wang X, Sandholm T (2002) Reinforcement learning to play an optimal Nash equilibrium in team Markov games. In: Proceedings of the annual conference on neural information processing systems (NIPS), Vancouver. http://books.nips.cc/papers/files/nips15/CN08.pdf
Wang Y, Xie Q, Ammari A (2011) Deriving a near-optimal power management policy using model-free reinforcement learning and Bayesian classification. In: DAC ‘11 proceedings of the 48th design automation conference, ACM, New York
Weick KE, Sutcliffe KM (2007) Managing the unexpected: resilient performance in an age of uncertainty, 2nd edn. Hoboken, New Jersey
Xu X, Sun Y, Huang Z (2007) Defending DDoS attacks using hidden Markov models and cooperative reinforcement learning. In: Proceedings, PAISI’07 proceedings of the 2007 pacific Asia conference on intelligence and security informatics, Springer, Berlin/Heidelberg
Ye D, Zhang M, Sutato D (2011) A hybrid multiagent framework with Q-learning for power grid systems restoration. IEEE Trans Power Syst 26(4):2434–2441
Yousefpour R, Hanewinkel M (2009) Modelling of forest conversion planning with an adaptive simulation-optimization approach and simultaneous consideration of the values of timber, carbon and biodiversity. Ecol Econ 68(6):1711–1722
Yu JY, Mannor S, Shimkin N (2009) Markov decision processes with arbitrary reward processes. Math Oper Res 34(3):737–757
Zhao Y, Kosorok MR, Zeng D (2009) Reinforcement learning design for cancer clinical trials. Stat Med 28(26):3294–3315. http://www.ncbi.nlm.nih.gov/pubmed/19750510
Zhou L, Lai KK (2009) Adaboosting neural networks for credit scoring. Advances in intelligent and soft computing vol 56, pp 875–884. doi: 10.1007/978-3-642-01216-7_93
Zhou B, Chan KW, Yu T (2011) Q-Learning approach for hierarchical AGC Scheme of interconnected power grids. In: The proceedings of international conference on smart grid and clean energy technologies energy procedia, vol 12, Chengdu, pp 43–52
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 Louis Anthony Cox, Jr
About this chapter
Cite this chapter
Cox, L.A. (2012). Improving Individual Risk Management Decisions: Learning from Experience and Coping with Model Uncertainty. In: Improving Risk Analysis. International Series in Operations Research & Management Science, vol 185. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6058-9_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6058-9_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6057-2
Online ISBN: 978-1-4614-6058-9
eBook Packages: Business and EconomicsBusiness and Management (R0)