Abstract
The goal of this work is to investigate under what parameter conditions reinforcement learning works, and furthermore, how these parameters affect the performance. We therefore break this problem into two parts. The first part attempts to find parameter subregions, within a large parameter space, for which reinforcement learning is generally successful; we call these regions convergent subregions of the parameter space such that reinforcement learning runs frequently converge. The second part takes a closer look at these convergent subregions and attempts to understand how these parameters affect learning performance and what parameters are the most influential. The problem domains analyzed later in this work use very similar experimental methodologies and analysis procedures, and instead of repeating the methodology used for each problem domain, we present the methods used in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ankenman, B., Nelson, B. L., & Staum, J. (2010). Stochastic kriging for simulation metamodeling. Operations Research, 58(2), 371–382.
Breiman, L. (2001). Random forestss. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. New York, NY: Chapman & Hall.
Chen, X. & Kim, K. (2014). Stochastic kriging with biased sample estimates. ACM Transactions on Modeling and Computer Simulation, 24(2). doi: 10.1145/2567893
Chen, V. C. P., Tsui, K.-L., Barton, R. R., & Mechesheimer, M. (2006). A review on design, modeling and applications of computer experiments. IIE Transactions, 38(4), 273–291.
Cho, K. & Dunn, S. M. (1994). Learning shape classes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9), 882–888.
Cressie, N. A. C. (1993). Statistics for Spatial Data (2nd edition). New York, NY: Wiley.
Faure, H. (1982). Discrepancy of sequences associated with a number system (in dimension s). Acta Arithmetica, 41(4), 337–351.
Fruth, J., Muehlenstaedt, T., & Roustant, O. (2013). fanovaGraph: Building Kriging models from FANOVA graphs (Manual for R package fanovaGraph, version 1.4.7). Retrieved from http://cran.r-project.org/web/packages/fanovaGraph/ fanovaGraph.pdf.
Gatti, C. J. & Embrechts, M. J. (2012). Reinforcement learning with neural networks: Tricks of the trade. In Georgieva, P., Mihayolva, L., & Jain, L. (Eds.), Advances in Intelligent Signal Processing and Data Mining (pp. 275–310). New York, NY: Springer-Verlag.
Gatti, C. J., Embrechts, M. J., & Linton, J. D. (2011a). Parameter settings of reinforcement learning for the game of Chung Toi. In Proceedings of the 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2011), Anchorage, AK, 9–12 October (pp. 3530–3535). doi: 10.1109/ICSMC.2011.6084216
Halton, J. (1960). On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numerische Mathematik, 2(1), 84–90.
Hammersley, J. M. (1960). Monte carlo methods for solving multivariate problems. Annals of the New York Academy of Sciences, 86, 844–874.
Hornberger, G. M. & Spear, R. C. (1981). An approach to the preliminary analysis of environmental systems. Journal of Environmental Management, 12, 7–18.
Jansen, M. J. W. (1999). Analysis of variance designs for model output. Computational Physics Communications, 117(1), 35–43.
Krige, D. G. (1951). A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa, 52(6), 119–139.
Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8), 1246–1266.
Monod, H., Naud, C., & Makowski, D. (2006). Uncertainty and sensitivity analysis for crop models. In Working with Dynamic Crop Models: Evaluation, Analysis, Parameterization, and Applications. Amsterdam, Netherlands: Elsevier.
Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: SIAM.
Pujol, G., Iooss, B., & Janon, A. (2012). sensitivity: Sensitivity Analysis (Manual for R package sensitivity, version 1.8-2). Retrieved from http://cran.r-project.org/web/packages/sensitivity/sensitivity.pdf.
Qu, H. & Fu, M. C. (2013). Gradient extrapolated stochastic kriging. ACM Transactions on Modeling and Computer Simulation, 9(4). doi: 10.1145/0000000. 0000000
Rasmussen, C. & Williams, C. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.
Ratto, M., Pagano, A., & Young, P. (2007). Factor mapping and metamodeling (Technical Report EUR 21878 EN - 2007, European Commission, Joint Research Centre). Retrieved from http://publications.jrc.ec.europa.eu/repository/bitstream/111111111/13310/1/ reqno_jrc37692_eur 21878 - factor mapping and metamodelling[2].pdf
Robertson, B. L., Price, C. J., & Reale, M. (2013). CARTopt: A random search method for nonsmooth unconstrained optimization. Computational Optimization and Applications, 56(2), 291–315.
Roustant, O., Ginsbourger, D., & Deville, Y. (2012a). DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. Journal of Statistical Software, 51(1), 1–55.
Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4(4), 409–423.
Saltelli, A., Tarantola, S., & Chan, K. P.-S. (1999). A quantitative model-independent method for global sensitivity analysis of model output. Technometrics, 41(1), 39–56.
Saltelli, A., Tarantola, S., Campolongo, F., & Ratto, M. (2004). Sensitivity Analysis in Practice. Hoboken, NJ: Wiley.
Sobol’, I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. U.S.S.R. Computational Mathematics and Mathematical Physics, 7(4), 86–112.
Sobol’, I. M. (2001). Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55(1–3), 271–280.
Therneau, T., Atkinson, B., & Ripley, B. (2012). rpart: Recursive Partitioning and Regression Trees (Manual for R package rpart, version 4.1-8). Retrieved from http://cran.r-project.org/web/packages/rpart/rpart.pdf.
van Beers, W. & Kleijnen, J. P. C. (2003). Kriging for interpolation in random simulations. Journal of the Operational Research Society, 54(3), 2233–2241.
Xie, W., Nelson, B., & Staum, J. (2010). The influence of correlation function on stochastic kriging metamodels. In Proceedings of the 2010 Winter Simulation Conference (WSC), Baltimore, MD, 5–8 December (pp. 1067–1078). doi: 10.1109/WSC.2010.5679083
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gatti, C. (2015). Methodology. In: Design of Experiments for Reinforcement Learning. Springer Theses. Springer, Cham. https://doi.org/10.1007/978-3-319-12197-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-12197-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12196-3
Online ISBN: 978-3-319-12197-0
eBook Packages: EngineeringEngineering (R0)