Abstract
Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Our goal here is not to argue for the superiority of genetic algorithms for model calibration, but to examine the use of different error measures as fitness functions. We expect our findings to generalize to other metaheuristic search algorithms, but this should be confirmed in future work.
- 2.
Our apologies to J.R.R. Tolkien.
References
Althaus S, Tewksbury D (2000) Patterns of Internet and traditional news media use in a networked community. Polit Commun 17(1):21–45
Bankes S (2002) Agent-based modeling: a revolution? PNAS 99(10):7199–7200
Calvez B, Hutzler G (2005) Automatic tuning of agent-based models using genetic algorithms. In: MABS 2005: proceedings of the 6th international workshop on multi-agent-based simulation
Conway R, Johnson B, Maxwell W (1959) Some problems of digital systems simulation. Manage Sci 6(1):92–110
Dutta-Bergman M (2006) Community participation and internet use after September 11: complementarity in channel consumption. J Comput Mediat Commun 11(2):469–484
Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, New York
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
Hasan AA, Dellarocas C, Lucas HC, Yim D (2010) The impact of the internet and online news on newspapers and voter behavior. Technical report, University of Maryland
Hassan S, Antunes L, Pavon J, Gilbert N (2008) Stepping on earth: a roadmap for data-driven agent-based modelling. In: Proceedings of the 5th conference of the European social simulation association (ESSA08)
Hassan S, Pavón J, Antunes L, Gilbert N (2010) Injecting data into agent-based simulation. In: Simulating interacting agents and social phenomena. Springer, New York, pp 177–191
Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47:135–161. doi:10.1137/S0036144503424786. URL http://portal.acm.org/citation.cfm?id=1055334.1055396
Ma T, Abdulhai B (2002) Genetic algorithm-based optimization approach and generic tool for calibrating traffic microscopic simulation parameters. Transp Res Rec J Transp Res Board 1800:6–15
Midgley D, Marks R, Kunchamwar D (2007) Building and assurance of agent-based models: an example and challenge to the field. J Bus Res 60(8):884–893
Miller J (1998) Active nonlinear tests (ANTs) of complex simulation models. Manage Sci 44(6):820–830
Narzisi G, Mysore V, Mishra B (2006) Multi-objective evolutionary optimization of agent-based models: an application to emergency response planning. In: Proceedings of the second IASTED international conference on computational intelligence
North, M, Macal C (2007) Managing business complexity: discovering strategic solutions with agent-based modeling and simulation. Oxford University Press, Oxford
Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet Am Life Proj 1:1–51
Rand W, Rust R (2011) Agent-based modeling in marketing: guidelines for rigor. Int J Res Mark 28(3):181–193
Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998. IEEE, pp 59–66
Stonedahl F, Wilensky U (2010) BehaviorSearch [computer software]. Center for connected learning and computer based modeling, Northwestern University, Evanston. Available online: http://www.behaviorsearch.org/
Stonedahl F, Wilensky U (2010) Evolutionary robustness checking in the artificial anasazi model. In: Proceedings of the 2010 AAAI fall symposium on complex adaptive systems
Stonedahl F, Rand W, Wilensky U (2010) Evolving viral marketing strategies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1195–1202
Tewksbury D (2003) What do Americans really want to know? Tracking the behavior of news readers on the internet. J Commun 53(4):694–710
Tewksbury D (2005) The seeds of audience fragmentation: specialization in the use of online news sites. J Broadcast Electronic Media 49(3):332–348
Thorngate W, Edmonds B (2013) Measuring simulation-observation fit: an introduction to ordinal pattern analysis. J Artif Soc Soc Simul 16(2):4. URL http://jasss.soc.surrey.ac.uk/16/2/4.html
Wahle J, Schreckenberg M (2001) A multi-agent system for on-line simulations based on real-world traffic data. In: Proceedings of the 34th annual Hawaii international conference on system sciences, 2001. IEEE, p 9
Weinberg R (1970) Computer simulation of a living cell. Ph.D. thesis, University of Michigan
Wilensky U (1999) NetLogo. http://ccl.northwestern.edu/netlogo/
Acknowledgements
We thank Uri Wilensky for his support for F.S., and Northwestern’s Quest HPCC for providing computational resources for this work. We also acknowledge support from Google under the Google Marketing Research Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Japan
About this paper
Cite this paper
Stonedahl, F., Rand, W. (2014). When Does Simulated Data Match Real Data?. In: Chen, SH., Terano, T., Yamamoto, R., Tai, CC. (eds) Advances in Computational Social Science. Agent-Based Social Systems, vol 11. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54847-8_19
Download citation
DOI: https://doi.org/10.1007/978-4-431-54847-8_19
Published:
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-54846-1
Online ISBN: 978-4-431-54847-8
eBook Packages: Business and EconomicsEconomics and Finance (R0)