Skip to main content

When Does Simulated Data Match Real Data?

Comparing Model Calibration Functions Using Genetic Algorithms

  • Conference paper
  • First Online:

Part of the book series: Agent-Based Social Systems ((ABSS,volume 11))

Abstract

Agent-based models can be calibrated to replicate real-world data sets, but choosing the best set of parameters to achieve this result can be difficult. To validate a model, the real-world data set is often divided into a training and a test set. The training set is used to calibrate the parameters, and the test set is used to determine if the calibrated model represents the real-world data. The difference between the real-world data and the simulated data is determined using an error measure. When using evolutionary computation to choose the parameters, this error measure becomes the fitness function, and choosing the appropriate measure becomes even more crucial for a successful calibration process. We survey the effect of five different error measures in the context of a toy problem and a real-world problem (simulating online news consumption). We use each error measure in turn to calibrate on the training data set, and then examine the results of all five error measures on both the training and test data sets. For the toy problem, one measure was the Pareto-dominant choice for calibration, but no error measure dominated all the others for the real-world problem. Additionally, we observe the counterintuitive result that calibrating using one measure may sometimes lead to better performance on a second measure than could be achieved by calibrating using that second measure directly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Our goal here is not to argue for the superiority of genetic algorithms for model calibration, but to examine the use of different error measures as fitness functions. We expect our findings to generalize to other metaheuristic search algorithms, but this should be confirmed in future work.

  2. 2.

    Our apologies to J.R.R. Tolkien.

References

  1. Althaus S, Tewksbury D (2000) Patterns of Internet and traditional news media use in a networked community. Polit Commun 17(1):21–45

    Article  Google Scholar 

  2. Bankes S (2002) Agent-based modeling: a revolution? PNAS 99(10):7199–7200

    Article  Google Scholar 

  3. Calvez B, Hutzler G (2005) Automatic tuning of agent-based models using genetic algorithms. In: MABS 2005: proceedings of the 6th international workshop on multi-agent-based simulation

    Google Scholar 

  4. Conway R, Johnson B, Maxwell W (1959) Some problems of digital systems simulation. Manage Sci 6(1):92–110

    Article  Google Scholar 

  5. Dutta-Bergman M (2006) Community participation and internet use after September 11: complementarity in channel consumption. J Comput Mediat Commun 11(2):469–484

    Article  Google Scholar 

  6. Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, New York

    Google Scholar 

  7. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading

    Google Scholar 

  8. Hasan AA, Dellarocas C, Lucas HC, Yim D (2010) The impact of the internet and online news on newspapers and voter behavior. Technical report, University of Maryland

    Google Scholar 

  9. Hassan S, Antunes L, Pavon J, Gilbert N (2008) Stepping on earth: a roadmap for data-driven agent-based modelling. In: Proceedings of the 5th conference of the European social simulation association (ESSA08)

    Google Scholar 

  10. Hassan S, Pavón J, Antunes L, Gilbert N (2010) Injecting data into agent-based simulation. In: Simulating interacting agents and social phenomena. Springer, New York, pp 177–191

    Google Scholar 

  11. Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  12. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  13. Langville AN, Meyer CD (2005) A survey of eigenvector methods for web information retrieval. SIAM Rev 47:135–161. doi:10.1137/S0036144503424786. URL http://portal.acm.org/citation.cfm?id=1055334.1055396

  14. Ma T, Abdulhai B (2002) Genetic algorithm-based optimization approach and generic tool for calibrating traffic microscopic simulation parameters. Transp Res Rec J Transp Res Board 1800:6–15

    Article  Google Scholar 

  15. Midgley D, Marks R, Kunchamwar D (2007) Building and assurance of agent-based models: an example and challenge to the field. J Bus Res 60(8):884–893

    Article  Google Scholar 

  16. Miller J (1998) Active nonlinear tests (ANTs) of complex simulation models. Manage Sci 44(6):820–830

    Article  Google Scholar 

  17. Narzisi G, Mysore V, Mishra B (2006) Multi-objective evolutionary optimization of agent-based models: an application to emergency response planning. In: Proceedings of the second IASTED international conference on computational intelligence

    Google Scholar 

  18. North, M, Macal C (2007) Managing business complexity: discovering strategic solutions with agent-based modeling and simulation. Oxford University Press, Oxford

    Book  Google Scholar 

  19. Purcell K, Rainie L, Mitchell A, Rosenstiel T, Olmstead K (2010) Understanding the participatory news consumer. Pew Internet Am Life Proj 1:1–51

    Google Scholar 

  20. Rand W, Rust R (2011) Agent-based modeling in marketing: guidelines for rigor. Int J Res Mark 28(3):181–193

    Article  Google Scholar 

  21. Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. In: Sixth international conference on computer vision, 1998. IEEE, pp 59–66

    Google Scholar 

  22. Stonedahl F, Wilensky U (2010) BehaviorSearch [computer software]. Center for connected learning and computer based modeling, Northwestern University, Evanston. Available online: http://www.behaviorsearch.org/

  23. Stonedahl F, Wilensky U (2010) Evolutionary robustness checking in the artificial anasazi model. In: Proceedings of the 2010 AAAI fall symposium on complex adaptive systems

    Google Scholar 

  24. Stonedahl F, Rand W, Wilensky U (2010) Evolving viral marketing strategies. In: Proceedings of the 12th annual conference on genetic and evolutionary computation. ACM, New York, pp 1195–1202

    Google Scholar 

  25. Tewksbury D (2003) What do Americans really want to know? Tracking the behavior of news readers on the internet. J Commun 53(4):694–710

    Article  Google Scholar 

  26. Tewksbury D (2005) The seeds of audience fragmentation: specialization in the use of online news sites. J Broadcast Electronic Media 49(3):332–348

    Article  Google Scholar 

  27. Thorngate W, Edmonds B (2013) Measuring simulation-observation fit: an introduction to ordinal pattern analysis. J Artif Soc Soc Simul 16(2):4. URL http://jasss.soc.surrey.ac.uk/16/2/4.html

  28. Wahle J, Schreckenberg M (2001) A multi-agent system for on-line simulations based on real-world traffic data. In: Proceedings of the 34th annual Hawaii international conference on system sciences, 2001. IEEE, p 9

    Google Scholar 

  29. Weinberg R (1970) Computer simulation of a living cell. Ph.D. thesis, University of Michigan

    Google Scholar 

  30. Wilensky U (1999) NetLogo. http://ccl.northwestern.edu/netlogo/

Download references

Acknowledgements

We thank Uri Wilensky for his support for F.S., and Northwestern’s Quest HPCC for providing computational resources for this work. We also acknowledge support from Google under the Google Marketing Research Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Forrest Stonedahl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Japan

About this paper

Cite this paper

Stonedahl, F., Rand, W. (2014). When Does Simulated Data Match Real Data?. In: Chen, SH., Terano, T., Yamamoto, R., Tai, CC. (eds) Advances in Computational Social Science. Agent-Based Social Systems, vol 11. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54847-8_19

Download citation

Publish with us

Policies and ethics