Validation Metrics: A Case for Pattern-Based Methods

Marks, Robert E.

doi:10.1007/978-3-319-70766-2_13

Robert E. Marks¹⁰

Part of the book series: Simulation Foundations, Methods and Applications ((SFMA))

2644 Accesses
2 Citations

Abstract

This chapter discusses the issue of choosing the best computer model for simulating a real-world phenomenon through the process of validating the model’s output against the historical, real-world data. Four families of techniques are discussed that are used in the context of validation. One is based on the comparison of statistical summaries of the historical data and the model output. The second is used where the models and data are stochastic, and distributions of variables must be compared, and a metric is used to measure their closeness. After exploring the desirable properties of such a measure, the paper compares the third and fourth methods (from information theory) of measuring closeness of patterns, using an example from strategic market competition. The techniques can, however, be used for validating computer models in any domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For an overview of types of computer simulation modelling, see Gilbert and Troitzsch (2005).
2.
We distinguish between broader measure and narrower metric—a metric is a measure, but a measure is not necessarily a metric—as discussed in Sect. 13.2 below.
3.
See Marks (2007), Midgley et al. (2007), Oberkampf and Roy (2010), and Liu et al. (2010) for further general discussions of validation.
4.
As Guerini and Moneta (2017) observe, the appearance of many measures to validate agent-based simulation models is an indication of “the vitality of the agent-based community.”
5.
This chapter, in effect, focuses on techniques of output validation (see Chap. 30, Sects. 4.2, 5.1 and 5.2 by Fagiolo et al. in this volume), going into greater detail about three of the six measures they discuss.
6.
This chapter puts the work of Marks (2013) into a wider context.
7.
A temperature of 100K is twice as hot as 50 K, but 100 \(^{\circ }\)C is not twice as hot as 50 \(^{\circ }\)C: K is a ratio scale, but \(^{\circ }\)C is only an interval scale (“by how much?”); “hotter” and “colder” is only an ordered scale.
8.
Lacking only symmetry, it is a quasi-metric; lacking only the identity of indiscernables, it is a semi-metric; lacking only the triangle inequality, it is a pseudo-metric.
9.
Guerini and Moneta (2017) present a new method of validation, based on comparing structures of vector autoregressive models estimated from both model and historical data.
10.
Another measure used for information is Hartley information (see Sect. 13.7). Both are special cases of Rényi entropy (Rényi 1970). Both derive from work done at the Bell Labs.
11.
It is a semi-quasi-metric.
12.
The K-L measure is defined only if \(p_i = 0\) whenever \(\pi _i = 0\).
13.
As Akaike (1973) first showed, the negative of K-L information is Boltzmann’s entropy. Hence minimizing the K-L distance is equivalent to maximizing the entropy; hence the term “maximum entropy principle.” But, as Burnham and Anderson (2002) point out, maximizing entropy is subject to a constraint—the model of the information in the data. A good model contains the information in the historical data, leaving only “noise.” It is the noise (or entropy or uncertainty) that is maximized under the concept of the entropy maximizing principle. Minimizing K-L information loss then results in an approximating model g that loses a minimum amount of information in the data f. The K-L information loss is averaged negative entropy, hence the expectation with respect to f. Fagiolo et al. (2007, p. 211) note further that “K-L distance can be an arbitrarily bad choice from a decision-theoretic perspective ... if the set of models does not contain the true underlying model ... then we will not want to select a model based on K-L distance.” This is because “K-L distance looks for where models make the most different predictions—even if these differences concern aspects of the data behaviour that are unimportant to us.”
14.
Although, as (Lamperti 2018b) points out, so long as the simulated data are always compared with the historical data, and not with simulated data from other models, GSL-div might still allow model choice.
15.
The three models differ in more than the frequencies of the eight states (Table 13.1): each model contains three distinct mappings from state to action, and, as deterministic finite automata (Marks 1992), they are ergodic, with emergent periodicities. Model A has a period of 13 weeks, Model B of 6 weeks, and Model C of 8 weeks. It is not clear that the historical data exhibit ergodicity, absence of which will make simulation initial conditions significant (Fagiolo et al. 2007). Initial conditions might determine the periodicity of the simulation model.
16.
In Midgley et al. (1997) and Marks (2013), Model A is called Model 26a, Model B is called Model 26b and Model C is called Model 11.
17.
Figures 2 and 3 of Marks (2013) plot these behaviours. State 000 corresponds to all three players choosing High prices; State 001 corresponds to Players 1 and 2 choosing High prices and Player 3 choosing a Low price, etc.
18.
This number was determined by a Monte Carlo bootstrap simulation of 100,000 pairs of sets of four quasi-random time series, calculating the SSM between each pair, and examining the distribution. The lowest observed SSM of 64 appeared twice, that is, with a frequency of 2/100,000, or 0.002 percent.
19.
See further discussion in Marks (2013), Appendix 2.
20.
It is not correct to call the function r a possibility distribution function, since it does not distribute any fixed value among the elements of the set X: \(1 \le \sum _{x \in X} r ( x) \le |X| .\)
21.
It might be objected that this reordering loses information. But this overlooks the fact that the order of the states is arbitrary. It should not be forgotten that the definition of the states with more than 1 week’s memory captures dynamic elements of interaction.
22.
Normalization here means \(r_1 = 1\), not \(\sum r_i = 1\).
23.
For clarity, we have included the \((i=1)\)th element, \((r_1 - r_2 ) \log _2 1\), which is always zero, by construction, consistent with Eq. (13.2).
24.
We could also define a normalized GHM.
25.
Exploration of these differences awaits further research.

References

Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov, & F. Csaki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akademiai Kiado.
Google Scholar
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York: Springer.
MATH Google Scholar
Chen, S.-H., Chang, C.-L., & Du, Y.-R. (2012). Agent-based economic models and econometrics. The Knowledge Engineering Review, 27(2), 187–219.
Article Google Scholar
Fagiolo, G., Moneta, A., & Windrum, P. (2007). A critical guide to empirical validation of agent-based models in economics: Methodologies, procedures, and open problems. Computational Economics, 30(3), 195–226.
Article Google Scholar
Fagiolo, G., Guerini, M., Lamperti, F., Moneta, A., & Roventini, A. (2019). Validation of agent-based models in economics and finance. pp. 763–787.
Google Scholar
Ferson, S., Oberkampf, W. L., & Ginzburg, L. (2008). Model validation and predictive capability for the thermal challenge problem. Computer Methods in Applied Mechanics and Engineering, 197, 2408–2430.
Article Google Scholar
Gilbert, N., & Troitzsch, K. G. (2005). Simulation for the social scientist (2nd ed.). Open University Press.
Google Scholar
Guerini, M., & Moneta, A. (2017). A method for agent-based models validation. Journal of Economic Dynamics & Control, 82, 125–141.
Article MathSciNet Google Scholar
Hartley, R. V. L. (1928). Transmission of information. The Bell System Technical Journal, 7(3), 535–563.
Article Google Scholar
Klir, G. J. (2006). Uncertainty and information: Foundations of generalized information theory. New York: Wiley.
MATH Google Scholar
Krause, E. F. (1986). Taxicab geometry: An adventure in non-euclidean geometry, New York: Dover. (First published by Addison-Wesley in 1975.)
Google Scholar
Kullback, J. L., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86.
Article MathSciNet Google Scholar
Lamperti, F. (2018a). An information theoretic criterion for empirical validation of simulation models. Econometrics and Statistics, 5, 83–106.
Article MathSciNet Google Scholar
Lamperti, F. (2018b). Empirical validation of simulated models through the GSL-div: An illustrative application. Journal of Economic Interaction and Coordination, 13, 143–171.
Article Google Scholar
Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151.
Article MathSciNet Google Scholar
Liu Y., Chen W., Arendt P., & Huang H.-Z. (2010). Towards a better understanding of model validation metrics. In 13th AIAA/ISSMO Multidisciplinary Analysis Optimization Conference, Multidisciplinary Analysis Optimization Conferences.
Google Scholar
Mankin, J. B., O’Neill, R. V., Shugart, H. H., & Rust, B. W. (1977). The importance of validation in ecosystem analysis. In G. S. Innis (Ed.), New Directions in the Analysis of Ecological Systems, Part 1, Simulation Council Proceedings Series, Simulation Councils, La Jolla, California (Vol. 5, pp. 63–71). Reprinted. In H. H. Shugart & R. V. O’Neill (Eds.), Systems ecology (pp. 309–317). Hutchinson and Ross, Stroudsburg, Pennsylvania: Dowden.
Google Scholar
Marks, R. E. (1992). Breeding hybrid strategies: Optimal behaviour for oligopolists. Journal of Evolutionary Economics, 2, 17–38.
Article Google Scholar
Marks, R. E. (2007). Validating simulation models: A general framework and four applied examples. Computational Economics, 30(3), 265–290. http://www.agsm.edu.au/bobm/papers/s1.pdf.
Article Google Scholar
Marks, R. E. (2010). Comparing two sets of time-series: The state similarity measure. In V. A. Alexandria (Ed.), 2010 Joint Statistical Meetings Proceedings-Statistics: A Key to Innovation in a Data-centric World, Statistical Computing Section (pp. 539–551). American Statistical Association.
Google Scholar
Marks, R. E. (2013). Validation and model selection: Three similarity measures compared. Complexity Economics, 2(1), 41–61. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.401.6982&rep=rep1&type=pdf.
Marks, R. E. (2016). Monte Carlo. In D. Teece, & M. Augier (Eds.), The palgrave encyclopedia of strategic management. London: Palgrave.
Google Scholar
Marks, R. E., Midgley, D. F., & Cooper, L. G. (1995). Adaptive behavior in an oligopoly. In J. Biethahn, & V. Nissen (Eds.), Evolutionary algorithms in management applications (pp. 225–239). Berlin: Springer.
Google Scholar
Midgley, D. F., Marks, R. E., & Cooper, L. G. (1997). Breeding competitive strategies. Management Science, 43(3), 257–275.
Google Scholar
Midgley, D. F., Marks, R. E., & Kunchamwar, D. (2007). The building and assurance of agent-based models: An example and challenge to the field. Journal of Business Research, 60(8), 884–893. (Special Issue: Complexities in Markets).
Google Scholar
Oberkampf, W. L., & Roy, C. J. (2010). Chapter 12: Model accuracy assessment. Verification and validation in scientific computing (pp. 469–554). Cambridge: Cambridge University Press.
Chapter Google Scholar
Ramer, A. (1989). Conditional possibility measures. International Journal of Cybernetics and Systems, 20, 233–247. Reprinted in D. Dubois, H. Prade, & R. R. Yager, (Eds.). (1993). Readings in fuzzy sets for intelligent systems (pp. 233–240). San Mateo, California: Morgan Kaufmann Publishers.
Google Scholar
Rényi, A. (1970). Probability theory. Amsterdam: North-Holland (Chapter 9, Introduction to information theory, pp. 540–616).
Google Scholar
Roy, C. J., & Oberkampf, W. L. (2011). A comprehensive framework for verification, validation, and uncertainty quantification in scientific computing. Computer Methods in Applied Mechanics and Engineering, 200, 2131–2144.
Article MathSciNet Google Scholar
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423, 623–656.
Article MathSciNet Google Scholar

Download references

Acknowledgements

I should like to thank Dan MacKinlay for his mention of the K-L information loss measure, Arthur Ramer for his mention of the Hartley or U-uncertainty metric and his suggestions, and Vessela Daskalova for her mention of the “cityblock” metric. The efforts of the editors of this volume and anonymous referees were very constructive, and have greatly improved this chapter’s presentation.

Author information

Authors and Affiliations

School of Economics, University of New South Wales, 6 Vincent Street, Balmain, Sydney, NSW, 2041, Australia
Robert E. Marks

Authors

Robert E. Marks
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert E. Marks .

Editor information

Editors and Affiliations

University of Bern, Bern, Switzerland
Claus Beisbart
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Nicole J. Saam

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Marks, R.E. (2019). Validation Metrics: A Case for Pattern-Based Methods. In: Beisbart, C., Saam, N. (eds) Computer Simulation Validation. Simulation Foundations, Methods and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-70766-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-70766-2_13
Published: 10 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70765-5
Online ISBN: 978-3-319-70766-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Validation Metrics: A Case for Pattern-Based Methods