Abstract
According to a standard point of view, statistical modelling consists in establishing a parsimonious representation of a random phenomenon, generally based upon the knowledge of an expert of the application field: the aim of a model is to provide a better understanding of data and of the underlying mechanism which have produced it. On the other hand, Data Mining and KDD deal with predictive modelling: models are merely algorithms and the quality of a model is assessed by its performance for predicting new observations. In this communication, we develop some general considerations about both aspects of modelling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BCBS (2005): Studies on the Validation of Internal Rating Systems, Basel Com- mittee on Banking Supervision, Bank of International Settlements, http://www.bis.org/publ/bcbs_wp14.htm
BERKSON, J. (1980): Minimum chi-square, not maximum likelihood! Annals of Mathematical Statistics 8, 457-487.
BESSE, P., CAUSSINUS, H., FERRÉ, L. and FINE, J. (1988): Principal Components Analysis and Optimization of Graphical Displays, Statistics, 19, 301–312.
BORRA, S. and Di CIACCIO, A.(2007): Measuring the prediction error. A comparison of cross-validation, bootstrap and hold-out methods, in Ferreira, C., Lauro, C., Saporta, G. and Souto de Miranda, M. (eds), Proceedings IASC 07, Aveiro, Portugal
BOX, G.E.P. and DRAPER, N.R. (1987): Empirical Model-Building and Response Surfaces, Wiley
BURNHAM, K.P. and ANDERSON, D.R. (2000): Model Selection and Inference, Springer
CHERKASSKY, V. and MULIER, F. (1998): Learning from data, Wiley
DEVROYE, L., GYÖRFI L. and LUGOSI, G. (1996): A Probabilistic Theory of Pattern Recognition, Springer
HAND, D.J. (2000): Methodological issues in data mining, in J.G. Bethlehem and P.G.M. van der Heijden (eds), Compstat 2000 : Proceedings in Computational Statistics, Physica-Verlag, 77-85
HASTIE, T., TIBSHIRANI, F. and FRIEDMAN, J. (2001): Elements of Statistical Learning, Springer
HATABIAN, G. and SAPORTA, G. (1986): Régions de confiance en analyse factorielle, in Diday E. (ed) Data Analysis and Informatics IV, North-Holland, 499-508
LEBART, L. (2006): Validation Techniques in Multiple Correspondence Analysis, in Greenacre M. and Blasius J. (eds) Multiple Correspondence Analysis and related techniques, Chapman and Hall/CRC, 179-196
NIANG, N. and SAPORTA, G. (2007): Resampling ROC curves, in Ferreira, C., Lauro, C., Saporta, G. and Souto de Miranda, M. (eds), Proceedings IASC 07, Aveiro, Portugal
VAPNIK, V. (2006): Estimation of Dependences Based on Empirical Data, 2nd edition, Springer
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Physica-Verlag Heidelberg
About this paper
Cite this paper
Saporta, G. (2008). Models for Understanding Versus Models for Prediction. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_26
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2084-3_26
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2083-6
Online ISBN: 978-3-7908-2084-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)