Summary
We consider statistical edits defined on a metric data space spanned by the nonkey attributes (variables) of a given database. Integrity constraints are defined on this data space based on definitions, behavioral equations or a balance equation system. As an example think of a set of business or economic indicators. The variables are linked by the four basic arithmetic operations only. Assuming a multivariate Gaussian distribution and an error in the variables model estimation of the unknown (latent) variables can be carried out by a generalized least-squares (GLS) procedure. The drawback of this approach is that the equations form a non-linear equation system due to multiplication and division of variables, and that generally one assumes independence between all variables due to a lack of information in real applications. As there exists no finite parameter density family which is closed under all four arithmetic operations we use MCMC-simulation techniques, cf. Smith and Gelfand (1992) and Chib (2004) to derive the “exact” distributions in the non-normal case and under cross-correlation. The research can be viewed as an extension of Köppen and Lenz (2005) in the sense of studying the robustness of the GLS approach with respect to non-normality and correlation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
John Aitchison. The Statistical Analysis of Compositional Data. Kluwer, 1986.
Adelchi Azzalini and Antonella Capitanio. Statistical Applications of the Multivariate Skew Normal Distribution, Journal of the Royal Statistical Society. Series B, 61, 579-602, 1999.
Adelchi Azzalini and Alessandra Dalla Valle. The Multivariate Skew-Normal Distribution, Biometrika, 83, 715-726, 1996.
Carlo Batini and Monica Scannapieco. Data Quality Concepts, Methodologies and Techniques, Springer, 2006.
Siddhartha Chib. Handbook of Computational Statistics - Concepts and Methods, chapter Markov Chain Monte Carlo Technology, pages 71–102. Springer, 2004.
I. P. Fellegi and D. Holt. A Systematic Approach to Automatic Edit and Imputation, JASA, 71, 17-35, 1976.
W. Keith Hastings. Monte Carlo sampling methods using markov chains and their applications. Biometrika, 57:97–109, 1970.
Veit Köppen and Hans-J. Lenz. Simulation of non-linear stochastic equation systems. In A.N. Pepelyshev, S.M. Ermakov, V.B. Melas, eds., Proceeding of the Fifth Workshop on Simulation, pages 373–378, St. Petersburg, Russia, July 2005. NII Chemistry Saint Petersburg University Publishers.
Hans-J. Lenz and Roland M. Müller. On the solution of fuzzy equation systems. In G. Della Riccia, H-J. Lenz, and R. Kruse, eds., Computational Intelligence in Data Mining, CISM Courses and Lectures. Springer, New York, 2000.
Hans-J. Lenz and Egmar Rödel. Statistical quality control of data. In Peter Gritzmann, Rainer Hettich, Reiner Horst, and Ekkehard Sachs, editors, 16th Symposium on Operations Research, pages 341–346. Physica Verlag, Heidelberg, 1991.
Gunar E. Liepins and V.R.R. Uppuluri. Data Quality Control Theory and Pragmatics, Marcel Dekker, 1991.
Beat Schmid, (1979). Bilanzmodelle. Simulationsverfahren zur Verarbeitung unscharfer Teilinformationen, ORL-Bericht No. 40, ORL Institut, ETH Zürich, 1979.
Adian F. M. Smith and Alan E. Gelfand. Bayesian statistics without tears: A samplingresampling perspective. The American Statistician, 46(2):84–88, may 1992.
G.Barrie Wetherill and Marion E. Gerson. Computer Aids to Data Quality Control, The Statisticians, 36, 598-592, 1987.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Physica-Verlag Heidelberg
About this chapter
Cite this chapter
Köppen, V., Lenz, HJ. (2010). Data Quality Control Based on Metric Data Models. In: Lenz, HJ., Wilrich, PT., Schmid, W. (eds) Frontiers in Statistical Quality Control 9. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2380-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2380-6_17
Published:
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2379-0
Online ISBN: 978-3-7908-2380-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)