Skip to main content

Data Quality Control Based on Metric Data Models

  • Chapter
  • First Online:
  • 1676 Accesses

Summary

We consider statistical edits defined on a metric data space spanned by the nonkey attributes (variables) of a given database. Integrity constraints are defined on this data space based on definitions, behavioral equations or a balance equation system. As an example think of a set of business or economic indicators. The variables are linked by the four basic arithmetic operations only. Assuming a multivariate Gaussian distribution and an error in the variables model estimation of the unknown (latent) variables can be carried out by a generalized least-squares (GLS) procedure. The drawback of this approach is that the equations form a non-linear equation system due to multiplication and division of variables, and that generally one assumes independence between all variables due to a lack of information in real applications. As there exists no finite parameter density family which is closed under all four arithmetic operations we use MCMC-simulation techniques, cf. Smith and Gelfand (1992) and Chib (2004) to derive the “exact” distributions in the non-normal case and under cross-correlation. The research can be viewed as an extension of Köppen and Lenz (2005) in the sense of studying the robustness of the GLS approach with respect to non-normality and correlation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • John Aitchison. The Statistical Analysis of Compositional Data. Kluwer, 1986.

    Google Scholar 

  • Adelchi Azzalini and Antonella Capitanio. Statistical Applications of the Multivariate Skew Normal Distribution, Journal of the Royal Statistical Society. Series B, 61, 579-602, 1999.

    Article  Google Scholar 

  • Adelchi Azzalini and Alessandra Dalla Valle. The Multivariate Skew-Normal Distribution, Biometrika, 83, 715-726, 1996.

    Article  MathSciNet  Google Scholar 

  • Carlo Batini and Monica Scannapieco. Data Quality Concepts, Methodologies and Techniques, Springer, 2006.

    MATH  Google Scholar 

  • Siddhartha Chib. Handbook of Computational Statistics - Concepts and Methods, chapter Markov Chain Monte Carlo Technology, pages 71–102. Springer, 2004.

    Google Scholar 

  • I. P. Fellegi and D. Holt. A Systematic Approach to Automatic Edit and Imputation, JASA, 71, 17-35, 1976.

    Google Scholar 

  • W. Keith Hastings. Monte Carlo sampling methods using markov chains and their applications. Biometrika, 57:97–109, 1970.

    Article  Google Scholar 

  • Veit Köppen and Hans-J. Lenz. Simulation of non-linear stochastic equation systems. In A.N. Pepelyshev, S.M. Ermakov, V.B. Melas, eds., Proceeding of the Fifth Workshop on Simulation, pages 373–378, St. Petersburg, Russia, July 2005. NII Chemistry Saint Petersburg University Publishers.

    Google Scholar 

  • Hans-J. Lenz and Roland M. Müller. On the solution of fuzzy equation systems. In G. Della Riccia, H-J. Lenz, and R. Kruse, eds., Computational Intelligence in Data Mining, CISM Courses and Lectures. Springer, New York, 2000.

    Google Scholar 

  • Hans-J. Lenz and Egmar Rödel. Statistical quality control of data. In Peter Gritzmann, Rainer Hettich, Reiner Horst, and Ekkehard Sachs, editors, 16th Symposium on Operations Research, pages 341–346. Physica Verlag, Heidelberg, 1991.

    Google Scholar 

  • Gunar E. Liepins and V.R.R. Uppuluri. Data Quality Control Theory and Pragmatics, Marcel Dekker, 1991.

    Google Scholar 

  • Beat Schmid, (1979). Bilanzmodelle. Simulationsverfahren zur Verarbeitung unscharfer Teilinformationen, ORL-Bericht No. 40, ORL Institut, ETH Zürich, 1979.

    Google Scholar 

  • Adian F. M. Smith and Alan E. Gelfand. Bayesian statistics without tears: A samplingresampling perspective. The American Statistician, 46(2):84–88, may 1992.

    Article  MathSciNet  Google Scholar 

  • G.Barrie Wetherill and Marion E. Gerson. Computer Aids to Data Quality Control, The Statisticians, 36, 598-592, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Veit Köppen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Physica-Verlag Heidelberg

About this chapter

Cite this chapter

Köppen, V., Lenz, HJ. (2010). Data Quality Control Based on Metric Data Models. In: Lenz, HJ., Wilrich, PT., Schmid, W. (eds) Frontiers in Statistical Quality Control 9. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2380-6_17

Download citation

Publish with us

Policies and ethics