Skip to main content
  • 127 Accesses

Abstract

Statistical methods have been used to handle noisy and incomplete data. In this note, I shall argue that AI, especially AI modelling techniques, has an important role to play in helping ensure the data quality. The management of outliers will be used to illustrate this point. In particular, two different ways of using domain knowledge to help distinguish between noisy outlying data and noise free outliers are described, and AI modelling techniques have been found particularly useful for this type of knowledge-based outlier analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barnet, V. (1976): The Ordering of Multivariate Data (with Discussion). J. Roy. Statist. Soc. A, 139, 318–354

    Article  Google Scholar 

  2. Barnet, V. and Lewis, T. (1994): Outliers in Statistical Data. John Wiley & Sons

    Google Scholar 

  3. Bauer, H. U. and Pawelzik, K. R. (1992): Quantifying the Neighborhood Preservation of Self-Organizing Feature Maps. IEEE Trans. Neural Networks, 3(4), 570–579

    Article  Google Scholar 

  4. Brachman, A. R. and Anand, T. (1996): The Process of Knowledge Discovery in Databases. In: U. M. Fayyad et al. (eds.), Advances in Knowledge Discovery and Data Mining, 37–57

    Google Scholar 

  5. Erwin, E., Obermayer, K. and Schulten, K. (1992): Self-Organising Maps: Stationary States, Metastability and Convergence Rate. Biol. Cybernetics, 67, 35–45

    Article  MATH  Google Scholar 

  6. Everitt, B. S. (1993): Cluster Analysis. Gower

    Google Scholar 

  7. Favata, F. and Walker, R. (1991): A Study of the Application of Kohonen-type Neural Networks to the Travelling Salesman Problem. Biol. Cybernetics, 64, 463–468

    Article  MATH  Google Scholar 

  8. Fisher, D. H., Pazzani, M. J. and Langley, P. (1991): Concept Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann

    Google Scholar 

  9. Frawley, W. J., Piatetsky-Shapiro, G. and Matheus, C. J. (1991): Knowledge Discovery in Databases: An Overview. In: G. Piatetsky-Shapiro and W. J. &Frawley (eds.), Knowledge Discovery in Databases, 1-27. AAAI Press/MIT Press

    Google Scholar 

  10. Ghahramani, Z. and Jordan, M. I. (1996): Supervised Learning from Incomplete Data via an EM Approach. In: J. D. Cowan et al. (eds.), Advances in Neural Information Processing Systems 6, 120–127. Morgan Kaufmann

    Google Scholar 

  11. Grubbs, F. E. (1950): Sample Criteria for Testing Outlying Observations. Ann. Math. Statist. 21, 27–58

    Article  MathSciNet  MATH  Google Scholar 

  12. Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156

    Google Scholar 

  13. Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156

    Google Scholar 

  14. Hawkins, A. D. M. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc. 69, 340–344

    Article  MATH  Google Scholar 

  15. Hawkins, A. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc., 69, 340–344

    Article  MATH  Google Scholar 

  16. Huber, P. J. (1981): Robust Statistics. John Wiley & Sons

    Google Scholar 

  17. Kleiner, B. and Hartigan, J. A. (1981): Representing Points in Many Dimensions by Trees and Castles (with Discussion). J. Am. Statist. Assoc, 76, 260–276

    Article  Google Scholar 

  18. Klosgen, W. and Zytkow, J. (1994): Machine Discovery Terminology. AAAI-94 Workshop on Knowledge Discovery in Databases, 463–473

    Google Scholar 

  19. Kohonen, T. (1989): Self-Organization and Associative Memory. Springer-Verlag

    Google Scholar 

  20. Kohonen, T. (1990): The Self-Organising Map. Proc IEEE, 78(9), 1464-1480

    Google Scholar 

  21. Kohonen, T. (1995): Self-Organizing Maps. Springer-Verlag

    Google Scholar 

  22. Kohonen, T., Makisara, K. and Saramaki, T. (1984): Phonotopic Maps-Insightful Representation of Phonological Features for Speech Recognition, Proc 7th Int. Conf. on Pattern Recognition, Montreal

    Google Scholar 

  23. Kruskal, J. and Wish, M. (1978): Multidimensional Scaling. Sage Publications

    Google Scholar 

  24. Liu, X., Cheng, G. and Wu, J. X. (1994): Identifying the Measurement Noise in Glaucomatous Testing: an Artificial Neural Network Approach. Artificial Intelligence in Medicine, 6, 401–416

    Article  Google Scholar 

  25. Liu, X., Cheng, G. and Wu, J. (1994): Noise and Uncertainty Management in Intelligent Data Modeling. Proc. of 12th National Conference on Artificial Intelligence (AAAI-94), 263–268. Seattle, WA. AAAI/MIT Press

    Google Scholar 

  26. Lo, Z.-P. and Bavarian, B. (1991): On the Rate of Convergence in Topology Preserving Neural Networks. Biol. Cybernetics, 65, 55–63

    Article  MathSciNet  MATH  Google Scholar 

  27. Obermayer, K. H., Ritter, H. and Schulten, K. (1990): Large-Scale Simulations of Self-Organising Neural Networks on Parallel Computers: Application to Biological Modeling. Parallel Computing, 14, 381–404

    Article  Google Scholar 

  28. Quinlan, J. R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann

    Google Scholar 

  29. Ritter, H. (1993): Self Organising Feature Map for Clustering in Persistent Object Stores. In: T. Kohonen (ed.), Proc of the International Workshop on Next Generation Information Technologies and Systems, Haifa, Israel

    Google Scholar 

  30. Ritter, H. J. and Kohonen, T. (1989): Self-Organizing Semantic Maps. Biol. Cybernetics, 61, 241–254

    Article  Google Scholar 

  31. Ritter, H. and Schulten, K. (1986): On the Stationary State of Kohonen’s Self-Organising Sensory Mapping. Biol. Cybernetics, 54, 99–106

    Article  MATH  Google Scholar 

  32. Rumelhart, D. E. and McClelland, J. L. (1986): Parallel Distributed Processing. MIT Press, Cambridge, MA

    Google Scholar 

  33. Schweizer, L. G., Parladori, G. L., Sicuranza, G. L. and Marsi, S. (1991): A Fully Neural Approach to Image Compression. In: T. Kohonen, K. Makisara, O. Simula and J. Kandas (eds.), Artificial Neural Networks, 815–820

    Google Scholar 

  34. Smith, S., and Bergeron, D. and Grinstein, G. (1990): Stereophonic and Surface Sound Generation for Exploratory Data Analysis. Proc. of the Conference of the Special Interest Group in Computer and Human Interaction (SIGCHI), 125–31. Addison-Wesley

    Google Scholar 

  35. Wright, T. (1983): Statistical Methods and the Improvement of Data Quality. Acadamic Press

    Google Scholar 

  36. Wu, J. X. (1993): Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry. PhD thesis, University of London

    Google Scholar 

  37. Wu, J. X., Cheng, G., Liu, X. and Hitchings, R. (1994): Assessment of the Reliability in a Psychophysical Test by Neural Networks. Proc. of the Conference on Vision Science and its Applications, New Mexico. Optical Society of America

    Google Scholar 

  38. Wu, J., Cheng, G. and Liu, X. (1997): Reasoning about outliers by modelling noisy data. In: X. Liu, P. Cohen and M. Berthold (eds.), Advances in Intelligent Data Analysis (IDA-97). LNCS 1280, 549–558. Springer-Verlag.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Liu, X. (1999). AI Modelling for Data Quality Control. In: Gammerman, A. (eds) Causal Models and Intelligent Data Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58648-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-58648-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-63682-0

  • Online ISBN: 978-3-642-58648-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics