AI Modelling for Data Quality Control

Liu, Xiaohui

doi:10.1007/978-3-642-58648-4_10

Xiaohui Liu²

127 Accesses

Abstract

Statistical methods have been used to handle noisy and incomplete data. In this note, I shall argue that AI, especially AI modelling techniques, has an important role to play in helping ensure the data quality. The management of outliers will be used to illustrate this point. In particular, two different ways of using domain knowledge to help distinguish between noisy outlying data and noise free outliers are described, and AI modelling techniques have been found particularly useful for this type of knowledge-based outlier analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnet, V. (1976): The Ordering of Multivariate Data (with Discussion). J. Roy. Statist. Soc. A, 139, 318–354
Article Google Scholar
Barnet, V. and Lewis, T. (1994): Outliers in Statistical Data. John Wiley & Sons
Google Scholar
Bauer, H. U. and Pawelzik, K. R. (1992): Quantifying the Neighborhood Preservation of Self-Organizing Feature Maps. IEEE Trans. Neural Networks, 3(4), 570–579
Article Google Scholar
Brachman, A. R. and Anand, T. (1996): The Process of Knowledge Discovery in Databases. In: U. M. Fayyad et al. (eds.), Advances in Knowledge Discovery and Data Mining, 37–57
Google Scholar
Erwin, E., Obermayer, K. and Schulten, K. (1992): Self-Organising Maps: Stationary States, Metastability and Convergence Rate. Biol. Cybernetics, 67, 35–45
Article MATH Google Scholar
Everitt, B. S. (1993): Cluster Analysis. Gower
Google Scholar
Favata, F. and Walker, R. (1991): A Study of the Application of Kohonen-type Neural Networks to the Travelling Salesman Problem. Biol. Cybernetics, 64, 463–468
Article MATH Google Scholar
Fisher, D. H., Pazzani, M. J. and Langley, P. (1991): Concept Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann
Google Scholar
Frawley, W. J., Piatetsky-Shapiro, G. and Matheus, C. J. (1991): Knowledge Discovery in Databases: An Overview. In: G. Piatetsky-Shapiro and W. J. &Frawley (eds.), Knowledge Discovery in Databases, 1-27. AAAI Press/MIT Press
Google Scholar
Ghahramani, Z. and Jordan, M. I. (1996): Supervised Learning from Incomplete Data via an EM Approach. In: J. D. Cowan et al. (eds.), Advances in Neural Information Processing Systems 6, 120–127. Morgan Kaufmann
Google Scholar
Grubbs, F. E. (1950): Sample Criteria for Testing Outlying Observations. Ann. Math. Statist. 21, 27–58
Article MathSciNet MATH Google Scholar
Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156
Google Scholar
Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156
Google Scholar
Hawkins, A. D. M. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc. 69, 340–344
Article MATH Google Scholar
Hawkins, A. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc., 69, 340–344
Article MATH Google Scholar
Huber, P. J. (1981): Robust Statistics. John Wiley & Sons
Google Scholar
Kleiner, B. and Hartigan, J. A. (1981): Representing Points in Many Dimensions by Trees and Castles (with Discussion). J. Am. Statist. Assoc, 76, 260–276
Article Google Scholar
Klosgen, W. and Zytkow, J. (1994): Machine Discovery Terminology. AAAI-94 Workshop on Knowledge Discovery in Databases, 463–473
Google Scholar
Kohonen, T. (1989): Self-Organization and Associative Memory. Springer-Verlag
Google Scholar
Kohonen, T. (1990): The Self-Organising Map. Proc IEEE, 78(9), 1464-1480
Google Scholar
Kohonen, T. (1995): Self-Organizing Maps. Springer-Verlag
Google Scholar
Kohonen, T., Makisara, K. and Saramaki, T. (1984): Phonotopic Maps-Insightful Representation of Phonological Features for Speech Recognition, Proc 7th Int. Conf. on Pattern Recognition, Montreal
Google Scholar
Kruskal, J. and Wish, M. (1978): Multidimensional Scaling. Sage Publications
Google Scholar
Liu, X., Cheng, G. and Wu, J. X. (1994): Identifying the Measurement Noise in Glaucomatous Testing: an Artificial Neural Network Approach. Artificial Intelligence in Medicine, 6, 401–416
Article Google Scholar
Liu, X., Cheng, G. and Wu, J. (1994): Noise and Uncertainty Management in Intelligent Data Modeling. Proc. of 12th National Conference on Artificial Intelligence (AAAI-94), 263–268. Seattle, WA. AAAI/MIT Press
Google Scholar
Lo, Z.-P. and Bavarian, B. (1991): On the Rate of Convergence in Topology Preserving Neural Networks. Biol. Cybernetics, 65, 55–63
Article MathSciNet MATH Google Scholar
Obermayer, K. H., Ritter, H. and Schulten, K. (1990): Large-Scale Simulations of Self-Organising Neural Networks on Parallel Computers: Application to Biological Modeling. Parallel Computing, 14, 381–404
Article Google Scholar
Quinlan, J. R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann
Google Scholar
Ritter, H. (1993): Self Organising Feature Map for Clustering in Persistent Object Stores. In: T. Kohonen (ed.), Proc of the International Workshop on Next Generation Information Technologies and Systems, Haifa, Israel
Google Scholar
Ritter, H. J. and Kohonen, T. (1989): Self-Organizing Semantic Maps. Biol. Cybernetics, 61, 241–254
Article Google Scholar
Ritter, H. and Schulten, K. (1986): On the Stationary State of Kohonen’s Self-Organising Sensory Mapping. Biol. Cybernetics, 54, 99–106
Article MATH Google Scholar
Rumelhart, D. E. and McClelland, J. L. (1986): Parallel Distributed Processing. MIT Press, Cambridge, MA
Google Scholar
Schweizer, L. G., Parladori, G. L., Sicuranza, G. L. and Marsi, S. (1991): A Fully Neural Approach to Image Compression. In: T. Kohonen, K. Makisara, O. Simula and J. Kandas (eds.), Artificial Neural Networks, 815–820
Google Scholar
Smith, S., and Bergeron, D. and Grinstein, G. (1990): Stereophonic and Surface Sound Generation for Exploratory Data Analysis. Proc. of the Conference of the Special Interest Group in Computer and Human Interaction (SIGCHI), 125–31. Addison-Wesley
Google Scholar
Wright, T. (1983): Statistical Methods and the Improvement of Data Quality. Acadamic Press
Google Scholar
Wu, J. X. (1993): Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry. PhD thesis, University of London
Google Scholar
Wu, J. X., Cheng, G., Liu, X. and Hitchings, R. (1994): Assessment of the Reliability in a Psychophysical Test by Neural Networks. Proc. of the Conference on Vision Science and its Applications, New Mexico. Optical Society of America
Google Scholar
Wu, J., Cheng, G. and Liu, X. (1997): Reasoning about outliers by modelling noisy data. In: X. Liu, P. Cohen and M. Berthold (eds.), Advances in Intelligent Data Analysis (IDA-97). LNCS 1280, 549–558. Springer-Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK
Xiaohui Liu

Authors

Xiaohui Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of London, Royal Holloway, Surrey, TW20 0EX, UK
Alex Gammerman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, X. (1999). AI Modelling for Data Quality Control. In: Gammerman, A. (eds) Causal Models and Intelligent Data Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58648-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-58648-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63682-0
Online ISBN: 978-3-642-58648-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics