Abstract
Statistical methods have been used to handle noisy and incomplete data. In this note, I shall argue that AI, especially AI modelling techniques, has an important role to play in helping ensure the data quality. The management of outliers will be used to illustrate this point. In particular, two different ways of using domain knowledge to help distinguish between noisy outlying data and noise free outliers are described, and AI modelling techniques have been found particularly useful for this type of knowledge-based outlier analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barnet, V. (1976): The Ordering of Multivariate Data (with Discussion). J. Roy. Statist. Soc. A, 139, 318–354
Barnet, V. and Lewis, T. (1994): Outliers in Statistical Data. John Wiley & Sons
Bauer, H. U. and Pawelzik, K. R. (1992): Quantifying the Neighborhood Preservation of Self-Organizing Feature Maps. IEEE Trans. Neural Networks, 3(4), 570–579
Brachman, A. R. and Anand, T. (1996): The Process of Knowledge Discovery in Databases. In: U. M. Fayyad et al. (eds.), Advances in Knowledge Discovery and Data Mining, 37–57
Erwin, E., Obermayer, K. and Schulten, K. (1992): Self-Organising Maps: Stationary States, Metastability and Convergence Rate. Biol. Cybernetics, 67, 35–45
Everitt, B. S. (1993): Cluster Analysis. Gower
Favata, F. and Walker, R. (1991): A Study of the Application of Kohonen-type Neural Networks to the Travelling Salesman Problem. Biol. Cybernetics, 64, 463–468
Fisher, D. H., Pazzani, M. J. and Langley, P. (1991): Concept Formation: Knowledge and Experience in Unsupervised Learning. Morgan Kaufmann
Frawley, W. J., Piatetsky-Shapiro, G. and Matheus, C. J. (1991): Knowledge Discovery in Databases: An Overview. In: G. Piatetsky-Shapiro and W. J. &Frawley (eds.), Knowledge Discovery in Databases, 1-27. AAAI Press/MIT Press
Ghahramani, Z. and Jordan, M. I. (1996): Supervised Learning from Incomplete Data via an EM Approach. In: J. D. Cowan et al. (eds.), Advances in Neural Information Processing Systems 6, 120–127. Morgan Kaufmann
Grubbs, F. E. (1950): Sample Criteria for Testing Outlying Observations. Ann. Math. Statist. 21, 27–58
Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156
Guyon, I., Matic, N. and Vapnik, V. (1994): Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, 143–156
Hawkins, A. D. M. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc. 69, 340–344
Hawkins, A. (1974): The Detection of Errors in Multivariate Data Using Principal Components. J. Am. Statist. Assoc., 69, 340–344
Huber, P. J. (1981): Robust Statistics. John Wiley & Sons
Kleiner, B. and Hartigan, J. A. (1981): Representing Points in Many Dimensions by Trees and Castles (with Discussion). J. Am. Statist. Assoc, 76, 260–276
Klosgen, W. and Zytkow, J. (1994): Machine Discovery Terminology. AAAI-94 Workshop on Knowledge Discovery in Databases, 463–473
Kohonen, T. (1989): Self-Organization and Associative Memory. Springer-Verlag
Kohonen, T. (1990): The Self-Organising Map. Proc IEEE, 78(9), 1464-1480
Kohonen, T. (1995): Self-Organizing Maps. Springer-Verlag
Kohonen, T., Makisara, K. and Saramaki, T. (1984): Phonotopic Maps-Insightful Representation of Phonological Features for Speech Recognition, Proc 7th Int. Conf. on Pattern Recognition, Montreal
Kruskal, J. and Wish, M. (1978): Multidimensional Scaling. Sage Publications
Liu, X., Cheng, G. and Wu, J. X. (1994): Identifying the Measurement Noise in Glaucomatous Testing: an Artificial Neural Network Approach. Artificial Intelligence in Medicine, 6, 401–416
Liu, X., Cheng, G. and Wu, J. (1994): Noise and Uncertainty Management in Intelligent Data Modeling. Proc. of 12th National Conference on Artificial Intelligence (AAAI-94), 263–268. Seattle, WA. AAAI/MIT Press
Lo, Z.-P. and Bavarian, B. (1991): On the Rate of Convergence in Topology Preserving Neural Networks. Biol. Cybernetics, 65, 55–63
Obermayer, K. H., Ritter, H. and Schulten, K. (1990): Large-Scale Simulations of Self-Organising Neural Networks on Parallel Computers: Application to Biological Modeling. Parallel Computing, 14, 381–404
Quinlan, J. R. (1993): C4.5: Programs for Machine Learning. Morgan Kaufmann
Ritter, H. (1993): Self Organising Feature Map for Clustering in Persistent Object Stores. In: T. Kohonen (ed.), Proc of the International Workshop on Next Generation Information Technologies and Systems, Haifa, Israel
Ritter, H. J. and Kohonen, T. (1989): Self-Organizing Semantic Maps. Biol. Cybernetics, 61, 241–254
Ritter, H. and Schulten, K. (1986): On the Stationary State of Kohonen’s Self-Organising Sensory Mapping. Biol. Cybernetics, 54, 99–106
Rumelhart, D. E. and McClelland, J. L. (1986): Parallel Distributed Processing. MIT Press, Cambridge, MA
Schweizer, L. G., Parladori, G. L., Sicuranza, G. L. and Marsi, S. (1991): A Fully Neural Approach to Image Compression. In: T. Kohonen, K. Makisara, O. Simula and J. Kandas (eds.), Artificial Neural Networks, 815–820
Smith, S., and Bergeron, D. and Grinstein, G. (1990): Stereophonic and Surface Sound Generation for Exploratory Data Analysis. Proc. of the Conference of the Special Interest Group in Computer and Human Interaction (SIGCHI), 125–31. Addison-Wesley
Wright, T. (1983): Statistical Methods and the Improvement of Data Quality. Acadamic Press
Wu, J. X. (1993): Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry. PhD thesis, University of London
Wu, J. X., Cheng, G., Liu, X. and Hitchings, R. (1994): Assessment of the Reliability in a Psychophysical Test by Neural Networks. Proc. of the Conference on Vision Science and its Applications, New Mexico. Optical Society of America
Wu, J., Cheng, G. and Liu, X. (1997): Reasoning about outliers by modelling noisy data. In: X. Liu, P. Cohen and M. Berthold (eds.), Advances in Intelligent Data Analysis (IDA-97). LNCS 1280, 549–558. Springer-Verlag.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Liu, X. (1999). AI Modelling for Data Quality Control. In: Gammerman, A. (eds) Causal Models and Intelligent Data Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58648-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-58648-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63682-0
Online ISBN: 978-3-642-58648-4
eBook Packages: Springer Book Archive