Abstract
Outliers are difficult to handle because some of them can be measurement errors, while others may represent phenomena of interest, something “significant” from the viewpoint of the application domain. Statistical methods for managing outliers do not distinguish between these two possibilities. In our previous work, we suggested a method for distinguishing these two possibilities by modelling “real measurements” — how measurements should be distributed in a domain of interest. In this paper, we make this distinction by modelling measurement errors instead. The proposed method is better suited to those applications where it is difficult to obtain relevant knowledge about real measurements. The test data collected from a recent glaucoma case finding study in a general practice are used to evaluate the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barnet, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, (1994).
Bauer, A.U. and Pawelzik, K.R.: Quantifying the Neighbourhood Preservation of Self-Organising Feature Maps. IEEE Trans. on Neural Networks, 3 (1992) 570–9
Brachman, A.R.J. and Anand, T.: The Process of Knowledge Discovery in Databases. Advances in Knowledge Discovery and Data Mining (eds. U M Fayyad et al.), (1996) 37–57, AAAI/MIT.
Cohen, P.R.: Empirical Methods for Artificial Intelligence (1995), MIT Press
Everitt, B.S.: Cluster Analysis. (1993), Gower Publications, London
Fisher, D., Pazzani, M. and Langley, P.: Concept Formation: Knowledge and Experience in Unsupervised Learning (1991) Morgan Kaufmann
Grubbs, F.E.: Sample Criteria for Testing Outlying Observations. Ann. Math. Statist., 21 (1950) 27–58
Guyon, I., Matic, N. and Vapnik, V.: Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, (1994) 143–56.
Hawkins, A.: The Detection of Errors in Multivariate Data Using Principal Components. J. Amer. Statist. Assn., 69 (1974) 340–4
Huber, P.J.: Robust Statistics. (1981) John Wiley and Sons
Kohonen, T.: Self-Organisation and Associative Memory. (1989) Springer-Verlag
Liu, X., Cheng, G. and Wu, J. X.: Identifying the Measurement Noise in Glaucomatous Testing: an Artificial Neural Network Approach. Artificial Intelligence in Medicine, 6 (1994) 401–416
Liu, X., Cheng, G. and Wu, J.X.: Noise and Uncertainty Management in Intelligent Data Modelling. Proc. of 12th National Conference on Artificial Intelligence (AAAI-94), 263–268
Quinlan, J.R.: C4.5: Programs for Machine Learning. (1993) Morgan Kaufmann
Weiss, S.M. and Kulikowski, C.A.: Computer Systems that Learn. (1995), Morgan Kaufmann
Wu, J.X.: Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry, PhD thesis, University of London, (1993).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag
About this paper
Cite this paper
Wu, J.X., Cheng, G., Liu, X. (1997). Reasoning about outliers by modelling noisy data. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052870
Download citation
DOI: https://doi.org/10.1007/BFb0052870
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive