Reasoning about outliers by modelling noisy data

Wu, John X; Cheng, Gongxian; Liu, Xiaohui

doi:10.1007/BFb0052870

John X Wu¹,
Gongxian Cheng² &
Xiaohui Liu²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1280))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

728 Accesses
2 Citations

Abstract

Outliers are difficult to handle because some of them can be measurement errors, while others may represent phenomena of interest, something “significant” from the viewpoint of the application domain. Statistical methods for managing outliers do not distinguish between these two possibilities. In our previous work, we suggested a method for distinguishing these two possibilities by modelling “real measurements” — how measurements should be distributed in a domain of interest. In this paper, we make this distinction by modelling measurement errors instead. The proposed method is better suited to those applications where it is difficult to obtain relevant knowledge about real measurements. The test data collected from a recent glaucoma case finding study in a general practice are used to evaluate the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnet, V., Lewis, T.: Outliers in Statistical Data. John Wiley and Sons, (1994).
Google Scholar
Bauer, A.U. and Pawelzik, K.R.: Quantifying the Neighbourhood Preservation of Self-Organising Feature Maps. IEEE Trans. on Neural Networks, 3 (1992) 570–9
Article Google Scholar
Brachman, A.R.J. and Anand, T.: The Process of Knowledge Discovery in Databases. Advances in Knowledge Discovery and Data Mining (eds. U M Fayyad et al.), (1996) 37–57, AAAI/MIT.
Google Scholar
Cohen, P.R.: Empirical Methods for Artificial Intelligence (1995), MIT Press
Google Scholar
Everitt, B.S.: Cluster Analysis. (1993), Gower Publications, London
Google Scholar
Fisher, D., Pazzani, M. and Langley, P.: Concept Formation: Knowledge and Experience in Unsupervised Learning (1991) Morgan Kaufmann
Google Scholar
Grubbs, F.E.: Sample Criteria for Testing Outlying Observations. Ann. Math. Statist., 21 (1950) 27–58
Article MATH MathSciNet Google Scholar
Guyon, I., Matic, N. and Vapnik, V.: Discovering Informative Patterns and Data Cleaning. Proc. of AAAI-94 Workshop on Knowledge Discovery in Databases, (1994) 143–56.
Google Scholar
Hawkins, A.: The Detection of Errors in Multivariate Data Using Principal Components. J. Amer. Statist. Assn., 69 (1974) 340–4
Article MATH Google Scholar
Huber, P.J.: Robust Statistics. (1981) John Wiley and Sons
Google Scholar
Kohonen, T.: Self-Organisation and Associative Memory. (1989) Springer-Verlag
Google Scholar
Liu, X., Cheng, G. and Wu, J. X.: Identifying the Measurement Noise in Glaucomatous Testing: an Artificial Neural Network Approach. Artificial Intelligence in Medicine, 6 (1994) 401–416
Article Google Scholar
Liu, X., Cheng, G. and Wu, J.X.: Noise and Uncertainty Management in Intelligent Data Modelling. Proc. of 12th National Conference on Artificial Intelligence (AAAI-94), 263–268
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. (1993) Morgan Kaufmann
Google Scholar
Weiss, S.M. and Kulikowski, C.A.: Computer Systems that Learn. (1995), Morgan Kaufmann
Google Scholar
Wu, J.X.: Visual Screening for Blinding Diseases in the Community Using Computer Controlled Video Perimetry, PhD thesis, University of London, (1993).
Google Scholar

Download references

Author information

Authors and Affiliations

Moorfields Eye Hospital, Glaxo Department of Ophthalmic Epidemiology, Institute of Ophthalmology, Bath Street, EC1V 9EL, London, UK
John X Wu
Department of Computer Science, Birkbeck College, University of London, Malet Street, WC1E 7HX, London, UK
Gongxian Cheng & Xiaohui Liu

Authors

John X Wu
View author publications
You can also search for this author in PubMed Google Scholar
Gongxian Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Xiaohui Liu Paul Cohen Michael Berthold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J.X., Cheng, G., Liu, X. (1997). Reasoning about outliers by modelling noisy data. In: Liu, X., Cohen, P., Berthold, M. (eds) Advances in Intelligent Data Analysis Reasoning about Data. IDA 1997. Lecture Notes in Computer Science, vol 1280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052870

Download citation

DOI: https://doi.org/10.1007/BFb0052870
Published: 19 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63346-4
Online ISBN: 978-3-540-69520-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics