Outlier Detection Using Replicator Neural Networks

Hawkins, Simon; He, Hongxing; Williams, Graham; Baxter, Rohan

doi:10.1007/3-540-46145-0_17

Outlier Detection Using Replicator Neural Networks

Simon Hawkins⁷,
Hongxing He⁷,
Graham Williams⁷ &
…
Rohan Baxter⁷

Conference paper
First Online: 01 January 2002

2402 Accesses
321 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Abstract

We consider the problem of finding outliers in large multivariate databases. Outlier detection can be applied during the data cleansing process of data mining to identify problems with the data itself, and to fraud detection where groups of outliers are often of particular interest. We use replicator neural networks (RNNs) to provide a measure of the outlyingness of data records. The performance of the RNNs is assessed using a ranked score measure. The effectiveness of the RNNs for outlier detection is demonstrated on two publicly available databases.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

D. H. Ackley, G. E. Hinton, and T. J. Sejinowski. A learning algorithm for boltzmann machines. Cognit. Sci., 9:147–169, 1985.
Article Google Scholar
A. C. Atkinson. Fast very robust methods for the detection of multiple outliers. Journal of the American Statistical Association, 89:1329–1339, 1994.
Article MATH Google Scholar
A. Bartkowiak and A. Szustalewicz. Detecting multivariate outliers by a grand tour. Machine Graphics and Vision, 6(4):487–505, 1997.
Google Scholar
M. Breunig, H. Kriegel, R. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. ACM SIGMOD,Int. Conf. on Management of Data, 2000.
Google Scholar
W. DuMouchel and M. Schonlau. Afast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities. In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, pages 189–193, 1998.
Google Scholar
M. Ester, H. P. Kriegel, J. Sander, and X. Xu. Adensit y-based algorithm for discovering clusters in large spatial databases with noise. In Proc. KDD, pages 226–231, 1999.
Google Scholar
T. Fawcett and F. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery Journal, 1(3):291–316, 1997.
Article Google Scholar
D. M. Hawkins. Identification of outliers. Chapman and Hall, London, 1980.
MATH Google Scholar
R. Hecht-Nielsen. Replicator neural networks for universal optimal source coding. Science, 269(1860–1863), 1995.
Article Google Scholar
E. Knorr and R. Ng. A unified approach for mining outliers. In Proc. KDD, pages 219–222, 1997.
Google Scholar
E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases,VLDB, pages 392–403, 24-27 1998.
Google Scholar
E. Knorr., R. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. VLDB Journal: Very Large Data Bases, 8(3–4):237–253, 2000.
Article Google Scholar
George Kollios, Dimitrios Gunopoulos, Nick Koudas, and Stefan Berchtold. An efficient approximation scheme for data mining tasks. In ICDE, 2001.
Google Scholar
A. S. Kosinksi. A procedure for the detection of multivariate outliers. Computational Statistics and Data Analysis, 29, 1999.
Google Scholar
R. Ng and J. Han. Efficient and effiective clustering methods for spatial data mining. In Proc. 20th VLDB, pages 144–155, 1994.
Google Scholar
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In Proceedings of International Conference on Management of Data,A CM-SIGMOD, Dallas, 2000.
Google Scholar
D. F. Swayne, D. Cook, and A. Buja. XGobi: interactive dynamic graphics in the X window system with a link to S. In Proceedings of the ASA Section on Statistical Graphics, pages 1–8, Alexandria, VA, 1991. American Statistical Association.
Google Scholar
P. Sykacek. Equivalent error bars for neural network classifiers trained by bayesian inference. In Proc. ESANN, 1997.
Google Scholar
G. Williams, I. Altas, S. Bakin, Peter Christen, Markus Hegland, Alonso Marquez, Peter Milne, Rajehndra Nagappan, and Stephen Roberts. The integrated delivery of large-scale data mining: The ACSys data mining project. In Mohammed J. Zaki and Ching-Tien Ho, editors, Large-Scale Parallel Data Mining, LNAI State-of-the-Art Survey, pages 24–54. Springer-Verlag, 2000.
Google Scholar
G. Williams and Z. Huang. Mining the knowledge mine: The hot spots methodology for mining large real world databases. In Abdul Sattar, editor, Advanced Topics in Artificial Intelligence, volume 1342 of Lecture Notes in Artificial Intelligenvce, pages 340–348. Springer, 1997.
Google Scholar
K. Yamanishi, J. Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithm. In Proceedings of KDD2000, pages 320–324, 2000.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. An efficient data clustering method for very large databases. In Proc. ACM SIGMOD, pages 103–114, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

CSIRO Mathematical and Information Sciences, GPO Box 664, 2601, Canberra, ACT, Australia
Simon Hawkins, Hongxing He, Graham Williams & Rohan Baxter

Authors

Simon Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Hongxing He
View author publications
You can also search for this author in PubMed Google Scholar
Graham Williams
View author publications
You can also search for this author in PubMed Google Scholar
Rohan Baxter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hawkins, S., He, H., Williams, G., Baxter, R. (2002). Outlier Detection Using Replicator Neural Networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_17
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics