A Fast Outlier Detection Algorithm for Big Datasets

van Hieu, Duong; Meesad, Phayung

doi:10.1007/978-3-319-40415-8_16

Duong van Hieu⁵ &
Phayung Meesad⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 463))

557 Accesses
1 Citations

Abstract

This paper proposes a fast outlier detection algorithm for big datasets, which is a combination of a Cell-based method and a rank-difference outlier detection method associated with a new weighted distance definition. Firstly, a Cell-based method is used to transform a dataset having a very large number of objects into a significant small set of weighted cells based on predefined lower bound and upper bound sizes. A weighted distance function is defined to measure distances between two cells based on their coordinates and weights. Then, a rank-based outlier detection method with different depths is used to calculate outlier scores of cells. Finally, cells are ranked based on scores, outlier objects are identified from ranked cells and eliminated from the provided dataset. Based on experiment results, this proposed method is appropriate for datasets that have a very large number of objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surveys 41, 15:1–15:58 (2009)
Google Scholar
Hawkins, D. M.: Introduction. In: Hawkins, D.M. (ed.) Identification of Outliers, pp. 1–9. Chapman & Hall (1980)
Google Scholar
Aggarwal, C.C.: Outlier Analysis. In: Aggarwal, C.C. (ed.) Data Mining, pp. 237–263. Springer International Publishing Switzerland (2015)
Google Scholar
Hodge, V.J.: Outlier detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. IGI Global (2014)
Google Scholar
Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: Outlier detection using neighborhood rank difference. Pattern Recogn. Lett. 60–61, 24–31 (2015)
Article Google Scholar
Shaikh, S., Kitagawa, H.: Top-k outlier detection from uncertain data. Int. J. Autom. Comput. 11, 128–142 (2014)
Article Google Scholar
Breunig, M.M., Kriegel, H.P., Raymond, T.: Ng, and Sander, J.: LOF: identifying density-based local outliers. ACM. SIGMOD Record 29, 93–104 (2000)
Article Google Scholar
Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.L.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (2002)
Google Scholar
Jin, W., Tung, A.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.K., Kitsuregawa, M., Li, J., Chang, K. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 3918, pp. 577–593. Springer, Berlin (2006)
Chapter Google Scholar
Huang, H., Mehrotraa, K., Mohana, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 83, 518–531 (2013)
Article MathSciNet Google Scholar
Huang, H., Mehrotra, K., Mohan, C.: Algorithms for detecting outliers via clustering and ranks. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) Advanced Research in Applied Artificial Intelligence, vol. 7345, pp. 20–29. Springer, Berlin (2012)
Chapter Google Scholar
Ha, J., Seok, S., Lee, J.S.: A precise ranking method for outlier detection. Inf. Sci. 324, 88–107 (2015)
Article MathSciNet Google Scholar
Hodge, V.J.: Outlier Detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. Business Science Reference, Hershey (2014)
Chapter Google Scholar
Hieu, D.V., Meesad, P.: A Cell-MST-Based method for big dataset clustering on limited memory computers. In: 7th International Conference on Information Technology and Electrical Engineering, pp. 632–637. Chiang Mai, Thailand (2015)
Google Scholar
Yuan, J., Zheng, Y., Xie, X. Sun, G.: Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, USA (2011)
Google Scholar
Lichman, M.: Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets.html

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, 10800, Thailand
Duong van Hieu & Phayung Meesad

Authors

Duong van Hieu
View author publications
You can also search for this author in PubMed Google Scholar
Phayung Meesad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duong van Hieu .

Editor information

Editors and Affiliations

Fac of Information Technology, King Mongkut's Uni of Tech North Bangkok, Bangkok, Thailand
Phayung Meesad
Faculty of Information Technology, King Mongkut's Uni of Teck North Banqkok, Bangkok, Thailand
Sirapat Boonkrong
Lehrgebiet Kommunikationsnetze, FernUniversität in Hagen, Hagen, Germany
Herwig Unger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Hieu, D., Meesad, P. (2016). A Fast Outlier Detection Algorithm for Big Datasets. In: Meesad, P., Boonkrong, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2016. Advances in Intelligent Systems and Computing, vol 463. Springer, Cham. https://doi.org/10.1007/978-3-319-40415-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-40415-8_16
Published: 12 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40414-1
Online ISBN: 978-3-319-40415-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics