Abstract
This paper proposes a fast outlier detection algorithm for big datasets, which is a combination of a Cell-based method and a rank-difference outlier detection method associated with a new weighted distance definition. Firstly, a Cell-based method is used to transform a dataset having a very large number of objects into a significant small set of weighted cells based on predefined lower bound and upper bound sizes. A weighted distance function is defined to measure distances between two cells based on their coordinates and weights. Then, a rank-based outlier detection method with different depths is used to calculate outlier scores of cells. Finally, cells are ranked based on scores, outlier objects are identified from ranked cells and eliminated from the provided dataset. Based on experiment results, this proposed method is appropriate for datasets that have a very large number of objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surveys 41, 15:1–15:58 (2009)
Hawkins, D. M.: Introduction. In: Hawkins, D.M. (ed.) Identification of Outliers, pp. 1–9. Chapman & Hall (1980)
Aggarwal, C.C.: Outlier Analysis. In: Aggarwal, C.C. (ed.) Data Mining, pp. 237–263. Springer International Publishing Switzerland (2015)
Hodge, V.J.: Outlier detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. IGI Global (2014)
Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: Outlier detection using neighborhood rank difference. Pattern Recogn. Lett. 60–61, 24–31 (2015)
Shaikh, S., Kitagawa, H.: Top-k outlier detection from uncertain data. Int. J. Autom. Comput. 11, 128–142 (2014)
Breunig, M.M., Kriegel, H.P., Raymond, T.: Ng, and Sander, J.: LOF: identifying density-based local outliers. ACM. SIGMOD Record 29, 93–104 (2000)
Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.L.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (2002)
Jin, W., Tung, A.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.K., Kitsuregawa, M., Li, J., Chang, K. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 3918, pp. 577–593. Springer, Berlin (2006)
Huang, H., Mehrotraa, K., Mohana, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 83, 518–531 (2013)
Huang, H., Mehrotra, K., Mohan, C.: Algorithms for detecting outliers via clustering and ranks. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) Advanced Research in Applied Artificial Intelligence, vol. 7345, pp. 20–29. Springer, Berlin (2012)
Ha, J., Seok, S., Lee, J.S.: A precise ranking method for outlier detection. Inf. Sci. 324, 88–107 (2015)
Hodge, V.J.: Outlier Detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. Business Science Reference, Hershey (2014)
Hieu, D.V., Meesad, P.: A Cell-MST-Based method for big dataset clustering on limited memory computers. In: 7th International Conference on Information Technology and Electrical Engineering, pp. 632–637. Chiang Mai, Thailand (2015)
Yuan, J., Zheng, Y., Xie, X. Sun, G.: Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, USA (2011)
Lichman, M.: Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
van Hieu, D., Meesad, P. (2016). A Fast Outlier Detection Algorithm for Big Datasets. In: Meesad, P., Boonkrong, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2016. Advances in Intelligent Systems and Computing, vol 463. Springer, Cham. https://doi.org/10.1007/978-3-319-40415-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-40415-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40414-1
Online ISBN: 978-3-319-40415-8
eBook Packages: EngineeringEngineering (R0)