Abstract
Anomaly detection is a crucial problem in the field of data mining. However, prevailing anomaly detection algorithms are serial in nature which fail to handle huge volume of data. In this paper, we propose two parallel local density based algorithms namely, MapReduce based Local Outlier Factor (MRLOF) and Spark based Local Outlier Factor (SLOF). The proposed algorithms have time complexity of O(N) for each. This is an improvement over the Simplified LOF (Local Outlier Factor) which has time complexity of \( O(\textit{N}^{2}) \), where N is the data size. We conducted extensive experiments with MRLOF and SLOF on various real life and synthetic datasets. The proposed algorithms are shown to outperform the serial Simplified LOF.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hayes, M.A., Capretz, M.A.: Contextual anomaly detection framework for big sensor data. J. Big Data 2(1), 2 (2015)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Tan, P.N., Kumar, V., Steinbach, M.: Introduction to Data Mining. Pearson Education, India (2011)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. ACM SIGMOD Rec. 29(2), 93–104 (2000)
Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Disc. 28(1), 190–237 (2014)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Advances in Knowledge Discovery and Data Mining, pp. 813–822 (2009)
Schubert, E., Zimek, A., Kriegel, H.P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 542–550. Society for Industrial and Applied Mathematics, April 2014
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Sinha, A., Jana, P.K.: A novel K-means based clustering algorithm for big data. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1875–1879. IEEE, September 2016
Apache Hadoop. http://hadoop.apache.org/
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media, Inc., USA (2015)
https://spark.apache.org. Accessed 9 Aug 2017
http://archive.ics.uci.edu/ml/index.php. Accessed 14 Aug 2017
Acknowledgements
The authors would like to thank Council of Scientific and Industrial Research (CSIR), New Delhi, India for the financial support for this research work (File No. 09/085(0111)/2014.EMR.1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Sinha, A., Jana, P.K. (2018). Efficient Algorithms for Local Density Based Anomaly Detection. In: Negi, A., Bhatnagar, R., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2018. Lecture Notes in Computer Science(), vol 10722. Springer, Cham. https://doi.org/10.1007/978-3-319-72344-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-72344-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72343-3
Online ISBN: 978-3-319-72344-0
eBook Packages: Computer ScienceComputer Science (R0)