Abstract
Business intelligence is one of the applications that can benefit from various techniques and methodologies to patronize the unlablled big data anomalies. To address this issue, in this paper, we present a model to identify anomalies in spark environment using related big data. To optimize this instance, we use an open source software framework named Spark for analyzing the big data. Spark contains powerful APIS for machine learning and soft computing algorithms. To handle and detect the anomaly instances in the perspective of big data, Apache spark is installed on the top of the Hadoop and Adaptive Neuro Fuzzy Interface System (ANFIS) is implemented in spark. The variant of Hadoop HDFS is used as a data source through resilient distributed data sets (RDDs) data which is fetched in the spark. Experimental results show that the proposed method outperforms in a fault tolerant manner and also records accurate instances in the distributed environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Savage, D., Zhang, X., Yu, X., Chou, P., Wang, Q.: Anomaly detection in online social networks. Soc. Netw. 39, 62–70 (2014)
Drosou, M., Jagadish, H.V., Pitoura, E., Stoyanovich, J.: Diversity in big data: a review. Big Data 5(2), 73–84 (2017)
Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts. Drivers & Techniques. Prentice Hall Press, Upper Saddle River (2016)
Holmes, A.: Hadoop in Practice. Manning Publications Co, Shelter Island (2012)
Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE, June 2014
Dittrich, J., Quiané-Ruiz, J.A.: Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5(12), 2014–2015 (2012)
Sri, P.A., Anusha, M.: Big data-survey. Indones. J. Electr. Eng. Inform. (IJEEI) 4(1), 74–80 (2016)
GarcÃa, S., RamÃrez-Gallego, S., Luengo, J., BenÃtez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Jach, T., Magiera, E., Froelich, W.: Application of HADOOP to store and process big data gathered from an urban water distribution system. Procedia Eng. 119, 1375–1380 (2015)
Gunarathne, T., Zhang, B., Wu, T.L., Qiu, J.: Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Gener. Comput. Syst. 29(4), 1035–1048 (2013)
Chowdhury, M., Zaharia, M., Stoica, I.: Performance and scalability of broadcast in Spark (2014). http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf. Accessed 08 Oct 2014
Shanahan, J.G., Dai, L.: Large scale distributed data science using apache spark. In: Proceedings of 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2323–2324. ACM, August 2015
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S., Xin, D.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
Bharill, N., Tiwari, A., Malviya, A.: Fuzzy based clustering algorithms to handle big data with implementation on Apache Spark. In: 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 95–104. IEEE, March 2016
Chen, L., Wang, F., Deng, H., Ji, K.: A survey on hand gesture recognition. In: 2013 International Conference on Computer Sciences and Applications (CSA), pp. 313–316. IEEE, December 2013
Chang, F.J., Chang, Y.T.: Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv. Water Resour. 29(1), 1–10 (2006)
Polat, K., Güneş, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Sig. Process. 17(4), 702–710 (2007)
Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic, vol. 4. Prentice Hall, Upper Saddle River (1995)
Son, S., Gil, M.S., Moon, Y.S.: Anomaly detection for big log data using a Hadoop ecosystem. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 377–380. IEEE, February 2017
Sulaiman, S.M., Jeyanthy, P.A., Devaraj, D.: Big data analytics of smart meter data using Adaptive Neuro Fuzzy Inference System (ANFIS). In: International Conference on Emerging Technological Trends (ICETT), pp. 1–5. IEEE, October 2016
Hayes, M.A., Capretz, M.A.: Contextual anomaly detection framework for big sensor data. J. Big Data 2(1), 2 (2015)
Hill, D.J., Minsker, B.S.: Anomaly detection in streaming environmental sensor data: a data-driven modeling approach. Environ. Model Softw. 25(9), 1014–1022 (2010)
Berger, J.O.: Statistical decision theory and Bayesian analysis. Springer Science & Business Media, New York (2013)
Xie, M., Hu, J., Tian, B.: Histogram-based online anomaly detection in hierarchical wireless sensor networks. In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 751–759. IEEE, June 2012
Kittler, J., Christmas, W., De Campos, T., Windridge, D., Yan, F., Illingworth, J., Osman, M.: Domain anomaly detection in machine perception: a system architecture and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 845–859 (2014)
Solaimani, M., Iftekhar, M., Khan, L., Thuraisingham, B., Ingram, J.B.: Spark-based anomaly detection over multi-source VMware performance data in real-time. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8. IEEE, December 2014
ccFraud Dataset, August 2017. http://packages.revolutionanalytics.com/datasets/. Accessed 12 July 2017
Kamaruddin, S., Ravi, V.: Credit card fraud detection using big data analytics: use of PSOAANN based one-class classification. In: Proceedings of International Conference on Informatics and Analytics, p. 33. ACM, August 2016
Acknowledgments
This work is partially supported by Indian Institute of Technology (ISM), Govt. of India. The authors wish to express their gratitude and thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India for providing their support in arranging necessary computing facilities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Santosh, T., Ramesh, D. (2018). Spark Based ANFIS Approach for Anomaly Detection Using Big Data. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 828. Springer, Singapore. https://doi.org/10.1007/978-981-10-8660-1_34
Download citation
DOI: https://doi.org/10.1007/978-981-10-8660-1_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8659-5
Online ISBN: 978-981-10-8660-1
eBook Packages: Computer ScienceComputer Science (R0)