Skip to main content

Spark Based ANFIS Approach for Anomaly Detection Using Big Data

  • Conference paper
  • First Online:
Smart and Innovative Trends in Next Generation Computing Technologies (NGCT 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 828))

Included in the following conference series:

  • 1524 Accesses

Abstract

Business intelligence is one of the applications that can benefit from various techniques and methodologies to patronize the unlablled big data anomalies. To address this issue, in this paper, we present a model to identify anomalies in spark environment using related big data. To optimize this instance, we use an open source software framework named Spark for analyzing the big data. Spark contains powerful APIS for machine learning and soft computing algorithms. To handle and detect the anomaly instances in the perspective of big data, Apache spark is installed on the top of the Hadoop and Adaptive Neuro Fuzzy Interface System (ANFIS) is implemented in spark. The variant of Hadoop HDFS is used as a data source through resilient distributed data sets (RDDs) data which is fetched in the spark. Experimental results show that the proposed method outperforms in a fault tolerant manner and also records accurate instances in the distributed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)

    Article  Google Scholar 

  2. Savage, D., Zhang, X., Yu, X., Chou, P., Wang, Q.: Anomaly detection in online social networks. Soc. Netw. 39, 62–70 (2014)

    Article  Google Scholar 

  3. Drosou, M., Jagadish, H.V., Pitoura, E., Stoyanovich, J.: Diversity in big data: a review. Big Data 5(2), 73–84 (2017)

    Article  Google Scholar 

  4. Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts. Drivers & Techniques. Prentice Hall Press, Upper Saddle River (2016)

    Google Scholar 

  5. Holmes, A.: Hadoop in Practice. Manning Publications Co, Shelter Island (2012)

    Google Scholar 

  6. Grolinger, K., Hayes, M., Higashino, W.A., L’Heureux, A., Allison, D.S., Capretz, M.A.: Challenges for MapReduce in big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 182–189. IEEE, June 2014

    Google Scholar 

  7. Dittrich, J., Quiané-Ruiz, J.A.: Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5(12), 2014–2015 (2012)

    Article  Google Scholar 

  8. Sri, P.A., Anusha, M.: Big data-survey. Indones. J. Electr. Eng. Inform. (IJEEI) 4(1), 74–80 (2016)

    Google Scholar 

  9. García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016)

    Article  Google Scholar 

  10. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  11. Jach, T., Magiera, E., Froelich, W.: Application of HADOOP to store and process big data gathered from an urban water distribution system. Procedia Eng. 119, 1375–1380 (2015)

    Article  Google Scholar 

  12. Gunarathne, T., Zhang, B., Wu, T.L., Qiu, J.: Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Gener. Comput. Syst. 29(4), 1035–1048 (2013)

    Article  Google Scholar 

  13. Chowdhury, M., Zaharia, M., Stoica, I.: Performance and scalability of broadcast in Spark (2014). http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf. Accessed 08 Oct 2014

  14. Shanahan, J.G., Dai, L.: Large scale distributed data science using apache spark. In: Proceedings of 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2323–2324. ACM, August 2015

    Google Scholar 

  15. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D.B., Amde, M., Owen, S., Xin, D.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Bharill, N., Tiwari, A., Malviya, A.: Fuzzy based clustering algorithms to handle big data with implementation on Apache Spark. In: 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 95–104. IEEE, March 2016

    Google Scholar 

  17. Chen, L., Wang, F., Deng, H., Ji, K.: A survey on hand gesture recognition. In: 2013 International Conference on Computer Sciences and Applications (CSA), pp. 313–316. IEEE, December 2013

    Google Scholar 

  18. Chang, F.J., Chang, Y.T.: Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv. Water Resour. 29(1), 1–10 (2006)

    Article  Google Scholar 

  19. Polat, K., Güneş, S.: An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit. Sig. Process. 17(4), 702–710 (2007)

    Article  Google Scholar 

  20. Klir, G., Yuan, B.: Fuzzy Sets and Fuzzy Logic, vol. 4. Prentice Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  21. Son, S., Gil, M.S., Moon, Y.S.: Anomaly detection for big log data using a Hadoop ecosystem. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 377–380. IEEE, February 2017

    Google Scholar 

  22. Sulaiman, S.M., Jeyanthy, P.A., Devaraj, D.: Big data analytics of smart meter data using Adaptive Neuro Fuzzy Inference System (ANFIS). In: International Conference on Emerging Technological Trends (ICETT), pp. 1–5. IEEE, October 2016

    Google Scholar 

  23. Hayes, M.A., Capretz, M.A.: Contextual anomaly detection framework for big sensor data. J. Big Data 2(1), 2 (2015)

    Article  Google Scholar 

  24. Hill, D.J., Minsker, B.S.: Anomaly detection in streaming environmental sensor data: a data-driven modeling approach. Environ. Model Softw. 25(9), 1014–1022 (2010)

    Article  Google Scholar 

  25. Berger, J.O.: Statistical decision theory and Bayesian analysis. Springer Science & Business Media, New York (2013)

    Google Scholar 

  26. Xie, M., Hu, J., Tian, B.: Histogram-based online anomaly detection in hierarchical wireless sensor networks. In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 751–759. IEEE, June 2012

    Google Scholar 

  27. Kittler, J., Christmas, W., De Campos, T., Windridge, D., Yan, F., Illingworth, J., Osman, M.: Domain anomaly detection in machine perception: a system architecture and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 845–859 (2014)

    Article  Google Scholar 

  28. Solaimani, M., Iftekhar, M., Khan, L., Thuraisingham, B., Ingram, J.B.: Spark-based anomaly detection over multi-source VMware performance data in real-time. In: 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), pp. 1–8. IEEE, December 2014

    Google Scholar 

  29. ccFraud Dataset, August 2017. http://packages.revolutionanalytics.com/datasets/. Accessed 12 July 2017

  30. Kamaruddin, S., Ravi, V.: Credit card fraud detection using big data analytics: use of PSOAANN based one-class classification. In: Proceedings of International Conference on Informatics and Analytics, p. 33. ACM, August 2016

    Google Scholar 

Download references

Acknowledgments

This work is partially supported by Indian Institute of Technology (ISM), Govt. of India. The authors wish to express their gratitude and thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India for providing their support in arranging necessary computing facilities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dharavath Ramesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santosh, T., Ramesh, D. (2018). Spark Based ANFIS Approach for Anomaly Detection Using Big Data. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 828. Springer, Singapore. https://doi.org/10.1007/978-981-10-8660-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8660-1_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8659-5

  • Online ISBN: 978-981-10-8660-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics