Advertisement

Testbeds, Attacks, and Dataset Generation for Big Data Cluster: A System Application for Big Data Platform Security Analysis

  • Swagata Paul
  • Sajal Saha
  • R. T. Goswami
Conference paper
  • 6 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1119)

Abstract

A big data cluster consists of number of network-connected computers. A big data cluster offers a huge data store and processing power. End users submit both data and application to the cluster. All the computers called nodes in the cluster work together to give the result from the data. During data processing, lots of process run on different nodes and exchange data. The data exchange is done via regular network protocols. During processing, one or multiple computers may not participate well due to its bad hardware or operating system health. Some computers may receive known network attack like DOS and thus slow down the performance of the cluster. Some other computers may receive unknown attacks generated by the big data job itself. Therefore, the system requires a mechanism to detect such nodes under attack or the nodes generating attacks and isolate thereafter. To detect this attack, we need to analyze the cumulative network traffic of all the nodes in the cluster. Therefore, we must collect such network traffic of all the nodes participating in data processing job simultaneously. This work is to present an efficient testbed for external or internal attack generation and dataset creation for different attacks. The proposed architecture captures network traffic from all nodes of the cluster and stores them for attack detection in near future.

Keywords

Big data cluster Attack detection framework Big data security analysis 

Notes

Acknowledgements

We have conducted this research at Big Data Lab, Techno International New Town (Formerly known as Techno India College of Technology), Kolkata. We received professional version of Tableau through their Tableau Academic Programs.

References

  1. 1.
    Cutting, D.: The apache\(^{\text{TM}}\) hadoop\(\textregistered \). https://hadoop.apache.org, May 2019
  2. 2.
    Hortonworks: Apache hadoop ecosystem and open source big data projects. https://hortonworks.com/ecosystems, May 2019. Accessed 8 July 2019
  3. 3.
    Ambari: The apache ambari project. https://ambari.apache.org, May 2019
  4. 4.
    Wireshark: Wireshark—the world’s foremost and widely-used network protocol analyzer. https://www.wireshark.org, July 2019. Accessed 10 Sept 2019
  5. 5.
    Wireshark: Pcapng file format. https://www.wireshark.org/docs/dfref/f/file-pcapng.html, March 2019. Accessed 10 Sept 2019
  6. 6.
    Aditham, S.K., Ranganathan, N.: Systems and methods for detecting attacks in big data systems. http://www.freepatentsonline.com/y2019/0089720.html, March 2019
  7. 7.
    Glenn, W., Yu, W.: Cyber attacks on mapreduce computation time in a hadoop cluster. Big Data Concepts, Theories, and Apps, pp. 257–279. Springer, Berlin (2016)CrossRefGoogle Scholar
  8. 8.
    Huang, J., Nicol, D.M., Campbell, R.H.: Denial-of-service threat to hadoop/yarn clusters with multi-tenancy. In: 2014 IEEE International Congress on Big Data, pp. 48–55. IEEE (2014)Google Scholar
  9. 9.
    Aditham, S., Ranganathan, N.: A system architecture for the detection of insider attacks in big data systems. IEEE Trans. Dependable Secur. Comput. 15(6), 974–987 (2017)CrossRefGoogle Scholar
  10. 10.
    Alzahrani, S., Hong, L.: Generation of ddos attack dataset for effective ids development and evaluation. J. Inf. Secur. 9(04), 225 (2018)Google Scholar
  11. 11.
    Haider, W., Hu, J., Slay, J., Turnbull, B.P., Xie, Y.: Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87, 185–192 (2017)CrossRefGoogle Scholar
  12. 12.
    Metron: Apache metron big data security. https://metron.apache.org
  13. 13.
    Metron: Apache metron. https://hortonworks.com/apache/metron, May 2019
  14. 14.
    MIT: Kerberos - network auth. protocol. https://web.mit.edu/kerberos/
  15. 15.
  16. 16.
  17. 17.
    Millman, R.: Thousands of hadoop clusters still not being secured against attacks. https://www.scmagazineuk.com/thousands-hadoop-clusters-not-secured-against-attacks/article/1475302
  18. 18.
    Yoder, M., Acharya, S.: Protecting hadoop clusters from malware attacks. https://blog.cloudera.com/blog/2018/11/protecting-hadoop-clusters-from-malware-attacks, November 2018
  19. 19.
    Dey, N., Das, H., Naik, B., Behera, H.: Big Data Analytics for Intelligent Healthcare Management. Academic, London (2019)Google Scholar
  20. 20.
    Sahoo, A.K., Mallik, S., Pradhan, C., Mishra, B.S.P., Barik, R.K., Das, H.: Intelligence-based health recommendation system using big data analytics. In: Big Data Analytics for Intelligent Healthcare Management, pp. 227–246. Elsevier, Amsterdam (2019)Google Scholar
  21. 21.
    Panigrahi, C.R., Tiwary, M., Pati, B., Das, H.: Big data and cyber foraging: future scope and challenges. In: Techniques and Environments for Big Data Analysis, pp. 75–100. Springer, Berlin (2016)Google Scholar
  22. 22.
    Reddy, K.H.K., Das, H., Roy, D.S.: A data aware scheme for scheduling big data applications with savanna hadoop. In: Networks of the Future, pp. 377–392. Chapman and Hall/CRC, Boca Raton (2017)Google Scholar
  23. 23.
    Barik, R.K., Dubey, H., Misra, C., Borthakur, D., Constant, N., Sasane, S.A., Lenka, R.K., Mishra, B.S.P., Das, H., Mankodiya, K.: Fog assisted cloud computing in era of big data and internet-of-things: systems, architectures, and applications. In: Cloud Computing for Optimization: Foundations, Applications, and Challenges, pp. 367–394. Springer, Berlin (2018)Google Scholar
  24. 24.
    Barik, R.K., Kandpal, M., Dubey, H., Kumar, V., Das, H.: Geocloud4gi: Cloud sdi model for geographical indications information infrastructure network. In: Cloud Computing for Geospatial Big Data Analytics, pp. 215–224. Springer, Berlin (2019)Google Scholar
  25. 25.
    Das, H., Barik, R.K., Dubey, H., Roy, D.S.: Cloud Computing for Geospatial Big Data Analytics: Intelligent Edge, Fog and Mist Computing, vol. 49. Springer, Berlin (2018)Google Scholar
  26. 26.
    Pradhan, C., Das, H., Naik, B., Dey, N.: Handbook of Research on Information Security in Biomedical Signal Processing. IGI Global (2018)Google Scholar
  27. 27.
    Gupta, B.: 10 hadoop alternatives that you should consider for big data. https://www.analyticsindiamag.com/10-hadoop-alternatives-consider-big-data
  28. 28.
    Sanfilippo, S.: hping3 package description. http://www.hping.org, March 2019

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Swagata Paul
    • 1
  • Sajal Saha
    • 2
  • R. T. Goswami
    • 1
  1. 1.Techno International New TownKolkataIndia
  2. 2.The Assam Kaziranga University KoraikhowaJorhatIndia

Personalised recommendations