Abstract
Applying Intelligence to the machines is a need in today’s world and this need leads to the evolution of machine learning. The analysis of data using machine learning algorithms is a trending research area and this analysis lead to some problems when the data comes out to be big data. This paper compares various classification based machine learning algorithms namely, Decision Tree Learning, Naïve Bayes, Random Forest and Support Vector Machines on big data using Apache Spark. The accuracy is evaluated to find out which classification based algorithm gives fast and better result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput. Sci. 93, 824–831 (2016)
Shyam, R., Bharathi Ganesh, H.B., Kumar, S., Poornachandran, P., Soman, K.P.: Apache spark a big data analytics platform for smart grid. Procedia Technol. 21, 171–178 (2015)
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., Chouvarda, I.: Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15, 104–116 (2017)
Kumar, D., Singh, R., Kumar, A., Sharma, N.: An adaptive method of PCA for minimization of classification error using Naïve Bayes classifier. Procedia Comput. Sci. 70, 9–15 (2015)
Zhang, P., Wu, X., Wang, X., Bi, S.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1(3), 59–67 (2015)
Liu, S., Wang, X., Liu, M., Zhu, J.: Towards better analysis of machine learning models: a visual analytics perspective. Vis. Inf. 1(1), 48–56 (2017)
Panigrahi, S., Lenka, R.K., Stitipragyan, A.: A hybrid distributed collaborative filtering recommender engine using apache spark. Procedia Comput. Sci. 83, 1000–1006 (2016)
Alpaydin, E.: Introduction to Machine Learning, 3rd edn. The MIT Press, Cambridge, London (2014)
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis, 2nd edn. Springer, London (2010). https://doi.org/10.1007/978-1-84882-260-3
Kelleher, J.D., Mac Namee, B., D’Arcy, A.: Fundamentals of Machine Learning for Predictive Data Analytics. The MIT Press, Cambridge, London (2015)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufman Publishers, Burlington (2011)
Mitchell, T.M.: Machine Learning. Mcgraw Hill Education Private Limited, New York (1997)
Scott, J.A.: Getting Started with Apache Spark: Inception to Production, 1st edn. MapR Technologies, San Jose (2015)
Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)
Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)
Shafique, M.A., Hato, E.: Classification of travel data with multiple sensor information using random forest. Transp. Res. Procedia 22, 144–153 (2017)
Swetapadma, A., Yadav, A.: Protection of parallel transmission lines including inter-circuit faults using Naïve Bayes classifier. Alexandria Eng. J. 55(2), 1411–1419 (2016)
Jayasree, V., Balan, R.S.: Money laundering regulatory risk evaluation using bitmap index-based decision tree. J. Assoc. Arab Univ. Basic Appl. Sci. 23, 96–102 (2017)
Götz, M., Richerzhagen, M., Bodenstein, C., Cavallaro, G., Glock, P., Riedel, M., Benediktsson, J.A.: On scalable data mining techniques for earth science. Procedia Comput. Sci. 51, 2188–2197 (2015)
Github. https://github.com/caroljmcdonald/sparkmldecisiontree/blob/master/data/rita2014jan.csv. Accessed 10 July 2017
Apache Spark. https://spark.apache.org/docs/2.1.0/mllib-decision-tree.html#basic-algorithm. Accessed 10 July 2017
Packt Pub. https://www.packtpub.com/books/content/spark-%E2%80%93-architecture-and-first-program. Accessed 23 Sept 2017
Apache Spark. https://spark.apache.org/docs/latest/cluster-overview.html. Accessed 23 Sept 2017
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mogha, G., Ahlawat, K., Singh, A.P. (2018). Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark. In: Panda, B., Sharma, S., Roy, N. (eds) Data Science and Analytics. REDSET 2017. Communications in Computer and Information Science, vol 799. Springer, Singapore. https://doi.org/10.1007/978-981-10-8527-7_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-8527-7_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8526-0
Online ISBN: 978-981-10-8527-7
eBook Packages: Computer ScienceComputer Science (R0)