Abstract
One of the main challenges in Big data is the processing speed and scalability. Solid State Drive (SSD) helps for faster processing than HDD. Here along with SSD, Spark is also accompanied with hadoop framework for more scalability and fast processing. Apache Spark is a general-purpose engine for large-scale data processing on any cluster. It is a framework which can afford more than 8000 nodes in a cluster Spark allows for code reuse across batch, interactive, and streaming applications. Spark is much faster than MapReduce. It was generally coded from Java; Spark supports not only Java, but also Python and Scala, which is a newer language that contains some attractive properties for manipulating data. Spark runs up to 100 times faster than Hadoop MapReduce in memory and 10 times faster on disk. This paper tries to integrate spark with Hadoop ecosystem along the SSD. It increases the processing speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhosale, H.S., Gadekar, D.P.: A review paper on Big Data and Hadoop. Int. J. Innov. Res. Sci. Eng. Technol. 4(10), 1–3 (2014)
Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of MSST2010, May 2010, IEEE
Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. In: OSDI 2004
Ma, Z., Gu, L.: The limitation of MapReduce: a probing case and a lightweight solution. In: Proceedings of 1st International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2010)
Srinivas Jonnalagadda, V., Srikanth, P., Thumati, K., Nallamala, S.H.: A review study of Apache Spark in Big data processing. Int. J. Comput. Sci. Trends Technol. (IJCST) 4(3), 93–98 (2016)
Kambatla, K., Chen, Y.: The truth about MapReduce performance on SSDs. In: Proceedings of the 28th Large Installation System Administration Conference (LISA14). https://www.usenix.org/conference/lisa14/conferenceprogram/presentation/kambatla
Park, D., Kee, Y.-S.: In-storage computing for Hadoop MapReduce framework: challenges and possibilities. IEEE Trans. Comput. doi:10.1109/TC.2016.2595566
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S301D_Ki.pdf
http://www.dell.com/downloads/global/products/pvaul/en/ssd_vs_hdd_price_and_performance_study.pdf
Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)
Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: The truth about MapReduce performance on SSDs. In: Large Installation System Administration Conference (LISA14) (2014)
Saxena, P., Chou, J.: How much solid state drive can improve the performance of hadoop cluster? Performance evaluation of Hadoop on SSD and HDD. Int. J. Mod. Commun. Technol. Res. (IJMCTR) 4(6), 72–78 (2016)
Shoro, A.G., Soomro, T.R.: Big data analysis: Ap spark perspective. Glob. J. Comput. Sci. Technol. C Softw. Data Eng. (2015)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster Computing with Working Sets. University of California, Berkeley (2015)
Gopalani, S., Arora, R.: Comparing Apache Spark and Map Reduce with performance analysis using K-means. Int. J. Comput. Appl. (2015). doi:10.5120/19788-0531
Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive. O’Reilly Media, Sebastopol (2012)
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark. O’Reilly Media, Sebastopol (2015)
Thomas, L., Syama, R.: Survey on MapReduce scheduling algorithms. Int. J. Eng. Trends Technol. (IJETT) (2014)
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly Media, Sebastopol
Machine Learning: Wikipedia (2014). http://en.wikipedia.org/wiki/Machine_learning
SparkJobFlow – Databricks. https://databrickstraining.s3.amazonaws.com/slides/advancedspark-training.pdf
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation, University of California, Berkeley
Khanam, Z., Agarwal, S.: Map-reduce implementations: survey and performance comparison. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) (2015). doi:10.5121/ijcsit.2015.7410.119
Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proc. VLDB Endow. 8(12), 1840–1843 (2015)
Wikipedia
Garion, S.: Big Data Analytics Hadoop and Spark. PhD, IBM Research, Haifa
Acknowledgements
We would like to thank Dr. Muhammad Sitheeq and Prof. Rosna P. Haroon of Computer Science Department at Ilahia College of Engineering for their valuable feed back.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Soumya, K., Arunkumar, M. (2018). SSD Implementation and Spark Integration. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 1. ICTIS 2017. Smart Innovation, Systems and Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-319-63673-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-63673-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63672-6
Online ISBN: 978-3-319-63673-3
eBook Packages: EngineeringEngineering (R0)