SSD Implementation and Spark Integration

Soumya, K.; Arunkumar, M.

doi:10.1007/978-3-319-63673-3_30

K. Soumya⁵ &
M. Arunkumar⁵

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 83))

Included in the following conference series:

International Conference on Information and Communication Technology for Intelligent Systems

1722 Accesses
1 Citations

Abstract

One of the main challenges in Big data is the processing speed and scalability. Solid State Drive (SSD) helps for faster processing than HDD. Here along with SSD, Spark is also accompanied with hadoop framework for more scalability and fast processing. Apache Spark is a general-purpose engine for large-scale data processing on any cluster. It is a framework which can afford more than 8000 nodes in a cluster Spark allows for code reuse across batch, interactive, and streaming applications. Spark is much faster than MapReduce. It was generally coded from Java; Spark supports not only Java, but also Python and Scala, which is a newer language that contains some attractive properties for manipulating data. Spark runs up to 100 times faster than Hadoop MapReduce in memory and 10 times faster on disk. This paper tries to integrate spark with Hadoop ecosystem along the SSD. It increases the processing speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

www.tutorialspoint.com/hadoop/hadoop_tutorial_in_pdf.htm
Bhosale, H.S., Gadekar, D.P.: A review paper on Big Data and Hadoop. Int. J. Innov. Res. Sci. Eng. Technol. 4(10), 1–3 (2014)
Google Scholar
Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of MSST2010, May 2010, IEEE
Google Scholar
Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. In: OSDI 2004
Google Scholar
Ma, Z., Gu, L.: The limitation of MapReduce: a probing case and a lightweight solution. In: Proceedings of 1st International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2010)
Google Scholar
Srinivas Jonnalagadda, V., Srikanth, P., Thumati, K., Nallamala, S.H.: A review study of Apache Spark in Big data processing. Int. J. Comput. Sci. Trends Technol. (IJCST) 4(3), 93–98 (2016)
Google Scholar
Kambatla, K., Chen, Y.: The truth about MapReduce performance on SSDs. In: Proceedings of the 28th Large Installation System Administration Conference (LISA14). https://www.usenix.org/conference/lisa14/conferenceprogram/presentation/kambatla
Park, D., Kee, Y.-S.: In-storage computing for Hadoop MapReduce framework: challenges and possibilities. IEEE Trans. Comput. doi:10.1109/TC.2016.2595566
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S301D_Ki.pdf
http://www.dell.com/downloads/global/products/pvaul/en/ssd_vs_hdd_price_and_performance_study.pdf
http://www.storagereview.com/ssd_vs_hdd
Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)
Google Scholar
Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: The truth about MapReduce performance on SSDs. In: Large Installation System Administration Conference (LISA14) (2014)
Google Scholar
https://www.infoq.com/articles/apache-spark-introduction
Saxena, P., Chou, J.: How much solid state drive can improve the performance of hadoop cluster? Performance evaluation of Hadoop on SSD and HDD. Int. J. Mod. Commun. Technol. Res. (IJMCTR) 4(6), 72–78 (2016)
Google Scholar
Shoro, A.G., Soomro, T.R.: Big data analysis: Ap spark perspective. Glob. J. Comput. Sci. Technol. C Softw. Data Eng. (2015)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster Computing with Working Sets. University of California, Berkeley (2015)
Google Scholar
Gopalani, S., Arora, R.: Comparing Apache Spark and Map Reduce with performance analysis using K-means. Int. J. Comput. Appl. (2015). doi:10.5120/19788-0531
Google Scholar
Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive. O’Reilly Media, Sebastopol (2012)
Google Scholar
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark. O’Reilly Media, Sebastopol (2015)
Google Scholar
Thomas, L., Syama, R.: Survey on MapReduce scheduling algorithms. Int. J. Eng. Trends Technol. (IJETT) (2014)
Google Scholar
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly Media, Sebastopol
Google Scholar
Machine Learning: Wikipedia (2014). http://en.wikipedia.org/wiki/Machine_learning
SparkJobFlow – Databricks. https://databrickstraining.s3.amazonaws.com/slides/advancedspark-training.pdf
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation, University of California, Berkeley
Google Scholar
Khanam, Z., Agarwal, S.: Map-reduce implementations: survey and performance comparison. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) (2015). doi:10.5121/ijcsit.2015.7410.119
Google Scholar
Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proc. VLDB Endow. 8(12), 1840–1843 (2015)
Google Scholar
www.mapr.com
Wikipedia
Google Scholar
https://pubs.vmware.com
Garion, S.: Big Data Analytics Hadoop and Spark. PhD, IBM Research, Haifa
Google Scholar

Download references

Acknowledgements

We would like to thank Dr. Muhammad Sitheeq and Prof. Rosna P. Haroon of Computer Science Department at Ilahia College of Engineering for their valuable feed back.

Author information

Authors and Affiliations

Computer Science Department, ICET, Mulavoor, India
K. Soumya & M. Arunkumar

Authors

K. Soumya
View author publications
You can also search for this author in PubMed Google Scholar
M. Arunkumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Soumya .

Editor information

Editors and Affiliations

Department of CSE, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Sabar Institute of Technology for Girls , Ahmedabad, Gujarat, India
Amit Joshi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soumya, K., Arunkumar, M. (2018). SSD Implementation and Spark Integration. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 1. ICTIS 2017. Smart Innovation, Systems and Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-319-63673-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-63673-3_30
Published: 08 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63672-6
Online ISBN: 978-3-319-63673-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics