Skip to main content

SSD Implementation and Spark Integration

  • Conference paper
  • First Online:
Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 1 ( ICTIS 2017)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 83))

Abstract

One of the main challenges in Big data is the processing speed and scalability. Solid State Drive (SSD) helps for faster processing than HDD. Here along with SSD, Spark is also accompanied with hadoop framework for more scalability and fast processing. Apache Spark is a general-purpose engine for large-scale data processing on any cluster. It is a framework which can afford more than 8000 nodes in a cluster Spark allows for code reuse across batch, interactive, and streaming applications. Spark is much faster than MapReduce. It was generally coded from Java; Spark supports not only Java, but also Python and Scala, which is a newer language that contains some attractive properties for manipulating data. Spark runs up to 100 times faster than Hadoop MapReduce in memory and 10 times faster on disk. This paper tries to integrate spark with Hadoop ecosystem along the SSD. It increases the processing speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. www.tutorialspoint.com/hadoop/hadoop_tutorial_in_pdf.htm

  2. Bhosale, H.S., Gadekar, D.P.: A review paper on Big Data and Hadoop. Int. J. Innov. Res. Sci. Eng. Technol. 4(10), 1–3 (2014)

    Google Scholar 

  3. Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)

    Google Scholar 

  4. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of MSST2010, May 2010, IEEE

    Google Scholar 

  5. Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. In: OSDI 2004

    Google Scholar 

  6. Ma, Z., Gu, L.: The limitation of MapReduce: a probing case and a lightweight solution. In: Proceedings of 1st International Conference on Cloud Computing, GRIDs, and Virtualization (Cloud Computing 2010)

    Google Scholar 

  7. Srinivas Jonnalagadda, V., Srikanth, P., Thumati, K., Nallamala, S.H.: A review study of Apache Spark in Big data processing. Int. J. Comput. Sci. Trends Technol. (IJCST) 4(3), 93–98 (2016)

    Google Scholar 

  8. Kambatla, K., Chen, Y.: The truth about MapReduce performance on SSDs. In: Proceedings of the 28th Large Installation System Administration Conference (LISA14). https://www.usenix.org/conference/lisa14/conferenceprogram/presentation/kambatla

  9. Park, D., Kee, Y.-S.: In-storage computing for Hadoop MapReduce framework: challenges and possibilities. IEEE Trans. Comput. doi:10.1109/TC.2016.2595566

  10. http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S301D_Ki.pdf

  11. http://www.dell.com/downloads/global/products/pvaul/en/ssd_vs_hdd_price_and_performance_study.pdf

  12. http://www.storagereview.com/ssd_vs_hdd

  13. Kang, S.-H., Koo, D.-H., Kang, W.-H., Lee, S.-W.: A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6(1), 201–210 (2013)

    Google Scholar 

  14. Do, J., Kee, Y.-S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: The truth about MapReduce performance on SSDs. In: Large Installation System Administration Conference (LISA14) (2014)

    Google Scholar 

  15. https://www.infoq.com/articles/apache-spark-introduction

  16. Saxena, P., Chou, J.: How much solid state drive can improve the performance of hadoop cluster? Performance evaluation of Hadoop on SSD and HDD. Int. J. Mod. Commun. Technol. Res. (IJMCTR) 4(6), 72–78 (2016)

    Google Scholar 

  17. Shoro, A.G., Soomro, T.R.: Big data analysis: Ap spark perspective. Glob. J. Comput. Sci. Technol. C Softw. Data Eng. (2015)

    Google Scholar 

  18. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster Computing with Working Sets. University of California, Berkeley (2015)

    Google Scholar 

  19. Gopalani, S., Arora, R.: Comparing Apache Spark and Map Reduce with performance analysis using K-means. Int. J. Comput. Appl. (2015). doi:10.5120/19788-0531

    Google Scholar 

  20. Capriolo, E., Wampler, D., Rutherglen, J.: Programming Hive. O’Reilly Media, Sebastopol (2012)

    Google Scholar 

  21. Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark. O’Reilly Media, Sebastopol (2015)

    Google Scholar 

  22. Thomas, L., Syama, R.: Survey on MapReduce scheduling algorithms. Int. J. Eng. Trends Technol. (IJETT) (2014)

    Google Scholar 

  23. White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly Media, Sebastopol

    Google Scholar 

  24. Machine Learning: Wikipedia (2014). http://en.wikipedia.org/wiki/Machine_learning

  25. SparkJobFlow – Databricks. https://databrickstraining.s3.amazonaws.com/slides/advancedspark-training.pdf

  26. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I. Improving MapReduce Performance in Heterogeneous Environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation, University of California, Berkeley

    Google Scholar 

  27. Khanam, Z., Agarwal, S.: Map-reduce implementations: survey and performance comparison. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) (2015). doi:10.5121/ijcsit.2015.7410.119

    Google Scholar 

  28. Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proc. VLDB Endow. 8(12), 1840–1843 (2015)

    Google Scholar 

  29. www.mapr.com

  30. Wikipedia

    Google Scholar 

  31. https://pubs.vmware.com

  32. Garion, S.: Big Data Analytics Hadoop and Spark. PhD, IBM Research, Haifa

    Google Scholar 

Download references

Acknowledgements

We would like to thank Dr. Muhammad Sitheeq and Prof. Rosna P. Haroon of Computer Science Department at Ilahia College of Engineering for their valuable feed back.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Soumya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Soumya, K., Arunkumar, M. (2018). SSD Implementation and Spark Integration. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 1. ICTIS 2017. Smart Innovation, Systems and Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-319-63673-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63673-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63672-6

  • Online ISBN: 978-3-319-63673-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics