Skip to main content

A Study on Big Data Processing Frameworks: Spark and Storm

  • Conference paper
  • First Online:
Smart Intelligent Computing and Applications

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 160))

Abstract

Today’s internet world impose a trade-off between Peta-byte to Exa-byte being created in digital computer world attributable enormous volume of unstructured datasets being generating from diverse social sites, IOT, Google, Twitter, Yahoo, monitoring surroundings through sensors, etc., is big data (BD). Because second to second doubles the datasets volume size but the shortage of smooth dynamic processing, analysis and scalability techniques. Because the recent high-speed decade we applied only extant methods and common tools about the gigabyte data process and perform computations on whole world huge data. Apache open free source Hadoop is the latest BD weapon can process zetta-byte dimensions of databases by its most developed and popular components as HDFS and map reduce (MR), to get done excellent storage features magnificent and reliable processing on zetta-byte of datasets. MR likes more famous software, popular framework for handling BD existing issues with full parallel, highly distributed, and most scalable manner. Despite, Hadoop, map and reduces tasks have more limitations like poor allocating custom resources, stream way processing, shortage of latency, the deficit of efficient performance, imperfection of optimization, the real-time trend of computations and diverse logical elucidation. We significant most modern progressive features computing procedures. This examination paper shows Apache fastest spark tool, world latest and fastest tool is apache storm has efficient frameworks to conquer those limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)

    Article  Google Scholar 

  2. Oguntimilehin, A., Ademola, E.: A review of big data management, benefits and challenges. J. Emerg. Trends Comput. Inf. Sci. 5(6), 433–438 (2014)

    Google Scholar 

  3. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2014)

    Article  Google Scholar 

  4. Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 1 (2015)

    Article  Google Scholar 

  5. Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)

    Article  Google Scholar 

  6. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol. 11, pp. 22–22 (2011)

    Google Scholar 

  7. Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM (2014)

    Google Scholar 

  8. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  9. Marcu, O.C., Costan, A., Antoniu, G., Perez-Hernandez, M.S.: Spark versus flink: understanding performance in big data analytics frameworks. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 433–442. IEEE (2016)

    Google Scholar 

  10. Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)

    Article  Google Scholar 

  11. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)

    Google Scholar 

  12. Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 659–670. IEEE (2017)

    Google Scholar 

  13. Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Ozcan, F.: Clash of the titans: map reduce vs. spark for large scale data analytics. Proc. VLDB Endow 8(13), 2110–2121 (2015)

    Article  Google Scholar 

  14. Dea, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  15. Polato, I., Re, R., Goldman, A., Kon, F.: A comprehensive view of hadoop research a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)

    Article  Google Scholar 

  16. Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N.: Haste: hadoop yarn scheduling based on task-dependency and resource demand. In: Proceedings of the 2014 IEEE (2014)

    Google Scholar 

  17. Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R.R., Bradshaw, R., Weizenbaum, N.: FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 363–375, New York, NY, USA (2010)

    Google Scholar 

  18. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA, USA (2010)

    Google Scholar 

  19. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 13, pp. 2:1–2:6, ACM. New York, NY, USA (2013)

    Google Scholar 

  20. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pp. 11–11, Berkeley, CA, USA (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Deshai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deshai, N., Venkataramana, S., Sekhar, B.V.D.S., Srinivas, K., Saradhi Varma, G.P. (2020). A Study on Big Data Processing Frameworks: Spark and Storm. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 160. Springer, Singapore. https://doi.org/10.1007/978-981-32-9690-9_43

Download citation

Publish with us

Policies and ethics