A Study on Big Data Processing Frameworks: Spark and Storm

Deshai, N.; Venkataramana, S.; Sekhar, B. V. D. S.; Srinivas, K.; Saradhi Varma, G. P.

doi:10.1007/978-981-32-9690-9_43

N. Deshai⁷,
S. Venkataramana⁷,
B. V. D. S. Sekhar⁷,
K. Srinivas⁷ &
…
G. P. Saradhi Varma⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 160))

873 Accesses
2 Citations

Abstract

Today’s internet world impose a trade-off between Peta-byte to Exa-byte being created in digital computer world attributable enormous volume of unstructured datasets being generating from diverse social sites, IOT, Google, Twitter, Yahoo, monitoring surroundings through sensors, etc., is big data (BD). Because second to second doubles the datasets volume size but the shortage of smooth dynamic processing, analysis and scalability techniques. Because the recent high-speed decade we applied only extant methods and common tools about the gigabyte data process and perform computations on whole world huge data. Apache open free source Hadoop is the latest BD weapon can process zetta-byte dimensions of databases by its most developed and popular components as HDFS and map reduce (MR), to get done excellent storage features magnificent and reliable processing on zetta-byte of datasets. MR likes more famous software, popular framework for handling BD existing issues with full parallel, highly distributed, and most scalable manner. Despite, Hadoop, map and reduces tasks have more limitations like poor allocating custom resources, stream way processing, shortage of latency, the deficit of efficient performance, imperfection of optimization, the real-time trend of computations and diverse logical elucidation. We significant most modern progressive features computing procedures. This examination paper shows Apache fastest spark tool, world latest and fastest tool is apache storm has efficient frameworks to conquer those limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Article Google Scholar
Oguntimilehin, A., Ademola, E.: A review of big data management, benefits and challenges. J. Emerg. Trends Comput. Inf. Sci. 5(6), 433–438 (2014)
Google Scholar
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2014)
Article Google Scholar
Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 1 (2015)
Article Google Scholar
Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)
Article Google Scholar
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol. 11, pp. 22–22 (2011)
Google Scholar
Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM (2014)
Google Scholar
Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Article Google Scholar
Marcu, O.C., Costan, A., Antoniu, G., Perez-Hernandez, M.S.: Spark versus flink: understanding performance in big data analytics frameworks. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 433–442. IEEE (2016)
Google Scholar
Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)
Article Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)
Google Scholar
Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 659–670. IEEE (2017)
Google Scholar
Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Ozcan, F.: Clash of the titans: map reduce vs. spark for large scale data analytics. Proc. VLDB Endow 8(13), 2110–2121 (2015)
Article Google Scholar
Dea, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Polato, I., Re, R., Goldman, A., Kon, F.: A comprehensive view of hadoop research a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)
Article Google Scholar
Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N.: Haste: hadoop yarn scheduling based on task-dependency and resource demand. In: Proceedings of the 2014 IEEE (2014)
Google Scholar
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R.R., Bradshaw, R., Weizenbaum, N.: FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 363–375, New York, NY, USA (2010)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA, USA (2010)
Google Scholar
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 13, pp. 2:1–2:6, ACM. New York, NY, USA (2013)
Google Scholar
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pp. 11–11, Berkeley, CA, USA (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, S.R.K.R Engineering College Affiliated to JNTUK, Bhimavaram, Andhra Pradesh, India
N. Deshai, S. Venkataramana, B. V. D. S. Sekhar, K. Srinivas & G. P. Saradhi Varma

Authors

N. Deshai
View author publications
You can also search for this author in PubMed Google Scholar
S. Venkataramana
View author publications
You can also search for this author in PubMed Google Scholar
B. V. D. S. Sekhar
View author publications
You can also search for this author in PubMed Google Scholar
K. Srinivas
View author publications
You can also search for this author in PubMed Google Scholar
G. P. Saradhi Varma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Deshai .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Applications, KIIT Deemed to be University, Bhubaneswar, Odisha, India
J. R. Mohanty
School of Computer and Information Science, University of Hyderabad, Hyderabad, Telangana, India
Siba K. Udgata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deshai, N., Venkataramana, S., Sekhar, B.V.D.S., Srinivas, K., Saradhi Varma, G.P. (2020). A Study on Big Data Processing Frameworks: Spark and Storm. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 160. Springer, Singapore. https://doi.org/10.1007/978-981-32-9690-9_43

Download citation

DOI: https://doi.org/10.1007/978-981-32-9690-9_43
Published: 04 October 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9689-3
Online ISBN: 978-981-32-9690-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics