Abstract
Today’s internet world impose a trade-off between Peta-byte to Exa-byte being created in digital computer world attributable enormous volume of unstructured datasets being generating from diverse social sites, IOT, Google, Twitter, Yahoo, monitoring surroundings through sensors, etc., is big data (BD). Because second to second doubles the datasets volume size but the shortage of smooth dynamic processing, analysis and scalability techniques. Because the recent high-speed decade we applied only extant methods and common tools about the gigabyte data process and perform computations on whole world huge data. Apache open free source Hadoop is the latest BD weapon can process zetta-byte dimensions of databases by its most developed and popular components as HDFS and map reduce (MR), to get done excellent storage features magnificent and reliable processing on zetta-byte of datasets. MR likes more famous software, popular framework for handling BD existing issues with full parallel, highly distributed, and most scalable manner. Despite, Hadoop, map and reduces tasks have more limitations like poor allocating custom resources, stream way processing, shortage of latency, the deficit of efficient performance, imperfection of optimization, the real-time trend of computations and diverse logical elucidation. We significant most modern progressive features computing procedures. This examination paper shows Apache fastest spark tool, world latest and fastest tool is apache storm has efficient frameworks to conquer those limitations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Oguntimilehin, A., Ademola, E.: A review of big data management, benefits and challenges. J. Emerg. Trends Comput. Inf. Sci. 5(6), 433–438 (2014)
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 8 (2014)
Landset, S., Khoshgoftaar, T.M., Richter, A.N., Hasanin, T.: A survey of open source tools for machine learning with big data in the hadoop ecosystem. J. Big Data 2(1), 1 (2015)
Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol. 11, pp. 22–22 (2011)
Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM (2014)
Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Marcu, O.C., Costan, A., Antoniu, G., Perez-Hernandez, M.S.: Spark versus flink: understanding performance in big data analytics frameworks. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 433–442. IEEE (2016)
Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)
Zhang, S., He, B., Dahlmeier, D., Zhou, A.C., Heinze, T.: Revisiting the design of data stream processing systems on multi-core processors. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 659–670. IEEE (2017)
Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Ozcan, F.: Clash of the titans: map reduce vs. spark for large scale data analytics. Proc. VLDB Endow 8(13), 2110–2121 (2015)
Dea, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Polato, I., Re, R., Goldman, A., Kon, F.: A comprehensive view of hadoop research a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)
Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N.: Haste: hadoop yarn scheduling based on task-dependency and resource demand. In: Proceedings of the 2014 IEEE (2014)
Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R.R., Bradshaw, R., Weizenbaum, N.: FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’10, pp. 363–375, New York, NY, USA (2010)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA, USA (2010)
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 13, pp. 2:1–2:6, ACM. New York, NY, USA (2013)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pp. 11–11, Berkeley, CA, USA (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deshai, N., Venkataramana, S., Sekhar, B.V.D.S., Srinivas, K., Saradhi Varma, G.P. (2020). A Study on Big Data Processing Frameworks: Spark and Storm. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 160. Springer, Singapore. https://doi.org/10.1007/978-981-32-9690-9_43
Download citation
DOI: https://doi.org/10.1007/978-981-32-9690-9_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9689-3
Online ISBN: 978-981-32-9690-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)