The Fault Tolerance of Big Data Systems

Wu, Xing; Du, Zhikang; Dai, Shuji; Liu, Yazhou

doi:10.1007/978-981-10-3996-6_5

Xing Wu^12,13,
Zhikang Du¹²,
Shuji Dai¹² &
…
Yazhou Liu¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 686))

Included in the following conference series:

International Workshop on Management of Information, Processes and Cooperation

506 Accesses

Abstract

When the size of the data itself becomes part of the problem, big data era is approaching. Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. Fault tolerance is of great importance for big data systems, which have potential software and hardware faults after their development. This paper introduces some popular applications and case studies of big data mining. The architecture of big data’s individual components has parallel and distributed features, including distributed data processing, distributed storage and distributed memory, this paper briefly introduces Hadoop architecture of big data systems. Then presents some fault tolerance work recently in the big data systems such as batch computing, stream computing, Spark and Software defined networks, which shows great efforts to the capability of massive big data systems, and makes some comparison with each other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jhawar, R., Piuri, V., Santambrogio, M.: A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In: 2012 IEEE International Systems Conference (SysCon), pp. 1–5. IEEE (2012)
Google Scholar
Dyavanur, M., Kori, K.: Fault tolerance techniques in big data tools: a survey. Int. J. Innovative Res. Comput. Commun. Eng. 2(2), 95–101 (2014)
Google Scholar
Parker, P.A.: Discussion of “reliability meets big data: opportunities and challenges”. Qual. Eng. 26(1), 117–120 (2014)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., et al.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Neumeyer, L., Robbins, B., Nair, A., et al.: S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 170–177. IEEE (2010)
Google Scholar
Jones, M.T.: Process real-time big data with Twitter Storm. IBM Tech. Libr. 14(2), 1–5 (2013)
Google Scholar
Reitblatt, M., Canini, M., Guha, A., et al.: Fattire: declarative fault tolerance for software-defined networks. In: Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, pp. 109–114. ACM (2013)
Google Scholar
Antoniu, G., Costan, A., Bigot, J., et al.: Scalable data management for map-reduce-based data-intensive applications: a view for cloud and hybrid infrastructures. Int. J. Cloud Comput. 2(2), 150–170 (2013)
Article Google Scholar
Hwang, J.H., Balazinska, M., Rasin, A., et al.: High-availability algorithms for distributed stream processing. In: Proceedings of 21st International Conference on Data Engineering 2005, ICDE 2005, pp. 779–790. IEEE (2005)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, p. 10 (2010)
Google Scholar
Kim, H., Santos, J.R., Turner, Y., et al.: Coronet: fault tolerance for software defined networks. In: 2012 20th IEEE International Conference on Network Protocols (ICNP), pp. 1–2. IEEE (2012)
Google Scholar

Download references

Acknowledgements

This paper is supported by the project 61303094 supported by National Natural Science Foundation of China, by the Science and Technology Commission of Shanghai Municipality (16511102400), by Innovation Program of Shanghai Municipal Education Commission (14YZ024).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Xing Wu, Zhikang Du & Shuji Dai
Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology, Nanjing, China
Xing Wu & Yazhou Liu

Authors

Xing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhikang Du
View author publications
You can also search for this author in PubMed Google Scholar
Shuji Dai
View author publications
You can also search for this author in PubMed Google Scholar
Yazhou Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xing Wu .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Jian Cao
School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Holy See (Vatican City State)
Jianxun Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Du, Z., Dai, S., Liu, Y. (2017). The Fault Tolerance of Big Data Systems. In: Cao, J., Liu, J. (eds) Management of Information, Process and Cooperation. MIPaC 2016. Communications in Computer and Information Science, vol 686. Springer, Singapore. https://doi.org/10.1007/978-981-10-3996-6_5

Download citation

DOI: https://doi.org/10.1007/978-981-10-3996-6_5
Published: 23 February 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3995-9
Online ISBN: 978-981-10-3996-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics