Redundant Independent Files (RIF): A Technique for Reducing Storage and Resources in Big Data Replication

Kaseb, Mostafa R.; Khafagy, Mohamed H.; Ali, Ihab A.; Saad, ElSayed M.

doi:10.1007/978-3-319-77703-0_18

Redundant Independent Files (RIF): A Technique for Reducing Storage and Resources in Big Data Replication

Mostafa R. Kaseb⁶,
Mohamed H. Khafagy⁶,
Ihab A. Ali⁷ &
…
ElSayed M. Saad⁷

Conference paper

8564 Accesses
2 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 745))

Abstract

Most of cloud computing storage systems widely use a distributed file system (DFS) to store big data, such as Hadoop Distributed File System (HDFS) and Google File System (GFS). Therefore, the DFS depends on replicate data and stores it as multiple copies, to achieve high reliability and availability. On the other hand, that technique increases storage and resources consumption.

This paper addresses these issues by presenting a decentralized hybrid model. That model; called CPRIF, is a combination of a cloud provider (CP) and a suggested service that we call Redundant Independent Files (RIF). The CP provides HDFS without replica, and the RIF acts as a service layer that splits data into three parts and uses the XOR operation to generate a fourth part as parity. These four parts are to be stored in HDFS files as independent files on CP. The generated parity file not only guarantees the security and reliability of data but also reduces storage space, resources consumption and operational costs. It also improved the writing and reading performance.

The suggested model was implemented on a cloud computing storage that we built using three physical servers (Dell T320) running a total 12 virtual nodes. The TeraGen benchmark tool and Java Code were used to test the model. Implemented results show the suggested model decreased the storage space by 35% compared to other models and improved the data writing and reading by about 34%.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Patel, Y.S., Mehrotra, N., Soner, S.: Green cloud computing: a review on green IT areas for cloud computing environment. In: IEEE 1st International Conference on Futuristic trend in Computational Analysis and Knowledge Management, pp. 327–332 (2015)
Google Scholar
Nair, M.K., Gopalakrishna, D.V.: Generic web services: a step towards green computing. Int. J. Comput. Sci. Eng. 1, 248–253 (2009)
Google Scholar
Asadianfam, S., Shamsi, M., Kashany, S.: A review distributed file system. Int. J. Comput. Netw. Commun. Secur. 3(5), 229–234 (2015)
Google Scholar
Krishna, T.L.S.R., Ragunathan, T., Battula, S.K.: Customized web user interface for hadoop distributed file system. In: Proceedings of the Second International Conference on Computer and Communication Technologies, 04 September 2015
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of ACM Symposium on Operating Systems Principles, Lake George, NY, pp. 29–43, October 2003
Google Scholar
The Apache Hadoop Project. https://hadoop.apache.org/. Accessed 17 Nov 2017
Shvachko, K., Kuang, H., Radia, S.: The Hadoop distributed file system. In: Proceedings of the 10th IEEE Symposium on Mass Storage Systems and Technologies, MSST 2010, pp. 1–10 (2010)
Google Scholar
Carns, P.H., Ligon III, W.B., Ross, R.B., Thakur, R.: PVFS: a parallel file system for Linux clusters. In: Proceedings of 4th Annual Linux Showcase and Conference, pp. 317–327 (2000)
Google Scholar
Braam, P.J.: The Lustre storage architecture. Cluster File Systems, Inc., August 2004. http://www.lustre.org/documentation.html
Wu, S., Zhu, W., Mao, B., Li, K.-C.: PP: popularity-based proactive data recovery for HDFS RAID systems. Future Generation Computer Systems (2017)
Google Scholar
Abead, E.S., Khafagy, M.H., Omara, F.A.: An efficient replication technique for hadoop distributed file system. Int. J. Sci. Eng. Res. 7(1), 254–261 (2016)
Google Scholar
Patel Neha, M., Patel Narendra, M., Hasan, M.I., Shah Parth, D., Patel Mayur, M.: Improving HDFS write performance using efficient replica placement. In: 2014 5th International Conferences - Confluence The Next Generation Information Technology Summit (Confluence), pp. 36–39 (2014)
Google Scholar
Li, J., Zhang, P., Li, Y., Chen, W., Liu, Y., Wang, L.: A data-check based distributed storage model for storing hot temporary data. Future Generation Comp. Syst. 73, 13–21 (2017)
Article Google Scholar
Thomasian, A.: Multi-level raid for very large disk arrays. ACM SIGMETRICS Perform. Eval. Rev. 33(4), 17–22 (2006)
Article Google Scholar
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/examples/terasort/TeraGen.html. Accessed 17 Nov 2017

Download references

Author information

Authors and Affiliations

Faculty of Computers and Information, Fayoum University, Fayoum, Egypt
Mostafa R. Kaseb & Mohamed H. Khafagy
Faculty of Engineering, Helwan University, Helwan, Egypt
Ihab A. Ali & ElSayed M. Saad

Authors

Mostafa R. Kaseb
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed H. Khafagy
View author publications
You can also search for this author in PubMed Google Scholar
Ihab A. Ali
View author publications
You can also search for this author in PubMed Google Scholar
ElSayed M. Saad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mostafa R. Kaseb .

Editor information

Editors and Affiliations

Departamento de Engenharia Informática, Universidade de Coimbra, Coimbra, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
DSI/EEUM, Universidade do Minho, Guimarães, Portugal
Luís Paulo Reis
DIMES, Università della Calabria, Arcavacata di Rende, Italy
Sandra Costanzo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaseb, M.R., Khafagy, M.H., Ali, I.A., Saad, E.M. (2018). Redundant Independent Files (RIF): A Technique for Reducing Storage and Resources in Big Data Replication. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds) Trends and Advances in Information Systems and Technologies. WorldCIST'18 2018. Advances in Intelligent Systems and Computing, vol 745. Springer, Cham. https://doi.org/10.1007/978-3-319-77703-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-77703-0_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77702-3
Online ISBN: 978-3-319-77703-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics