A Data Preparation Approach for Cloud Storage Based on Containerized Parallel Patterns

  • Diana Carrizales
  • Dante D. Sánchez-GallegosEmail author
  • Hugo Reyes
  • J. L. Gonzalez-Compean
  • Miguel Morales-Sandoval
  • Jesus Carretero
  • Alejandro Galaviz-Mosqueda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11874)


In this paper, we present the design, implementation, and evaluation of an efficient data preparation and retrieval approach for cloud storage. The approach includes a deduplication subsystem that indexes the hash of each content to identify duplicated data. As a consequence, avoiding duplicated content reduces reprocessing time during uploads and other costs related to outsource data management tasks. Our proposed data preparation scheme enables organizations to add properties such as security, reliability, and cost-efficiency to their contents before sending them to the cloud. It also creates recovery schemes for organizations to share preprocessed contents with partners and end-users. The approach also includes an engine that encapsulates preprocessing applications into virtual containers (VCs) to create parallel patterns that improve the efficiency of data preparation retrieval process. In a study case, real repositories of satellite images, and organizational files were prepared to be migrated to the cloud by using processes such as compression, encryption, encoding for fault tolerance, and access control. The experimental evaluation revealed the feasibility of using a data preparation approach for organizations to mitigate risks that still could arise in the cloud. It also revealed the efficiency of the deduplication process to reduce data preparation tasks and the efficacy of parallel patterns to improve the end-user service experience.


Deduplication systems Virtual containers Parallel patterns Content delivery Cloud storage 


  1. 1.
    Chow, R., et al.: Controlling data in the cloud: outsourcing computation without outsourcing control. In: CCSW 2009, pp. 85–90. ACM (2009)Google Scholar
  2. 2.
    Dworkin, M.J.: SHA-3 standard: permutation-based hash and extendable-output functions (2015)Google Scholar
  3. 3.
    Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView 2007(2012), 1–16 (2012) Google Scholar
  4. 4.
    Gonzalez, J.L., Perez, J.C., Sosa-Sosa, V.J., Sanchez, L.M., Bergua, B.: SkyCDS: a resilient content delivery service based on diversified cloud storage. Simul. Model. Pract. Theory 54, 64–85 (2015)CrossRefGoogle Scholar
  5. 5.
    Gonzalez, J.L., Sosa, V., Diaz, A., Carretero, J., Yanez, J.: Sacbe: a building block approach for constructing efficient and flexible end-to-end cloud storage. J. Syst. Softw. 135, 143–156 (2018)CrossRefGoogle Scholar
  6. 6.
    Mao, B., Wu, S., Jiang, H.: Improving storage availability in cloud-of-clouds with hybrid redundant data distribution. In: IPDPS 2015m, pp. 633–642. IEEE (2015)Google Scholar
  7. 7.
    Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009, p. 8. ACM (2009)Google Scholar
  8. 8.
    Meister, D., Brinkmann, A.: dedupv1: improving deduplication throughput using solid state drives (SSD). In: MSST 2010, pp. 1–6. IEEE (2010)Google Scholar
  9. 9.
    Miller, K.: Cloud deduplication, on-demand: storreduce, an apn technology partner: Amazon web services, March 2018.
  10. 10.
    Mitzenmacher, M.: The power of two choices in randomized load balancing. IEEE TPDS 12(10), 1094–1104 (2001)Google Scholar
  11. 11.
    Morales, M., Gonzalez, J.L., Diaz, A., Sosa, V.J.: A pairing-based cryptographic approach for data security in the cloud. IJISP 17(4), 441–461 (2018)CrossRefGoogle Scholar
  12. 12.
    Ng, W., Wen, Y., Zhu, H.: Private data deduplication protocols in cloud storage. In: Proceedings of the SAC 2012, pp. 441–446. ACM (2012)Google Scholar
  13. 13.
    Plummer, D.C., Bittman, T.J., Austin, T., Cearley, D.W., Smith, D.M.: Cloud computing: defining and describing an emerging phenomenon. Gartner, 17 June 2008Google Scholar
  14. 14.
    Rabin, M.O.: Efficient dispersal of information for security, load balancing, and fault tolerance. JACM 36(2), 335–348 (1989)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Reinsel, D., Gantz, J., Rydning, J.: The digitization of the world: from edge to core. International Data Corporation, Framingham (2018)Google Scholar
  16. 16.
    Reyes, H., Gonzalez, J., Morales, M., Carretero, J.: A data integrity verification service for cloud storage based on building blocks. In: 2018 8th CSIT, pp. 201–206. IEEE (2018)Google Scholar
  17. 17.
    Sánchez, D., Gonzalez, J., Alvarado, S., Sosa, V., Tuxpan, J., Carretero, J.: A containerized service for clustering and categorization of weather records in the cloud. In: CSIT, pp. 26–31. IEEE (2018)Google Scholar
  18. 18.
    Singh, A., Chatterjee, K.: Cloud security issues and challenges: a survey. J. Netw. Comput. Appl. 79, 88–115 (2017)CrossRefGoogle Scholar
  19. 19.
    Xiong, H., Zhang, X., Zhu, W., Yao, D.: CloudSeal: end-to-end content protection in cloud-based storage and delivery services. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds.) SecureComm 2011. LNICST, vol. 96, pp. 491–500. Springer, Heidelberg (2012). Scholar
  20. 20.
    Zhang, J., Zhang, Z.: Secure and efficient data-sharing in clouds. CCPE 27(8), 2125–2143 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Diana Carrizales
    • 1
  • Dante D. Sánchez-Gallegos
    • 1
    Email author
  • Hugo Reyes
    • 1
  • J. L. Gonzalez-Compean
    • 1
  • Miguel Morales-Sandoval
    • 1
  • Jesus Carretero
    • 2
  • Alejandro Galaviz-Mosqueda
    • 3
  1. 1.Cinvestav TamaulipasCiudad VictoriaMexico
  2. 2.Universidad Carlos III de MadridMadridSpain
  3. 3.CICESEEnsenadaMexico

Personalised recommendations