Advertisement

HIVE-EC: Erasure Code Functionality in HIVE Through Archiving

  • Aatish Chiniah
  • Mungur Utam Avinash Einstein
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 887)

Abstract

Most of the researches being conducted in the area of cloud storage using Erasure Codes are mainly concentrated in either finding optimal solution for a lesser storage capacity or lesser bandwidth consumption. In this paper, our goal is to provide Erasure Code functionalities directly from the application layer. For this purpose, we reviewed some application layer languages, namely, Hive, Pig and Oozie, and opt for the addition EC support in Hive. We develop several Hive commands that allow Hive tables to be first archived and then encoded or decoded with different parameters, such as join and union. We test our implementation using the MovieLen Dataset locally and on the cloud. We also compare the performance against a replicated system.

Keywords

Cloud storage Erasure code Hive Hadoop Archiving 

Notes

Acknowledgement

We thank Associate Professor Anwitaman Datta from NTU, Singapore, for his constant support and expertise reviews that greatly assisted the research.

References

  1. 1.
    Esmaili, K.S., Pamies-Juarez, L., Datta, A.: The CORE storage primitive: cross-object redundancy for efficient data repair & access in erasure coded storage. CoRR, vol. abs/1302.5192 (2013)Google Scholar
  2. 2.
    Pamies-Juarez, L., Oggier, F.E., Datta, A.: Data insertion and archiving in erasure-coding based large-scale storage systems. In: ICDCIT, pp. 47–68 (2013)Google Scholar
  3. 3.
    Islam, M., Huang, A.K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., Neumann, A., Abdelnur, A.: Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, p. 4. ACM (2012)Google Scholar
  4. 4.
    Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the Pig experience. Proc. VLDB Endow. 2(2), 1414–1425 (2009)CrossRefGoogle Scholar
  5. 5.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRefGoogle Scholar
  6. 6.
    Plank, J.S., Greenan, K.M.: Jerasure: A library in C facilitating erasure coding for storage applications–version 2.0. Technical Report UT-EECS-14-721. University of Tennessee (2014)Google Scholar
  7. 7.
    Beach, B.: Backblaze releases the Reed-Solomon Java Library for free. Backblaze Blog| Cloud Storage & Cloud Backup (2017). https://www.backblaze.com/blog/reed-solomon. Accessed 3 Aug 2017
  8. 8.
    GitHub: openstack/liberasurecode (2017). https://github.com/openstack/liberasurecode. Accessed 3 Aug 2017
  9. 9.
    Hadoop.apache.org: WebHDFS REST API (2017). https://hadoop.apache.org/docs/r1.0.4/webhdfs.html. Accessed 10 July 2017
  10. 10.
    Chandole, N.S., Kulkarni, C.S., Surwase, M.D., Shelake, S.M.: Study of HIVE Tool for Big Data used in Facebook. Ijsrd.com (2017). http://ijsrd.com/Article.php?manuscript=IJSRDV5I30070. Accessed 1 Aug 2017
  11. 11.
    Fitzgerald, N.: Using data archiving tools to preserve archival records in business systems—a case study. iPRES (2013)Google Scholar
  12. 12.
    KEEP SOLUTIONS: RODA | Repository of Authentic Digital Objects (2017). http://www.keep.pt/produtos/roda/?lang=en. Accessed 22 Nov 2017
  13. 13.
    Loc.gov.: SIARD (Software Independent Archiving of Relational Databases) Version 1.0 (2017). https://www.loc.gov/preservation/digital/formats/fdd/fdd000426.shtml. Accessed 2 Aug 2017
  14. 14.
    Saas.hpe.com.: Application Archiving & Retirement Software, Structured Data | Hewlett Packard Enterprise (2017). https://saas.hpe.com/en-us/software/application-database-archiving. Accessed 29 July 2017
  15. 15.
    Brandl, S., Keller-Marxer, P.: Long-term archiving of relational databases with Chronos. In: First International Workshop on Database Preservation (PresDB 2007), Edinburgh (2007)Google Scholar
  16. 16.
    Dev.mysql.com:. MySQL :: MySQL 5.7 Reference Manual :: 4.5.4 mysqldump—A Database Backup Program (2017). https://dev.mysql.com/doc/en/mysqldump.html. Accessed 9 Aug 2017

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Aatish Chiniah
    • 1
  • Mungur Utam Avinash Einstein
    • 1
  1. 1.Faculty of ICDTUniversity of MauritiusReduitMauritius

Personalised recommendations