Skip to main content

HIVE-EC: Erasure Code Functionality in HIVE Through Archiving

  • Conference paper
  • First Online:
Advances in Information and Communication Networks (FICC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 887))

Included in the following conference series:

Abstract

Most of the researches being conducted in the area of cloud storage using Erasure Codes are mainly concentrated in either finding optimal solution for a lesser storage capacity or lesser bandwidth consumption. In this paper, our goal is to provide Erasure Code functionalities directly from the application layer. For this purpose, we reviewed some application layer languages, namely, Hive, Pig and Oozie, and opt for the addition EC support in Hive. We develop several Hive commands that allow Hive tables to be first archived and then encoded or decoded with different parameters, such as join and union. We test our implementation using the MovieLen Dataset locally and on the cloud. We also compare the performance against a replicated system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Esmaili, K.S., Pamies-Juarez, L., Datta, A.: The CORE storage primitive: cross-object redundancy for efficient data repair & access in erasure coded storage. CoRR, vol. abs/1302.5192 (2013)

    Google Scholar 

  2. Pamies-Juarez, L., Oggier, F.E., Datta, A.: Data insertion and archiving in erasure-coding based large-scale storage systems. In: ICDCIT, pp. 47–68 (2013)

    Google Scholar 

  3. Islam, M., Huang, A.K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., Neumann, A., Abdelnur, A.: Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, p. 4. ACM (2012)

    Google Scholar 

  4. Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of map-reduce: the Pig experience. Proc. VLDB Endow. 2(2), 1414–1425 (2009)

    Article  Google Scholar 

  5. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  6. Plank, J.S., Greenan, K.M.: Jerasure: A library in C facilitating erasure coding for storage applications–version 2.0. Technical Report UT-EECS-14-721. University of Tennessee (2014)

    Google Scholar 

  7. Beach, B.: Backblaze releases the Reed-Solomon Java Library for free. Backblaze Blog| Cloud Storage & Cloud Backup (2017). https://www.backblaze.com/blog/reed-solomon. Accessed 3 Aug 2017

  8. GitHub: openstack/liberasurecode (2017). https://github.com/openstack/liberasurecode. Accessed 3 Aug 2017

  9. Hadoop.apache.org: WebHDFS REST API (2017). https://hadoop.apache.org/docs/r1.0.4/webhdfs.html. Accessed 10 July 2017

  10. Chandole, N.S., Kulkarni, C.S., Surwase, M.D., Shelake, S.M.: Study of HIVE Tool for Big Data used in Facebook. Ijsrd.com (2017). http://ijsrd.com/Article.php?manuscript=IJSRDV5I30070. Accessed 1 Aug 2017

  11. Fitzgerald, N.: Using data archiving tools to preserve archival records in business systems—a case study. iPRES (2013)

    Google Scholar 

  12. KEEP SOLUTIONS: RODA | Repository of Authentic Digital Objects (2017). http://www.keep.pt/produtos/roda/?lang=en. Accessed 22 Nov 2017

  13. Loc.gov.: SIARD (Software Independent Archiving of Relational Databases) Version 1.0 (2017). https://www.loc.gov/preservation/digital/formats/fdd/fdd000426.shtml. Accessed 2 Aug 2017

  14. Saas.hpe.com.: Application Archiving & Retirement Software, Structured Data | Hewlett Packard Enterprise (2017). https://saas.hpe.com/en-us/software/application-database-archiving. Accessed 29 July 2017

  15. Brandl, S., Keller-Marxer, P.: Long-term archiving of relational databases with Chronos. In: First International Workshop on Database Preservation (PresDB 2007), Edinburgh (2007)

    Google Scholar 

  16. Dev.mysql.com:. MySQL :: MySQL 5.7 Reference Manual :: 4.5.4 mysqldump—A Database Backup Program (2017). https://dev.mysql.com/doc/en/mysqldump.html. Accessed 9 Aug 2017

Download references

Acknowledgement

We thank Associate Professor Anwitaman Datta from NTU, Singapore, for his constant support and expertise reviews that greatly assisted the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aatish Chiniah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chiniah, A., Einstein, M.U.A. (2019). HIVE-EC: Erasure Code Functionality in HIVE Through Archiving. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03405-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03404-7

  • Online ISBN: 978-3-030-03405-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics