Skip to main content

H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8285))

Abstract

With the explosion of the amount of data, analytics applications require much higher performance and scalability. However, traditional DBMS encounters the tough obstacle of scalability, and could not handle big data easily. In the meantime, due to the complex relational data model, the large amount of historical data and the independent demand of subsystems, it is not suitable to use either shared-nothing MPP architecture (e.g. Hadoop) or existing hybrid architecture (e.g. HadoopDB) to replace completely. In this paper, considering the feasibility and versatility of building a hybrid system, we propose a novel prototype H-DB which takes DBMSs as the underlying storage and execution units, and Hadoop as an index layer and a cache. H-DB not only retains the analytical DBMS, but also could handle the demands of rapidly exploding data applications. The experiments show that H-DB meets the demand, outperforms original system and would be appropriate for analogous big data applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gantz, J., Chute, C., Manfrediz, A.: The diverse and exploding digital universe. IDC White Paper (2008)

    Google Scholar 

  2. Worldwide LHC Computing Grid, http://public.web.cern.ch/public/en/LHC/Computing-en.html

  3. WinterCorp., http://www.wintercorp.com/

  4. Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operationg System Principles (SOSP 2003), USA (2003)

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA (2004)

    Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  7. Hadoop: Open-source implementation of MapReduce, http://hadoop.apache.org

  8. The HDFS Project, http://hadoop.apche.org/hdfs

  9. Shvachko, K., Huang, H., Radia, S., et al.: The hadoop distributed filesystem. In: Proceedings of the 26th IEEE Symposium on Massive Storage Systems and Technologies, MSST 2010 (2010)

    Google Scholar 

  10. Xu, Y., Kostamaa, P., Gao, L.: Integrating hadoop and parallel DBMS. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, Indiana (2010)

    Google Scholar 

  11. Stonebraker, M., Abadi, D., DeWitt, D.J., et al.: MapReduce and parallel DBMSs: friends or foes? Communications of the ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  12. Loebman, S., Nunley, D., Kwon, Y., et al.: Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help? In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009). New Orleans, Louisiana, USA (2009)

    Google Scholar 

  13. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., et al.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: Proceedings of the Conference on Very Large Databases (VLDB 2009), Lyon, France (2009)

    Google Scholar 

  14. The Pig Project, http://hadoop.apache.org/pig

  15. The Hive Project, http://hadoop.apache.org/hive

  16. An, M., Wang, Y., Wang, W., et al.: Integrating DBMSs as a Read-Only Execution Layer into Hadoop. In: Proceedings of the 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2010 (2010)

    Google Scholar 

  17. Greenplum is driving the future of Big Data analytics, http://www.greenplum.com/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Luo, T., Chen, G., Zhang, Y. (2013). H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS. In: Kołodziej, J., Di Martino, B., Talia, D., Xiong, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Springer, Cham. https://doi.org/10.1007/978-3-319-03859-9_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03859-9_28

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03858-2

  • Online ISBN: 978-3-319-03859-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics