H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS

Luo, Tao; Chen, Guoliang; Zhang, Yunquan

doi:10.1007/978-3-319-03859-9_28

H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS

Tao Luo²⁰,
Guoliang Chen²⁰ &
Yunquan Zhang²¹

Conference paper

1668 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8285))

Abstract

With the explosion of the amount of data, analytics applications require much higher performance and scalability. However, traditional DBMS encounters the tough obstacle of scalability, and could not handle big data easily. In the meantime, due to the complex relational data model, the large amount of historical data and the independent demand of subsystems, it is not suitable to use either shared-nothing MPP architecture (e.g. Hadoop) or existing hybrid architecture (e.g. HadoopDB) to replace completely. In this paper, considering the feasibility and versatility of building a hybrid system, we propose a novel prototype H-DB which takes DBMSs as the underlying storage and execution units, and Hadoop as an index layer and a cache. H-DB not only retains the analytical DBMS, but also could handle the demands of rapidly exploding data applications. The experiments show that H-DB meets the demand, outperforms original system and would be appropriate for analogous big data applications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gantz, J., Chute, C., Manfrediz, A.: The diverse and exploding digital universe. IDC White Paper (2008)
Google Scholar
Worldwide LHC Computing Grid, http://public.web.cern.ch/public/en/LHC/Computing-en.html
WinterCorp., http://www.wintercorp.com/
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operationg System Principles (SOSP 2003), USA (2003)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA (2004)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Article Google Scholar
Hadoop: Open-source implementation of MapReduce, http://hadoop.apache.org
The HDFS Project, http://hadoop.apche.org/hdfs
Shvachko, K., Huang, H., Radia, S., et al.: The hadoop distributed filesystem. In: Proceedings of the 26th IEEE Symposium on Massive Storage Systems and Technologies, MSST 2010 (2010)
Google Scholar
Xu, Y., Kostamaa, P., Gao, L.: Integrating hadoop and parallel DBMS. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, Indiana (2010)
Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D.J., et al.: MapReduce and parallel DBMSs: friends or foes? Communications of the ACM 53(1), 64–71 (2010)
Article Google Scholar
Loebman, S., Nunley, D., Kwon, Y., et al.: Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help? In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009). New Orleans, Louisiana, USA (2009)
Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., et al.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: Proceedings of the Conference on Very Large Databases (VLDB 2009), Lyon, France (2009)
Google Scholar
The Pig Project, http://hadoop.apache.org/pig
The Hive Project, http://hadoop.apache.org/hive
An, M., Wang, Y., Wang, W., et al.: Integrating DBMSs as a Read-Only Execution Layer into Hadoop. In: Proceedings of the 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2010 (2010)
Google Scholar
Greenplum is driving the future of Big Data analytics, http://www.greenplum.com/

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, 230027, Hefei, China
Tao Luo & Guoliang Chen
State Key Laboratory of Computer Architecture, Institute of Computing Technology, CAS, 100190, Beijing, China
Yunquan Zhang

Authors

Tao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Cracow University of Technology, Warszawska 24, 31-155, Cracow, Poland
Joanna Kołodziej
Dipartimento di Ingegneria, Seconda Universita’ di Napoli, 81031, Aversa, CE, Italy
Beniamino Di Martino
DIMES and ICAR-CNR, c/o Università della Calabria, 87036, Rende, CS, Italy
Domenico Talia
College of Computing and Information Sciences, Rochester Institute of Technology, 14623, Rochester, NY, USA
Kaiqi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, T., Chen, G., Zhang, Y. (2013). H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS. In: Kołodziej, J., Di Martino, B., Talia, D., Xiong, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Springer, Cham. https://doi.org/10.1007/978-3-319-03859-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-03859-9_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03858-2
Online ISBN: 978-3-319-03859-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics