Abstract
With the explosion of the amount of data, analytics applications require much higher performance and scalability. However, traditional DBMS encounters the tough obstacle of scalability, and could not handle big data easily. In the meantime, due to the complex relational data model, the large amount of historical data and the independent demand of subsystems, it is not suitable to use either shared-nothing MPP architecture (e.g. Hadoop) or existing hybrid architecture (e.g. HadoopDB) to replace completely. In this paper, considering the feasibility and versatility of building a hybrid system, we propose a novel prototype H-DB which takes DBMSs as the underlying storage and execution units, and Hadoop as an index layer and a cache. H-DB not only retains the analytical DBMS, but also could handle the demands of rapidly exploding data applications. The experiments show that H-DB meets the demand, outperforms original system and would be appropriate for analogous big data applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gantz, J., Chute, C., Manfrediz, A.: The diverse and exploding digital universe. IDC White Paper (2008)
Worldwide LHC Computing Grid, http://public.web.cern.ch/public/en/LHC/Computing-en.html
WinterCorp., http://www.wintercorp.com/
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operationg System Principles (SOSP 2003), USA (2003)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA (2004)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Communications of the ACM 53(1), 72–77 (2010)
Hadoop: Open-source implementation of MapReduce, http://hadoop.apache.org
The HDFS Project, http://hadoop.apche.org/hdfs
Shvachko, K., Huang, H., Radia, S., et al.: The hadoop distributed filesystem. In: Proceedings of the 26th IEEE Symposium on Massive Storage Systems and Technologies, MSST 2010 (2010)
Xu, Y., Kostamaa, P., Gao, L.: Integrating hadoop and parallel DBMS. In: Proceedings of the 2010 International Conference on Management of Data (SIGMOD 2010), Indianapolis, Indiana (2010)
Stonebraker, M., Abadi, D., DeWitt, D.J., et al.: MapReduce and parallel DBMSs: friends or foes? Communications of the ACM 53(1), 64–71 (2010)
Loebman, S., Nunley, D., Kwon, Y., et al.: Analyzing Massive Astrophysical Datasets: Can Pig/Hadoop or a Relational DBMS Help? In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER 2009). New Orleans, Louisiana, USA (2009)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., et al.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In: Proceedings of the Conference on Very Large Databases (VLDB 2009), Lyon, France (2009)
The Pig Project, http://hadoop.apache.org/pig
The Hive Project, http://hadoop.apache.org/hive
An, M., Wang, Y., Wang, W., et al.: Integrating DBMSs as a Read-Only Execution Layer into Hadoop. In: Proceedings of the 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2010 (2010)
Greenplum is driving the future of Big Data analytics, http://www.greenplum.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Luo, T., Chen, G., Zhang, Y. (2013). H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS. In: Kołodziej, J., Di Martino, B., Talia, D., Xiong, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Springer, Cham. https://doi.org/10.1007/978-3-319-03859-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-03859-9_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03858-2
Online ISBN: 978-3-319-03859-9
eBook Packages: Computer ScienceComputer Science (R0)