Abstract
Cloud Computing has brought a great change in the way information is stored and applications run. In order for one or more clusters to work as a cloud we need a middleware framework, such as Apache Hadoop [17], that provides reliability, scalability and distributed computing. Once the infrastructure has been established, a software framework can be installed, which runs on top of it and will be the connection to communicate with the applications developed by the users. The software, in this regard, is a NoSQL database. This paper deals with the problem of searching data in some widespread NoSQL databases used in cloud computing. Two categories of NoSQL databases are compared; one based on columns using a column-oriented key-value store, HBase [6], and a high-available graph database, Neo4j [11]. HBase is a distributed, scalable storage system that runs on top of HDFS, and has being designed based on Google’s BigTable [4]. Neo4j has being designed and developed to be a reliable database, optimized for graph structures, instead of tables, and is a robust, scalable, high performance and high available database that supports ACID transactions and queries written in Cypher language. The aim of this paper is to create a novel system that will decide when a query must be send to be executed in a key-value store or a graph database. Thus, an experimental pure performance comparison has been made between Apache HBase and Neo4j for a variety of queries, that were programmed using systems API’s and Java language.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. 40(1), 1:1–1:39 (2008)
Brewer, E.: Cap twelve years later: how the “rules” have changed. Computer 45(2), 23–29 (2012)
Cai, L., Huang, S., Chen, L., Zheng, Y.: Performance analysis and testing of hbase based on its architecture. In: 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS), pp. 353–358, June 2013
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 205–218. OSDI 2006, USENIX Association, Berkeley, CA, USA (2006)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
George, L.: HBase: The Definitive Guide. O’Reilly Media Inc., Sebastopol (2011)
Holzschuher, F., Peinl, R.: Performance of graph query languages: comparison of cypher, gremlin and native access in neo4j. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT 2013, NY, USA, pp. 195–204. ACM, New York (2013)
Kostylev, E.V., Reutter, J.L., Vrgoc, D.: Containment of data graph queries. In: ICDT, pp. 131–142 (2014)
Kristina, C., Michael, D.: MongoDB: The Definitive Guide. O’Reilly Media, Sebastopol (2010)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Neo4j.org: Neo4j - the world’s leading graph database. http://www.neo4j.org/, Accessed on 16 june 2014
Nishimura, S., Das, S., Agrawal, D., Abbadi, A.: Md-hbase: a scalable multi-dimensional data infrastructure for location aware services. In: 2011 12th IEEE International Conference on Mobile Data Management (MDM), vol. 1, pp. 7–16, June 2011
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media, Inc., Sebastopol (2013)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10, May 2010
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., Wilkins, D.: A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th Annual Southeast Regional Conference, ACM SE 2010, NY, USA, pp. 42: 1–42: 6. ACM, New York (2010)
White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly Media Inc., Sebastopol (2012)
Wood, P.T.: Query languages for graph databases. SIGMOD Rec. 41(1), 50–60 (2012)
Acknowledgments
Our thanks to C. Caratheodory Research Program from University of Patras, Greece to support this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kendea, M., Gkantouna, V., Rapti, A., Sioutas, S., Tzimas, G., Tsolis, D. (2016). Graph DBs vs. Column-Oriented Stores: A Pure Performance Comparison. In: Karydis, I., Sioutas, S., Triantafillou, P., Tsoumakos, D. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2015. Lecture Notes in Computer Science(), vol 9511. Springer, Cham. https://doi.org/10.1007/978-3-319-29919-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-29919-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29918-1
Online ISBN: 978-3-319-29919-8
eBook Packages: Computer ScienceComputer Science (R0)