Abstract
Processing large volumes of RDF data requires sophisticated tools. In recent years, much effort was spent on optimizing native RDF stores and on repurposing relational query engines for large-scale RDF processing. Concurrently, a number of new data management systems—regrouped under the NoSQL (for “not only SQL”) umbrella—rapidly rose to prominence and represent today a popular alternative to classical databases. Though NoSQL systems are increasingly used to manage RDF data, it is still difficult to grasp their key advantages and drawbacks in this context. This work is, to the best of our knowledge, the first systematic attempt at characterizing and comparing NoSQL stores for RDF processing. In the following, we describe four different NoSQL stores and compare their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 411–422 (2007)
Bizer, C., Schultz, A.: The berlin sparql benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5(2), 1–24 (2009)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26(2), 4:1–4:26 (2008)
Fundatureanu, S.: A Scalable RDF Store Based on HBase. Master’s thesis, Vrije University (2012), http://archive.org/details/ScalableRDFStoreOverHBase
Gueret, C., Kotoulas, S., Groth, P.: Triplecloud: An infrastructure for exploratory querying over web-scale RDF data. WI-IAT (2011)
Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web (2005)
Harris, S., Lamb, N., Shadbolt, N.: 4store: The design and implementation of a clustered rdf store. In: 5th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2009), pp. 94–109 (2009)
Harth, A., Decker, S.: Optimized Index Structures for Querying RDF from the Web. In: IEEE LA-WEB, pp. 71–80 (2005)
Ladwig, G., Harth, A.: CumulusRDF: Linked data management on nested key-value stores. In: The 7th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2011), p. 30 (2011)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Morsey, M., Lehmann, J., Auer, S., Ngomo, A.-C.N.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)
O’Malley, O.: Terabyte sort on apache hadoop (2008), http://sortbenchmark.org/YahooHadoop.pdf
Papailiou, N., Konstantinou, I., Tsoumakos, D., Koziris, N.: H2rdf: adaptive query processing on rdf data in the cloud. In: WWW (Companion Volume)
Pokorny, J.: Nosql databases: a step to database scalability in web environment. In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2011, pp. 278–283. ACM, New York (2011)
Przyjaciel-Zablocki, M., Schätzle, A., Hornung, T., Dorner, C., Lausen, G.: Cascading map-side joins over hbase for scalable join processing. CoRR (2012)
Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An experimental comparison of rdf data management approaches in a SPARQL benchmark scenario. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 82–97. Springer, Heidelberg (2008)
Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., Manegold, S.: Column-store support for rdf data management: not all swans are white. Proc. of the VLDB Endow. 1(2), 1553–1563 (2008)
Sun, J.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference Advanced Computer Theory and Engineering, ICACTE (2010)
Tsialiamanis, P., Sidirourgos, L., Fundulaki, I., Christophides, V., Boncz, P.: Heuristics-based query optimisation for SPARQL. In: Proceedings of the 15th International Conference on Extending Database Technology
Urbani, J., Kotoulas, S., Maassen, J., Drost, N., Seinstra, F., Harmelen, F.V., Bal, H.: Webpie: A web-scale parallel inference engine. In: Third IEEE International Scalable Computing Challenge (SCALE2010), held in Conjunction with the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid (2010)
Urbani, J., van Harmelen, F., Schlobach, S., Bal, H.: Querypie: Backward reasoning for owl horst over very large knowledge bases. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 730–745. Springer, Heidelberg (2011)
Khadilkar, V., Murat Kantarcioglu, B.T., Castagna, P.: Jena-hbase: A distributed, scalable and efficient rdf triple store. In: Proceedings of the ISWC 2012 Posters & Demonstrations Track (2012)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. of the VLDB Endow. (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cudré-Mauroux, P. et al. (2013). NoSQL Databases for RDF: An Empirical Evaluation. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41338-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-41338-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41337-7
Online ISBN: 978-3-642-41338-4
eBook Packages: Computer ScienceComputer Science (R0)