A Distributed RDF Storage and Query Model Based on HBase

Li, Keran; Wu, Bin; Wang, Bai

doi:10.1007/978-3-319-23531-8_1

Keran Li¹⁵,
Bin Wu¹⁵ &
Bai Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9391))

Included in the following conference series:

International Conference on Web-Age Information Management

753 Accesses
3 Citations

Abstract

Now we are living in an interconnected world and the amount of heterogeneous information data such as RDF is continually increasing. A lot has been done to find the solution to manage huge amount of RDF data. The solutions based on RDBMS have significant scalability issues considering the magnitude of data in modern time. In this paper we describe our solution to store and query RDF data in the cloud based on HBase and MapReduce. A vertical-partitioning-like model is used in HBase to reduce the table size and to obtain a good performance of SPARQL query. For complex query on large data, we propose to use cascading MapReduce job on HBase to enhance efficiency. Our experiments on LUBM show that our system can store large RDF graphs and can obtain good query efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Apache Jena, http://jena.apache.org.
2.
RDF Current Status, http://www.w3.org/standards/techs/rdf#w3c_all.
3.
SPARQL 1.1 Query Language, http://www.w3.org/TR/sparql11-query/.
4.
Accumulo, http://accumulo.apache.org/.

References

Yuanzhuo, W., Yantao, J., Dawei, L., Xiaolong, J., Xueqi, C.: Open web knowledge aided information search and data mining. J. Comput. Res. Dev. 52(2), 456–474 (2015)
Google Scholar
Du, F., Chen, Y.G., Du, X.Y.: Survey of RDF query processing techniques. Ruan Jian Xue Bao/J. Softw. 24(6), 1222–1242 (2013)
Google Scholar
Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: IEEE International Conference on Cloud Computing (CLOUD), 2011, pp. 105–112. IEEE, July 2011
Google Scholar
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment, September 2007
Google Scholar
Melnik, S.: Storing RDF in a relational database (2001)
Google Scholar
Wilkinson, K., Wilkinson, K.: Jena property table implementation (2006)
Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, vol. 3, pp. 131–150, September 2003
Google Scholar
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J.—Int. J. Very Large Data Bases 18(2), 385–406 (2009)
Article Google Scholar
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 1–25 (2014)
Google Scholar
Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and retrieval of large RDF graph using Hadoop and MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 680–686. Springer, Heidelberg (2009)
Chapter Google Scholar
Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop ++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)
Article Google Scholar
Dittrich, J., Quiané-Ruiz, J.A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)
Article Google Scholar
Choi, H., Son, J., Cho, Y., Sung, M.K., Chung, Y.D.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088. ACM, November 2009
Google Scholar
Sun, J., Jin, Q.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 1, pp. V1–633. IEEE, August 2010
Google Scholar
Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic web of scientific workflow provenance. In: IEEE International Conference on Services Computing (SCC), 2010, pp. 178–185. IEEE, July 2010
Google Scholar
Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)
Article Google Scholar
Punnoose, R., Crainiceanu, A., Rapp, D.: SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)
Article Google Scholar
Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)
Chapter Google Scholar
Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227, August 2005
Google Scholar
Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 121–132. ACM, 2013 June
Google Scholar
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Article Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Key Basic Research and Department (973) Program of China (No. 2013CB329606), and the Co-construction Project of Beijing Municipal Commission of Education.

Author information

Authors and Affiliations

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Keran Li, Bin Wu & Bai Wang

Authors

Keran Li
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Bai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keran Li .

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Xiaokui Xiao
Advanced Digital Sciences Center, Singpore, Singapore
Zhenjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, K., Wu, B., Wang, B. (2015). A Distributed RDF Storage and Query Model Based on HBase. In: Xiao, X., Zhang, Z. (eds) Web-Age Information Management. WAIM 2015. Lecture Notes in Computer Science(), vol 9391. Springer, Cham. https://doi.org/10.1007/978-3-319-23531-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-23531-8_1
Published: 21 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23530-1
Online ISBN: 978-3-319-23531-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics