Advertisement

International Conference on Web-Age Information Management

WAIM 2015: Web-Age Information Management pp 3-15 | Cite as

A Distributed RDF Storage and Query Model Based on HBase

  • Keran LiEmail author
  • Bin Wu
  • Bai Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9391)

Abstract

Now we are living in an interconnected world and the amount of heterogeneous information data such as RDF is continually increasing. A lot has been done to find the solution to manage huge amount of RDF data. The solutions based on RDBMS have significant scalability issues considering the magnitude of data in modern time. In this paper we describe our solution to store and query RDF data in the cloud based on HBase and MapReduce. A vertical-partitioning-like model is used in HBase to reduce the table size and to obtain a good performance of SPARQL query. For complex query on large data, we propose to use cascading MapReduce job on HBase to enhance efficiency. Our experiments on LUBM show that our system can store large RDF graphs and can obtain good query efficiency.

Keywords

RDF Heterogeneous Hbase Vertical partition Mapreduce 

Notes

Acknowledgments

This work is supported in part by the National Key Basic Research and Department (973) Program of China (No. 2013CB329606), and the Co-construction Project of Beijing Municipal Commission of Education.

References

  1. 1.
    Yuanzhuo, W., Yantao, J., Dawei, L., Xiaolong, J., Xueqi, C.: Open web knowledge aided information search and data mining. J. Comput. Res. Dev. 52(2), 456–474 (2015)Google Scholar
  2. 2.
    Du, F., Chen, Y.G., Du, X.Y.: Survey of RDF query processing techniques. Ruan Jian Xue Bao/J. Softw. 24(6), 1222–1242 (2013)Google Scholar
  3. 3.
    Franke, C., Morin, S., Chebotko, A., Abraham, J., Brazier, P.: Distributed semantic web data management in HBase and MySQL cluster. In: IEEE International Conference on Cloud Computing (CLOUD), 2011, pp. 105–112. IEEE, July 2011Google Scholar
  4. 4.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 411–422. VLDB Endowment, September 2007Google Scholar
  5. 5.
    Melnik, S.: Storing RDF in a relational database (2001)Google Scholar
  6. 6.
    Wilkinson, K., Wilkinson, K.: Jena property table implementation (2006)Google Scholar
  7. 7.
    Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF storage and retrieval in Jena2. In: SWDB, vol. 3, pp. 131–150, September 2003Google Scholar
  8. 8.
    Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: SW-Store: a vertically partitioned DBMS for Semantic Web data management. VLDB J.—Int. J. Very Large Data Bases 18(2), 385–406 (2009)CrossRefGoogle Scholar
  9. 9.
    Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 1–25 (2014)Google Scholar
  10. 10.
    Husain, M.F., Doshi, P., Khan, L., Thuraisingham, B.: Storage and retrieval of large RDF graph using Hadoop and MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 680–686. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Dittrich, J., Quiané-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop ++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow. 3(1–2), 515–529 (2010)CrossRefGoogle Scholar
  12. 12.
    Dittrich, J., Quiané-Ruiz, J.A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)CrossRefGoogle Scholar
  13. 13.
    Choi, H., Son, J., Cho, Y., Sung, M.K., Chung, Y.D.: SPIDER: a system for scalable, parallel/distributed evaluation of large-scale RDF data. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2087–2088. ACM, November 2009Google Scholar
  14. 14.
    Sun, J., Jin, Q.: Scalable rdf store based on hbase and mapreduce. In: 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 1, pp. V1–633. IEEE, August 2010Google Scholar
  15. 15.
    Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic web of scientific workflow provenance. In: IEEE International Conference on Services Computing (SCC), 2010, pp. 178–185. IEEE, July 2010Google Scholar
  16. 16.
    Guo, Y., Pan, Z., Heflin, J.: LUBM: a benchmark for OWL knowledge base systems. Web Semant. Sci. Serv. Agents World Wide Web 3(2), 158–182 (2005)CrossRefGoogle Scholar
  17. 17.
    Punnoose, R., Crainiceanu, A., Rapp, D.: SPARQL in the cloud using Rya. Inf. Syst. 48, 181–195 (2015)CrossRefGoogle Scholar
  18. 18.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: a generic architecture for storing and querying RDF and RDF schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  19. 19.
    Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL-based RDF querying scheme. In: VLDB, pp. 1216–1227, August 2005Google Scholar
  20. 20.
    Bornea, M.A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., Bhattacharjee, B.: Building an efficient RDF store over a relational database. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 121–132. ACM, 2013 JuneGoogle Scholar
  21. 21.
    Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Beijing Key Laboratory of Intelligent Telecommunications Software and MultimediaBeijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations