Skip to main content

Distributed RDFS Reasoning with MapReduce

  • Conference paper
  • First Online:
Information Sciences and Systems 2014
  • 815 Accesses

Abstract

We live in big data age in which many computational tasks either generate or need to use large datasets. This makes parallel and distributed computing a key for scalability. MapReduce is a programming model for processing large datasets in parallel and distributed fashion on cluster of computers. Today, since the size and complexity of RDFS documents increase rapidly, RDFS reasoning problem has to embrace and address the big data solutions. The output of RDFS reasoning job can be input to another job and the output of RDFS reasoning jobs grow big as the input documents gets bigger. In this study, an indexing method is proposed to speed up the RDFS reasoning over Hadoop clusters. We also explore the utility of caching and Hadoop ecosystem tools Apache Hive and Apache Pig for this task. Experimental evaluations on Dbpedia and Freebase datasets show that the indexing method is quite effective and offers scalable solutions. Performance of caching and Apache Hive is found acceptable too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Berners-Lee, J. Hendler, O. Lassila, The semantic web. Sci. Am. 284(5), 28–37 (2001)

    Article  Google Scholar 

  2. J. Weaver, J.A. Hendler, Parallel materialization of the finite rdfs closure for hundreds of millions of triples. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 682–697, Springer (2009)

    Google Scholar 

  3. M. Husain, L. Khan, M. Kantarcioglu, B. Thuraisingham, Data intensive query processing for large RDF graphs using cloud computing tools. in: Proceedings of the IEEE 3rd International Conference on Cloud Computing (CLOUD 2010), pp. 1–10, (2010)

    Google Scholar 

  4. Apache Hadoop, http://hadoop.apache.org/. Accessed April 2014

  5. S.G.J. Dean,Mapreduce: simplified data processing on large clusters. in 6th Symposium on Operating Systems Design and Implementation (OSDI 2004), (2004)

    Google Scholar 

  6. Apache Hive, http://hive.apache.org/. Accessed April 2014

  7. Apache Pig, http://pig.apache.org/. Accessed April 2014

  8. A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, R. Murthy, Hive-a petabyte scale data warehouse using Hadoop. in: Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996–1005, (2010)

    Google Scholar 

  9. P. Papailiou, I. Konstantinou, D. Tsoumakos, P. Karras, N. Koziris, H2RDF+: High-performance distributed joins over large-scale RDF graphs. in: Proceedings of the IEEE International Conference on Big Data, pp. 255–263, (2013)

    Google Scholar 

  10. S. Jianling, J. Qiang, Scalable RDF Store Based on HBase and MapReduce. in: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010), pp. 633–636, (2010)

    Google Scholar 

  11. D. Brickley, R.V. Guha (eds.), RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, (2004)

    Google Scholar 

  12. J. Urbani, S. Kotoulas, E. Oren, F. Van Harmelen, Scalable distributed reasoning using mapreduce. in: Proceedings of the 8th International Semantic Web Conference (ISWC 2009), pp. 634–649, Springer (2009)

    Google Scholar 

  13. T. White, Hadoop The Definitive Guide (O’Reilly Media/Yahoo Press, Sebastopol, 2012)

    Google Scholar 

  14. Y. Zhanga, T. Chenb, W. Youc, J. Yud, J. Sune, H. Chenf, A new efficient semantic web platform based on the Solr, SIREn and RDF. in: Proceedings of the International Conference on Information Engineering (2012)

    Google Scholar 

  15. Apache Solr, http://lucene.apache.org/solr/. Accessed April 2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osman Abul .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Cetin, Y., Abul, O. (2014). Distributed RDFS Reasoning with MapReduce. In: Czachórski, T., Gelenbe, E., Lent, R. (eds) Information Sciences and Systems 2014. Springer, Cham. https://doi.org/10.1007/978-3-319-09465-6_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09465-6_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09464-9

  • Online ISBN: 978-3-319-09465-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics