Processing RDF Using Hadoop

Ali, Mehreen; Sriram Bharat, K.; Ranichandra, C.

doi:10.1007/978-3-642-31552-7_40

Processing RDF Using Hadoop

Mehreen Ali⁴,
K. Sriram Bharat⁴ &
C. Ranichandra⁵

Conference paper

3217 Accesses
3 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 177))

Abstract

The basic inspiration of the Semantic Web is to broaden the existing human-readable web by encoding some of the semantics of resources in a machine-understandable form. There are various formats and technologies that help in making it possible. These technologies comprise of the Resource Description Framework (RDF), an assortment of data interchange formats like RDF/XML, N3, N-Triples, and representations such as RDF Schema (RDFS) and Web Ontology Language (OWL), all of which help in providing a proper description of concepts, terms and associations in a particular knowledge domain. Presently, there are some existing frameworks for semantic web technologies but they have limitations for large RDF graphs. Thus storing and efficiently querying a large number of RDF triples is a challenging and important problem. We propose a framework which is constructed using Hadoop to store and retrieve massive numbers of RDF triples by taking advantage of the cloud computing paradigm. Hadoop permits the development of reliable, scalable, proficient, cost-effective and distributed computing using very simple Java interfaces. Hadoop comprises of a distributed file system HDFS to stock up RDF data. Hadoop Map Reduce framework is used to answer the queries. MapReduce job divides the input data-set into independent units which are processed in parallel by the map tasks , which then serve as inputs to the reduce tasks. This framework takes care of task scheduling, supervising them and re-execution of the failed tasks. Uniqueness of our approach is its efficient, automatic allocation of data and work across machines and in turn exploiting the fundamental parallelism of the CPU cores. Results confirm that our proposed framework offers multi-fold efficiencies and benefits which include on-demand processing, operational scalability, competence, cost efficiency and local access to enormous data, contrasting the various traditional approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amazon. Amazon EC2 Instance Types (2010), http://aws.amazon.com/ec2/instance-types/
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (May 17, 2001)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the USENIX Symposium on Operating Systems Design & Implementation, OSDI, pp. 137–147 (2004)
Google Scholar
DeWitt, D., Stonebraker, M.: MapReduce: A major step backwards, database-column.com , http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/ (retrieved August 28, 2010)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics 3(2), 158–182 (2005)
Article Google Scholar
Grigoris, A., van Harmelen, F.: A Semantic Web Primer, 2nd edn. The MIT Press (2008)
Google Scholar
Urbani, J., Kotoulas, S., Oren, E., van Harmelen, F.: Scalable Distributed Reasoning Using MapReduce. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 634–649. Springer, Heidelberg (2009)
Chapter Google Scholar
Hendler, J.: Web 3.0: The Dawn of Semantic Search. IEEE Computer (January 2010)
Google Scholar
Kolas, D., Emmons, I., Dean, M.: Efficient Linked-List RDF Indexing in Parliament. In: The Proceedings of the Scalable Semantic Web (SSWS) Workshop of ISWC 2009 (2009)
Google Scholar
Li, P., Zeng, Y., Kotoulas, S., Urbani, J., Zhong, N.: The Quest for Parallel Reasoning on the Semantic Web. In: Liu, J., Wu, J., Yao, Y., Nishida, T. (eds.) AMT 2009. LNCS, vol. 5820, pp. 430–441. Springer, Heidelberg (2009)
Chapter Google Scholar
LinkingOpenData (2010), http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Mika, P., Tummarello, G.: Web Semantics in the Clouds. IEEE Intelligent Systems 23(5), 82–87 (2008)
Article Google Scholar
Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.: Heuristics Based Query Processing for Large RDF Graphs Using Cloud Computing. Journal of Latex Class Files 6(1) (January 2007)
Google Scholar
Project Voldemort (2010), http://project-voldemort.com/
RDF. Resource Description Framework (RDF) (2010), http://www.w3.org/RDF/
Rohloff, K., Schantz, R.: High-Performance, Massively Scalable Distributed Systems using the MapReduce Software Framework: The SHARD Triple-Store. In: International Workshop on Programming Support Innovations for Emerging Distributed Applications, PSIEtA (2010)
Google Scholar
Rohloff, K., Dean, M., Emmons, I., Ryder, D., Sumner, J.: Evaluation of Triple-Store Technologies for Large Data Stores. In: 3rd International Workshop on Scalable Semantic Web Knowledge Base Systems, SSWS 2007, Vilamoura, Portugal (2007)
Google Scholar
SPARQL. SPARQL Query Language for RDF (2010), http://www.w3.org/TR/rdf-sparql-query/

Download references

Author information

Authors and Affiliations

MS (Software Engineering), VIT University, Vellore, India
Mehreen Ali & K. Sriram Bharat
VIT University, Vellore, India
C. Ranichandra

Authors

Mehreen Ali
View author publications
You can also search for this author in PubMed Google Scholar
K. Sriram Bharat
View author publications
You can also search for this author in PubMed Google Scholar
C. Ranichandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehreen Ali .

Editor information

Editors and Affiliations

, Department of Computer Science, Jackson State University, John R. Lynch Street 1400, Jackson, 39217, USA
Natarajan Meghanathan
Wireilla Net Solutions PTY Ltd, Melbourne, Australia
Dhinaharan Nagamalai
Department of Computer Science & Eng., University of Calcutta, Calcutta, 700 073, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ali, M., Sriram Bharat, K., Ranichandra, C. (2013). Processing RDF Using Hadoop. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31552-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-31552-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31551-0
Online ISBN: 978-3-642-31552-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics