Abstract
Linked Data is a collection of RDF data that can grow exponentially and change over time. Detecting changes in RDF data is important to support Linked Data consuming applications with version management. Traditional approaches for change detection are not scalable. This has led researchers to devise algorithms on the MapReduce framework. Most works simply take a URI as a Map key. We observed that it is not efficient to handle RDF data with a large number of distinct URIs since many Reduce tasks have to be created. Even though the Reduce tasks are scheduled to run simultaneously, too many small Reduce tasks would increase the overall running time. In this paper, we propose G-Diff, an efficient MapReduce algorithm for RDF change detection. G-Diff groups triples by URIs during Map phase and sends the triples to a particular Reduce task rather than multiple Reduce tasks. Experiments on real datasets showed that the proposed approach takes less running time than previous works.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, New York, USA (2011)
Im, D.-H., Ahn, J., Zong, N., Jung, J., Kim, H.-G.: Link-Diff: change detection tool for linked data using MapReduce framework. In: The Workshop on Big data for Knowledge Engineering in JIST 2012, Nara, Japan, December 2–4, 2012
Wang, Y., DeWitt, D.J., Cai, J.-Y.: X-Diff: an effective change detection algorithm for XML documents. In: The Proceeding of Data Engineering (2003)
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data The story so far. International Journal on Semantic Web and Information Systems 5(3) (2009)
Volkel, M., Groza, T.: SemVersion: an RDF-based ontology versioning system. In: Proceedings of the IADIS International Conference WWW/Internet (2006)
Zeginis, D., Tzitzikas, Y., Christophides, V.: On the foundations of computing deltas between RDF models. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 637–651. Springer, Heidelberg (2007)
Apache Hadoop. http://hadoop.apache.org
Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.M.: Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. TKDE 23(9), 1312–1327 (2011)
Cassidy, S., Ballantine, J.: Version control for RDF triple store. In: ICSOFT (ISDM/EHST/DC), vol. 512 (2007)
Vander Sande, M., Colpaert, P., Verborgh, R., Coppens, S., Mannens, E., Van de Walle, R.: R&Wbase: git for triples. In: Proceedings of the 6th Workshop on Linked Data on the Web (2013)
GNU diff. http://www.gnu.org/software/diffutils
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ahn, J., Im, DH., Eom, JH., Zong, N., Kim, HG. (2015). G-Diff: A Grouping Algorithm for RDF Change Detection on MapReduce. In: Supnithi, T., Yamaguchi, T., Pan, J., Wuwongse, V., Buranarach, M. (eds) Semantic Technology. JIST 2014. Lecture Notes in Computer Science(), vol 8943. Springer, Cham. https://doi.org/10.1007/978-3-319-15615-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-15615-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15614-9
Online ISBN: 978-3-319-15615-6
eBook Packages: Computer ScienceComputer Science (R0)