Abstract
In the context of RDF document matching/integration, the datatype information, which is related to literal objects, is an important aspect to be analyzed in order to better determine similar RDF documents. In this paper, we propose a datatype inference process based on four steps: (i) predicate information analysis (i.e., deduce the datatype from existing range property); (ii) analysis of the object value itself by a pattern-matching process (i.e., recognize the object lexical-space); (iii) semantic analysis of the predicate name and its context; and (iv) generalization of numeric and binary datatypes to ensure the integration. We evaluated the performance and the accuracy of our approach with datasets from DBpedia. Results show that the execution time of the inference process is linear and its accuracy can increase up to 97.10%.
This is a preview of subscription content, log in via an institution.
Notes
- 1.
- 2.
- 3.
WordNet is a large lexical database of English (nouns, verbs, adjectives, etc.).
- 4.
Information about persons extracted from the English and Germany Wikipedia, represented by the FOAF vocabulary - http://wiki.dbpedia.org/Downloads2015-10.
References
XML Grid - Online XML Editor (2010). http://xmlgrid.net/xml2xsd.html. Accessed 03 May 2017
Free Formatter - Free Online Tools For Developers (2011). https://www.freeformatter.com/xsd-genearator.html. Accessed 03 May 2017
Algergawy, A., et al.: A sequence-based ontology matching approach. In: Proceedings of European Conference on Artificial Intelligence Workshops, pp. 26–30 (2008)
Algergawy, A., Nayak, R., Saake, G.: XML Schema Element Similarity Measures: A Schema Matching Context. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2009. LNCS, vol. 5871, pp. 1246–1253. Springer, Heidelberg (2009). doi:10.1007/978-3-642-05151-7_36
Arts, T., Castro, L.M., Hughes, J.: Testing erlang data types with quviq quickcheck. In: Proceedings of the 7th ACM SIGPLAN Workshop on ERLANG, pp. 1–8. ACM, New York (2008)
Boulytchev, D.: Combinators and type-driven transformers in objective caml. Sci. Comput. Program. 114, 57–73 (2015)
Chidlovskii, B.: Schema extraction from xml collections. In: Proceedings of the 2Nd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2002, pp. 291–292. ACM, New York (2002)
Dan Brickley, R.G.: RDF Schema 1.1. https://www.w3.org/TR/rdf-schema/. Accessed 06 Dec 2016
Fluet, M., Pucella, R.: Practical datatype specializations with phantom types and recursion schemes. Electron. Notes Theor. Comput. Sci. 148(2), 211–237 (2006)
Gunaratna, K., Thirunarayan, K., Sheth, A., Cheng, G.: Gleaning types for literals in rdf triples with application to entity summarization. In: Proceedings of the 13th International Conference on The SW., pp. 85–100, NY, USA (2016)
Hegewald, J., Naumann, F., Weis, M.: Xstruct: Efficient schema extraction from multiple and large xml documents. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, p. 81, Washington, DC, USA (2006)
Holdermans, S.: Random testing of purely functional abstract datatypes: guidelines for dealing with operation invariance. In: Proceedings of the 15th Symposium on Principles and Practice of Declarative Programming, pp. 275–284. ACM, New York (2013)
Jeremy J. Carroll, J.Z.P.: XML Schema Datatypes in RDF and OWL, W3C Working Group Note 14 March 2006. https://www.w3.org/TR/swbp-xsch-datatypes/#sec-values. Accessed 06 Dec 2016 (2006)
Kellou-Menouer, K., Kedad, Z.: Discovering Types in RDF Datasets. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 77–81. Springer, Cham (2015). doi:10.1007/978-3-319-25639-9_15
Liu, B., Huang, K., Li, J., Zhou, M.: An incremental and distributed inference method for large-scale ontologies based on mapreduce paradigm. IEEE Trans. Cybern. 45(1), 53–64 (2015)
Microsoft. Xml Schema Inference - Developer Network. https://msdn.microsoft.com/en-us/library/system.xml.schema.xmlschemainference.aspx. Accessed 03 May 2017
Mukkala, L., Arvo, J., Lehtonen, T., Knuutila, T., et al.: Current State of Ontology Matching. A Survey of Ontology and Schema Matching (2015)
Patrick J. Hayes, P.F.P.-S.: RDF 1.1 Semantics, W3C Recommendation 25 February 2014 (2014). https://www.w3.org/TR/rdf11-mt/#literals-and-datatypes. Accessed 06 Dec 2016
Paul V. Biron, A.M.: XML Schema Part 2: Datatypes Second Edition, W3C Recommendation 28 October 2004 (2004). https://www.w3.org/TR/xmlschema-2/#built-in-datatypes. Accessed 06 Dec 2016
Paulheim, H., Bizer, C.: Type Inference on Noisy RDF Data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 510–525. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_32
Polleres, A., Hogan, A., Harth, A., Decker, S.: Can we ever catch up with the web? Semant. Web 1(1,2), 45–52 (2010)
Sandro Hawke, P.A., Herman, I.: W3C Semantic Web Activity (2001). https://www.w3c.org/2001/sw/. Accessed 06 Dec 2016
Sleeman, J., Finin, T., Joshi, A.: Entity type recognition for heterogeneous semantic graphs. AI Mag. 36(1), 75–86 (2015)
Wang, M., Gibbons, J., Matsuda, K., Hu, Z.: Refactoring pattern matching. Sci. Comput. Program. 78(11), 2216–2242 (2013)
Acknowledgments
FINCyT/INNOVATE Peru - N 104-FINCyT-BDE-2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dongo, I., Cardinale, Y., Al-Khalil, F., Chbeir, R. (2017). Semantic Web Datatype Inference: Towards Better RDF Matching. In: Bouguettaya, A., et al. Web Information Systems Engineering – WISE 2017. WISE 2017. Lecture Notes in Computer Science(), vol 10570. Springer, Cham. https://doi.org/10.1007/978-3-319-68786-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-68786-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68785-8
Online ISBN: 978-3-319-68786-5
eBook Packages: Computer ScienceComputer Science (R0)