Abstract
Linked Data initiative has completely changed the procedure of sharing knowledge over the Web. It primarily aimed at improving the interoperability and semantics of the data published, by following a set of recommendations. Still, many data sources, which have a significant value, have not migrated to this new data space and continue to publish semi-structured data. Thus, new challenges arise in accessing and integrating the two data sources and models. This paper explores and identifies some of the major challenges, such as the continuous expansion and dynamism of a heterogeneous and an autonomous yet connected web of data, and addresses them by proposing SemiLD, a mediator-based framework to integrate on-the-fly heterogeneous semi-structured and Linked Data sources. The approach is implemented into a highly automated keyword search system that retrieves its input from various SPARQL endpoints and web APIs. The evaluation of the system illustrates the high precision, performance and recall of the contributed approach.
Similar content being viewed by others
References
Abelló, A, de Palol, X., Hacid, M.S. (2018). Approximating the schema of a set of documents by means of resemblance. Journal on Data Semantics, 7(2), 87–105. https://doi.org/10.1007/s13740-018-0088-0.
Bergamaschi, S., Domnori, E., Guerra, F., Orsini, M., Lado, R.T., Velegrakis, Y. (2010). Keymantic: semantic keyword-based searching in data integration systems. Proceedings of the VLDB Endowment, 3(1-2), 1637–1640.
Berners-Lee, T. (1999). Weaving the Web. Harper.
Berners-Lee, T. (2006). Linked data. http://www.w3.org/DesignIssues/LinkedData.html. Accessed: 04 Jan 2016.
Bizer, C., Heath, T., Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.
Cai, Q, & Yates, A. (2013). Large-scale semantic parsing via schema matching and lexicon extension. In ACL (1) (pp. 423–433). Citeseer.
Calì, A, Calvanese, D, De Giacomo, G, Lenzerini, M. (2004). Data integration under integrity constraints. Information Systems, 29(2), 147–163. https://doi.org/10.1016/S0306-4379(03)00050-4.
Ciobanu, G., Horne, R., Sassone, V. (2015). Minimal type inference for linked data consumers. Journal of Logical and Algebraic Methods in Programming, 84(4), 485–504. https://doi.org/10.1016/j.jlamp.2014.12.005.
Collarana, D, Lange, C, Auer, S, Grangel-González, I. (2016). Fuhsen: a platform for federated, rdf-based hybrid search. In The 16th international conference on web engineering (ICWE2016).
Cyganiak, R, & Jentzsch, A. (2014). Linking open data cloud. http://lod-cloud.net/versions/2014-08-30/lod-cloud.svg.
Dong, H., & Hussain, F.K. (2014). Self-adaptive semantic focused crawler for mining services information discovery. IEEE Transactions on Industrial Informatics, 10(2), 1616–1626. https://doi.org/10.1109/TII.2012.2234472.
Fatima, A., Luca, C., Wilson, G. (2014). User experience and efficiency for semantic search engine. In 2014 International conference on optimization of electrical and electronic equipment (OPTIM) (pp. 924–929). IEEE.
Freitas, A., Curry, E., Oliveira, J.G., Riain, S.O. (2012). Querying heterogeneous datasets on the linked data web: challenges, approaches, and trends. Internet Computing IEEE, 16(1), 24–33.
Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220. https://doi.org/10.1006/knac.1993.1008.
Haase, P., Mathäß, T, Ziller, M. (2010). An evaluation of approaches to federated query processing over linked data. In Proceedings of the 6th international conference on semantic systems 2010 SRC (pp. 5:1–5:9. https://doi.org/10.1145/1839707.1839713.
Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J. (2013). UMBC EBIQUITY-CORE: semantic textual similarity systems (Vol 44). Atlanta.
Jarke, M., Jeusfeld, M., Quix, C. (2014). Data-centric intelligent information integration from concepts to automation. Journal of Intelligent Information Systems, 43(3), 437–462.
Kalja, A, Haav, H.M., Robal, T. (2014). Databases and information systems VIII: selected papers from the eleventh international baltic conference, DB&IS 2014. IOS Press.
Kaufmann, E., & Bernstein, A. (2010). Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4), 377–393. https://doi.org/10.1016/j.websem.2010.06.001.
Kettouch, M.S., Luca, C., Hobbs, M. (2015a). An interlinking approach based on domain recognition for linked data. In 2015 IEEE 13th International conference on industrial informatics (INDIN) (pp. 488–491). IEEE.
Kettouch, M.S., Luca, C., Hobbs, M., Fatima, A. (2015b). Data integration approach for semi-structured and structured data (linked data). In 2015 IEEE 13th international conference on industrial informatics (INDIN) (pp. 820–825). IEEE.
Kettouch, MS, Luca, C, Hobbs, M, Dascalu, S. (2017). Using semantic similarity for schema matching of semi-structured and linked data. In 2017 Internet technologies and applications (ITA) (pp. 128–133). https://doi.org/10.1109/ITECHA.2017.8101923.
Koffina, I., Serfiotis, G., Christophides, V., Tannen, V. (2006). Mediating RDF/S queries to relational and XML sources. International Journal on Semantic Web and Information Systems, 2(4), 68–92.
Le-Phuoc, D., Nguyen-Mau, H.Q., Parreira, J.X., Hauswirth, M. (2012). A middleware framework for scalable management of linked streams. Web Semantics: Science, Services and Agents on the World Wide Web, 16, 42–51. https://doi.org/10.1016/j.websem.2012.06.003.
Lopez, V., Uren, V., Motta, E., Pasin, M. (2007). AquaLog: an ontology-driven question answering system for organizational semantic intranets. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2), 72–105. https://doi.org/10.1016/j.websem.2007.03.003.
Lopez, V., Fernández, M, Motta, E., Stieler, N. (2011). Poweraqua: supporting users in querying and exploring the semantic web. Semantic Web, 3(3), 249–265.
Lopez, V., Unger, C., Cimiano, P., Motta, E. (2013). Evaluating question answering over linked data. Web Semantics: Science, Services and Agents on the World Wide Web, 21, 3–13. https://doi.org/10.1016/j.websem.2013.05.006.
Macura, M. (2014). Integration of data from heterogeneous sources. Computer Science, 15(2), 109–132.
Morbidoni, C., Le Phuoc, D., Polleres, A., Samwald, M., Tummarello, G. (2008). The semantic web: research and applications, lecture notes in computer science Vol. 5021. Berlin: Springer.
Nguyen, K., Ichise, R., Le, B. (2012). SLINT: a schema-independent linked data interlinking system. Ontology Matching, 1–12.
Pánek, O. (2015). Integration of heterogeneous data sources based on a catalog of master entities. Diploma thesis, Czech Technical University, in Prague.
Pfaff, M., & Krcmar, H. (2014). Semantic integration of semi-structured distributed data in the domain of IT benchmarking - towards a domain specific ontology. In Proceedings of the 16th international conference on enterprise information systems (pp. 320–324). https://doi.org/10.5220/0004969303200324.
Ramis, B, Gonzalez, L, Iarovyi, S, Lobov, A, Martinez Lastra, J, Vyatkin, V, Dai, W. (2014). Knowledge-based web service integration for industrial automation. In EEE International conference on industrial informatics (pp. 733–739). IEEE, https://doi.org/10.1109/INDIN.2014.6945604.
Talukdar, P.P., Ives, Z.G., Pereira, F. (2010). Automatically incorporating new sources in keyword search-based data integration. In Proceedings of the 2010 ACM SIGMOD international conference on management of data (pp. 387–398). ACM.
Umbrich, J. (2012). A hybrid framework for querying linked data dynamically. PhD thesis.
Usbeck, R, Ngonga Ngomo, A, Bühmann, L, Unger, C. (2015). Hawk-hybrid question answering over linked data. In 12th extended semantic web conference.
Verborgh, R., Steiner, T., Van de Walle, R., Gabarro, J. (2015). Linked data and linked apis: similarities, differences, and challenges. In Simperl, E, Norton, B, Mladenic, D, Della Valle, E, Fundulaki, I, Passant, A, Troncy, R (Eds.) The semantic web: ESWC 2012 satellite events (pp. 272–284). Berlin: Springer.
Vincini, M., Beneventano, D., Bergamaschi, S. (2013). Semantic integration of heterogeneous data sources in the momis data transformation system. Journal of Universal Computer Science, 19(13), 1986–2012.
Zhao, L, & Ichise, R. (2013). Integrating ontologies using ontology learning approach. https://doi.org/10.1587/transinf.E96.D.40.
Ziegler, P, & Dittrich, K.R. (2007). Data integration-problems, approaches, and perspectives. In Conceptual modelling in information systems engineering (pp. 39–58). Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kettouch, M., Luca, C. & Hobbs, M. SemiLD: mediator-based framework for keyword search over semi-structured and linked data. J Intell Inf Syst 52, 311–335 (2019). https://doi.org/10.1007/s10844-018-0536-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-018-0536-1