Skip to main content
Log in

SemiLD: mediator-based framework for keyword search over semi-structured and linked data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Linked Data initiative has completely changed the procedure of sharing knowledge over the Web. It primarily aimed at improving the interoperability and semantics of the data published, by following a set of recommendations. Still, many data sources, which have a significant value, have not migrated to this new data space and continue to publish semi-structured data. Thus, new challenges arise in accessing and integrating the two data sources and models. This paper explores and identifies some of the major challenges, such as the continuous expansion and dynamism of a heterogeneous and an autonomous yet connected web of data, and addresses them by proposing SemiLD, a mediator-based framework to integrate on-the-fly heterogeneous semi-structured and Linked Data sources. The approach is implemented into a highly automated keyword search system that retrieves its input from various SPARQL endpoints and web APIs. The evaluation of the system illustrates the high precision, performance and recall of the contributed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://DBpedia.org

  2. http://LinkedMDB.org

  3. http://OMDB.com

  4. http://TMDB.org

References

  • Abelló, A, de Palol, X., Hacid, M.S. (2018). Approximating the schema of a set of documents by means of resemblance. Journal on Data Semantics, 7(2), 87–105. https://doi.org/10.1007/s13740-018-0088-0.

    Article  Google Scholar 

  • Bergamaschi, S., Domnori, E., Guerra, F., Orsini, M., Lado, R.T., Velegrakis, Y. (2010). Keymantic: semantic keyword-based searching in data integration systems. Proceedings of the VLDB Endowment, 3(1-2), 1637–1640.

    Article  Google Scholar 

  • Berners-Lee, T. (1999). Weaving the Web. Harper.

  • Berners-Lee, T. (2006). Linked data. http://www.w3.org/DesignIssues/LinkedData.html. Accessed: 04 Jan 2016.

  • Bizer, C., Heath, T., Berners-Lee, T. (2009). Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.

    Article  Google Scholar 

  • Cai, Q, & Yates, A. (2013). Large-scale semantic parsing via schema matching and lexicon extension. In ACL (1) (pp. 423–433). Citeseer.

  • Calì, A, Calvanese, D, De Giacomo, G, Lenzerini, M. (2004). Data integration under integrity constraints. Information Systems, 29(2), 147–163. https://doi.org/10.1016/S0306-4379(03)00050-4.

    Article  MATH  Google Scholar 

  • Ciobanu, G., Horne, R., Sassone, V. (2015). Minimal type inference for linked data consumers. Journal of Logical and Algebraic Methods in Programming, 84(4), 485–504. https://doi.org/10.1016/j.jlamp.2014.12.005.

    Article  MathSciNet  MATH  Google Scholar 

  • Collarana, D, Lange, C, Auer, S, Grangel-González, I. (2016). Fuhsen: a platform for federated, rdf-based hybrid search. In The 16th international conference on web engineering (ICWE2016).

  • Cyganiak, R, & Jentzsch, A. (2014). Linking open data cloud. http://lod-cloud.net/versions/2014-08-30/lod-cloud.svg.

  • Dong, H., & Hussain, F.K. (2014). Self-adaptive semantic focused crawler for mining services information discovery. IEEE Transactions on Industrial Informatics, 10(2), 1616–1626. https://doi.org/10.1109/TII.2012.2234472.

    Article  Google Scholar 

  • Fatima, A., Luca, C., Wilson, G. (2014). User experience and efficiency for semantic search engine. In 2014 International conference on optimization of electrical and electronic equipment (OPTIM) (pp. 924–929). IEEE.

  • Freitas, A., Curry, E., Oliveira, J.G., Riain, S.O. (2012). Querying heterogeneous datasets on the linked data web: challenges, approaches, and trends. Internet Computing IEEE, 16(1), 24–33.

    Article  Google Scholar 

  • Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220. https://doi.org/10.1006/knac.1993.1008.

    Article  Google Scholar 

  • Haase, P., Mathäß, T, Ziller, M. (2010). An evaluation of approaches to federated query processing over linked data. In Proceedings of the 6th international conference on semantic systems 2010 SRC (pp. 5:1–5:9. https://doi.org/10.1145/1839707.1839713.

  • Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J. (2013). UMBC EBIQUITY-CORE: semantic textual similarity systems (Vol 44). Atlanta.

  • Jarke, M., Jeusfeld, M., Quix, C. (2014). Data-centric intelligent information integration from concepts to automation. Journal of Intelligent Information Systems, 43(3), 437–462.

    Article  Google Scholar 

  • Kalja, A, Haav, H.M., Robal, T. (2014). Databases and information systems VIII: selected papers from the eleventh international baltic conference, DB&IS 2014. IOS Press.

  • Kaufmann, E., & Bernstein, A. (2010). Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4), 377–393. https://doi.org/10.1016/j.websem.2010.06.001.

    Article  Google Scholar 

  • Kettouch, M.S., Luca, C., Hobbs, M. (2015a). An interlinking approach based on domain recognition for linked data. In 2015 IEEE 13th International conference on industrial informatics (INDIN) (pp. 488–491). IEEE.

  • Kettouch, M.S., Luca, C., Hobbs, M., Fatima, A. (2015b). Data integration approach for semi-structured and structured data (linked data). In 2015 IEEE 13th international conference on industrial informatics (INDIN) (pp. 820–825). IEEE.

  • Kettouch, MS, Luca, C, Hobbs, M, Dascalu, S. (2017). Using semantic similarity for schema matching of semi-structured and linked data. In 2017 Internet technologies and applications (ITA) (pp. 128–133). https://doi.org/10.1109/ITECHA.2017.8101923.

  • Koffina, I., Serfiotis, G., Christophides, V., Tannen, V. (2006). Mediating RDF/S queries to relational and XML sources. International Journal on Semantic Web and Information Systems, 2(4), 68–92.

    Article  Google Scholar 

  • Le-Phuoc, D., Nguyen-Mau, H.Q., Parreira, J.X., Hauswirth, M. (2012). A middleware framework for scalable management of linked streams. Web Semantics: Science, Services and Agents on the World Wide Web, 16, 42–51. https://doi.org/10.1016/j.websem.2012.06.003.

    Article  Google Scholar 

  • Lopez, V., Uren, V., Motta, E., Pasin, M. (2007). AquaLog: an ontology-driven question answering system for organizational semantic intranets. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2), 72–105. https://doi.org/10.1016/j.websem.2007.03.003.

    Article  Google Scholar 

  • Lopez, V., Fernández, M, Motta, E., Stieler, N. (2011). Poweraqua: supporting users in querying and exploring the semantic web. Semantic Web, 3(3), 249–265.

    Google Scholar 

  • Lopez, V., Unger, C., Cimiano, P., Motta, E. (2013). Evaluating question answering over linked data. Web Semantics: Science, Services and Agents on the World Wide Web, 21, 3–13. https://doi.org/10.1016/j.websem.2013.05.006.

    Article  Google Scholar 

  • Macura, M. (2014). Integration of data from heterogeneous sources. Computer Science, 15(2), 109–132.

    Article  Google Scholar 

  • Morbidoni, C., Le Phuoc, D., Polleres, A., Samwald, M., Tummarello, G. (2008). The semantic web: research and applications, lecture notes in computer science Vol. 5021. Berlin: Springer.

    Google Scholar 

  • Nguyen, K., Ichise, R., Le, B. (2012). SLINT: a schema-independent linked data interlinking system. Ontology Matching, 1–12.

  • Pánek, O. (2015). Integration of heterogeneous data sources based on a catalog of master entities. Diploma thesis, Czech Technical University, in Prague.

  • Pfaff, M., & Krcmar, H. (2014). Semantic integration of semi-structured distributed data in the domain of IT benchmarking - towards a domain specific ontology. In Proceedings of the 16th international conference on enterprise information systems (pp. 320–324). https://doi.org/10.5220/0004969303200324.

  • Ramis, B, Gonzalez, L, Iarovyi, S, Lobov, A, Martinez Lastra, J, Vyatkin, V, Dai, W. (2014). Knowledge-based web service integration for industrial automation. In EEE International conference on industrial informatics (pp. 733–739). IEEE, https://doi.org/10.1109/INDIN.2014.6945604.

  • Talukdar, P.P., Ives, Z.G., Pereira, F. (2010). Automatically incorporating new sources in keyword search-based data integration. In Proceedings of the 2010 ACM SIGMOD international conference on management of data (pp. 387–398). ACM.

  • Umbrich, J. (2012). A hybrid framework for querying linked data dynamically. PhD thesis.

  • Usbeck, R, Ngonga Ngomo, A, Bühmann, L, Unger, C. (2015). Hawk-hybrid question answering over linked data. In 12th extended semantic web conference.

  • Verborgh, R., Steiner, T., Van de Walle, R., Gabarro, J. (2015). Linked data and linked apis: similarities, differences, and challenges. In Simperl, E, Norton, B, Mladenic, D, Della Valle, E, Fundulaki, I, Passant, A, Troncy, R (Eds.) The semantic web: ESWC 2012 satellite events (pp. 272–284). Berlin: Springer.

  • Vincini, M., Beneventano, D., Bergamaschi, S. (2013). Semantic integration of heterogeneous data sources in the momis data transformation system. Journal of Universal Computer Science, 19(13), 1986–2012.

    Google Scholar 

  • Zhao, L, & Ichise, R. (2013). Integrating ontologies using ontology learning approach. https://doi.org/10.1587/transinf.E96.D.40.

  • Ziegler, P, & Dittrich, K.R. (2007). Data integration-problems, approaches, and perspectives. In Conceptual modelling in information systems engineering (pp. 39–58). Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Kettouch.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kettouch, M., Luca, C. & Hobbs, M. SemiLD: mediator-based framework for keyword search over semi-structured and linked data. J Intell Inf Syst 52, 311–335 (2019). https://doi.org/10.1007/s10844-018-0536-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-018-0536-1

Keywords

Navigation