Metadata Reconciliation for Improved Data Binding and Integration

  • Hiba KhalidEmail author
  • Esteban Zimanyi
  • Robert Wrembel
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 928)


Data Integration has been a consistent concern in the Linked Open Data (LOD) research. The data integration problem (DIP) depends upon many factors. Primarily the nature and type of datasets guide the integration process. Every day, the demand for open and improved data visualization is increasing. Organizations, researchers and data scientists all require more improved techniques for data integration that can be used for analytics and predictions. The scientific community has been able to construct meaningful solutions by using the power of metadata. The metadata is powerful if it is properly guided. There are several existing methodologies that improve system semantics using metadata. However, the data integration between heterogeneous resources for example structured and unstructured data is still a far fetched reality. Metadata can not only improve but effectively increase semantic search performance if properly reconciled with the available information or standard data. In this paper, we present a metadata reconciliation strategy for improving data integration and data classification between data sources that correspond to a certain standard of similarity. The data similarity can be deployed as a power tool for linked data operations. The data publishing and connection over the LOD can effectively be improved using reconciliation strategies. In this paper, we also briefly define the procedure of reconciliation that can semi-automate the interlinking and validation process for publishing linked data as an integrated resource.


Metadata Data reconciliation Metadata reconciliation Open refine Data integration Fuzzy matching Semantic metadata 



This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC).


  1. 1.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Alg. 50(2), 257–275 (2004)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Fetahu, B., Anand, A., Anand, A.: How much is Wikipedia lagging behind news? In: Proceedings of the ACM Web Science Conference, p. 28. ACM (2015)Google Scholar
  3. 3.
    Georgescu, M., Kanhabua, N., Krause, D., Nejdl, W., Siersdorfer, S.: Extracting event-related information from article updates in Wikipedia. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 254–266. Springer, Heidelberg (2013). Scholar
  4. 4.
    Ho, T., Oh, S.R., Kim, H.: A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations. PloS One 12(10), e0186251 (2017)CrossRefGoogle Scholar
  5. 5.
    Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Seman. Web 6(2), 167–195 (2015)Google Scholar
  6. 6.
    Morsey, M., Lehmann, J., Auer, S., Stadler, C., Hellmann, S.: DBpedia and the live extraction of structured data from Wikipedia. Program 46(2), 157–181 (2012)CrossRefGoogle Scholar
  7. 7.
    Ochs, C., Tian, T., Geller, J., Chun, S.A.: Google knows who is famous today-building an ontology from search engine knowledge and DBpedia. In: 2011 Fifth IEEE International Conference on Semantic Computing (ICSC), pp. 320–327. IEEE (2011)Google Scholar
  8. 8.
    Zhu, X., Wang, B.: Web service management based on Hadoop. In: 2011 8th International Conference on Service Systems and Service Management (ICSSSM), pp. 1–6. IEEE (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University Libre de BruxellesBrusselsBelgium
  2. 2.Poznan University of TechnologyPoznanPoland

Personalised recommendations