Skip to main content
Log in

Incremental entity resolution process over query results for data integration systems

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Entity Resolution (ER) in data integration systems is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query-time. This task becomes even more challenging in scenarios with dynamic data sources or when a large volume of data needs to be integrated. Then, to deal with large volumes of data, new ER solutions have been proposed. One possible approach consists in performing the ER process over query results rather than in the whole set of tuples being integrated. Additionally, previous results of ER tasks can be reused in order to reduce the number of comparisons between pairs of tuples at query-time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an incremental ER process over query results. The contributions of this work are the specification, the implementation and the evaluation of the proposed incremental process. We performed some experiments and we concluded that the incremental ER at query-time is more efficient than traditional ER processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. CD: http://hpi.de/naumann/projects/repeatability/datasets/cd-datasets.html

  2. FreeDB: http://freedb.org

  3. Cora: https://hpi.de/de/naumann/projects/data-quality-and-cleansing/dude-duplicate-detection.html#c114715

  4. Febrl: http://sourceforge.net/projects/febrl/

References

Download references

Acknowledgements

The authors thank Center of Informatics at Federal University of Pernambuco, Brazil, for the infrastructure for development of this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Priscilla Kelly Machado Vieira.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vieira, P.K.M., Lóscio, B.F. & Salgado, A.C. Incremental entity resolution process over query results for data integration systems. J Intell Inf Syst 52, 451–471 (2019). https://doi.org/10.1007/s10844-019-00544-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-019-00544-1

Keywords

Navigation