Skip to main content

Entity Resolution

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Entity matching; Object deduplication; Record linkage; Reference reconciliation

Definition

Let \( \mathcal {E} \) denote a set of entities in a domain, described using a set of attributes \( {\mathcal {A}} \). Each entity \( E \in {\mathcal {E}} \) is associated with zero, one, or more values for each attribute \( A \in {\mathcal {A}} \). For each entity in \( {\mathcal {E}} \), there can be a set of records \( \mathcal {R} \), provided by one or more sources over the attributes \( {\mathcal {A}} \), where each record provides at most one value for an attribute. We consider atomic values (string, number, date, time, etc.) as attribute values, and allow multiple representations of the same value, as well as erroneous values, in records. Entity resolution takes as input the records provided by the sources and decides which records refer to the same entity; in particular, it computes a partitioning \( {\mathcal {P}} \) of \( {\mathcal {R}} \), such that records in each partition...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. Science. 1959;130(3381):954–59.

    Article  Google Scholar 

  2. Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64(328):1183–210.

    Article  MATH  Google Scholar 

  3. Cohen WW. Integration of heterogeneous databases without common domains using queries based on textual similarity. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 201–12.

    Google Scholar 

  4. Cohen WW, Ravikumar P, Fienberg SE. A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 3rd International Workshop on Information Integration on the Web; 2003. p. 73–8.

    Google Scholar 

  5. Hernandez MA, Stolfo SJ. The merge/purge problem for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1995. p. 127–38.

    Google Scholar 

  6. Winkler WE. Using the EM algorithm for weight computation in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods; 1988. p. 667–71.

    Google Scholar 

  7. Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 269–78.

    Google Scholar 

  8. Dey D. Entity matching in heterogeneous databases: a logistic regression approach. Decis Support Syst. 2008;44(3):740–47.

    Article  Google Scholar 

  9. Hassanzadeh O, Chiang F, Miller RJ, Lee HC. Framework for evaluating clustering algorithms in duplicate detection. Proc. VLDB Endowment. 2009;2(1):1282–293.

    Article  Google Scholar 

  10. Dong X, Halevy AY, Madhavan J. Reference reconciliation in complex information spaces. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 85–96.

    Google Scholar 

  11. Chaudhuri S, Sarma AD, Ganti V, Kaushik R. Leveraging aggregate constraints for deduplication. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 437–48.

    Google Scholar 

  12. Guo S, Dong XL, Srivastava D, Zajac R. Record linkage with uniqueness constraints and erroneous values. Proc. VLDB Endowment. 2010;3(1):417–28.

    Article  Google Scholar 

  13. Li P, Dong XL, Maurino A, Srivastava D. Linking temporal records. Proc. VLDB Endowment. 2011;4(11):956–67.

    MATH  Google Scholar 

  14. McCallum AK, Nigam K, Ungar LH. Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000. p. 169–78.

    Google Scholar 

  15. Kolb L, Thor A, Rahm E. Load balancing for MapReduce-based entity resolution. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 618–29.

    Google Scholar 

  16. Gruenheid A, Dong XL, Srivastava D. Incremental record linkage. Proc. VLDB Endowment. 2014;7(9):697–708.

    Article  Google Scholar 

  17. Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proc. VLDB Endowment. 2009;2(1):407–18.

    Article  Google Scholar 

  18. Bansal N, Blum A, Chawla S. Correlation clustering. In: Proceedings of the 19th International Conference on Machine Learning; 2002. p. 238–47.

    Google Scholar 

  19. Baxter R, Christen P, Churches T. A comparison of fast blocking methods for record linkage. In: Proceedings of the ACM SIGKDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation; 2003. p. 253–68.

    Google Scholar 

  20. Kopcke H, Thor A, Rahm E. Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endowment. 2010;3(1):484–93.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Luna Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Dong, X.L., Srivastava, D. (2018). Entity Resolution. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_2547

Download citation

Publish with us

Policies and ethics