Skip to main content

Record Matching

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Deduplication in Data Cleaning; Duplicate detection; Entity resolution; Instance identification; Merge-purge; Name matching; Record linkage

Definition

Record matching is the problem of identifying whether two records in a database refer to the same real-world entity. For example, in Fig. 1, the customer record A1 in Table A and record B1 in Table B probably refer to the same customer, and should therefore be matched. (The example in Fig. 1 was adapted from an example in [21].) As Fig. 1 suggests, the same entity can be encoded in different ways in a database; this phenomenon is fairly common and occurs due to a variety of natural reasons such as different formatting conventions, abbreviations, and typographic errors. Record matching is often studied in the following setting: Given two relations A and B, identify all pairs of matching records, one from each relation. For the two tables in Fig. 1, a reasonable output might be the pairs (A1, B1) and (A2, B2). In some settings of...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Arasu A, Chaudhuri S, Kaushik R Transformation-based framework for record matching. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 40–9.

    Google Scholar 

  2. Arasu A, Ganti V, Kaushik R. Efficient exact set-similarity joins. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 918–29.

    Google Scholar 

  3. Bilenko M, Mooney, RJ. Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004. p. 39–48.

    Google Scholar 

  4. Chaudhuri S, Chen B.C, Ganti V, Kaushik R. Example-driven design of efficient record matching queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 327–38.

    Google Scholar 

  5. Chaudhuri S, Ganjam K, Ganti V, Motwani R. Robust and efficient fuzzy match for online data cleaning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2003. p. 313–24.

    Google Scholar 

  6. Chaudhuri S, Ganti V, Kaushik R. A primitive operator for similarity joins in data cleaning. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.

    Google Scholar 

  7. Cochinwala M, Kurien V, Lalk G, Shasha D. Efficient data reconciliation. Inf Sci. 2001;137(1–4):1–15.

    Article  MATH  Google Scholar 

  8. Cohen WW. Data integration using similarity joins and a word-based information representation language. ACM Trans Inf Syst. 2000;18(3):288–321.

    Article  Google Scholar 

  9. Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: a survey. IEEE Trans Knowl Data Eng. 2007;19(1):1–16.

    Article  Google Scholar 

  10. Felligi IP, Sunter AB. A theory for record linkage. J Am Stat Soc. 1969;64(328):1183–210.

    Article  Google Scholar 

  11. Hernandez M, Stolfo S. The merge/purge problem for large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1995. p. 127–38.

    Article  Google Scholar 

  12. Jaro MA. Unimatch: a record linkage system: user’s manual. Technical Report. Washington, DC: US Bureau of the Census; 1976.

    Google Scholar 

  13. Jaro MA. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa. Florida J Am Stat Assoc. 1989;84(406):414–20.

    Article  Google Scholar 

  14. Koudas N, Sarawagi S, Srivastava D. Record linkage: similarity measures and algorithms. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 802–3.

    Google Scholar 

  15. McCallum A, Nigam K, Ungar LH. Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000. p. 169–78.

    Google Scholar 

  16. Newcombe HB, Kennedy JM, Axford SJ, James AP. Automatic linkage of vital records. Science. 1959;130(3381):954–9.

    Article  Google Scholar 

  17. Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 269–78.

    Google Scholar 

  18. Sarawagi S, Kirpal A. Efficient set joins on similarity predicates. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 743–54.

    Google Scholar 

  19. Torra V, Domingo-Ferrer J. Record linkage methods for multidatabase data mining. In: Torra V, editor. Information fusion in data mining. Springer; 2003. p. 101–32.

    Google Scholar 

  20. Winkler W. Improved decision rules in the felligi-sunter model of record linkage. Technical Report. Washington, DC: Statistical Research Division/US Bureau of the Census; 1993.

    Google Scholar 

  21. Winkler W. The state of record linkage and current research problems. Technical Report. Washington, DC: Statistical Research Division/US Bureau of the Census; 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arvind Arasu .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Arasu, A., Domingo-Ferrer, J. (2018). Record Matching. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_594

Download citation

Publish with us

Policies and ethics