Skip to main content

Probabilistic Matching

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data
  • 66 Accesses

Definition/Introduction

Probabilistic matching differs from the simplest data matching technique, deterministic matching. For deterministic matching, two records are said to match if one or more identifiers are identical. Deterministic record linkage is a good option when the entities in the data sets have identified common identifiers with a relatively high quality of data. Probabilistic matching is a statistical approach in measuring the probability that two records represent the same subject or individual based on whether they agree or disagree on the various identifiers (Dusetzina et al. 2014).

It calculates linkage composite weights based on likeness scores for identifier values and uses thresholds to determine a match, nonmatch, or possible match. The quality of resulting matches can depend upon one’s confidence in the specification of the matching rules (Zhang and Stevens 2012). It is designed to work using a wider set of data elements and all available identifiers for matching...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Further Readings

  • Dusetzina, S. B., Tyree, S., Meyer, A. M., et al. (2014). Linking data for health services research: A framework and instructional guide. Rockville: Agency for Healthcare Research and Quality (US).

    Google Scholar 

  • Fellegi, I. P., & Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64, 1183–1210.

    Article  Google Scholar 

  • Schumacher, S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision, information management special reports. Washington, DC: The Office of the National Coordinator for Health Information Technology (ONC).

    Google Scholar 

  • Winkler, W. E. (1999). The state of record linkage and current research problems. Washington, DC: Statistical Research Division, US Census Bureau.

    Google Scholar 

  • Zhang, T., & Stevens, D. W. (2012). Integrated data system person identification: Accuracy requirements and methods. https://ssrn.com/abstract=2512590; https://doi.org/10.2139/ssrn.2512590.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Zhang, T. (2018). Probabilistic Matching. In: Schintler, L., McNeely, C. (eds) Encyclopedia of Big Data. Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_501-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32001-4_501-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32001-4

  • Online ISBN: 978-3-319-32001-4

  • eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics