Probabilistic matching differs from the simplest data matching technique, deterministic matching. For deterministic matching, two records are said to match if one or more identifiers are identical. Deterministic record linkage is a good option when the entities in the data sets have identified common identifiers with a relatively high quality of data. Probabilistic matching is a statistical approach in measuring the probability that two records represent the same subject or individual based on whether they agree or disagree on the various identifiers (Dusetzina et al. 2014).
It calculates linkage composite weights based on likeness scores for identifier values and uses thresholds to determine a match, nonmatch, or possible match. The quality of resulting matches can depend upon one’s confidence in the specification of the matching rules (Zhang and Stevens 2012). It is designed to work using a wider set of data elements and all available identifiers for matching...
- Dusetzina, S. B., Tyree, S., Meyer, A. M., et al. (2014). Linking data for health services research: A framework and instructional guide. Rockville: Agency for Healthcare Research and Quality (US).Google Scholar
- Schumacher, S. (2007). Probabilistic versus deterministic data matching: Making an accurate decision, information management special reports. Washington, DC: The Office of the National Coordinator for Health Information Technology (ONC).Google Scholar
- Winkler, W. E. (1999). The state of record linkage and current research problems. Washington, DC: Statistical Research Division, US Census Bureau.Google Scholar