Probabilistic Entity Resolution
Deduplication; Linkage; Matching
Entity Resolution is the task of analyzing a collection of data (e.g., database, data set) in order to create entities by merging the data instances that describe the same real-world objects. Uncertain entity resolution is a group of resolution methodologies focusing on handling the uncertainties that are present either in the data or are generated during the resolution process.
The fundamental component of resolution techniques is an instance that provides some characteristic of a real-world object. An instance is a tuple with k attributes 〈v1, …, vk〉, with each attribute being one characteristic of the corresponding object. Consider now a collection of instances. The goal of resolution is to detect the instances that describe the same real-world objects and merge them into entities, i.e., create entity e for representing instances r1, r2, and r3.
The initial resolution approaches focused on handling the...
- 1.Andritsos P, Fuxman A, Miller R. Clean answers over dirty databases: a probabilistic approach. In: Proceedings of the 22nd International Conference on Data Engineering; 2006.Google Scholar
- 3.Dong XL, Halevy A, Yu C. Data integration with uncertainty. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 687–98.Google Scholar
- 6.Ioannou E, Staworko S. Management of inconsistencies in data integration. In: Data exchange, integration, and streams. 2013. p. 217–25.Google Scholar
- 7.Re C, Dalvi N, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.Google Scholar