Abstract
Constraints ubiquitously exist in many real-life applications for entity resolution (ER). However, it is always challenging to effectively specify and efficiently use constraints when performing ER tasks. In particular, not every constraint is equally effective or robust, and using weights to express the “confidences” on constraints becomes a natural choice. In this paper, we study entity resolution (ER) (i.e., the problem of determining which records in a database refer to the same entities) in the presence of weighted constraints. We propose a unified framework that can interweave positive and negative constraints into the ER process, and investigate how effectively and efficiently weighted constraints can be used for generating ER clustering results. Our experimental study shows that using weighted constraints can lead to improved ER quality and scalability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: ICDE, pp. 952–963 (2009)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412–424 (2000)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1-3), 89–113 (2004)
Bhattacharya, I., Getoor, L.: Relational clustering for multi-type entity resolution. In: MRDM, pp. 3–12 (2005)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. TKDD 1(1), 5 (2007)
Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. Journal of Computer and System Sciences 71(3), 360–383 (2005)
Chaudhuri, S., Das Sarma, A., Ganti, V., Kaushik, R.: Leveraging aggregate constraints for deduplication. In: SIGMOD, pp. 437–448 (2007)
Christen, P.: Data Matching. Springer (2012)
Demaine, E.D., Emanuel, D., Fiat, A., Immorlica, N.: Correlation clustering in general weighted graphs. TCS 361(2), 172–187 (2006)
Doan, A., Lu, Y., Lee, Y., Han, J.: Profile-based object matching for information integration. Intelligent Systems 18(5), 54–59 (2003)
Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. In: ACM SIGMOD, pp. 85–96 (2005)
Equitz, W.H.: A new vector quantization clustering algorithm. IEEE Trans. Acoustics, Speech and Signal Processing 37(10), 1568–1575 (1989)
Lee, T., Wang, Z., Wang, H., Hwang, S.-W.: Web scale taxonomy cleansing. PVLDB 4(12) (2011)
Liu, E.Y., Zhang, Z., Wang, W.: Clustering with relative constraints. In: KDD, pp. 947–955 (2011)
Lowd, D., Domingos, P.: Efficient weight learning for markov logic networks. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 200–211. Springer, Heidelberg (2007)
Shen, W., Li, X., Doan, A.: Constraint-based entity matching. In: AAAI, pp. 862–867 (2005)
Singla, P., Domingos, P.: Discriminative training of Markov logic networks. In: AAAI, pp. 868–873 (2005)
Tung, A.K., Han, J., Lakshmanan, L.V., Ng, R.T.: Constraint-based clustering in large databases. In: ICDT, pp. 405–419 (2001)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML (2000)
Wang, F., Wang, H., Li, J., Gao, H.: Graph-based reference table construction to facilitate entity matching. Journal of Systems and Software (2013)
Whang, S.E., Benjelloun, O., Garcia-Molina, H.: Generic entity resolution with negative rules. The VLDB Journal 18(6), 1261–1277 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Shen, Z., Wang, Q. (2014). Entity Resolution with Weighted Constraints. In: Manolopoulos, Y., Trajcevski, G., Kon-Popovska, M. (eds) Advances in Databases and Information Systems. ADBIS 2014. Lecture Notes in Computer Science, vol 8716. Springer, Cham. https://doi.org/10.1007/978-3-319-10933-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-10933-6_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10932-9
Online ISBN: 978-3-319-10933-6
eBook Packages: Computer ScienceComputer Science (R0)