Abstract
This paper presents a method for Entity Disambiguation in Information Extraction from different sources in the web. Once entities and relations between them are extracted, it is needed to determine which ones are referring to the same real-world entity. We model the problem as a graph partitioning problem in order to combine the available information more accurately than a pairwise classifier. Moreover, our method handle uncertain information which turns out to be quite helpful. Two algorithms are trained and compared, one probabilistic and the other deterministic. Both are tuned using genetic algorithms to find the best weights for the set of constraints. Experiments show that graph-based modeling yields better results using uncertain information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 127–138. ACM Press, New York (1995)
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI (2003)
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 39–48. ACM Press, New York (2003)
Han, H., Giles, L., Li, H.Z.C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305 (2004)
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 169–178. ACM Press, New York (2000)
Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD 2004: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, Paris, France, pp. 11–18. ACM Press, New York (2004)
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Processing (NIPS) (2002)
Doan, A., Lu, Y., Lee, Y., Han, J.: Profile-based object matching for information integration. IEEE Intelligent Systems 18(5), 54–59 (2003)
Shen, W., Li, X., Doan, A.: Constraint-based entity matching. In: Proceedings of AAAI (2005)
Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006, pp. 572–582. IEEE Computer Society, Washington (2006)
Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 257–258. ACM, New York (2005)
Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 334–343. ACM, New York (2005)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1(1), 5 (2007)
Wang, C., Lu, J., Zhang, G.: A constrained clustering approach to duplicate detection among relational data. In: Advances in Knowledge Discovery and Data Mining, pp. 308–319 (2007)
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL 2007: Proceedings of the 7th ACM/IEEE joint conference on Digital libraries, pp. 204–213. ACM, New York (2007)
Sapena, E., Padró, L., Turmo, J.: Alias assigment in information extraction. In: Proceedings of SEPLN-2007, Sevilla, Spain (2007)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)
Pelillo, M., Abbattista, F., Maffione, A.: An evolutionary approach to training relaxation labeling processes. Pattern Recogn. Lett. 16(10), 1069–1078 (1995)
Rosenfeld, R., Hummel, R.A., Zucker, S.W.: Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6(6), 420–433 (1976)
Màrquez, L., Padró, L., Rodríguez, H.: A machine learning approach for pos tagging. Machine Learning Journal 39(1), 59–91 (2000)
Atserias, J.: Towards Robustness in Natural Language Understanding. Ph.D. Thesis, Dept. Lenguajes y Sistemas Informáticos. Euskal Herriko Unibertsitatea. Donosti. Spain (2006)
Comellas, F., Ozon, J.: An ant algorithm for the graph colouring problem. In: ANTS 1998 - From Ant Colonies to Artificial Ants: First international workshop on ant colony optimization, Brussels (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sapena, E., Padró, L., Turmo, J. (2008). A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)