A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information

Sapena, Emili; Padró, Lluís; Turmo, Jordi

doi:10.1007/978-3-540-85287-2_41

Emili Sapena²,
Lluís Padró² &
Jordi Turmo²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1450 Accesses
1 Citations

Abstract

This paper presents a method for Entity Disambiguation in Information Extraction from different sources in the web. Once entities and relations between them are extracted, it is needed to determine which ones are referring to the same real-world entity. We model the problem as a graph partitioning problem in order to combine the available information more accurately than a pairwise classifier. Moreover, our method handle uncertain information which turns out to be quite helpful. Two algorithms are trained and compared, one probabilistic and the other deterministic. Both are tuned using genetic algorithms to find the best weights for the set of constraints. Experiments show that graph-based modeling yields better results using uncertain information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 127–138. ACM Press, New York (1995)
Chapter Google Scholar
Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI (2003)
Google Scholar
Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 39–48. ACM Press, New York (2003)
Google Scholar
Han, H., Giles, L., Li, H.Z.C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305 (2004)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 169–178. ACM Press, New York (2000)
Chapter Google Scholar
Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD 2004: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, Paris, France, pp. 11–18. ACM Press, New York (2004)
Chapter Google Scholar
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Processing (NIPS) (2002)
Google Scholar
Doan, A., Lu, Y., Lee, Y., Han, J.: Profile-based object matching for information integration. IEEE Intelligent Systems 18(5), 54–59 (2003)
Article Google Scholar
Shen, W., Li, X., Doan, A.: Constraint-based entity matching. In: Proceedings of AAAI (2005)
Google Scholar
Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006, pp. 572–582. IEEE Computer Society, Washington (2006)
Google Scholar
Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 257–258. ACM, New York (2005)
Google Scholar
Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 334–343. ACM, New York (2005)
Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1(1), 5 (2007)
Article Google Scholar
Wang, C., Lu, J., Zhang, G.: A constrained clustering approach to duplicate detection among relational data. In: Advances in Knowledge Discovery and Data Mining, pp. 308–319 (2007)
Google Scholar
Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL 2007: Proceedings of the 7th ACM/IEEE joint conference on Digital libraries, pp. 204–213. ACM, New York (2007)
Chapter Google Scholar
Sapena, E., Padró, L., Turmo, J.: Alias assigment in information extraction. In: Proceedings of SEPLN-2007, Sevilla, Spain (2007)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)
MATH Google Scholar
Pelillo, M., Abbattista, F., Maffione, A.: An evolutionary approach to training relaxation labeling processes. Pattern Recogn. Lett. 16(10), 1069–1078 (1995)
Article Google Scholar
Rosenfeld, R., Hummel, R.A., Zucker, S.W.: Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6(6), 420–433 (1976)
Article MathSciNet MATH Google Scholar
Màrquez, L., Padró, L., Rodríguez, H.: A machine learning approach for pos tagging. Machine Learning Journal 39(1), 59–91 (2000)
Article MATH Google Scholar
Atserias, J.: Towards Robustness in Natural Language Understanding. Ph.D. Thesis, Dept. Lenguajes y Sistemas Informáticos. Euskal Herriko Unibertsitatea. Donosti. Spain (2006)
Google Scholar
Comellas, F., Ozon, J.: An ant algorithm for the graph colouring problem. In: ANTS 1998 - From Ant Colonies to Artificial Ants: First international workshop on ant colony optimization, Brussels (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

TALP Research Center, Universitat Politecnica de Catalunya, Barcelona, Spain
Emili Sapena, Lluís Padró & Jordi Turmo

Authors

Emili Sapena
View author publications
You can also search for this author in PubMed Google Scholar
Lluís Padró
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Turmo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sapena, E., Padró, L., Turmo, J. (2008). A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics