Extracting Key Phrases to Disambiguate Personal Names on the Web

Bollegala, Danushka; Matsuo, Yutaka; Ishizuka, Mitsuru

doi:10.1007/11671299_24

Danushka Bollegala¹⁷,
Yutaka Matsuo¹⁸ &
Mitsuru Ishizuka¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1376 Accesses
7 Citations

Abstract

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andritsos, P., Miller, R.J., Tsapars, P.: Information-theoretic tools for mining database structure from large data sets. In: Proceedings of the ACM SIGMOD Conference (2004)
Google Scholar
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of COLING, pp. 79–85 (1998)
Google Scholar
Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using word net. In: Proceedings of the third international conference on computational linguistics and intelligent text processing, pp. 136–145 (2002)
Google Scholar
Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proceedings of the 14th international conference on World Wide Web, pp. 463–470 (2005)
Google Scholar
Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: 16th Conference on Computational Lingustics, pp. 41–46 (1996)
Google Scholar
Frantzi, K., Ananiadou, S.: The c-value/nc-value domain independent method for multi-word term extraction. Journal of Natural Language Processing 6(3), 145–179 (1999)
Google Scholar
Hernandez, M., Stolfo, S.: The merge/purge problem for large databases. In: SIGMOD Conference, pp. 127–138 (1995)
Google Scholar
Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics, 65–5 (2001)
Google Scholar
Li, X., Morie, P., Roth, D.: Semantic integration in text, from ambiguous names to identifiable entities. AI Magazine, American Association for Artificial Intelligence, pp. 45–58 (Spring 2005)
Google Scholar
Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of CoNLL-2003, pp. 33–40 (2003)
Google Scholar
McCallum, A., Wellner, B.: Toward conditional models of identity uncertainty with application to proper noun coreference. In: IJCAI Workshop on Information Integration on the Web (2003)
Google Scholar
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), pp. 279–286 (2004)
Google Scholar
Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (2005)
Google Scholar
Sahami, M., Heilman, T.: A web-based kernel function for matching short text snippets. In: International Workshop located at the 22nd International Conference on Machine Learning, ICML 2005 (2005)
Google Scholar
Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Tokyo,
Danushka Bollegala & Mitsuru Ishizuka
AIST,
Yutaka Matsuo

Authors

Danushka Bollegala
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Matsuo
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuru Ishizuka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bollegala, D., Matsuo, Y., Ishizuka, M. (2006). Extracting Key Phrases to Disambiguate Personal Names on the Web. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_24

Download citation

DOI: https://doi.org/10.1007/11671299_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics