Skip to main content

Extracting Key Phrases to Disambiguate Personal Names on the Web

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Abstract

When you search for information regarding a particular person on the web, a search engine returns many pages. Some of these pages may be for people with the same name. How can we disambiguate these different people with the same name? This paper presents an unsupervised algorithm which produces key phrases for the different people with the same name. These key phrases could be used to further narrow down the search, leading to more person specific unambiguous information. The algorithm we propose does not require any biographical or social information regarding the person. Although there are some previous work in personal name disambiguation on the web, to our knowledge, this is the first attempt to extract key phrases to disambiguate the different persons with the same name. To evaluate our algorithm, we collected and hand labeled a dataset of over 1000 Web pages retrieved from Google using personal name queries. Our experimental results shows an improvement over the existing methods for namesake disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andritsos, P., Miller, R.J., Tsapars, P.: Information-theoretic tools for mining database structure from large data sets. In: Proceedings of the ACM SIGMOD Conference (2004)

    Google Scholar 

  2. Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of COLING, pp. 79–85 (1998)

    Google Scholar 

  3. Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using word net. In: Proceedings of the third international conference on computational linguistics and intelligent text processing, pp. 136–145 (2002)

    Google Scholar 

  4. Bekkerman, R., McCallum, A.: Disambiguating web appearances of people in a social network. In: Proceedings of the 14th international conference on World Wide Web, pp. 463–470 (2005)

    Google Scholar 

  5. Frantzi, K., Ananiadou, S.: Extracting nested collocations. In: 16th Conference on Computational Lingustics, pp. 41–46 (1996)

    Google Scholar 

  6. Frantzi, K., Ananiadou, S.: The c-value/nc-value domain independent method for multi-word term extraction. Journal of Natural Language Processing 6(3), 145–179 (1999)

    Google Scholar 

  7. Hernandez, M., Stolfo, S.: The merge/purge problem for large databases. In: SIGMOD Conference, pp. 127–138 (1995)

    Google Scholar 

  8. Lee, L.: On the effectiveness of the skew divergence for statistical language analysis. Artificial Intelligence and Statistics, 65–5 (2001)

    Google Scholar 

  9. Li, X., Morie, P., Roth, D.: Semantic integration in text, from ambiguous names to identifiable entities. AI Magazine, American Association for Artificial Intelligence, pp. 45–58 (Spring 2005)

    Google Scholar 

  10. Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of CoNLL-2003, pp. 33–40 (2003)

    Google Scholar 

  11. McCallum, A., Wellner, B.: Toward conditional models of identity uncertainty with application to proper noun coreference. In: IJCAI Workshop on Information Integration on the Web (2003)

    Google Scholar 

  12. McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding predominant word senses in untagged text. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), pp. 279–286 (2004)

    Google Scholar 

  13. Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Proceedings of the Sixth International Conference on Intelligent Text Processing and Computational Linguistics (2005)

    Google Scholar 

  14. Sahami, M., Heilman, T.: A web-based kernel function for matching short text snippets. In: International Workshop located at the 22nd International Conference on Machine Learning, ICML 2005 (2005)

    Google Scholar 

  15. Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bollegala, D., Matsuo, Y., Ishizuka, M. (2006). Extracting Key Phrases to Disambiguate Personal Names on the Web. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_24

Download citation

  • DOI: https://doi.org/10.1007/11671299_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics