Advertisement

Unsupervised Gazette Creation Using Information Distance

  • Sangameshwar Patil
  • Sachin Pawar
  • Girish K. Palshikar
  • Savita Bhat
  • Rajiv Srivastava
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7934)

Abstract

Named Entity extraction (NEX) problem consists of automatically constructing a gazette containing instances for each NE of interest. NEX is important for domains which lack a corpus with tagged NEs. In this paper, we propose a new unsupervised (bootstrapping) NEX technique, based on a new variant of the Multiword Expression Distance (MED)[1] and information distance [2]. Efficacy of our method is shown using comparison with BASILISK and PMI in agriculture domain. Our method discovered 8 new diseases which are not found in Wikipedia.

Keywords

Named entity extraction Information extraction Unsupervised learning Information distance Agriculture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proc. of the 23rd Conf. on Computational Linguistics, COLING (2010)Google Scholar
  2. 2.
    Bennett, C., Gacs, P., Li, M., Vitanyi, P., Zurek, W.: Information distance. IEEE Transactions on Information Theory 44(4), 1407–1423 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Thelen, M., Riloff, E.: A bootstrapping method for learning seman-tic lexicons using extraction pattern contexts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sangameshwar Patil
    • 1
  • Sachin Pawar
    • 1
  • Girish K. Palshikar
    • 1
  • Savita Bhat
    • 1
  • Rajiv Srivastava
    • 1
  1. 1.Tata Research Development and Design CentreTCSPuneIndia

Personalised recommendations