Unsupervised Gazette Creation Using Information Distance
Named Entity extraction (NEX) problem consists of automatically constructing a gazette containing instances for each NE of interest. NEX is important for domains which lack a corpus with tagged NEs. In this paper, we propose a new unsupervised (bootstrapping) NEX technique, based on a new variant of the Multiword Expression Distance (MED) and information distance . Efficacy of our method is shown using comparison with BASILISK and PMI in agriculture domain. Our method discovered 8 new diseases which are not found in Wikipedia.
KeywordsNamed entity extraction Information extraction Unsupervised learning Information distance Agriculture
Unable to display preview. Download preview PDF.
- 1.Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proc. of the 23rd Conf. on Computational Linguistics, COLING (2010)Google Scholar
- 3.Thelen, M., Riloff, E.: A bootstrapping method for learning seman-tic lexicons using extraction pattern contexts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002)Google Scholar