Combining Evidence for Automatic Extraction of Terms
The paper describes the method of extraction of two-word domain terms combining their features. The features are computed from three sources: the occurrence statistics in a domain-specific text collection, the statistics of global search engines, and a domain-specific thesaurus. The evaluation of the approach is based on the terminology of manually created thesauri. We show that the use of multiple features considerably improves the automatic extraction of domain-specific terms. We compare the quality of the proposed method in two different domains.
Keywordsterm acquisition thesaurus Internet search machine learning
- 1.Zhang, Z., Iria, J.: Brewster, Ch., Ciravegna, F.: A Comparative Evaluation of Term Recognition Algorithms. In: Sixth International Language Resources and Evaluation, LREC 2008 (2008)Google Scholar
- 2.Pecina, P., Schlesinger, P.: Combining association measures for collocation extraction. In: Annual Meeting of the Association for Computational Linguistics, ACL 2006, ACM Press, New York (2006)Google Scholar
- 3.Dobrov, B., Loukachevitch, N.: Development of Linguistic Ontology on Natural Sciences and Technology. In: Linguistic resources and Evaluation conference, LREC 2006 (2006)Google Scholar
- 5.Daille, B., Gaussier, E., Lang, J.M.: An evaluation of statistics scores for word association. In: Tbilisi Symposium on Logic, Language and Computation, pp. 177–188. CSLI Publications (1998)Google Scholar
- 6.Nenadic, G., Ananiadou, S., McNaught, J.: Enhancing automatic term recognition through recognition of variation. In: 20th International Conference on Computational Linguistics (COLING 2004), pp. 604–610 (2004)Google Scholar