Advertisement

A Domain Independent Approach for Extracting Terms from Research Papers

  • Birong Jiang
  • Endong Xun
  • Jianzhong QiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9093)

Abstract

We study the problem of extracting terms from research papers, which is an important step towards building knowledge graphs in research domain. Existing terminology extraction approaches are mostly domain dependent. They use domain specific linguistic rules, supervised machine learning techniques or a combination of the two to extract the terms. Using domain knowledge requires much human effort, e.g., manually composing a set of linguistic rules or labeling a large corpus, and hence limits the applicability of the existing approaches. To overcome this limitation, we propose a new terminology extraction approach that makes use of no knowledge from any specific domain. In particular, we use the title words and the keywords in research papers as the seeding terms and word2vec to identify similar terms from an open-domain corpus as the candidate terms, which are then filtered by checking their occurrence in the research papers. We repeat this process using the newly found terms until no new candidate term can be found. We conduct extensive experiments on the proposed approach. The results show that our approach can extract the terms effectively, while being domain independent.

Keywords

Terminology extraction Word2vec Statistical approach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 116–124 (2010)Google Scholar
  2. 2.
    Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical Observation of Term Variations and Principles for their Description. Terminology 3(2), 197–258 (1996)CrossRefGoogle Scholar
  3. 3.
    Dennis, S.F.: The construction of a thesaurus automatically from a sample of text. In: Proceedings of the Symposium on Statistical Association Methods for Mechanized Documentation, pp. 61–148 (1965)Google Scholar
  4. 4.
    Earl, L.L.: Experiments in automatic extracting and indexing. Information Storage and Retrieval 6(4), 313–330 (1970)CrossRefGoogle Scholar
  5. 5.
    Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL), pp. 17–24 (1996)Google Scholar
  6. 6.
    Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: Proceedings of the 16th Conference on Computational Linguistics (COLING), pp. 41–46 (1996)Google Scholar
  7. 7.
    Gianluca, R.B., Rossi, G.D., Pazienza, M.T.: Inducing terminology for lexical acquisition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (1997)Google Scholar
  8. 8.
    Hoffart, J., Suchanek, F.M., Berberich, K., Lewis-Kelham, E., de Melo, G., Weikum, G.: Yago2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International Conference Companion on World Wide Web (WWW), pp. 229–232 (2011)Google Scholar
  9. 9.
    Jones, L.P., Gassie Jr., E.W., Radhakrishnan, S.: Index: The statistical basis for an automatic conceptual phrase-indexing system. Journal of the American Society for Information Science 41(2), 87–97 (1990)CrossRefGoogle Scholar
  10. 10.
    Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)CrossRefGoogle Scholar
  11. 11.
    Krenn, B.: Empirical implications on lexical association measures. In: Proceedings of the 9th EURALEX International Congress (2000)Google Scholar
  12. 12.
    Maynard, D., Ananiadou, S.: Identifying contextual information for multi-word term extraction. In: 5th International Congress on Terminology and Knowledge Engineering, pp. 212–221 (1999)Google Scholar
  13. 13.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), pp. 3111–3119 (2013)Google Scholar
  14. 14.
    Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), pp. 746–751 (2013)Google Scholar
  15. 15.
    Pazienza, M., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)CrossRefGoogle Scholar
  17. 17.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pp. 173–180 (2003)Google Scholar
  18. 18.
    Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of domain ontologies. In: Proceedings of the ACL Workshop on Human Language Technology and Knowledge Management, pp. 5:1–5:8 (2001)Google Scholar
  19. 19.
    Xun, E., Li, C.: Applying terminology definition pattern and multiple features to identify technical new term and its definition. Journal of Computer Research and Development 46(1), 62–68 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Beijing Language and Culture UniversityBeijingChina
  2. 2.University of MelbourneMelbourneAustralia

Personalised recommendations