A Domain Independent Approach for Extracting Terms from Research Papers

Jiang, Birong; Xun, Endong; Qi, Jianzhong

doi:10.1007/978-3-319-19548-3_13

Birong Jiang¹⁶,
Endong Xun¹⁶ &
Jianzhong Qi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Australasian Database Conference

1512 Accesses
3 Citations

Abstract

We study the problem of extracting terms from research papers, which is an important step towards building knowledge graphs in research domain. Existing terminology extraction approaches are mostly domain dependent. They use domain specific linguistic rules, supervised machine learning techniques or a combination of the two to extract the terms. Using domain knowledge requires much human effort, e.g., manually composing a set of linguistic rules or labeling a large corpus, and hence limits the applicability of the existing approaches. To overcome this limitation, we propose a new terminology extraction approach that makes use of no knowledge from any specific domain. In particular, we use the title words and the keywords in research papers as the seeding terms and word2vec to identify similar terms from an open-domain corpus as the candidate terms, which are then filtered by checking their occurrence in the research papers. We repeat this process using the newly found terms until no new candidate term can be found. We conduct extensive experiments on the proposed approach. The results show that our approach can extract the terms effectively, while being domain independent.

Birong Jiang—This work is done when Birong is a visiting student at the University of Melbourne.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bu, F., Zhu, X., Li, M.: Measuring the non-compositionality of multiword expressions. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 116–124 (2010)
Google Scholar
Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical Observation of Term Variations and Principles for their Description. Terminology 3(2), 197–258 (1996)
Article Google Scholar
Dennis, S.F.: The construction of a thesaurus automatically from a sample of text. In: Proceedings of the Symposium on Statistical Association Methods for Mechanized Documentation, pp. 61–148 (1965)
Google Scholar
Earl, L.L.: Experiments in automatic extracting and indexing. Information Storage and Retrieval 6(4), 313–330 (1970)
Article Google Scholar
Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL), pp. 17–24 (1996)
Google Scholar
Frantzi, K.T., Ananiadou, S.: Extracting nested collocations. In: Proceedings of the 16th Conference on Computational Linguistics (COLING), pp. 41–46 (1996)
Google Scholar
Gianluca, R.B., Rossi, G.D., Pazienza, M.T.: Inducing terminology for lexical acquisition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (1997)
Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K., Lewis-Kelham, E., de Melo, G., Weikum, G.: Yago2: exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International Conference Companion on World Wide Web (WWW), pp. 229–232 (2011)
Google Scholar
Jones, L.P., Gassie Jr., E.W., Radhakrishnan, S.: Index: The statistical basis for an automatic conceptual phrase-indexing system. Journal of the American Society for Information Science 41(2), 87–97 (1990)
Article Google Scholar
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1, 9–27 (1995)
Article Google Scholar
Krenn, B.: Empirical implications on lexical association measures. In: Proceedings of the 9th EURALEX International Congress (2000)
Google Scholar
Maynard, D., Ananiadou, S.: Identifying contextual information for multi-word term extraction. In: 5th International Congress on Terminology and Knowledge Engineering, pp. 212–221 (1999)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS), pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association of Computational Linguistics on Human Language Technologies (HLT-NAACL), pp. 746–751 (2013)
Google Scholar
Pazienza, M., Pennacchiotti, M., Zanzotto, F.: Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis, S. (ed.) Knowledge Mining, Studies in Fuzziness and Soft Computing, vol. 185, pp. 255–279. Springer, Heidelberg (2005)
Google Scholar
Salton, G., Yang, C.S., Yu, C.T.: A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26(1), 33–44 (1975)
Article Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pp. 173–180 (2003)
Google Scholar
Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of domain ontologies. In: Proceedings of the ACL Workshop on Human Language Technology and Knowledge Management, pp. 5:1–5:8 (2001)
Google Scholar
Xun, E., Li, C.: Applying terminology definition pattern and multiple features to identify technical new term and its definition. Journal of Computer Research and Development 46(1), 62–68 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Language and Culture University, Beijing, China
Birong Jiang & Endong Xun
University of Melbourne, Melbourne, Australia
Jianzhong Qi

Authors

Birong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Endong Xun
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianzhong Qi .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Queensland, Australia
Mohamed A. Sharaf
Monash University, Clayton, Australia
Muhammad Aamir Cheema
The University of Melbourne, Melbourne, Australia
Jianzhong Qi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, B., Xun, E., Qi, J. (2015). A Domain Independent Approach for Extracting Terms from Research Papers. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-19548-3_13
Published: 28 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics