A Framework for Keyphrase Extraction from Scientific Journals

Daudaravicius, Vidas

doi:10.1007/978-3-319-53637-8_7

Vidas Daudaravicius¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9792))

Included in the following conference series:

International Workshop on Semantic, Analytics, Visualization

665 Accesses
1 Citations

Abstract

We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See demo on-line: http://textmining.lt:8080/tex2txt.htm.
2.
In our case, this is mathematical language. Other cases may include a mix of English and French paragraphs in the same article.
3.
Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence (https://en.wikipedia.org/wiki/Function_word). For instance, and, or, the, and a are all function words.
4.
http://www.journals.elsevier.com/journal-of-functional-analysis/.
5.
http://www.journals.elsevier.com/journal-of-algebra/.
6.
http://www.journals.elsevier.com/advances-in-mathematics/.
7.
http://www.springer.com/mathematics/journal/10440.
8.
http://www.springer.com/astronomy/astrophysics+and+astroparticles/journal/10509.
9.
https://en.wikipedia.org/wiki/Black_hole.
10.
http://jane.biosemantics.org/.
11.
http://helioblast.heliotext.com/.

References

Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Human Language Technologies: The 2010 Annual Conference of the NAACL, Los Angeles, CA, pp. 229–237 (June 2010)
Google Scholar
Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco, May 2008
Google Scholar
Choueka, Y.: Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases. In: Proceedings of the RIAO Conference on User-Oriented Content-Based Text and Image Handling, pp. 21–24. Cambridge, MA (1988)
Google Scholar
Daudaravicius, V., Marcinkeviciene, R.: Gravity counts for the boundaries of collocations. Int. J. Corpus Linguist. 9(2), 321–348 (2004)
Article Google Scholar
Daudaravicius, V.: The influence of collocation segmentation and top 10 items to keyword assignment performance. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 648–660. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12116-6_55
Chapter Google Scholar
Daudaravicius, V.: Applying collocation segmentation to the ACL anthology reference corpus. In: Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries, Jeju Island, Korea, pp. 66–75, July 2012
Google Scholar
Daudaravicius, V.: Collocation segmentation for text chunking. Ph.D. thesis. Vytautas Magnus University, January 2013
Google Scholar
Gollapalli, D.S., Caragea, C., Li, X., Giles, L.C.: Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction (2015)
Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262–1273, Baltimore, Maryland, June 2014
Google Scholar
Kilgarriff, A., Rychly, P., Kovar, V., Baisa, V.: Finding multiwords of more than two words. In: Proceedings of the 15th EURALEX International Congress, Oslo, pp. 693–700 (2012)
Google Scholar
Kim, N.S., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)
Google Scholar
Lin, D.: Extracting collocations from text corpora. In: First Workshop on Computational Terminology, Montreal (1998)
Google Scholar
Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 248–251, July 2010
Google Scholar
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60, 503–520 (2004)
Article Google Scholar
Seretan, V.: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol. 44. Springer, Netherlands (2011)
MATH Google Scholar
Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19, 143–177 (1993)
Google Scholar
Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)
Article Google Scholar
Strauss, U., Grzybek, P., Altmann, G.: Word length and word frequency. In: Grzybek, P. (ed.) Contributions to the Science of Text and Language: Word Length Studies and Related Issues, vol. 31, pp. 277–294. Springer, Netherlands (2006)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 127–132 (2000)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2(4), 303–336 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

VTeX, Vilnius, Lithuania
Vidas Daudaravicius

Authors

Vidas Daudaravicius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vidas Daudaravicius .

Editor information

Editors and Affiliations

Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom
Alejandra González-Beltrán
Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
Francesco Osborne
Dept of Computer Sci & Engineering, University of Bologna, Bologna, Italy
Silvio Peroni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daudaravicius, V. (2016). A Framework for Keyphrase Extraction from Scientific Journals. In: González-Beltrán, A., Osborne, F., Peroni, S. (eds) Semantics, Analytics, Visualization. Enhancing Scholarly Data. SAVE-SD 2016. Lecture Notes in Computer Science(), vol 9792. Springer, Cham. https://doi.org/10.1007/978-3-319-53637-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-53637-8_7
Published: 10 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53636-1
Online ISBN: 978-3-319-53637-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics