Skip to main content

A Framework for Keyphrase Extraction from Scientific Journals

  • Conference paper
  • First Online:
Semantics, Analytics, Visualization. Enhancing Scholarly Data (SAVE-SD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9792))

Included in the following conference series:

Abstract

We present a framework for keyphrase extraction from scientific journals in diverse research fields. While journal articles are often provided with manually assigned keywords, it is not clear how to automatically extract keywords and measure their significance for a set of journal articles. We compare extracted keyphrases from journals in the fields of astrophysics, mathematics, physics, and computer science. We show that the presented statistics-based framework is able to demonstrate differences among journals, and that the extracted keyphrases can be used to represent journal or conference research topics, dynamics, and specificity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See demo on-line: http://textmining.lt:8080/tex2txt.htm.

  2. 2.

    In our case, this is mathematical language. Other cases may include a mix of English and French paragraphs in the same article.

  3. 3.

    Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence (https://en.wikipedia.org/wiki/Function_word). For instance, and, or, the, and a are all function words.

  4. 4.

    http://www.journals.elsevier.com/journal-of-functional-analysis/.

  5. 5.

    http://www.journals.elsevier.com/journal-of-algebra/.

  6. 6.

    http://www.journals.elsevier.com/advances-in-mathematics/.

  7. 7.

    http://www.springer.com/mathematics/journal/10440.

  8. 8.

    http://www.springer.com/astronomy/astrophysics+and+astroparticles/journal/10509.

  9. 9.

    https://en.wikipedia.org/wiki/Black_hole.

  10. 10.

    http://jane.biosemantics.org/.

  11. 11.

    http://helioblast.heliotext.com/.

References

  1. Baldwin, T., Lui, M.: Language identification: the long and the short of the matter. In: Human Language Technologies: The 2010 Annual Conference of the NAACL, Los Angeles, CA, pp. 229–237 (June 2010)

    Google Scholar 

  2. Bird, S., Dale, R., Dorr, B., Gibson, B., Joseph, M., Kan, M.Y., Lee, D., Powley, B., Radev, D., Tan, Y.F.: The ACL anthology reference corpus: a reference dataset for bibliographic research in computational linguistics. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morocco, May 2008

    Google Scholar 

  3. Choueka, Y.: Looking for needles in a haystack, or locating interesting collocational expressions in large textual databases. In: Proceedings of the RIAO Conference on User-Oriented Content-Based Text and Image Handling, pp. 21–24. Cambridge, MA (1988)

    Google Scholar 

  4. Daudaravicius, V., Marcinkeviciene, R.: Gravity counts for the boundaries of collocations. Int. J. Corpus Linguist. 9(2), 321–348 (2004)

    Article  Google Scholar 

  5. Daudaravicius, V.: The influence of collocation segmentation and top 10 items to keyword assignment performance. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 648–660. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12116-6_55

    Chapter  Google Scholar 

  6. Daudaravicius, V.: Applying collocation segmentation to the ACL anthology reference corpus. In: Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries, Jeju Island, Korea, pp. 66–75, July 2012

    Google Scholar 

  7. Daudaravicius, V.: Collocation segmentation for text chunking. Ph.D. thesis. Vytautas Magnus University, January 2013

    Google Scholar 

  8. Gollapalli, D.S., Caragea, C., Li, X., Giles, L.C.: Proceedings of the ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction (2015)

    Google Scholar 

  9. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262–1273, Baltimore, Maryland, June 2014

    Google Scholar 

  10. Kilgarriff, A., Rychly, P., Kovar, V., Baisa, V.: Finding multiwords of more than two words. In: Proceedings of the 15th EURALEX International Congress, Oslo, pp. 693–700 (2012)

    Google Scholar 

  11. Kim, N.S., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)

    Google Scholar 

  12. Lin, D.: Extracting collocations from text corpora. In: First Workshop on Computational Terminology, Montreal (1998)

    Google Scholar 

  13. Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 248–251, July 2010

    Google Scholar 

  14. Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Documentation 60, 503–520 (2004)

    Article  Google Scholar 

  15. Seretan, V.: Syntax-Based Collocation Extraction. Text, Speech and Language Technology, vol. 44. Springer, Netherlands (2011)

    MATH  Google Scholar 

  16. Smadja, F.: Retrieving collocations from text: Xtract. Comput. Linguist. 19, 143–177 (1993)

    Google Scholar 

  17. Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  18. Strauss, U., Grzybek, P., Altmann, G.: Word length and word frequency. In: Grzybek, P. (ed.) Contributions to the Science of Text and Language: Word Length Studies and Related Issues, vol. 31, pp. 277–294. Springer, Netherlands (2006)

    Google Scholar 

  19. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 127–132 (2000)

    Google Scholar 

  20. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2(4), 303–336 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vidas Daudaravicius .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Daudaravicius, V. (2016). A Framework for Keyphrase Extraction from Scientific Journals. In: González-Beltrán, A., Osborne, F., Peroni, S. (eds) Semantics, Analytics, Visualization. Enhancing Scholarly Data. SAVE-SD 2016. Lecture Notes in Computer Science(), vol 9792. Springer, Cham. https://doi.org/10.1007/978-3-319-53637-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53637-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53636-1

  • Online ISBN: 978-3-319-53637-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics