Skip to main content

Keyword Extraction from Parallel Abstracts of Scientific Publications

  • Conference paper
  • First Online:
Semantic Keyword-Based Search on Structured Data Sources (IKC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10546))

Included in the following conference series:

Abstract

In this paper, we study the keyword extraction from parallel abstracts of scientific publication in the Serbian and English languages. The keywords are extracted by a selectivity-based keyword extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. Still, there are drawbacks – the method can extract only the words that appear in the text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://jerteh.rs/biblisha/ListaDokumenata.aspx?JCID=2&lng=en.

  2. 2.

    Bilingual Serbian-English KE dataset is publicly available from http://langnet.uniri.hr/resources.html.

References

  1. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)

    Google Scholar 

  2. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing - EMNLP 2004, pp. 404–411. ACL, Barcelona (2004)

    Google Scholar 

  3. Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. In: Proceeding of 12th Annual Conference of the International Speech Communication Association, Interspeech (2011)

    Google Scholar 

  4. Medelyan, O.: Human-competitive automatic topic indexing. Ph.D. thesis. Department of Computer Science, University of Waikato, New Zealand (2009)

    Google Scholar 

  5. Medelyan, O., Witten, I.H.: Domain independent automatic keyphrase indexing with small training sets. J. Am. Soc. Inf. Sci. Technol. 59(7), 1026–1040 (2008)

    Article  Google Scholar 

  6. Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libre et controlee d’articles scientifiques. Presentation et resultats du defi fouille de textes DEFT2012. In: Proceedings of the DEfi Fouille de Textes 2012 Workshop, pp. 1–13 (2012)

    Google Scholar 

  7. Kozłowski, M.: PKE: a novel Polish keywords extraction method. Pomiary Autom. Kontrola 60(5), 305–308 (2014)

    Google Scholar 

  8. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Sem. Web Inf. Syst. (IJSWIS) 12(3), 1–26 (2016)

    Article  Google Scholar 

  9. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Toward selectivity-based keyword extraction for croatian news. In: CEUR Proceedings of the Workshop on Surfacing the Deep and the Social Web (SDSW 2014), Riva del Garda, Trentino, Italy, vol. 1310, pp. 1–8 (2014)

    Google Scholar 

  10. Beliga, S., Martinčić-Ipšić, S.: Network-enabled keyword extraction for under-resourced languages. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 124–135. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_11

    Chapter  Google Scholar 

  11. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol (2009)

    MATH  Google Scholar 

  12. Balakrishnan, V., Ethel, L.-Y.: Stemming and lemmatization: a comparison of retrieval performances. Lect. Notes Softw. Eng. 2(3), 262–267 (2014)

    Article  Google Scholar 

  13. Ludwig, P., Thiel, M., Nürnberger, A.: Unsupervised extraction of conceptual keyphrases from abstracts. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 37–48. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_4

    Chapter  Google Scholar 

  14. Stanković, R., Krstev, C., Obradović, I., Lazić, B., Trtovac, A.: Rule-based automatic multi-word term extraction and lemmatization. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia (2016). ISBN 978-2-9517408-9-1

    Google Scholar 

  15. Stanković, R., Krstev, C., Lazić, B., Vorkapić, D.: A bilingual digital library for academic and entrepreneurial knowledge management. In: Proceeding of 10th International Forum on Knowledge Asset Dynamics - IFKAD 2015: Culture, Innovation and Entrepreneurship: Connecting the Knowledge Dots, Bari, Italy, pp. 1764–1777 (2015)

    Google Scholar 

  16. Stanković, R., Krstev, C., Vitas, D., Vulović, N., Kitanović, O.: Keyword-based search on bilingual digital libraries. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 112–123. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_10

    Chapter  Google Scholar 

  17. Vitas, D., Popović, L., Krstev, C., Obradović, I., Pavlović-Lazetić, G., Stanojević, M.: The Serbian Language in the Digital Age. META-NET White Paper Series. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30755-3. Rehm, G., Uszkoreit, H. (Series eds.)

    Google Scholar 

  18. Krstev, C., Vitas, D., Stanković, R.: A lexical approach to acronyms and their definitions. In: Mariani, Z.V.J. (ed.) Proceedings of the 7th Language & Technology Conference, pp. 219–223. Fundacja Uniwersytetu im. A. Mickiewicza, Poznan (2016)

    Google Scholar 

  19. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp. 855–860 (2008)

    Google Scholar 

  20. Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of the HLT: The Annual Conference on Empirical Methods in NLP, pp. 257–266 (2009)

    Google Scholar 

  21. Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013)

    Article  Google Scholar 

  22. Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM, New York (2009)

    Google Scholar 

  23. Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks (2014). http://arxiv.org/pdf/1401.6571.pdf

  24. Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2004 Conference on Empirical Methods in NLP, pp. 1318–1327 (2009)

    Google Scholar 

  25. Lahiri, S., Mihalcea, R., Lai, P.-H.: Keyword extraction from emails. Nat. Lang. Eng. 23(2), 295–317 (2016)

    Article  Google Scholar 

  26. Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: SemEval 2010 Proceedings of the 5th International Workshop on Semantic Evaluation, Los Angeles, California, pp. 21–26 (2010)

    Google Scholar 

  27. Yih, W.-T., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222 (2010)

    Google Scholar 

  28. Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248–251 (2010)

    Google Scholar 

  29. Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41

    Chapter  Google Scholar 

  30. Utvic, M.: List of frequency corpus of contemporary Serbian language (in Serbian). In: Milanovic, A., Stanojcic, Ž., Popovic, Lj. (eds.) International Slavic Center, Faculty of Philology, vol. 43/3, pp. 241–262 (2014)

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge networking support by the ICT COST Action IC1302 KEYSTONE – Semantic keyword-based search on structured data sources (www.keystone-cost.eu). The authors would also like to thank the University of Rijeka for the support under the LangNet project (13.13.2.2.07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Slobodan Beliga .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beliga, S., Kitanović, O., Stanković, R., Martinčić-Ipšić, S. (2018). Keyword Extraction from Parallel Abstracts of Scientific Publications. In: Szymański, J., Velegrakis, Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2017. Lecture Notes in Computer Science(), vol 10546. Springer, Cham. https://doi.org/10.1007/978-3-319-74497-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74497-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74496-4

  • Online ISBN: 978-3-319-74497-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics