Abstract
In this paper, we study the keyword extraction from parallel abstracts of scientific publication in the Serbian and English languages. The keywords are extracted by a selectivity-based keyword extraction method. The method is based on the structural and statistical properties of text represented as a complex network. The constructed parallel corpus of scientific abstracts with annotated keywords allows a better comparison of the performance of the method across languages since we have the controlled experimental environment and data. The achieved keyword extraction results measured with an F1 score are 49.57% for English and 46.73% for the Serbian language, if we disregard keywords that are not present in the abstracts. In case that we evaluate against the whole keyword set, the F1 scores are 40.08% and 45.71% respectively. This work shows that SBKE can be easily ported to new a language, domain and type of text in the sense of its structure. Still, there are drawbacks – the method can extract only the words that appear in the text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Bilingual Serbian-English KE dataset is publicly available from http://langnet.uniri.hr/resources.html.
References
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of Empirical Methods in Natural Language Processing - EMNLP 2004, pp. 404–411. ACL, Barcelona (2004)
Marujo, L., Viveiros, M., Neto, J.P.: Keyphrase cloud generation of broadcast news. In: Proceeding of 12th Annual Conference of the International Speech Communication Association, Interspeech (2011)
Medelyan, O.: Human-competitive automatic topic indexing. Ph.D. thesis. Department of Computer Science, University of Waikato, New Zealand (2009)
Medelyan, O., Witten, I.H.: Domain independent automatic keyphrase indexing with small training sets. J. Am. Soc. Inf. Sci. Technol. 59(7), 1026–1040 (2008)
Paroubek, P., Zweigenbaum, P., Forest, D., Grouin, C.: Indexation libre et controlee d’articles scientifiques. Presentation et resultats du defi fouille de textes DEFT2012. In: Proceedings of the DEfi Fouille de Textes 2012 Workshop, pp. 1–13 (2012)
Kozłowski, M.: PKE: a novel Polish keywords extraction method. Pomiary Autom. Kontrola 60(5), 305–308 (2014)
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Selectivity-based keyword extraction method. Int. J. Sem. Web Inf. Syst. (IJSWIS) 12(3), 1–26 (2016)
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: Toward selectivity-based keyword extraction for croatian news. In: CEUR Proceedings of the Workshop on Surfacing the Deep and the Social Web (SDSW 2014), Riva del Garda, Trentino, Italy, vol. 1310, pp. 1–8 (2014)
Beliga, S., Martinčić-Ipšić, S.: Network-enabled keyword extraction for under-resourced languages. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 124–135. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_11
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Sebastopol (2009)
Balakrishnan, V., Ethel, L.-Y.: Stemming and lemmatization: a comparison of retrieval performances. Lect. Notes Softw. Eng. 2(3), 262–267 (2014)
Ludwig, P., Thiel, M., Nürnberger, A.: Unsupervised extraction of conceptual keyphrases from abstracts. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 37–48. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_4
Stanković, R., Krstev, C., Obradović, I., Lazić, B., Trtovac, A.: Rule-based automatic multi-word term extraction and lemmatization. In: Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia (2016). ISBN 978-2-9517408-9-1
Stanković, R., Krstev, C., Lazić, B., Vorkapić, D.: A bilingual digital library for academic and entrepreneurial knowledge management. In: Proceeding of 10th International Forum on Knowledge Asset Dynamics - IFKAD 2015: Culture, Innovation and Entrepreneurship: Connecting the Knowledge Dots, Bari, Italy, pp. 1764–1777 (2015)
Stanković, R., Krstev, C., Vitas, D., Vulović, N., Kitanović, O.: Keyword-based search on bilingual digital libraries. In: Calì, A., Gorgan, D., Ugarte, M. (eds.) KEYSTONE 2016. LNCS, vol. 10151, pp. 112–123. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53640-8_10
Vitas, D., Popović, L., Krstev, C., Obradović, I., Pavlović-Lazetić, G., Stanojević, M.: The Serbian Language in the Digital Age. META-NET White Paper Series. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30755-3. Rehm, G., Uszkoreit, H. (Series eds.)
Krstev, C., Vitas, D., Stanković, R.: A lexical approach to acronyms and their definitions. In: Mariani, Z.V.J. (ed.) Proceedings of the 7th Language & Technology Conference, pp. 219–223. Fundacja Uniwersytetu im. A. Mickiewicza, Poznan (2016)
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp. 855–860 (2008)
Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of the HLT: The Annual Conference on Empirical Methods in NLP, pp. 257–266 (2009)
Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013)
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, pp. 661–670. ACM, New York (2009)
Lahiri, S., Choudhury, S.R., Caragea, C.: Keyword and keyphrase extraction using centrality measures on collocation networks (2014). http://arxiv.org/pdf/1401.6571.pdf
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2004 Conference on Empirical Methods in NLP, pp. 1318–1327 (2009)
Lahiri, S., Mihalcea, R., Lai, P.-H.: Keyword extraction from emails. Nat. Lang. Eng. 23(2), 295–317 (2016)
Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: SemEval 2010 Proceedings of the 5th International Workshop on Semantic Evaluation, Los Angeles, California, pp. 21–26 (2010)
Yih, W.-T., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222 (2010)
Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248–251 (2010)
Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41
Utvic, M.: List of frequency corpus of contemporary Serbian language (in Serbian). In: Milanovic, A., Stanojcic, Ž., Popovic, Lj. (eds.) International Slavic Center, Faculty of Philology, vol. 43/3, pp. 241–262 (2014)
Acknowledgments
The authors would like to acknowledge networking support by the ICT COST Action IC1302 KEYSTONE – Semantic keyword-based search on structured data sources (www.keystone-cost.eu). The authors would also like to thank the University of Rijeka for the support under the LangNet project (13.13.2.2.07).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Beliga, S., Kitanović, O., Stanković, R., Martinčić-Ipšić, S. (2018). Keyword Extraction from Parallel Abstracts of Scientific Publications. In: Szymański, J., Velegrakis, Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2017. Lecture Notes in Computer Science(), vol 10546. Springer, Cham. https://doi.org/10.1007/978-3-319-74497-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-74497-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74496-4
Online ISBN: 978-3-319-74497-1
eBook Packages: Computer ScienceComputer Science (R0)