Abstract
Extracting keyphrases from full-text is a daunting task in that many different concepts and themes are intertwined and extensive term variations exist in full-text. In this chapter, we proposes a novel unsupervised keyphrase extraction system, BioKeySpotter, which incorporates lexical syntactic features to weigh candidate keyphrases. The main contribution of our study is that BioKeySpotter is an innovative approach for combining Natural Language Processing (NLP), information extraction, and integration techniques into extracting keyphrases from full-text. The results of the experiment demonstrate that BioKeySpotter generates a higher performance, in terms of accuracy, compared to other supervised learning algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arshadi, N., Jurisica, I.: Feature Selection for Improving Case Based Classifiers on High Dimentional Data Sets. AAAI, Menlo Park (2005)
Barker, K., Cornacchia, N.: Using Noun Phrase Heads to Extract Document Keyphrases. In: Hamilton, H.J. (ed.) Canadian AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000)
Bracewell, D.B., Ren, F., et al.: Multilingual single document keyword extraction for information retrieval. In: NLP-KE 2005. IEEE, Los Alamitos (2005)
Brown, J.: Growing up digital: How the web changes work, education, and the ways people learn. Change, 10–20 (2000)
D’Avanzo, E., Magnini, B., et al.: Keyphrase Extraction for Summarization Purposes: The LAKE System at DUC-2004. In: Document Understanding Workshop. HLT/NAACL, Boston, USA (2004)
El-Beltagy, S.: KP-Miner: A Simple System for Effective Keyphrase Extraction. In: Innovation in Information Technology. IEEE Xplore (2006)
Frantzi, K.T., Ananiadou, S., Tsujii, J.: The C − value/NC − value Method of Automatic Recognition for Multi-word Terms. In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 585–604. Springer, Heidelberg (1998)
Liu, X.: Intelligent Data Analysis. Intelligent Information Technologies: Concepts, Methodologies, Tools and Applications, 308 (2007)
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications (NLPBA), pp. 104–107 (2004)
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: EMNLP-2004, Barcelona, Spain (2004)
Song, M., Song, I.-Y., et al.: KPSpotter: a flexible information gain-based keyphrase extraction system. In: International Workshop on Web Information and Data Management ACM (2003)
Song, M., Rudniy, A.: Markov Random Field-based Edit Distance for Entity Matching, Biomedical Literature. In: International Conference on Bioinformatics and Biomedicine, pp. 457–460 (2008)
Turney, P.D.: Extraction of Keyphrases from Text: Evaluation of Four Algorithms, pp. 1–29. National Research Council Canada, Institute for Information Technology (1997)
Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2(4), 303–336 (2000)
Wan, X., Yang, J., et al.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. ACL, Prague (2007)
Witten, I.H., Paynter, G.W., et al.: KEA: Practical Automatic Keyphrase Extraction. In: The Fourth on Digital Libraries 1999. ACM CNF, New York (1999)
Zhang, Y., Zincir-Heywood, N., et al.: Narrative text classification for automatic key phrase extraction in web document corpora. In: 7th Annual ACM International Workshop on Web Information and Data Management. ACM SIGIR, Bremen (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Song, M., Tanapaisankit, P. (2012). BioKeySpotter: An Unsupervised Keyphrase Extraction Technique in the Biomedical Full-Text Collection. In: Holmes, D., Jain, L. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23151-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-23151-3_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23150-6
Online ISBN: 978-3-642-23151-3
eBook Packages: EngineeringEngineering (R0)