Machine Learning for the Identification and Classification of Key Phrases from Clinical Documents in Spanish

  • Mireya Tovar VidalEmail author
  • Emmanuel Santos Rodríguez
  • José A. Reyes-Ortiz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1069)


The key phrases play a very important role because they allow us to characterize the content of a text in a short way and even answer questions related to it. Due to the above, the extraction and classification of these words are a competent problem in different areas of knowledge such as Information Retrieval, Natural Language Processing, among others. This research presents a proposed solution for the identification and classification of key phrases through automatic learning algorithms, in electronic documents related to health topics written in Spanish. According to the experimental results, the proposed algorithm achieves 94% of correctly classified key phrases and 72% of precision for the identification phase.


Key phrases extraction Natural Language Processing Machine learning 



This work is supported by the Sectoral Research Fund for Education with the CONACyT project 257357, and partially supported by the VIEP-BUAP project.


  1. 1.
    Liddy, E.D.: Natural language processing. In: Encyclopedia of Library and Information Science, 2nd edn., N.Y. (2001)Google Scholar
  2. 2.
    Beliga, S.: Keyword extraction: a review of methods and approaches (2014)Google Scholar
  3. 3.
    Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109, 18 (2015)Google Scholar
  4. 4.
    Stauffer, M., Fischer, A., Riesen, K.: Keyword spotting in historical handwritten documents based on graph matching. Pattern Recogn. 81, 240–253 (2018)CrossRefGoogle Scholar
  5. 5.
    Lynn, H., Lee, E., Choi, C., Kim, P.: SwiftRank: an unsupervised statistical approach of keyword and salient sentence extraction for individual documents. Procedia Comput. Sci. 113, 472–477 (2017)CrossRefGoogle Scholar
  6. 6.
    Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. Informatica 31, 249–268 (2007)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Kuhn, R., De Mori, R.: Learning speech semantics worth keyword classification trees. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 55–58 (1993)Google Scholar
  8. 8.
    Menaka, S., Radha, N.: Text classification using keyword extraction technique. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 734 (2013)Google Scholar
  9. 9.
    Honnibal, M., Montani, I.: Spacy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017)Google Scholar
  10. 10.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  11. 11.
    eHealth Knowledge Discovery. TASS. Accessed 31 Jan 2019
  12. 12.
    Tovar Vidal, M., Santos Rodríguez, E., Contreras González, M.: Extracción de palabras clave en documentos no estructurados utilizando Spacy. Coloquio de Investigación Multidisciplinaria 6, 1782–1789 (2018)Google Scholar
  13. 13.
    Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the Twentieth International Conference on Machine Learning, vol. 3, pp. 617–618 (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Mireya Tovar Vidal
    • 1
    Email author
  • Emmanuel Santos Rodríguez
    • 1
  • José A. Reyes-Ortiz
    • 2
  1. 1.Faculty of Computer ScienceBenemerita Universidad Autonoma de PueblaPueblaMexico
  2. 2.Universidad Autonoma MetropolitanaMexico CityMexico

Personalised recommendations