A Text Mining-Based Approach for Analyzing Information Retrieval in Spanish: Music Data Collection as a Case Study

  • Juan Ramos-GonzálezEmail author
  • Lucía Martín-Gómez
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 801)


This paper presents a text mining-based search approach aimed at information retrieval in the Spanish language. For this purpose, a tool has been developed in order to facilitate and automate the analysis and retrieval, allowing the user to apply different analyzers when carrying out a query, to index and delete documents stored in the system and to evaluate the recovery process. To this extent, a dataset consisting in 27 songs has been used as a case study. Different queries have been made to investigate about the best fitting approaches to the Spanish language and their suitability depending on the query text.


Text mining Information retrieval Stemming Spanish 



This work has been supported by project MOVIURBAN Máquina social para la gestión sostenible de ciudades inteligentes: movilidad urbana, datos abiertos, sensores móviles (SA070U 16). Project cofinanced with Junta Castilla y Leon, Consejera de Educacion and FEDER funds. In addition, the research of Juan Ramos González has been co-financed by the European Social Fund and Junta de Castilla y León (Operational Programme 2014-2020 for Castilla y León, BOCYL EDU/602/2016).


  1. 1.
    Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media Inc., New York (2015)Google Scholar
  2. 2.
    Gupta, V., Lehal, G.S.: A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)Google Scholar
  3. 3.
    Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. Ldv Forum 20, 19–62 (2005)Google Scholar
  4. 4.
    Patel, F.N., Soni, N.R.: Text mining: a brief survey. Int. J. Adv. Comput. Res. 2(4), 243–248 (2012)Google Scholar
  5. 5.
    Porter, M.: Spanish stemming algorithm (2005). Accessed 20 Jan 2018
  6. 6.
    Porter, M.F.: Snowball: a language for stemming algorithms (2001). Accessed 14 Jan 2018
  7. 7.
    Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)Google Scholar
  8. 8.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  9. 9.
    Savoy, J.: Report on CLEF-2001 experiments: effective combined query-translation approach. In: Workshop of the Cross-Language Evaluation Forum for European Languages, pp. 27–43. Springer (2001)Google Scholar
  10. 10.
    Sharma, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf. Syst. 4(3), 7–12 (2012)Google Scholar
  11. 11.
    Sproat, R.W.: Morphology and Computation. MIT press, Cambridge (1992)CrossRefGoogle Scholar
  12. 12.
    Vijayarani, S., Ilamathi, M.J., Nithya, M.: Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5(1), 7–16 (2015)Google Scholar
  13. 13.
    Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.BISITE Digital Innovation HubUniversity of Salamanca, Edificio Multiusos I+D+iSalamancaSpain

Personalised recommendations