Information retrieval methodology for aiding scientific database search

  • Samuel Marcos-PablosEmail author
  • Francisco J. García-Peñalvo


During literature reviews, and specially when conducting systematic literature reviews, finding and screening relevant papers during scientific document search may involve managing and processing large amounts of unstructured text data. In those cases where the search topic is difficult to establish or has fuzzy limits, researchers require to broaden the scope of the search and, in consequence, data from retrieved scientific publications may become huge and uncorrelated. However, through a convenient analysis of these data the researcher may be able to discover new knowledge which may be hidden within the search output, thus exploring the limits of the search and enhancing the review scope. With that aim, this paper presents an iterative methodology that applies text mining and machine learning techniques to a downloaded corpus of abstracts from scientific databases, combining automatic processing algorithms with tools for supervised decision-making in an iterative process sustained on the researchers’ judgement, so as to adapt, screen and tune the search output. The paper ends showing a working example that employs a set of developed scripts that implement the different stages of the proposed methodology.


Information retrieval Systematic literature review Text mining Vector space model Support vector machine 



This work has partially funded by the Spanish Government Ministry of Economy and Competitiveness throughout the DEFINES project (Ref. TIN2016-80172-R) and the Ministry of Education of the Junta de Castilla y Leon (Spain) throughout the T-CUIDA project (Ref. SA061P17).

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Al-Ruithe M, Benkhelifa E, Hameed K (2018) A systematic literature review of data governance and cloud data governance. Pers Ubiquitous Comput.
  2. Buttcher S, Clarke C, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. The MIT Press, CambridgezbMATHGoogle Scholar
  3. Eachempati P, Srivastava PR (2017) Systematic literature review of big data analytics. In: Proceedings of the 2017 ACM SIGMIS conference on computers and people research, ACM, New York, NY, USA, SIGMIS-CPR’17, pp 177–178.
  4. Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43Google Scholar
  5. Franco-Bedoya O, Ameller D, Costal D, Franch X (2017) Open source software ecosystems: a systematic mapping. Inf Softw Technol 91:160–185. CrossRefGoogle Scholar
  6. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144. CrossRefGoogle Scholar
  7. Hordri NF, Samar A, Yuhaniz SS, Shamsuddin SM (2017) A systematic literature review on features of deep learning in big data analytics. Int J Adv Soft Comput Appl 9(1):32–49. CrossRefGoogle Scholar
  8. Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62Google Scholar
  9. Islam MS, Jubayer FEM, Ahmed SI (2017) A support vector machine mixed with tf-idf algorithm to categorize Bengali document. In: 2017 international conference on electrical, computer and communication engineering (ECCE), pp 191–196.
  10. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, pp 137–142CrossRefGoogle Scholar
  11. Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of DurhamGoogle Scholar
  12. Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. VLDB Endow 5(12):2032–2033CrossRefGoogle Scholar
  13. LHeureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797. CrossRefGoogle Scholar
  14. Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press) Google Scholar
  15. Marshall C, Brereton P (2013) Tools to support systematic literature reviews in software engineering: a mapping study. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement, pp 296–299.
  16. Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, BostonGoogle Scholar
  17. Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601.
  18. Nelson B, Olovsson T (2016) Security and privacy for big data: a systematic literature review. In: 2016 IEEE international conference on big data (big data), pp 3693–3702.
  19. Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11.
  20. O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5CrossRefGoogle Scholar
  21. Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, LondonGoogle Scholar
  22. Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127.
  23. Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142Google Scholar
  24. Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar
  25. Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E (2014) Systematic review automation technologies. Syst Rev 3:74CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.GRIAL Research Group, Research Institute for Educational SciencesUniversity of SalamancaSalamancaSpain

Personalised recommendations