Skip to main content

Improving Query Expansion with Stemming Terms: A New Genetic Algorithm Approach

  • Conference paper
Evolutionary Computation in Combinatorial Optimization (EvoCOP 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4972))

Abstract

Nowadays, searching information in the web or in any kind of document collection has become one of the most frequent activities. However, user queries can be formulated in a way that hinder the recovery of the requested information. The objective of automatic query transformation is to improve the quality of the recovered information. This paper describes a new genetic algorithm used to change the set of terms that compose a user query without user supervision, by complementing an expansion process based on the use of a morphological thesaurus. We apply a stemming process to obtain the stem of a word, for which the thesaurus provides its different forms. The set of candidate query terms is constructed by expanding each term in the original query with the terms morphologically related. The genetic algorithm is in charge of selecting the terms of the final query from the candidate term set. The selection process is based on the retrieval results obtained when searching with different combination of candidate terms. We have obtained encouraging results, improving the performance of a standard set of tests.

Supported by projects TIN2007-68083-C02-01 and TIN2007-67581-C02-01.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)

    Google Scholar 

  2. Chen, H., Shankaranarayanan, G., She, L., Iyer, A.: A machine learning approach to inductive query by examples: An experiment using relevance feedback, id3, genetic algorithms, and simulated annealing. JASIS 49(8), 693–705 (1998)

    Article  Google Scholar 

  3. Cordón, O., de Moya Anegón, F., Zarco, C.: A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems. Soft Comput. 6(5), 308–319 (2002)

    MATH  Google Scholar 

  4. Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., Zarco, C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reasoning 34(2-3), 241–264 (2003)

    Article  MATH  Google Scholar 

  5. Cordón, O., Herrera-Viedma, E., Luque, M.: Improving the learning of boolean queries by means of a multiobjective iqbe evolutionary algorithm. Inf. Process. Manage. 42(3), 615–632 (2006)

    Article  Google Scholar 

  6. Salton, G.: Automatic Information Organization and Retrieval. McGraw Hill Book Co, New York (1968)

    Google Scholar 

  7. Holland, J.J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)

    Google Scholar 

  8. Horng, J.-T., Yeh, C.-C.: Applying genetic algorithms to query optimization in document retrieval. Inf. Process. Manage. 36(5), 737–759 (2000)

    Article  Google Scholar 

  9. Lopez-Pujalte, C., Bote, V.P.G., de Moya Anegón, F.: A test of genetic algorithms in relevance feedback. Inf. Process. Manage. 38(6), 793–805 (2002)

    Article  MATH  Google Scholar 

  10. Smith, M., Smith, M.: The use of genetic programming to build boolean queries for text retrieval through relevance feedback. Journal of Information Science 23(6), 423–431 (1997)

    Article  Google Scholar 

  11. Martín, J.L.F.-V., Shackleton, M.: Investigation of the importance of the genotype-phenotype mapping in information retrieval. Future Generation Comp. Syst. 19(1), 55–68 (2003)

    Article  MATH  Google Scholar 

  12. Petry, F.E., Buckles, B.P., Sadasivan, T., Kraft, D.H.: The use of genetic programming to build queries for information retrieval. In: International Conference on Evolutionary Computation, pp. 468–473 (1994)

    Google Scholar 

  13. Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  14. Robertson, A.M., Willet, P.: An upperbound to the performance of ranked-output searching: optimal weighting of query terms using a genetic algorithm. J. of Documentation 52(4), 405–420 (1996)

    Article  Google Scholar 

  15. Robertson, S., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1996)

    Article  Google Scholar 

  16. Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)

    Google Scholar 

  17. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)

    MATH  Google Scholar 

  18. Sanchez, E., Miyano, H., Brachet, J.: Optimization of fuzzy queries with genetic algorithms. application to a data base of patents in biomedical engineering. In: VI IFSA Congress, vol. II, pp. 293–296 (1995)

    Google Scholar 

  19. Tamine, L., Chrisment, C., Boughanem, M.: Multiple query evaluation based on an enhanced genetic algorithm. Inf. Process. Manage. 39(2), 215–231 (2003)

    Article  MATH  Google Scholar 

  20. van Rijsbergen, C., Robertson, S., Porter, M.: New models in probabilistic information retrieval. Technical Report 5587, British Library, London (1980)

    Google Scholar 

  21. van Rijsbergen, C.J.: Information retrieval, 2nd edn., Butterworths, London (1979)

    Google Scholar 

  22. Yang, J.-J., Korfhage, R.R.: Query modification using genetic algorithms in vector space models. Int. J. Expert Syst. 7(2), 165–191 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jano van Hemert Carlos Cotta

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Araujo, L., Pérez-Agüera, J.R. (2008). Improving Query Expansion with Stemming Terms: A New Genetic Algorithm Approach. In: van Hemert, J., Cotta, C. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2008. Lecture Notes in Computer Science, vol 4972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78604-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78604-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78603-0

  • Online ISBN: 978-3-540-78604-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics