Abstract
Nowadays, searching information in the web or in any kind of document collection has become one of the most frequent activities. However, user queries can be formulated in a way that hinder the recovery of the requested information. The objective of automatic query transformation is to improve the quality of the recovered information. This paper describes a new genetic algorithm used to change the set of terms that compose a user query without user supervision, by complementing an expansion process based on the use of a morphological thesaurus. We apply a stemming process to obtain the stem of a word, for which the thesaurus provides its different forms. The set of candidate query terms is constructed by expanding each term in the original query with the terms morphologically related. The genetic algorithm is in charge of selecting the terms of the final query from the candidate term set. The selection process is based on the retrieval results obtained when searching with different combination of candidate terms. We have obtained encouraging results, improving the performance of a standard set of tests.
Supported by projects TIN2007-68083-C02-01 and TIN2007-67581-C02-01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Chen, H., Shankaranarayanan, G., She, L., Iyer, A.: A machine learning approach to inductive query by examples: An experiment using relevance feedback, id3, genetic algorithms, and simulated annealing. JASIS 49(8), 693–705 (1998)
Cordón, O., de Moya Anegón, F., Zarco, C.: A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems. Soft Comput. 6(5), 308–319 (2002)
Cordón, O., Herrera-Viedma, E., López-Pujalte, C., Luque, M., Zarco, C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reasoning 34(2-3), 241–264 (2003)
Cordón, O., Herrera-Viedma, E., Luque, M.: Improving the learning of boolean queries by means of a multiobjective iqbe evolutionary algorithm. Inf. Process. Manage. 42(3), 615–632 (2006)
Salton, G.: Automatic Information Organization and Retrieval. McGraw Hill Book Co, New York (1968)
Holland, J.J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Horng, J.-T., Yeh, C.-C.: Applying genetic algorithms to query optimization in document retrieval. Inf. Process. Manage. 36(5), 737–759 (2000)
Lopez-Pujalte, C., Bote, V.P.G., de Moya Anegón, F.: A test of genetic algorithms in relevance feedback. Inf. Process. Manage. 38(6), 793–805 (2002)
Smith, M., Smith, M.: The use of genetic programming to build boolean queries for text retrieval through relevance feedback. Journal of Information Science 23(6), 423–431 (1997)
Martín, J.L.F.-V., Shackleton, M.: Investigation of the importance of the genotype-phenotype mapping in information retrieval. Future Generation Comp. Syst. 19(1), 55–68 (2003)
Petry, F.E., Buckles, B.P., Sadasivan, T., Kraft, D.H.: The use of genetic programming to build queries for information retrieval. In: International Conference on Evolutionary Computation, pp. 468–473 (1994)
Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Robertson, A.M., Willet, P.: An upperbound to the performance of ranked-output searching: optimal weighting of query terms using a genetic algorithm. J. of Documentation 52(4), 405–420 (1996)
Robertson, S., Jones, K.S.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1996)
Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)
Sanchez, E., Miyano, H., Brachet, J.: Optimization of fuzzy queries with genetic algorithms. application to a data base of patents in biomedical engineering. In: VI IFSA Congress, vol. II, pp. 293–296 (1995)
Tamine, L., Chrisment, C., Boughanem, M.: Multiple query evaluation based on an enhanced genetic algorithm. Inf. Process. Manage. 39(2), 215–231 (2003)
van Rijsbergen, C., Robertson, S., Porter, M.: New models in probabilistic information retrieval. Technical Report 5587, British Library, London (1980)
van Rijsbergen, C.J.: Information retrieval, 2nd edn., Butterworths, London (1979)
Yang, J.-J., Korfhage, R.R.: Query modification using genetic algorithms in vector space models. Int. J. Expert Syst. 7(2), 165–191 (1994)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Araujo, L., Pérez-Agüera, J.R. (2008). Improving Query Expansion with Stemming Terms: A New Genetic Algorithm Approach. In: van Hemert, J., Cotta, C. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2008. Lecture Notes in Computer Science, vol 4972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78604-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-78604-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78603-0
Online ISBN: 978-3-540-78604-7
eBook Packages: Computer ScienceComputer Science (R0)