Genetic Algorithm Based Model for Effective Document Retrieval

  • Hazra ImranEmail author
  • Aditi Sharan
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 70)


One central problem of information retrieval is to determine the relevance of documents with respect to the user information needs. The choice of similarity measure is crucial for improving search effectiveness of a retrieval system. Different similarity measures have been suggested to match the query and documents. Some of the popular measures being: cosine, jaccard, dice, okapi etc., each having their own pros and cons. Accordingly one may give better result over other depending on users need, document corpus, organization and indexing of corpus. Therefore it may be justifiable to combine these measures and develop a new similarity measure which can be named as combined similarity measure. Now individual measures can be assigned weights in different proportion in combined similarity measure. In order to optimize ranking of relevant documents, individual weights have to be optimized. In this chapter we suggest a genetic algorithm based model for learning weights of individual components of combined similarity measure. We have considered two different types of functions viz: non-order based and order based fitness functions to evaluate the goodness of the solution. A non-order based fitness function is based on recall-precision values only. However, it has been observed that a better fitness function can be obtained if we also consider the order in which relevant documents are retrieved. This leads to an idea of order based fitness functions. We evaluated the efficacy of a genetic algorithm with various fitness functions. The experiments have been carried out on TREC data collection. The results have been compared with various well-known similarity measures.


Document ranking Genetic algorithms Similarity measures Information retrieval Vector space model 


  1. 1.
    Bookstein, A.: Probability and fuzzy-set applications to information retrieval. Ann. Rev. Inform. Sci. Technol. 20, 117–151 (1985) Google Scholar
  2. 2.
    Imran, H., Sharan, A.: A framework for efficient document ranking using order and non-order based fitness function. In: Lecture Notes in Engineering and Computer Science: Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, Hong Kong, 17–19 March 2010, pp. 71–76 (2010) Google Scholar
  3. 3.
    Lourdes, A., Jose, R.: Improving query expansion with stemming terms: a new genetic algorithm approach. In: Evolutionary Computation in Combinatorial Optimization (2008) Google Scholar
  4. 4.
    Michalewicz, Z.: Genetic Algorithms+Data Structures = Evolution Programs. Springer, Berlin (1996) zbMATHCrossRefGoogle Scholar
  5. 5.
    Pérez-Agüera, J.R.: Using genetic algorithms for query reformulation. In: BCS IRSG Symposium: Future Directions in Information Access (FDIA 2007) (2007) Google Scholar
  6. 6.
    Robertson, S.E.: The probabilistic character of relevance. Inf. Process. Manag. 13, 247–251 (1997) CrossRefGoogle Scholar
  7. 7.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. In: Jones, K.S., Willett, P. (eds.) Readings in Information Retrieval, pp. 323–328. Morgan Kaufman Publishers, San Francisco (1997) Google Scholar
  8. 8.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983) zbMATHGoogle Scholar
  9. 9.
    Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26(12), 1022–1036 (1983) MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Spark Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–22 (1972) CrossRefGoogle Scholar
  11. 11.
    Vrajitoru, D.: Crossover improvement for genetic algorithms in information retrieval. Inf. Process. Manag. 34(4), 405–415 (1998) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceNew DelhiIndia
  2. 2.School of Computer and System SciencesJawaharlal Nehru UniversityNew DelhiIndia

Personalised recommendations