Advertisement

Large Population or Many Generations for Genetic Algorithms? Implications in Information Retrieval

  • Dana Vrajitoru
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 50)

Abstract

Artificial intelligence models may be used to improve performance of information retrieval (IR) systems and the genetic algorithms (GAs) are an example of such a model. This paper presents an application of GAs as a relevance feedback method aiming to improve the document representation and indexing. In this particular form of GAs, various document descriptions compete with each other and a better collection indexing is sought through reproduction, crossover and mutation operations. In this paradigm, we are searching for the optimal balance between two genetic parameters: the population size and the number of generations. We try to discover the optimal parameter choice both by experiments using the CACM and CISI collections, and by a theoretical analysis providing explanation of the experimental results. The general conclusion tends to be that larger populations have better chance of significantly improving the effectiveness of retrieval.

Keywords

Genetic Algorithm Information Retrieval Relevant Document Average Precision Relevance Feedback 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blair D.C. (1990) Language and Representation in Information Retrieval. Elsevier, Amsterdam (NL).Google Scholar
  2. 2.
    Brassard G., Bratley P. (1994) Fundamentals of Algorithmics. Prentice-Hall.Google Scholar
  3. 3.
    Chen H. (1995) Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms. Journal of the American Society for Information Science. 46(3): 194–216.CrossRefGoogle Scholar
  4. 4.
    De Jong K., Spears W. (1989). Using Genetic Algorithms to Solve NP-Complete Problems. International Conference on Genetic Algorithms. George Mason University, Fairfax (VA), 124–132.Google Scholar
  5. 5.
    Dillon M., Desper J. (1980) Automatic Relevance Feedback in Boolean Retrieval Systems. Journal of Documentation. 36:197–208.CrossRefGoogle Scholar
  6. 6.
    Efron B. (1986) How Biased Is the Apparent Error Rate of a Prediction Rule. Journal of the American Statistical Association. 81(394):461–470.MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Goldberg D.E. (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (MA).MATHGoogle Scholar
  8. 8.
    Gordon M. (1988) Probabilistic and Genetic Algorithms for Document Retrieval. Communications of the ACM. 31(10):1208–1218.CrossRefGoogle Scholar
  9. 9.
    Gordon M. (1991) User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm. Journal of the American Society For Information Science. 42(5):311–322.CrossRefGoogle Scholar
  10. 10.
    Holland J.H. (1975) Adaptation in Natural and Artificial Systems. Ann Arbor, University of Michigan Press.Google Scholar
  11. 11.
    Kraft D.H., Petry F.E., Buckles B.P., Sadavisan T. (1997) Genetic Algorithms for Query Optimization in Information Retrieval: Relevance Feedback. In Sanchez E., Zadeh L.A., Shibata T. (Eds.), Genetic Algorithms and Fuzzy Logic Systems, Soft Computing Perspectives. World Scientific. 155–173.Google Scholar
  12. 12.
    Kulikowski A.C., Weiss M.S. (1991) Computer Systems That Learn. Morgan Kaufmann, San Mateo (CA).Google Scholar
  13. 13.
    Lesk M. (1997) Practical Digital Libraries: Books, Bytes and Bucks. Morgan Kaufmann, San Francisco (CA).Google Scholar
  14. 14.
    Petry F., Buckles B., Prabhu D., Kraft D. (1993) Fuzzy Information Retrieval Using Genetic Algorithms and Relevance Feedback. Proceedings of the ASIS Annual Meeting. 122–125.Google Scholar
  15. 15.
    Salton G. (ed.) (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice-Hall Inc., Englewood Cliffs (NJ).Google Scholar
  16. 16.
    Salton G., Fox E., Wu U. (1983) Extended Boolean Information Retrieval. Communications of the ACM. 26(12):1022–1036.MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Salton, G., McGill M. J. (1983) Introduction to Modern Information Retrieval. McGraw-Hill (NY). Chapter 5MATHGoogle Scholar
  18. 18.
    Salton G., Buckley C. (1990) Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science. 26:361–372.Google Scholar
  19. 19.
    Sanchez E., Pierre Ph. (1994) Fuzzy Logic and Genetic Algorithms in Information Retrieval. Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, Iizuka, Japan, 29–35.Google Scholar
  20. 20.
    Sparck Jones K., Bates R. G. (1977) Research on Automatic Indexing 1974–1976. Technical Report. Computer Laboratory, University of Cambridge.Google Scholar
  21. 21.
    Sushil J.L., Gong L. (1997) Augmenting Genetic Algorithms with Memory to Solve Traveling Salesman Problems. Proceedings of the Joint Conference on Information Science. Duke University, 108–111.Google Scholar
  22. 22.
    Vrajitoru D. (1997) Apprentissage en recherche d’informations. Doctoral thesis, University of Neuchâtel, Switzerland.Google Scholar
  23. 23.
    Vrajitoru D. (1998) Crossover Improvement for the Genetic Algorithm in Information Retrieval. Information Processing and Management 34(4):405–415.CrossRefGoogle Scholar
  24. 24.
    Yang J.-J., Korfhage R.R., Rasmussen E. (1992) Query Improvement in Information Retrieval Using Genetic Algorithms. Proceeding of the fifth ICGA. 603–611.Google Scholar
  25. 25.
    Yang J.-J., Korfhage R.R. (1993) Query Optimization in Information Retrieval Using Genetic Algorithms: Report on the Experiments of the TREC Project. Proceedings of TREC’1. NIST, Gaitherburgs (MD). 31–58.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Dana Vrajitoru
    • 1
    • 2
  1. 1.Computer Science DepartmentUniversity of NeuchâtelNeuchâtelSwitzerland
  2. 2.Department of MathematicsEPFLLausanneSwitzerland

Personalised recommendations