Advertisement

Information Retrieval

, Volume 11, Issue 1, pp 1–24 | Cite as

An effective and efficient results merging strategy for multilingual information retrieval in federated search environments

  • Luo Si
  • Jamie Callan
  • Suleyman Cetintas
  • Hao Yuan
Article

Abstract

Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) (http://www.clef-campaign.org/) data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).

Keywords

Results merging Federated search Multilingual information retrieval 

References

  1. Aslam, J. A., & Montague, M. (2001). Models for metasearch. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘01, New Orleans, Louisiana, United States (pp. 276–284). New York, NY: ACM.Google Scholar
  2. Ballesteros, L., & Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. In N. J. Belkin, A. D. Narasimhalu, P. Willett, & W. Hersh (Eds.), Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘97, Philadelphia, Pennsylvania, United States, July 27–31, 1997 (pp. 84–91). New York, NY: ACM.Google Scholar
  3. Brown, P. F, Pietra, D., Pietra, D, & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19, 263–312.Google Scholar
  4. Callan, J., & Connell, M. (2001). Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2), 97–130.CrossRefGoogle Scholar
  5. CalIan, J. P., Croft, W. B., & Harding, S. M. (1992). The INQUERY retrieval system. In Proceedings of the Third International Conference on Database and Expert Systems Applications, Valencia, Spain (pp. 78–83). Springer-Verlag.Google Scholar
  6. Chen A., & Gey, F. C. (2003). Combining query translation and document translation in cross-language retrieval. In C. Peters, J. Gonzalo, M. Braschler, et al. (Eds.), 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Lecture notes in Computer Science, Trondheim, Norway (pp. 108–121). Springer-Verlag.Google Scholar
  7. Hull, D. A., & Grefenstette, G. (1996). Query across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘96, Zurich, Switzerland, August 18–22, 1996 (pp. 49–57). New York, NY: ACM.Google Scholar
  8. Jones, G. J. F., Burke, M., Judge, J., Khasin, A., Lam-Adesina, A. M., & Wagner, J. (2005). Dublin City University at CLEF 2004: Experiments in monolingual, bilingual and multilingual retrieval. In CLEF (pp. 207–220).Google Scholar
  9. Kamps, J., Monz, C., de Rijke, M., & Sigurbjörnsson, B. (2003). The University of Amsterdam at CLEF-2003. In Results of the CLEF 2003 Cross-Language System Evaluation Campaign, Trondheim, Norway (pp. 71–78).Google Scholar
  10. Lee, J. H. (1997). Analyses of multiple evidence combination. In N. J. Belkin, A. D. Narasimhalu, P. Willett, & W. Hersh (Eds.), Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘97, Philadelphia, Pennsylvania, United States, July 27–31, 1997 (pp. 267–276). New York, NY: ACM.Google Scholar
  11. Levow, G. A., Oard, D. W., & Resnik, P. (2004). Dictionary-based cross-language retrieval. Information Processing and Management, 41, 523–547.CrossRefGoogle Scholar
  12. Martínez-Santiago, F., Martin, M., & Ureña, A. (2002). SINAI on CLEF 2002: Experiments with merging strategies. In C. Peters (Ed.), Results of the cross-language evaluation forum—CLEF 2002 (pp. 187–196).Google Scholar
  13. Oard, D., & Diekema, A. (1998). Cross-language information retrieval. In M. Williams (Ed.), Annual review of information science (pp. 223–256).Google Scholar
  14. Och, F. J., & Ney, H. (2000). Improved statistical alignment models. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Annual Meeting of the ACL, Hong Kong, October 03–06, 2000 (pp. 440–447). Morristown, NJ: Association for Computational Linguistics.Google Scholar
  15. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In W. B. Croft & C. J. van Rijsbergen (Eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 03–06, 1994 (pp. 232–241). New York, NY: Springer-Verlag New York.Google Scholar
  16. Rogati, M., & Yang, Y. M. (2003). CONTROL: CLEF-2003 with open, transparent resources off-line. Experiments with merging strategies. In C. Peters (Ed.), Results of the cross-language evaluation forum-CLEF.Google Scholar
  17. Savoy, J. (2002). Report on CLEF 2002 experiments: Combining multiple sources of evidence. In C. Peters et al. (Eds.), Advances in cross-language information retrieval, LNCS (Vol. 2785, pp. 66–90). Berlin: Springer-Verlag.Google Scholar
  18. Savoy, J. (2003). Report on CLEF-2003 multilingual tracks. In: Procedings of CLEF 2003, Trondheim, Norway (pp. 7–12).Google Scholar
  19. Si, L., & Callan, J. (2003). A semi-supervised learning method to merge search engine results. ACM Transactions on Information Systems, 24(4), 457–491.CrossRefGoogle Scholar
  20. Si, L., & Callan, J. (2005). CLEF2005: Multilingual retrieval by combining multiple multilingual ranked lists. In C. Peters (Ed.), Results of the cross-language evaluation forum-CLEF 2005.Google Scholar
  21. Turtle, H. (1990). Inference networks for document retrieval. Technical Report COINS Report 90-7, Computer and Information Science Department, University of Massachusetts, Amherst.Google Scholar
  22. Xu, J., Weischedel, R., & Nguyen, C. (2001). Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘01, New Orleans, Louisiana, United States (pp. 105–110). New York, NY: ACM.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Luo Si
    • 1
  • Jamie Callan
    • 2
  • Suleyman Cetintas
    • 1
  • Hao Yuan
    • 1
  1. 1.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  2. 2.Language Technology Inst, School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations