Joint Ranking for Multilingual Web Search

  • Wei Gao
  • Cheng Niu
  • Ming Zhou
  • Kam-Fai Wong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5478)


Ranking for multilingual information retrieval (MLIR) is a task to rank documents of different languages solely based on their relevancy to the query regardless of query’s language. Existing approaches are focused on combining relevance scores of different retrieval settings, but do not learn the ranking function directly. We approach Web MLIR ranking within the learning-to-rank (L2R) framework. Besides adopting popular L2R algorithms to MLIR, a joint ranking model is created to exploit the correlations among documents, and induce the joint relevance probability for all the documents. Using this method, the relevant documents of one language can be leveraged to improve the relevance estimation for documents of different languages. A probabilistic graphical model is trained for the joint relevance estimation. Especially, a hidden layer of nodes is introduced to represent the salient topics among the retrieved documents, and the ranks of the relevant documents and topics are determined collaboratively while the model approaching to its thermal equilibrium. Furthermore, the model parameters are trained under two settings: (1) optimize the accuracy of identifying relevant documents; (2) directly optimize information retrieval evaluation measures, such as mean average precision. Benchmarks show that our model significantly outperforms the existing approaches for MLIR tasks.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ackley, D.H., Hinton, G.E., Sejnowski, T.J.A.: Learning Algorithm for Boltzmann Machines. Cognitive Science 9, 147–169 (1985)CrossRefGoogle Scholar
  2. 2.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to Rank using Gradient Descent. In: Proc. of ICML, pp. 89–96 (2005)Google Scholar
  3. 3.
    Fox, E.A., Shaw, J.A.: Combination of Multiple Searches. In: Proc. TREC-2 (1994)Google Scholar
  4. 4.
    Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research 4, 933–969 (2004)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. In: Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)Google Scholar
  6. 6.
    Jaakkola, T.S.: Variational Methods for Inference and Estimation in Graphical Models. Ph.D. Thesis, MIT (1997)Google Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proc. of ACM SIGIR, pp. 41–48 (2000)Google Scholar
  8. 8.
    Ko, J., Luo, S., Nyberg, E.: A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering. In: Proc. of ACM SIGIR, pp. 343–350 (2007)Google Scholar
  9. 9.
    Kullback, S.: Information Theory and Statistics. John Wiley & Sons Press, NY (1959)zbMATHGoogle Scholar
  10. 10.
    Lin, W.-C., Chen, H.-H.: Merging Mechanisms in Multilingual Information Retrieval. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 175–186. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Liu, T.-Y., Xu, J., Qin, T., Xiong, W.Y., Li, H.: LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: Proc. of ACM Workshop on Learning to Rank for Information Retrieval, Amsterdam, The Netherland (2007)Google Scholar
  12. 12.
    Mathieu, B., Besancon, R., Fluhr, C.: Multilingual Document Clusters Discovery. In: Proc. of RIAO, pp. 1–10 (2004)Google Scholar
  13. 13.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art of Scientific Computing (§10.5). Cambridge University Press, Cambridge (1992)zbMATHGoogle Scholar
  14. 14.
    Robertson, S.E., Walker, S., Hancock-Beaulieu, M.M., Gatford, M.: OKAPI at TREC-3. In: Proc. of TREC-3, pp. 109–128 (1995)Google Scholar
  15. 15.
    Savoy, J., Berger, P.Y.: Selection and merging strategies for multilingual information retrieval. In: Peters, C., Clough, P., Gonzalo, J., Jones, G.J.F., Kluck, M., Magnini, B. (eds.) CLEF 2004. LNCS, vol. 3491, pp. 27–37. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Si, L., Callan, J.A.: Semi-supervised Learning Method to Merge Search Engine Results. ACM Transaction on Information Systems 21(4), 457–491 (2003)CrossRefGoogle Scholar
  17. 17.
    Si, L., Callan, J.A.: CLEF 2005: Multilingual retrieval by combining multiple multilingual ranked lists. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 121–130. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Tsai, M.-F., Wang, Y.-T., Chen, H.-H.: A Study of Learning a Merge Model for Multilingual Information Retrieval. In: Proc. of ACM SIGIR, pp. 195–202 (2008)Google Scholar
  19. 19.
    Walsh, B.: Markov Chain Monte Carlo and Gibbs Sampling. Lecture Notes for EEB 596z (2002),
  20. 20.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A Support Vector Method for Optimizing Average Precision. In: Proc. of ACM SIGIR, pp. 271–278 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wei Gao
    • 1
  • Cheng Niu
    • 2
  • Ming Zhou
    • 2
  • Kam-Fai Wong
    • 1
  1. 1.The Chinese University of Hong Kong, Shatin, N.T.Hong KongChina
  2. 2.Microsoft Research AsiaBeijingChina

Personalised recommendations