Skip to main content

Fractional Similarity: Cross-Lingual Feature Selection for Search

  • Conference paper
Advances in Information Retrieval (ECIR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6611))

Included in the following conference series:

Abstract

Training data as well as supplementary data such as usage-based click behavior may abound in one search market (i.e., a particular region, domain, or language) and be much scarcer in another market. Transfer methods attempt to improve performance in these resource-scarce markets by leveraging data across markets. However, differences in feature distributions across markets can change the optimal model. We introduce a method called Fractional Similarity, which uses query-based variance within a market to obtain more reliable estimates of feature deviations across markets. An empirical analysis demonstrates that using this scoring method as a feature selection criterion in cross-lingual transfer improves relevance ranking in the foreign language and compares favorably to a baseline based on KL divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML 2005, pp. 89–96. ACM, New York (2005)

    Google Scholar 

  2. Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: NIPS, pp. 193–200. MIT Press, Cambridge (2006)

    Google Scholar 

  3. Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Information Retrieval 13(3), 254–270 (2010)

    Article  Google Scholar 

  4. Gao, J., Qi, H., Xia, X., yun Nie, J.: Linear discriminant model for information retrieval. In: Proceedings of the 28th International ACM SIGIR Conference, pp. 290–297. ACM Press, New York (2005)

    Google Scholar 

  5. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: CIKM 2004: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM, New York (2004)

    Chapter  Google Scholar 

  6. Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: SIGIR 2006, pp. 19–26. ACM, New York (2006)

    Google Scholar 

  7. Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)

    Article  MATH  Google Scholar 

  8. Rose, D.E., Levinson, D.: Understanding user goals in web search. In: WWW 2004: Proceedings of the 13th International Conference on World Wide Web, pp. 13–19. ACM, New York (2004)

    Google Scholar 

  9. Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: CIKM 2004: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 118–126. ACM, New York (2004)

    Chapter  Google Scholar 

  10. Gao, W., Blitzer, J., Zhou, M.: Using english information in non-english web search. In: iNEWS 2008: Proceeding of the 2nd ACM Workshop on Improving Non English Web Searching, pp. 17–24. ACM, New York (2008)

    Chapter  Google Scholar 

  11. Chinnakotla, M.K., Raman, K., Bhattacharyya, P.: Multilingual pseudo-relevance feedback: performance study of assisting languages. In: ACL 2010: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp. 1346–1356 (2010)

    Google Scholar 

  12. Chinnakotla, M.K., Raman, K., Bhattacharyya, P.: Multilingual PRF: english lends a helping hand. In: SIGIR 2010, pp. 659–666. ACM, New York (2010)

    Google Scholar 

  13. Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., Zhou, H.: Model adaptation via model interpolation and boosting for web search ranking. In: EMNLP 2009: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp. 505–513 (2009)

    Google Scholar 

  14. Bai, J., Zhou, K., Xue, G., Zha, H., Sun, G., Tseng, B., Zheng, Z., Chang, Y.: Multi-task learning for learning to rank in web search. In: CIKM 2009, pp. 1549–1552. ACM, New York (2009)

    Google Scholar 

  15. Carterette, B., Pavlu, V., Kanoulas, E., Aslam, J.A., Allan, J.: Evaluation over thousands of queries. In: SIGIR 2008, pp. 651–658. ACM, New York (2008)

    Google Scholar 

  16. Mansour, Y., Mohri, M., Rostamizadeh, A.: Domain adaptation: Learning bounds and algorithms. CoRR abs/0902.3430 (2009)

    Google Scholar 

  17. Ben-david, S., Blitzer, J., Crammer, K., Sokolova, P.M.: Analysis of representations for domain adaptation. In: NIPS. MIT Press, Cambridge (2007)

    Google Scholar 

  18. Welch, B.L.: The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika 34(1/2), 28–35 (1947)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fisher Ronald, A.: Applications of “student’s” distribution. Metron 5, 90–104 (1925)

    Google Scholar 

  20. Parzen, E.: On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  21. Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: SIGIR 2000, pp. 41–48. ACM, New York (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jagarlamudi, J., Bennett, P.N. (2011). Fractional Similarity: Cross-Lingual Feature Selection for Search. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20161-5_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20160-8

  • Online ISBN: 978-3-642-20161-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics