Skip to main content

A Score Fusion Method Using a Mixture Copula

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9828))

Included in the following conference series:

Abstract

In this paper, we propose a score fusion method using a mixture copula that can consider complex dependencies between multiple relevance scores in order to improve the effectiveness of information retrieval. The combination of multiple relevance scores has been shown to be effective in comparison with a single score. Widely used score fusion methods are linear combination and learning to rank. Linear combination cannot capture the non-linear dependency of multiple scores. Learning to rank yields output that makes it difficult to understand the models. These problems can be solved by using a copula, which is a statistical framework, because it can capture the non-linear dependency and also provide an interpretable reason for the model. Although some studies apply copulas to score fusion and demonstrate the effectiveness, their methods employ a unimodal copula, thus making it difficult to capture complex dependencies. Therefore, we introduce a new score fusion method that uses a mixture copula to handle the complicated dependencies of scores; then, we evaluate the accuracy of our proposed method. Experiments on ClueWeb’09, a large-scale document set, show that in some cases, our proposed method significantly outperforms linear combination and others existing methods that use a unimodal copula.

T. Komatsuda—This work was conducted while the author was at Tokyo Institute of Technology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://lemurproject.org/clueweb09/.

References

  1. Aslam, J.A., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: Proceedings of SIGIR, pp. 379–381 (2000)

    Google Scholar 

  2. Bordogna, G., Pasi, G.: A model for a SOft fusion of information accesses on the web. Fuzzy Sets Syst. 148(1), 105–118 (2004)

    Article  MathSciNet  Google Scholar 

  3. Borlund, P.: The concept of relevance in IR. J. Am. Soc. Inform. Sci. Technol. 54(10), 913–925 (2003)

    Article  Google Scholar 

  4. Bouchaud, J.P., Potters, M.: Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  5. Breymann, W., Dias, A., Embrechts, P.: Dependence structures for multivariate high-frequency data in finance (2003)

    Google Scholar 

  6. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML, pp. 89–96 (2005)

    Google Scholar 

  7. Chen, K., Lu, R., Wong, C., Sun, G., Heck, L., Tseng, B.: Trada: tree based ranking function adaptation. In: Proceedings of CIKM, pp. 1143–1152 (2008)

    Google Scholar 

  8. Choroś, B., Ibragimov, R., Permiakova, E.: Copula estimation. In: Jaworski, P., Durante, F., Härdle, W.K., Rychlik, T. (eds.) Copula Theory and Its Applications. Lecture Notes in Statistics, vol. 198, pp. 77–91. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: a new aggregation criterion. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 264–275. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Craswell, N., Robertson, S., Zaragoza, H., Taylor, M.: Relevance weighting for query independent evidence. In: Proceedings of SIGIR, pp. 416–423 (2005)

    Google Scholar 

  11. Cummins, R.: Measuring the ability of score distributions to model relevance. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 25–36. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Diday, E., Schroeder, A., Ok, Y.: The dynamic clusters method in pattern recognition. In: IFIP Congress, pp. 691–697 (1974)

    Google Scholar 

  13. Eickhoff, C., Serdyukov, P., De Vries, A.P.: A combined topical/non-topical approach to identifying web sites for children. In: Proceedings of WSDM, pp. 505–514 (2011)

    Google Scholar 

  14. Eickhoff, C., de Vries, A.P.: Modelling complex relevance spaces with copulas. In: Proceedings of CIKM, pp. 1831–1834 (2014)

    Google Scholar 

  15. Eickhoff, C., de Vries, A.P., Collins-Thompson, K.: Copulas for information retrieval. In: Proceedings of SIGIR, pp. 663–672 (2013)

    Google Scholar 

  16. Eickhoff, C., de Vries, A.P., Hofmann, T.: Modelling term dependence with copulas. In: Proceedings of SIGIR, pp. 783–786 (2015)

    Google Scholar 

  17. Embrechts, P., Lindskog, F., McNeil, A.: Modelling dependence with copulas and applications to risk management. In: Rachev, S. (ed.) Handbook of Heavy Tailed Distributions in Finance, pp. 329–384. Elsevier, Amsterdam (2003)

    Chapter  Google Scholar 

  18. Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pp. 243–243 (1994)

    Google Scholar 

  19. Gerani, S., Zhai, C.X., Crestani, F.: Score transformation in linear combination for multi-criteria relevance ranking. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 256–267. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Kanoulas, E., Dai, K., Pavlu, V., Aslam, J.A.: Score distribution models: assumptions, intuition, and robustness to score manipulation. In: Proceedings of SIGIR, pp. 242–249 (2010)

    Google Scholar 

  21. Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)

    Article  Google Scholar 

  22. Mizzaro, S.: Relevance: the whole history. J. Am. Soc. Inform. Sci. Technol. 48(9), 810–832 (1997)

    Article  Google Scholar 

  23. Montague, M., Aslam, J.A.: Relevance score normalization for metasearch. In: Proceedings of CIKM, pp. 427–433 (2001)

    Google Scholar 

  24. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of CIKM, pp. 538–548 (2002)

    Google Scholar 

  25. Nelsen, R.B.: An Introduction to Copulas. Springer Series in Statistics. Springer, New York (2006)

    MATH  Google Scholar 

  26. Onken, A., Grünewälder, S., Munk, M.H., Obermayer, K.: Analyzing short-term noise dependencies of spike-counts in macaque prefrontal cortex using copulas and the flashlight transformation. PLoS Comput. Biol. 5(11), e1000577 (2009)

    Article  MathSciNet  Google Scholar 

  27. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR, pp. 275–281 (1998)

    Google Scholar 

  28. Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of SIGKDD, pp. 239–248 (2005)

    Google Scholar 

  29. Renard, B., Lang, M.: Use of a gaussian copula for multivariate extreme value analysis: some case studies in hydrology. Adv. Water Resour. 30(4), 897–912 (2007)

    Article  Google Scholar 

  30. Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)

    MATH  Google Scholar 

  31. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM, pp. 42–49 (2004)

    Google Scholar 

  32. Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inform. Sci. Technol. 27(3), 129–146 (1976)

    Article  Google Scholar 

  33. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3, pp. 109–109. NIST SPECIAL PUBLICATION SP (1995)

    Google Scholar 

  34. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  35. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  36. Saracevic, T.: The concept of relevance in information science: a historical review. In: Saracevic, T. (ed.) Introduction to Information Science, pp. 111–151. R.R. Bowker, New York (1970)

    Google Scholar 

  37. Saracevic, T.: Relevance reconsidered. In: Proceedings of CoLIS, vol. 2, pp. 201–218 (1996)

    Google Scholar 

  38. Schamber, L., Eisenberg, M.B., Nilan, M.S.: A re-examination of relevance: toward a dynamic, situational definition. Inf. Process. Manage. 26(6), 755–776 (1990)

    Article  Google Scholar 

  39. Schoelzel, C., Friederichs, P., et al.: Multivariate non-normally distributed random variables in climate research-introduction to the copula approach. Nonlin. Process. Geophys. 15(5), 761–772 (2008)

    Article  Google Scholar 

  40. Scott, A.J., Symons, M.J.: Clustering methods based on likelihood ratio criteria. Biometrics 27, 387–397 (1971)

    Article  Google Scholar 

  41. Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)

    Article  Google Scholar 

  42. Vrac, M., Billard, L., Diday, E., Chédin, A.: Copula analysis of mixture models. Comput. Stat. 27(3), 427–457 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  43. Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of CIKM, pp. 648–651 (2002)

    Google Scholar 

Download references

Acknowledgment

This work was supported by JSPS KAKENHI Grant Numbers 15H02701 and 15K20990.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuya Komatsuda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Komatsuda, T., Keyaki, A., Miyazaki, J. (2016). A Score Fusion Method Using a Mixture Copula. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44406-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44405-5

  • Online ISBN: 978-3-319-44406-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics