Abstract
In this paper, we propose a score fusion method using a mixture copula that can consider complex dependencies between multiple relevance scores in order to improve the effectiveness of information retrieval. The combination of multiple relevance scores has been shown to be effective in comparison with a single score. Widely used score fusion methods are linear combination and learning to rank. Linear combination cannot capture the non-linear dependency of multiple scores. Learning to rank yields output that makes it difficult to understand the models. These problems can be solved by using a copula, which is a statistical framework, because it can capture the non-linear dependency and also provide an interpretable reason for the model. Although some studies apply copulas to score fusion and demonstrate the effectiveness, their methods employ a unimodal copula, thus making it difficult to capture complex dependencies. Therefore, we introduce a new score fusion method that uses a mixture copula to handle the complicated dependencies of scores; then, we evaluate the accuracy of our proposed method. Experiments on ClueWeb’09, a large-scale document set, show that in some cases, our proposed method significantly outperforms linear combination and others existing methods that use a unimodal copula.
T. Komatsuda—This work was conducted while the author was at Tokyo Institute of Technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aslam, J.A., Montague, M.: Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In: Proceedings of SIGIR, pp. 379–381 (2000)
Bordogna, G., Pasi, G.: A model for a SOft fusion of information accesses on the web. Fuzzy Sets Syst. 148(1), 105–118 (2004)
Borlund, P.: The concept of relevance in IR. J. Am. Soc. Inform. Sci. Technol. 54(10), 913–925 (2003)
Bouchaud, J.P., Potters, M.: Theory of Financial Risk and Derivative Pricing: From Statistical Physics to Risk Management. Cambridge University Press, Cambridge (2003)
Breymann, W., Dias, A., Embrechts, P.: Dependence structures for multivariate high-frequency data in finance (2003)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of ICML, pp. 89–96 (2005)
Chen, K., Lu, R., Wong, C., Sun, G., Heck, L., Tseng, B.: Trada: tree based ranking function adaptation. In: Proceedings of CIKM, pp. 1143–1152 (2008)
Choroś, B., Ibragimov, R., Permiakova, E.: Copula estimation. In: Jaworski, P., Durante, F., Härdle, W.K., Rychlik, T. (eds.) Copula Theory and Its Applications. Lecture Notes in Statistics, vol. 198, pp. 77–91. Springer, Heidelberg (2010)
da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: a new aggregation criterion. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 264–275. Springer, Heidelberg (2009)
Craswell, N., Robertson, S., Zaragoza, H., Taylor, M.: Relevance weighting for query independent evidence. In: Proceedings of SIGIR, pp. 416–423 (2005)
Cummins, R.: Measuring the ability of score distributions to model relevance. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds.) AIRS 2011. LNCS, vol. 7097, pp. 25–36. Springer, Heidelberg (2011)
Diday, E., Schroeder, A., Ok, Y.: The dynamic clusters method in pattern recognition. In: IFIP Congress, pp. 691–697 (1974)
Eickhoff, C., Serdyukov, P., De Vries, A.P.: A combined topical/non-topical approach to identifying web sites for children. In: Proceedings of WSDM, pp. 505–514 (2011)
Eickhoff, C., de Vries, A.P.: Modelling complex relevance spaces with copulas. In: Proceedings of CIKM, pp. 1831–1834 (2014)
Eickhoff, C., de Vries, A.P., Collins-Thompson, K.: Copulas for information retrieval. In: Proceedings of SIGIR, pp. 663–672 (2013)
Eickhoff, C., de Vries, A.P., Hofmann, T.: Modelling term dependence with copulas. In: Proceedings of SIGIR, pp. 783–786 (2015)
Embrechts, P., Lindskog, F., McNeil, A.: Modelling dependence with copulas and applications to risk management. In: Rachev, S. (ed.) Handbook of Heavy Tailed Distributions in Finance, pp. 329–384. Elsevier, Amsterdam (2003)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pp. 243–243 (1994)
Gerani, S., Zhai, C.X., Crestani, F.: Score transformation in linear combination for multi-criteria relevance ranking. In: Baeza-Yates, R., de Vries, A.P., Zaragoza, H., Cambazoglu, B.B., Murdock, V., Lempel, R., Silvestri, F. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 256–267. Springer, Heidelberg (2012)
Kanoulas, E., Dai, K., Pavlu, V., Aslam, J.A.: Score distribution models: assumptions, intuition, and robustness to score manipulation. In: Proceedings of SIGIR, pp. 242–249 (2010)
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Mizzaro, S.: Relevance: the whole history. J. Am. Soc. Inform. Sci. Technol. 48(9), 810–832 (1997)
Montague, M., Aslam, J.A.: Relevance score normalization for metasearch. In: Proceedings of CIKM, pp. 427–433 (2001)
Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of CIKM, pp. 538–548 (2002)
Nelsen, R.B.: An Introduction to Copulas. Springer Series in Statistics. Springer, New York (2006)
Onken, A., Grünewälder, S., Munk, M.H., Obermayer, K.: Analyzing short-term noise dependencies of spike-counts in macaque prefrontal cortex using copulas and the flashlight transformation. PLoS Comput. Biol. 5(11), e1000577 (2009)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR, pp. 275–281 (1998)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of SIGKDD, pp. 239–248 (2005)
Renard, B., Lang, M.: Use of a gaussian copula for multivariate extreme value analysis: some case studies in hydrology. Adv. Water Resour. 30(4), 897–912 (2007)
Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of CIKM, pp. 42–49 (2004)
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inform. Sci. Technol. 27(3), 129–146 (1976)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at TREC-3, pp. 109–109. NIST SPECIAL PUBLICATION SP (1995)
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Saracevic, T.: The concept of relevance in information science: a historical review. In: Saracevic, T. (ed.) Introduction to Information Science, pp. 111–151. R.R. Bowker, New York (1970)
Saracevic, T.: Relevance reconsidered. In: Proceedings of CoLIS, vol. 2, pp. 201–218 (1996)
Schamber, L., Eisenberg, M.B., Nilan, M.S.: A re-examination of relevance: toward a dynamic, situational definition. Inf. Process. Manage. 26(6), 755–776 (1990)
Schoelzel, C., Friederichs, P., et al.: Multivariate non-normally distributed random variables in climate research-introduction to the copula approach. Nonlin. Process. Geophys. 15(5), 761–772 (2008)
Scott, A.J., Symons, M.J.: Clustering methods based on likelihood ratio criteria. Biometrics 27, 387–397 (1971)
Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)
Vrac, M., Billard, L., Diday, E., Chédin, A.: Copula analysis of mixture models. Comput. Stat. 27(3), 427–457 (2012)
Wu, S., Crestani, F.: Data fusion with estimated weights. In: Proceedings of CIKM, pp. 648–651 (2002)
Acknowledgment
This work was supported by JSPS KAKENHI Grant Numbers 15H02701 and 15K20990.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Komatsuda, T., Keyaki, A., Miyazaki, J. (2016). A Score Fusion Method Using a Mixture Copula. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9828. Springer, Cham. https://doi.org/10.1007/978-3-319-44406-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-44406-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44405-5
Online ISBN: 978-3-319-44406-2
eBook Packages: Computer ScienceComputer Science (R0)