Skip to main content
Log in

Rank aggregation using latent-scale distance-based models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Rank aggregation aims at combining rankings of a set of items assigned by a sample of rankers to generate a consensus ranking. A typical solution is to adopt a distance-based approach to minimize the sum of the distances to the observed rankings. However, this simple sum may not be appropriate when the quality of rankers varies. This happens when rankers with different backgrounds may have different cognitive levels of examining the items. In this paper, we develop a new distance-based model by allowing different weights for different rankers. Under this model, the weight associated with a ranker is used to measure his/her cognitive level of ranking of the items, and these weights are unobserved and exponentially distributed. Maximum likelihood method is used for model estimation. Extensions to the cases of incomplete rankings and mixture modeling are also discussed. Empirical applications demonstrate that the proposed model produces better rank aggregation than those generated by Borda and the unweighted distance-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. A distance \(d(\pi ,\sigma )\) between two rankings \(\pi \) and \(\sigma \) is said to be right invariant if and only if for any ranking \(\tau \), \(d(\pi ,\sigma )=d(\pi \circ \tau ,\sigma \circ \tau )\), where \(\pi \circ \tau (i)=\pi (\tau (i))\).

References

  • Aledo, J.A., Gámez, J.A., Molina, D.: Tackling the rank aggregation problem with evolutionary algorithms. Appl. Math. Comput. 222, 632–644 (2013)

    MathSciNet  MATH  Google Scholar 

  • Ali, A., Meilă, M.: Experiments with Kemeny ranking: what works when? Math. Soc. Sci. 64(1), 28–40 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Aslam, J.A., Montague, M.: Models for metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284 (2001)

  • Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, Berlin (2005)

    MATH  Google Scholar 

  • Busse, L.M., Orbanz, P., Buhmann, J.M.: Cluster analysis of heterogeneous rank data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 113–120 (2007)

  • Ceberio, J., Irurozki, E., Mendiburu, A., Lozano, J.A.: Extending distance-based ranking models in estimation of distribution algorithms. In: Evolutionary Computation (CEC), 2014 IEEE Congress on, pp. 2459–2466. IEEE (2014)

  • de Borda, J.: Mémoire sur les élections au scrutin. Mémoires de l’Académie Royale des Sciences Année, pp. 657–664 (1781)

  • DeConde, R.P., Hawley, S., Falcon, S., Clegg, N., Knudsen, B., Etzioni, R.: Combining results of microarray experiments: a rank aggregation approach. Stat. Appl. Genet. Mol. Biol. 5(1):Article 15 (2006)

  • Deng, K., Han, S., Li, K.J., Liu, J.S.: Bayesian aggregation of order-based rank data. J. Am. Stat. Assoc. 109(507), 1023–1039 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622 (2001)

  • Fligner, M.A., Verducci, J.S.: Distance-based ranking models. J. R. Stat. Soc. B 48(3), 359–369 (1986)

    MathSciNet  MATH  Google Scholar 

  • Fligner, M.A., Verducci, J.S.: Posterior probabilities for a consensus ordering. Psychometrika 55(1), 53–63 (1990)

    Article  MathSciNet  Google Scholar 

  • Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows model under the Ulam distance. Technical report, Department of Computer Science and Artificial Intelligence, University of the Basque Country (2014)

  • Irurozki, E., Calvo, B., Lozano, J.A.: Sampling and learning the Mallows and generalized Mallows models under the Cayley distance. Methodol. Comput. Appl. Probab. 20(1), 1–35 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., et al.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, M.D., Steyvers, M., Miller, B.: A cognitive model for aggregating people’s rankings. PLoS One 9(5), e96431 (2014)

    Article  Google Scholar 

  • Lin, S., Ding, J.: Integration of ranked lists via cross entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics 65(1), 9–18 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Mallows, C.L.: Non-null ranking models. I. Biometrika 44(1–2), 114–130 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  • Murphy, T.B., Martin, D.: Mixtures of distance-based models for ranking data. Comput. Stat. Data Anal. 41(3), 645–655 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Naume, B., Zhao, X., Synnestvedt, M., Borgen, E., Russnes, H.G., Lingjærde, O.C., Strømberg, M., Wiedswang, G., Kvalheim, G., Kåresen, R., et al.: Presence of bone marrow micrometastasis is associated with different recurrence risk within molecular subtypes of breast cancer. Mol. Oncol. 1(2), 160–171 (2007)

    Article  Google Scholar 

  • Niederreiter, H.: Quasi-Monte Carlo Methods. Wiley Online Library (2010)

  • Černý, V.: Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J. Optim. Theory Appl. 45(1), 41–51 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hang Xu.

Additional information

The research of Philip L.H. Yu was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. 17303515).

Appendix: Sampling from a general distance-based Model

Appendix: Sampling from a general distance-based Model

It is straightforward to simulate ranking data from the Kendall distance-based model, because of the nice decomposition of Kendall distance into a set of independent variables \(V_{i}\left( \pi _{s}\circ \pi _{0}^{-1}\right) \) (see Sect. 4.1) which can be sampled easily. The detailed algorithm can be found in Ceberio et al. (2014). Recently Irurozki et al. (2018, 2014) developed two different methods to sample from models using Cayley distance and Ulam distance. Their simulation methods require the knowledge of special properties of the distance measures and they may not be able to be generalized to other distances. Here we introduce a general method to sample from any distance-based model.

We need to sample from f(x) here. The method begins at \(s=0\) with the selection of \(X^{(0)}=x^{(0)}\) drawn at random from some stating distribution g, with the requirement that \(f\left( x^{(0)}\right) >0\). Given \(X^{(s)}=x^{(s)}\) the algorithm generates \(X^{(s+1)}\) as follows:

  1. 1.

    Sample a candidate value \(X^{*}\) from a proposal distribution \(g\left( \cdot \mid x^{(s)}\right) \).

  2. 2.

    Compute the Metropolis–Hastings ratio \(R_{MH}\left( x^{(s)},X^{*}\right) \), where

    $$\begin{aligned} R_{MH}\left( x^{(s)},X^{*}\right) =\frac{f\left( X^{*}\right) g\left( x^{(s)}\mid X^{*}\right) }{f\left( x^{(t)}\right) g\left( X^{*}\mid x^{(t)}\right) }. \end{aligned}$$
  3. 3.

    Sample a random value for \(X^{(s+1)}\) as follow:

    $$\begin{aligned} X^{(s+1)}={\left\{ \begin{array}{ll} X^{*} &{} \text{ with } \text{ probability }\ \min \left\{ 1,R_{MH}\left( x^{(s)},X^{*}\right) \right\} \\ x^{(s)} &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
  4. 4.

    Assign \(s:=s+1\) and go to step 1

Back to our problem, given \(\pi _{0}\), \(\lambda \), we need to sample from the distance-based model: \(f(\pi )=\frac{\exp \left[ -\lambda d(\pi ,\pi _{0})\right] }{C(\lambda )}\) is the distance function we choose. Note that we don’t need to calculate the normalizing constant \(C(\lambda )\) in the \(R_{MH}\left( x^{(s)},X^{*}\right) \) function because the \(C(\lambda )\) terms cancel. For the proposal distribution \(g(\cdot |\centerdot )\) here, we will introduce two proposal distribution and compare their difference.

The proposal distribution for the algorithm can be chosen as a symmetric distribution so that \(g(x^{*}\mid x^{(s)})=g(x^{(s)}\mid x^{*})\). In our case, our choice of \(g(x^{*}\mid x^{(s)})\) imposes small perturbation of the elements of \(x^{(s)}\). Given \(x^{(s)}\), we uniformly picked two elements of \(x^{(s)}\) and swap their positions as our proposal \(\pi ^{*}\). Since this swap distribution is symmetric, so the Hasting corrections in the Metropolis–Hastings ratio cancel, and the ratio is given as:

$$\begin{aligned} R_{MH}\left( x^{(s)},X^{*}\right) =\frac{\exp \left( -\lambda d\left( \pi ^{*},\pi _{0}\right) \right) }{\exp \left( -\lambda d\left( \pi ^{(s)},\pi _{0}\right) \right) } \end{aligned}$$

When we need to sample from the distance-based Latent-scale Models, we first sample \(\lambda _{k}\) from \(exp(\lambda )\). Then we sample \(\pi _{k}\) from simple distance-based model with \(\lambda _{k}\) using the method above.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, P.L.H., Xu, H. Rank aggregation using latent-scale distance-based models. Stat Comput 29, 335–349 (2019). https://doi.org/10.1007/s11222-018-9811-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9811-9

Keywords

Navigation