Machine Learning

, Volume 107, Issue 4, pp 675–702 | Cite as

Robust Plackett–Luce model for k-ary crowdsourced preferences

  • Bo Han
  • Yuangang Pan
  • Ivor W. Tsang


The aggregation of k-ary preferences is an emerging ranking problem, which plays an important role in several aspects of our daily life, such as ordinal peer grading and online product recommendation. At the same time, crowdsourcing has become a trendy way to provide a plethora of k-ary preferences for this ranking problem, due to convenient platforms and low costs. However, k-ary preferences from crowdsourced workers are often noisy, which inevitably degenerates the performance of traditional aggregation models. To address this challenge, in this paper, we present a RObust PlAckett–Luce (ROPAL) model. Specifically, to ensure the robustness, ROPAL integrates the Plackett–Luce model with a denoising vector. Based on the Kendall-tau distance, this vector corrects k-ary crowdsourced preferences with a certain probability. In addition, we propose an online Bayesian inference to make ROPAL scalable to large-scale preferences. We conduct comprehensive experiments on simulated and real-world datasets. Empirical results on “massive synthetic” and “real-world” datasets show that ROPAL with online Bayesian inference achieves substantial improvements in robustness and noisy worker detection over current approaches.


Ranking k-Ary crowdsourced preferences Robust Plackett–Luce model Online Bayesian inference 



This work was supported in part by the Australian Research Council (ARC) Linkage Project under Grant No. LP150100671. Dr. Ivor W. Tsang is grateful for the support of ARC Future Fellowship FT130100746.


  1. Alfaro, L., & Shavlovsky, M. (2014). Crowdgrader: A tool for crowdsourcing the evaluation of homework assignments. In ACM technical symposium on computer science education (pp. 415–420). ACM.Google Scholar
  2. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.zbMATHGoogle Scholar
  3. Bradley, R., & Terry, M. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39, 324–345.MathSciNetzbMATHGoogle Scholar
  4. Chen, X., Bennett, P. N., Collins-Thompson, K., & Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In WSDM (pp. 193–202). ACM.Google Scholar
  5. Deng, J., Krause, J., Stark, M., & Fei-Fei, L. (2016). Leveraging the wisdom of the crowd for fine-grained recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 666–676.CrossRefGoogle Scholar
  6. Diaconis, P. (1988). Group representations in probability and statistics. Lecture Notes-Monograph Series (Vol. 11, pp. i–192). Hayward, CA: Institute of Mathematical Statistics.Google Scholar
  7. Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW (pp. 613–622). ACM.Google Scholar
  8. Freeman, S., & Parks, J. W. (2010). How accurate is peer grading? CBE-Life Sciences Education, 9(4), 482–488.CrossRefGoogle Scholar
  9. Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett–Luce ranking models. In ICML (pp. 377–384). ACM.Google Scholar
  10. Kazai, G., Kamps, J., Koolen, M., & Milic-Frayling, N. (2011). Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking. In SIGIR conference on research and development in information retrieval (pp. 205–214). ACM.Google Scholar
  11. Kendall, M. G. (1948). Rank correlation methods. London: Charles Griffin & Company Limited.Google Scholar
  12. Kolde, R., Laur, S., Adler, P., & Vilo, J. (2012). Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4), 573–580.CrossRefGoogle Scholar
  13. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L. J., Shamma, D. A., Bernstein, M. S., & Fei-Fei, L. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123, 1–42.Google Scholar
  14. Kulkarni, C., Wei, K., Le, H., Chia, D., Papadopoulos, K., Cheng, J., et al. (2015). Peer and self assessment in massive online classes. In Design thinking research (pp. 131–168). Berlin: Springer.Google Scholar
  15. Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends Information Retrieval, 3(3), 225–331.CrossRefGoogle Scholar
  16. Lu, T., & Boutilier, C. (2011). Learning Mallows models with pairwise preferences. In ICML (pp. 145–152).Google Scholar
  17. Lu, T., & Boutilier, C. (2014). Effective sampling and learning for Mallows models with pairwise-preference data. Journal of Machine Learning Research, 15(1), 3783–3829.MathSciNetzbMATHGoogle Scholar
  18. Luaces, O., Díez, J., Alonso-Betanzos, A., Troncoso, A., & Bahamonde, A. (2015). A factorization approach to evaluate open-response assignments in MOOCs using preference learning on peer assessments. Knowledge-Based Systems, 85, 322–328.CrossRefGoogle Scholar
  19. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley.zbMATHGoogle Scholar
  20. Lv, Y., Moon, T., Kolari, P., Zheng, Z., Wang, X., & Chang, Y. (2011). Learning to model relatedness for news recommendation. In WWW (pp. 57–66). ACM.Google Scholar
  21. Mallows, C. L. (1957). Non-null ranking models. I. Biometrika, 44(1/2), 114–130.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64(3), 325–340.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Mollica, C., & Tardella, L. (2017). Bayesian Plackett–Luce mixture models for partially ranked data. Psychometrika, 82(2):442–458. doi: 10.1007/s11336-016-9530-0.
  24. Ok, J., Oh, S., Shin, J., & Yi, Y. (2016). Optimality of belief propagation for crowdsourced classification. In ICML (pp. 535–544).Google Scholar
  25. Plackett, R. L. (1975). The analysis of permutations. Applied Statistics, 24, 193–202.Google Scholar
  26. Prpić, J., Melton, J., Taeihagh, A., & Anderson, T. (2015). MOOCs and crowdsourcing: Massive courses and massive resources.Google Scholar
  27. Raman, K., & Joachims, T. (2014). Methods for ordinal peer grading. In KDD (pp. 1037–1046). ACM.Google Scholar
  28. Schalekamp, F., & Zuylen, A. (2009). Rank aggregation: Together we’re strong. In Workshop on algorithm engineering and experiments (pp. 38–51). Society for Industrial and Applied Mathematics.Google Scholar
  29. Shah, N., Bradley, J., Parekh, A., Wainwright, M., & Ramchandran, K. (2013). A case for ordinal peer-evaluation in MOOCs. In NIPS workshop on data driven education.Google Scholar
  30. Sheng, V., Provost, F., & Ipeirotis, P. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In KDD (pp. 614–622). ACM.Google Scholar
  31. Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In EMNLP (pp. 254–263). Association for Computational Linguistics.Google Scholar
  32. Thurstone, L. (1927). The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 21(4), 384–400.CrossRefGoogle Scholar
  33. Venanzi, M., Guiver, J., Kazai, G., Kohli, P., & Shokouhi, M. (2014). Community-based Bayesian aggregation models for crowdsourcing. In WWW (pp. 155–164). ACM.Google Scholar
  34. Volkovs, M., Larochelle, H., & Zemel, R. (2012). Learning to rank by aggregating expert preferences. In CIKM (pp. 843–851). ACM.Google Scholar
  35. Vuurens, J., Vries, A., & Eickhoff, C. (2011). How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In SIGIR workshop on crowdsourcing for information retrieval (pp. 21–26).Google Scholar
  36. Wang, Y. S., Matsueda, R., & Erosheva, E. A. (2015). A variational EM method for mixed membership models with multivariate rank data: An analysis of public policy preferences. arXiv preprint arXiv:1512.08731
  37. Weng, R. C., & Lin, C. J. (2011). A Bayesian approximation method for online ranking. Journal of Machine Learning Research, 12(Jan), 267–300.MathSciNetzbMATHGoogle Scholar
  38. Yan, L., Dodier, R., Mozer, M., & Wolniewicz, R. (2003). Optimizing classifier performance via an approximation to the Wilcoxon–Mann–Whitney statistic. In ICML (pp. 848–855).Google Scholar
  39. Zhao, Z., Piech, P., & Xia, L. (2016). Learning mixtures of Plackett–Luce models. In ICML (pp. 2906–2914).Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Centre for Artificial Intelligence (CAI)University of Technology SydneySydneyAustralia

Personalised recommendations