Abstract
Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work in the classification community that uses the typical properties of rankings. We review the broader literature that we are aware of, and identify a common building block for both prediction of rankings and clustering of rankings, which is also valid for partial rankings. This building block is the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one (partial) ranking into another. The Kemeny distance is equivalent to Kendall’s τ for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of τ. For clustering, we use the flexible class of methods proposed by Ben-Israel and Iyigun (Journal of Classification 25: 5–26, 2008), and define the disparity between a ranking and the center of cluster as the Kemeny distance. For prediction, we build a prediction tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of all within-node Kemeny distances. The median ranking characterizes subgroups in both cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barthelémy, J. P., Guénoche, A., & Hudry, O. (1989). Median linear orders: Heuristics and a branch and bound algorithm. European Journal of Operational Research, 42, 313–325.
Ben-Israel, A., & Iyigun, C. (2008). Probabilistic distance clustering. Journal of Classification, 25, 5–26.
Böckenholt, U. (1992). Thurstonian representation for partial ranking data. British Journal of Mathematical and Statistical Psychology, 45, 31–49.
Böckenholt, U. (2001). Mixed-effects analysis of rank-ordered data. Psychometrika, 77, 45–62.
Bradley, R. A., & Terry, M. A. (1952). Rank analysis of incomplete block designs, I. Biometrika, 39, 324–345.
Brady, H. E. (1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54, 181–202.
Breiman, L., Froedman, J.H., Olshen, R.A., & Stone, C.J. (1984). Classification and regression trees. Wadsworth Publishing Co., Inc, Belmont, CA.
Busing, F. M. T. A. (2009). Some advances in multidimensional unfolding. Doctoral Dissertation, Leiden, The Netherlands: Leiden University.
Busing, F. M. T. A., Groenen, P., & Heiser, W. J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Psychometrika, 70, 71–98.
Busing, F. M. T. A., Heiser, W. J., & Cleaver, G. (2010). Restricted unfolding: Preference analysis with optimal transformations of preferences and attributes. Food Quality and Preference, 21, 82–92.
Cappelli, C., Mola, F., & Siciliano, R. (2002). A statistical approach to growing a reliable honest tree. Computational Statistics and Data Analysis, 38, 285–299.
Carroll, J. D. (1972). Individual differences and multidimensional scaling. In R. N. Shepard et al. (Eds.), Multidimensional scaling, Vol. I theory (pp. 105–155). New York: Seminar Press.
Chan, W., & Bentler, P. M. (1998). Covariance structure analysis of ordinal ipsative data. Psychometrika, 63, 369–399.
Chapman, R. G., & Staelin, R. (1982). Exploiting rank ordered choice set data within the stochastic utility model. Journal of Marketing Research, 19, 288–301.
Cheng, W., Hühn, J., & Hüllermeier, E. (2009). Decision tree and instance-based learning for label ranking. In: Proceedings of the 26th international conference on machine learning (pp. 161–168). Montreal. Canada.
Cohen, A., & Mellows, C. L. (1980). Analysis of ranking data (Tech. Rep.). Murray Hill: Bell Telephone Laboratories.
Cook, W. D. (2006). Distance-based and ad hoc consensus models in ordinal preference ranking. European Journal of Operational Research, 172, 369–385.
Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145–158.
Coombs, C. H. (1964). A theory of data. New York: Wiley.
Critchlow, D. E., & Fligner, M. A. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on GLIM. Psychometrika, 56, 517–533.
Critchlow, D. E., Fligner, M. A., & Verducci, J. S. (1991). Probability models on rankings. Journal of Mathematical Psychology, 35, 294–318.
Croon, M. A. (1989). Latent class models for the analysis of rankings. In G. De Soete et al. (Eds.) New developments in psychological choice modeling (pp. 99–121). North-Holland, Elsevier.
D’ambrosio, A. (2007). Tree-based methods for data editing and preference rankings. Doctoral dissertation. Naples, Italy: Department of Mathematics and Statistics.
D’ambrosio, A., & Heiser, W. J. (2011). Distance-based multivariate trees for rankings. Technical report.
Daniels, H. E. (1950). Rank correlation and population models. Journal of the Royal Statistical Society, Series B, 12, 171–191.
Diaconis, P. (1989). A generalization of spectral analysis with application to ranked data. The Annals of Statistics, 17, 949–979.
Dittrich, R., Katzenbeisser, W., & Reisinger, H. (2000). The analysis of rank ordered preference data based on Bradley-Terry type models. OR-Spektrum, 22, 117–134.
Emond, E. J., & Mason, D. W. (2002). A new rank correlation coefficient with application to the consensus ranking problem. Journal of Multi-Criteria Decision Analysis, 11, 17–28.
Fligner, M. A., & Verducci, J. S. (1986). Distance based ranking models. Journal of the Royal Statistical Society, Series B, 48, 359–369.
Fligner, M. A., & Verducci, J. S. (1988). Multistage ranking models. Journal of the American Statistical Association, 83, 892–901.
Francis, B., Dittrich, R., Hatzinger, R., & Penn, R. (2002). Analysing partial ranks by using smoothed paired comparison methods: An investigation of value orientation in Europe. Applied Statistics, 51, 319–336.
Fürnkranz, J., & Hüllermeier, E. (Eds.). (2010). Preference learning. Heidelberg: Springer.
Gormley, I. C., & Murphy, T. B. (2008a). Exploring voting blocs within the Irish electorate: A mixture modeling approach. Journal of the American Statistical Association, 103, 1014–1027.
Gormley, I. C., & Murphy, T. B. (2008b). A mixture of experts model for rank data with applications in election studies. The Annals of Applied Statistics, 2, 1452–1477.
Guttman, L. (1946). An approach for quantifying paired comparisons and rank order. Annals of Mathematical Statistics, 17, 144–163.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Heiser, W. J. (2004). Geometric representation of association between categories. Psychometrika, 69, 513–546.
Heiser, W. J., & Busing, F. M. T. A. (2004). Multidimensional scaling and unfolding of symmetric and asymmetric proximity relations. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 25–48). Thousand Oaks: Sage.
Heiser, W. J., & D’ambrosio, A. (2011). K-Median cluster component analysis. Technical report.
Heiser, W. J., & De Leeuw, J. (1981). Multidimensional mapping of preference data. Mathématiques et Sciences Humaines, 19, 39–96.
Hojo, H. (1997). A marginalization model for the multidimensional unfolding analysis of ranking data. Japanese Psychological Research, 39, 33–42.
Hojo, H. (1998). Multidimensional unfolding analysis of ranking data for groups. Japanese Psychological Research, 40, 166–171.
Iyigun, C., & Ben-Israel, A. (2008). Probabilistic distance clustering adjusted for cluster size. Probability in the Engineering and Informational Sciences, 22, 603–621.
Iyigun, C., & Ben-Israel, A. (2010). Semi-supervised probabilistic distance clustering and the uncertainty of classification. In A. Fink et al. (Eds.), Advances in data analysis, data handling and business intelligence (pp. 3–20). Heidelberg: Springer.
Kamakura, W. A., & Srivastava, R. K. (1986). An ideal-point probabilistic choice model for heterogeneous preferences. Marketing Science, 5, 199–218.
Kamiya, H., & Takemura, A. (1997). On rankings generated by pairwise linear discriminant analysis of m populations. Journal of Multivariate Analysis, 61, 1–28.
Kamiya, H., & Takemura, A. (2005). Characterization of rankings generated by linear discriminant analysis. Journal of Multivariate Analysis, 92, 343–358.
Kamiya, H., Orlik, P., Takemura, A., & Terao, H. (2006). Arrangements and ranking patterns. Annals of Combinatorics, 10, 219–235.
Kamiya, H., Takemura, A., & Terao, H. (2011). Ranking patterns of unfolding models of codimension one. Advances in Applied Mathematics, 47, 379–400.
Kemeny, J. G. (1959). Mathematics without numbers. Daedalus, 88, 577–591.
Kemeny, J. G., & Snell, J. L. (1962). Preference rankings: An axiomatic approach. In J. G. Kemeny & J. L. Snell (Eds.), Mathematical models in the social sciences (pp. 9–23). New York: Blaisdell.
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30, 81–93.
Kendall, M. G. (1948). Rank correlation methods. London: Charles Griffin.
Kruskal, W. (1958). Ordinal measures of association. Journal of the American Statistical Association, 53, 814–861.
Kruskal, J. B., & Carroll, J. D. (1969). Geometrical models and badness-of-fit functions. In P. R. Krishnaiah (Ed.), Multivariate analysis (Vol. 2, pp. 639–671). New York: Academic.
Luce, R. D. (1959). Individual choice behavior. New York: Wiley.
Mallows, C. L. (1957). Non-null ranking models, I. Biometrika, 44, 114–130.
Marden, J. I. (1995). Analyzing and modeling rank data. New York: Chapman & Hall.
Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64, 325–340.
Meulman, J. J., Van Der Kooij, A. J., & Heiser, W. J. (2004). Principal components analysis with nonlinear optimal scaling transformations for ordinal and nominal data. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 49–70). Thousand Oaks: Sage.
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.
Morgan, K. O., & Morgan, S. (2010). State rankings 2010: A statistical view of America. Washington, DC: CQ Press.
Murphy, T. B., & Martin, D. (2003). Mixtures of distance-based models for ranking data. Computational Statistics and Data Analysis, 41, 645–655.
Roskam, Ed. E. C. I. (1968). Metric analysis of ordinal data in psychology: Models and numerical methods for metric analysis of conjoint ordinal data in psychology. Doctoral dissertation, Voorschoten, The Netherlands: VAM.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.
Skrondal, A., & Rabe-Hesketh, S. (2003). Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, 267–287.
Slater, P. (1960). The analysis of personal preferences. British Journal of Statistical Psychology, 13, 119–135.
Thompson, G. L. (1993). Generalized permutation polytopes and exploratory graphical methods for ranked data. The Annals of Statistics, 21, 1401–1430.
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.
Thurstone, L. L. (1931). Rank order as a psychophysical method. Journal of Experimental Psychology, 14, 187–201.
Tucker, L. R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 155–167). New York: Wiley.
Van Blokland-Vogelesang, A. W. (1989). Unfolding and consensus ranking: A prestige ladder for technical occupations. In G. De Soete et al. (Eds.), New developments in psychological choice modeling (pp. 237–258). The Netherlands\North-Holland: Amsterdam.
van Buuren, S., & Heiser, W. J. (1989). Clustering n objects into k groups under optimal scaling of variables. Psychometrika, 54, 699–706.
Van Deun, K. (2005). Degeneracies in multidimensional unfolding. Doctoral dissertation, Leuven, Belgium: Catholic University of Leuven.
Yao, G., & Böckenholt, U. (1999). Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler. British Journal of Mathematical and Statistical Psychology, 52, 79–92.
Zhang, J. (2004). Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron. Journal of Mathematical Psychology, 48, 107–134.
Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327–350.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Heiser, W.J., D’Ambrosio, A. (2013). Clustering and Prediction of Rankings Within a Kemeny Distance Framework. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-00035-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)