Abstract
It is the purpose of this paper to introduce a novel approach to clustering rank data on a set of possibly large cardinality n ∈ ℕ*, relying upon Fourier representation of functions defined on the symmetric group \(\mathfrak{S}_n\). In the present setup, covering a wide variety of practical situations, rank data are viewed as distributions on \(\mathfrak{S}_n\). Cluster analysis aims at segmenting data into homogeneous subgroups, hopefully very dissimilar in a certain sense. Whereas considering dissimilarity measures/distances between distributions on the non commutative group \(\mathfrak{S}_n\), in a coordinate manner by viewing it as embedded in the set [0,1]n! for instance, hardly yields interpretable results and leads to face obvious computational issues, evaluating the closeness of groups of permutations in the Fourier domain may be much easier in contrast. Indeed, in a wide variety of situations, a few well-chosen Fourier (matrix) coefficients may permit to approximate efficiently two distributions on \(\mathfrak{S}_n\) as well as their degree of dissimilarity, while describing global properties in an interpretable fashion. Following in the footsteps of recent advances in automatic feature selection in the context of unsupervised learning, we propose to cast the task of clustering rankings in terms of optimization of a criterion that can be expressed in the Fourier domain in a simple manner. The effectiveness of the method proposed is illustrated by numerical experiments based on artificial and real data.
Chapter PDF
Similar content being viewed by others
References
Clémençon, S., Jakubowicz, J.: Kantorovich distances between rankings with applications to rank aggregation. In: Proceedings of ECML 2010 (2010)
Crammer, K., Singer, Y.: Pranking with ranking. In: NIPS (2001)
Clémençon, S., Vayatis, N.: Tree-based ranking methods. IEEE Transactions on Information Theory 55(9), 4316–4336 (2009)
desJardins, M., Eaton, E., Wagstaff, K.: Learning user preferences for sets of objects. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 273–280. Springer, Heidelberg (2007)
Diaconis, P.: Group representations in probability and statistics. Institute of Mathematical Statistics, Hayward (1988)
Diaconis, P.: A generalization of spectral analysis with application to ranked data. The Annals of Statistics 17(3), 949–979 (1989)
Donoho, D., Stark, P.: Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49(3), 906–931 (1989)
Freund, Y., Iyer, R.D., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. JMLR 4, 933–969 (2003)
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. JRSS 66(4), 815–849 (2004)
Fligner, M.A., Verducci, J.S.: Distance based ranking models. JRSS Series B (Methodological) 48(3), 359–369 (1986)
Fligner, M.A., Verducci, J.S.: Multistage ranking models. JASA 83(403), 892–901 (1988)
Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artificial Intelligence 172, 1897–1917 (2008)
Huang, J., Guestrin, C.: Riffled independence for ranked data. In: Proceedings of NIPS 2009 (2009)
Huang, J., Guestrin, C., Guibas, L.: Fourier theoretic probabilistic inference over permutations. JMLR 10, 997–1070 (2009)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn., pp. 520–528. Springer, Heidelberg (2009)
Körner, T.: Fourier Analysis. Cambridge University Press, Cambridge (1989)
Kondor, R., Barbosa, M.: Ranking with kernels in Fourier space. In: Proceedings of COLT 2010 (2010)
Kahane, J.P., Lemarié-Rieusset, P.G.: Fourier series and wavelets. Routledge, New York (1995)
R. Kondor. \(\mathbb{S}_n\)ob: a C++ library for fast Fourier transforms on the symmetric group (2006), http://www.its.caltech.edu/~risi/Snob/
Lebanon, G., Lafferty, J.: Conditional models on the ranking poset. In: Proceedings of NIPS 2003 (2003)
Lebanon, G., Mao, Y.: Non-parametric modeling of partially ranked data. JMLR 9, 2401–2429 (2008)
Mallows, C.L.: Non-null ranking models. Biometrika 44(1-2), 114–130 (1957)
Mandhani, B., Meila, M.: Tractable search for learning exponential models of rankings. In: Proceedings of AISTATS 2009 (2009)
Meila, M., Phadnis, K., Patterson, A., Bilmes, J.: Consensus ranking under the exponential model. Proceedings of UAI 2007, 729–734 (2007)
Matolcsi, T., Szücs, J.: Intersection des mesures spectrales conjuguées. CR Acad. Sci. S r. I Math. (277), 841–843 (1973)
Murnaghan, F.D.: The Theory of Group Representations. The Johns Hopkins Press, Baltimore (1938)
Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., Salakoski, T.: Learning to rank with pairwise regularized least-squares. In: Proceedings of SIGIR 2007, pp. 27–33 (2007)
Richard, E., Baskiotis, N., Evgeniou, T., Vayatis, N.: Link discovery using graph feature tracking. In: NIPS 2010, pp. 1966–1974 (2010)
Howard, A., Kondor, R., Jebara, T.: Multi-object tracking with representations of the symmetric group. In: Proceedings og ICML 2007 (2007)
Rattan, A., Sniady, P.: Upper bound on the characters of the symmetric groups for balanced Young diagrams and a generalized Frobenius formula. Adv. in Math. 218(3), 673–695 (2008)
Serre, J.P.: Algebraic groups and class fields. Springer, Heidelberg (1988)
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Royal Stat. Soc. 63(2), 411–423 (2001)
Witten, D.M., Tibshirani, R.: A framework for feature selection in clustering. JASA 105(490), 713–726 (2010)
Wünsch, D., Xu, R.: Clustering. IEEE Press, Wiley (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Clémençon, S., Gaudel, R., Jakubowicz, J. (2011). Clustering Rankings in the Fourier Domain. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)