Abstract
The matrix approximation approaches like Singular Value Decomposition SVD and Non-negative Matrix Tri-Factorization (NMTF) have recently been shown to be useful and effective to tackle the co-clustering problem. In this work, we embed the co-clustering in a Bistochastic Matrix Approximation (BMA) framework and we derive from the double kmeans objective function a new formulation of the criterion to optimize. First, we show that the double k-means is equivalent to algebraic problem of BMA under some suitable constraints. Secondly, we propose an iterative process seeking for the optimal simultaneous partitions of rows and columns data, the solution is given as the steady state of a markov chain process. We develop two iterative algorithms; the first consists in learning rows and columns similarities matrices and the second consists in obtaining the simultaneous rows and columns partitions. Numerical experiments on simulated and real datasets demonstrate the interest of our approach which does not require the knowledge of the number of co-clusters.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cheng, Y., Church, G.M.: Biclustering of expression data, pp. 93–103. AAAI (2000)
Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the Fourth SIAM International Conference on Data Mining, pp. 114–125 (2004)
Dhillon, I.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the International Conference ACM SIGKDD, San Francisco, USA, pp. 269–274 (2001)
Dhillon, I., Mallela, S., Modha, D.S.: Information-theoretic coclustering. In: Proceedings of KDD 2003, pp. 89–98 (2003)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of KDD 2006, Philadelphia, PA, pp. 635–640, September 2006
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Govaert, G., Nadif, M.: Block clustering with Bernoulli mixture models: comparison of different approaches. Comput. Stat. Data Anal. 52, 2333–3245 (2008)
Govaert, G., Nadif, M.: Latent block model for contingency table. Commun. Stat. Theor. Methods 39, 416–425 (2010)
Govaert, G., Nadif, M.: Co-clustering: Models, Algorithms and Applications. Wiley, New York (2013)
Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)
Labiod, L., Nadif, M.: Co-clustering for binary and categorical data with maximum modularity. In: ICDM 2011, pp. 1140–1145 (2011)
Labiod, L., Nadif, M.: Co-clustering under nonnegative matrix tri-factorization. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9492, pp. 709–717. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24958-7_82
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 03, 583–617 (2002)
Wang, F., Li, P., König, A.C., Wan, M.: Improving clustering by learning a bi-stochastic data similarity matrix. Knowl. Inf. Syst. 32(2), 351–382 (2012)
Yoo, J., Choi, S.: Orthogonal nonnegative matrix tri-factorization for co-clustering: multiplicative updates on Stiefel manifolds. Inf. Process. Manag. 46(5), 559–570 (2010)
Zass, R., Shashua, A.: A unifying approach to hard and probabilistic clustering. In: ICCV, pp. 294–301 (2005)
Acknowledgments
This work has been funded by AAP Sorbonne Paris Cité.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Labiod, L., Nadif, M. (2016). Bi-stochastic Matrix Approximation Framework for Data Co-clustering. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-46349-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46348-3
Online ISBN: 978-3-319-46349-0
eBook Packages: Computer ScienceComputer Science (R0)