Abstract
In general, binary matrix factorization (BMF) refers to the problem of finding two binary matrices of low rank such that the difference between their matrix product and a given binary matrix is minimal. BMF has served as an important tool in dimension reduction for high-dimensional data sets with binary attributes and has been successfully employed in numerous applications. In the existing literature on BMF, the matrix product is not required to be binary. We call this unconstrained BMF (UBMF) and similarly constrained BMF (CBMF) if the matrix product is required to be binary. In this paper, we first introduce two specific variants of CBMF and discuss their relation to other dimensional reduction models such as UBMF. Then we propose alternating update procedures for CBMF. In every iteration of the proposed procedure, we solve a specific binary linear programming (BLP) problem to update the involved matrix argument. We explore the relationship between the BLP subproblem and clustering to develop an effective 2- approximation algorithm for CBMF when the underlying matrix has very low rank. The proposed algorithm can also provide a 2-approximation to rank-1 UBMF. We also develop a randomized algorithm for CBMF and estimate the approximation ratio of the solution obtained. Numerical experiments show that the proposed algorithm for UBMF finds better solutions in less CPU time than several other algorithms in the literature, and the solution obtained from CBMF is very close to that of UBMF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)
Brunet, J., Tamayo, P., Golub, T.R., Mesirov, J.P., Lander, E.S.: Metagenes and molecular pattern discovery using matrix factorization. Proc. National Academy Sciences (2004)
Chaovalitwongse, W., Androulakis, I.P., Pardalos, P.M.: Quadratic integer programming: Complexity and equivalent forms. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization (2007)
Crama, Y., Hansen, P., Jaumard, B.: The basic algorithm for pseudo-Boolean programming revisited. Discrete Appl. Math. 29, 171–185 (1990)
Frank, A., Asuncion, A.: UCI Machine Learning Repository, School of Information and Computer Science, University of California, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Gillis, N., Glineur, F.: Using underapproximations for sparse nonnegative matrix factorization. Pattern Recognition 43(4), 1676–1687 (2010)
Hammer, P.L., Rudeanu, S.: Boolean Methods in Operations Research and Related Areas. Springer, New York (1968)
Hasegawa, S., Imai, H., Inaba, M., Katoh, N., Nakano, J.: Efficient algorithms for variance-based k-clustering. In: Proc. First Pacific Conf. Comput. Graphics Appl., Seoul, Korea, pp. 75–89. World Scientific, Singapore (1993)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Koyutürk, M., Grama, A.: PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In: ACM SIGKDD, pp. 147–156 (2003)
Koyutürk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE TKDE 17(4), 447–461 (2005)
Koyutürk, M., Grama, A., Ramakrishnan, N.: Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw. 32(1), 33–69 (2006)
Lee, D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Neural Information Processing Systems, NIPS (2001)
Li, T.: A general model for clustering binary data. In: ACM SIGKDD, pp. 188–197 (2005)
Li, T., Ding, C.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM, pp. 362–371 (2006)
Lin, M.M., Dong, B., Chu, M.T.: Integer Matrix Factorization and Its Application (2009) (preprint)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inform. Theory, 129–137 (1982)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. In: Neural Information Processing Systems 19 (NIPS 2006), pp. 977–984 (2006)
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowledge Data Engineering 20(10), 1348–1362 (2008)
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Shen, B.H., Ji, S., Ye, J.: Mining discrete patterns via binary matrix factorization. In: ACM SIGKDD, pp. 757–766 (2009)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15(10), 1329–1345 (2008)
Zass, R., Shashua, A.: Non-negative sparse PCA. In: Advances in Neural Information Processing Systems (NIPS), vol. 19, pp. 1561–1568 (2007)
Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Discov. 20(1), 28–52 (2010)
Zhang, Z.Y., Li, T., Ding, C., Zhang, X.S.: Binary matrix factorization with applications. In: ICDM, pp. 391–400 (2007)
Zdunek, R.: Data clustering with semi-binary nonnegative matrix factorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 705–716. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Jiang, P., Peng, J., Heath, M., Yang, R. (2014). A Clustering Approach to Constrained Binary Matrix Factorization. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-40837-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40836-6
Online ISBN: 978-3-642-40837-3
eBook Packages: EngineeringEngineering (R0)