A Clustering Approach to Constrained Binary Matrix Factorization

Jiang, Peng; Peng, Jiming; Heath, Michael; Yang, Rui

doi:10.1007/978-3-642-40837-3_9

Peng Jiang³,
Jiming Peng⁴,
Michael Heath³ &
…
Rui Yang⁴

Part of the book series: Studies in Big Data ((SBD,volume 1))

7368 Accesses
4 Citations

Abstract

In general, binary matrix factorization (BMF) refers to the problem of finding two binary matrices of low rank such that the difference between their matrix product and a given binary matrix is minimal. BMF has served as an important tool in dimension reduction for high-dimensional data sets with binary attributes and has been successfully employed in numerous applications. In the existing literature on BMF, the matrix product is not required to be binary. We call this unconstrained BMF (UBMF) and similarly constrained BMF (CBMF) if the matrix product is required to be binary. In this paper, we first introduce two specific variants of CBMF and discuss their relation to other dimensional reduction models such as UBMF. Then we propose alternating update procedures for CBMF. In every iteration of the proposed procedure, we solve a specific binary linear programming (BLP) problem to update the involved matrix argument. We explore the relationship between the BLP subproblem and clustering to develop an effective 2- approximation algorithm for CBMF when the underlying matrix has very low rank. The proposed algorithm can also provide a 2-approximation to rank-1 UBMF. We also develop a randomized algorithm for CBMF and estimate the approximation ratio of the solution obtained. Numerical experiments show that the proposed algorithm for UBMF finds better solutions in less CPU time than several other algorithms in the literature, and the solution obtained from CBMF is very close to that of UBMF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)
Article MathSciNet MATH Google Scholar
Brunet, J., Tamayo, P., Golub, T.R., Mesirov, J.P., Lander, E.S.: Metagenes and molecular pattern discovery using matrix factorization. Proc. National Academy Sciences (2004)
Google Scholar
Chaovalitwongse, W., Androulakis, I.P., Pardalos, P.M.: Quadratic integer programming: Complexity and equivalent forms. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization (2007)
Google Scholar
Crama, Y., Hansen, P., Jaumard, B.: The basic algorithm for pseudo-Boolean programming revisited. Discrete Appl. Math. 29, 171–185 (1990)
Article MathSciNet MATH Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository, School of Information and Computer Science, University of California, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Gillis, N., Glineur, F.: Using underapproximations for sparse nonnegative matrix factorization. Pattern Recognition 43(4), 1676–1687 (2010)
Article MATH Google Scholar
Hammer, P.L., Rudeanu, S.: Boolean Methods in Operations Research and Related Areas. Springer, New York (1968)
Book MATH Google Scholar
Hasegawa, S., Imai, H., Inaba, M., Katoh, N., Nakano, J.: Efficient algorithms for variance-based k-clustering. In: Proc. First Pacific Conf. Comput. Graphics Appl., Seoul, Korea, pp. 75–89. World Scientific, Singapore (1993)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Koyutürk, M., Grama, A.: PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In: ACM SIGKDD, pp. 147–156 (2003)
Google Scholar
Koyutürk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE TKDE 17(4), 447–461 (2005)
Google Scholar
Koyutürk, M., Grama, A., Ramakrishnan, N.: Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw. 32(1), 33–69 (2006)
Article Google Scholar
Lee, D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Neural Information Processing Systems, NIPS (2001)
Google Scholar
Li, T.: A general model for clustering binary data. In: ACM SIGKDD, pp. 188–197 (2005)
Google Scholar
Li, T., Ding, C.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM, pp. 362–371 (2006)
Google Scholar
Lin, M.M., Dong, B., Chu, M.T.: Integer Matrix Factorization and Its Application (2009) (preprint)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inform. Theory, 129–137 (1982)
Google Scholar
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. In: Neural Information Processing Systems 19 (NIPS 2006), pp. 977–984 (2006)
Google Scholar
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowledge Data Engineering 20(10), 1348–1362 (2008)
Article Google Scholar
Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Shen, B.H., Ji, S., Ye, J.: Mining discrete patterns via binary matrix factorization. In: ACM SIGKDD, pp. 757–766 (2009)
Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15(10), 1329–1345 (2008)
Article MathSciNet Google Scholar
Zass, R., Shashua, A.: Non-negative sparse PCA. In: Advances in Neural Information Processing Systems (NIPS), vol. 19, pp. 1561–1568 (2007)
Google Scholar
Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Discov. 20(1), 28–52 (2010)
Article MathSciNet Google Scholar
Zhang, Z.Y., Li, T., Ding, C., Zhang, X.S.: Binary matrix factorization with applications. In: ICDM, pp. 391–400 (2007)
Google Scholar
Zdunek, R.: Data clustering with semi-binary nonnegative matrix factorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 705–716. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Peng Jiang & Michael Heath
Department of ISE, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Jiming Peng & Rui Yang

Authors

Peng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jiming Peng
View author publications
You can also search for this author in PubMed Google Scholar
Michael Heath
View author publications
You can also search for this author in PubMed Google Scholar
Rui Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Jiang .

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Los Angeles, USA
Wesley W. Chu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jiang, P., Peng, J., Heath, M., Yang, R. (2014). A Clustering Approach to Constrained Binary Matrix Factorization. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-40837-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40836-6
Online ISBN: 978-3-642-40837-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics