Skip to main content

A Clustering Approach to Constrained Binary Matrix Factorization

  • Chapter
Data Mining and Knowledge Discovery for Big Data

Part of the book series: Studies in Big Data ((SBD,volume 1))

Abstract

In general, binary matrix factorization (BMF) refers to the problem of finding two binary matrices of low rank such that the difference between their matrix product and a given binary matrix is minimal. BMF has served as an important tool in dimension reduction for high-dimensional data sets with binary attributes and has been successfully employed in numerous applications. In the existing literature on BMF, the matrix product is not required to be binary. We call this unconstrained BMF (UBMF) and similarly constrained BMF (CBMF) if the matrix product is required to be binary. In this paper, we first introduce two specific variants of CBMF and discuss their relation to other dimensional reduction models such as UBMF. Then we propose alternating update procedures for CBMF. In every iteration of the proposed procedure, we solve a specific binary linear programming (BLP) problem to update the involved matrix argument. We explore the relationship between the BLP subproblem and clustering to develop an effective 2- approximation algorithm for CBMF when the underlying matrix has very low rank. The proposed algorithm can also provide a 2-approximation to rank-1 UBMF. We also develop a randomized algorithm for CBMF and estimate the approximation ratio of the solution obtained. Numerical experiments show that the proposed algorithm for UBMF finds better solutions in less CPU time than several other algorithms in the literature, and the solution obtained from CBMF is very close to that of UBMF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  2. Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51(1), 34–81 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Brunet, J., Tamayo, P., Golub, T.R., Mesirov, J.P., Lander, E.S.: Metagenes and molecular pattern discovery using matrix factorization. Proc. National Academy Sciences (2004)

    Google Scholar 

  4. Chaovalitwongse, W., Androulakis, I.P., Pardalos, P.M.: Quadratic integer programming: Complexity and equivalent forms. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization (2007)

    Google Scholar 

  5. Crama, Y., Hansen, P., Jaumard, B.: The basic algorithm for pseudo-Boolean programming revisited. Discrete Appl. Math. 29, 171–185 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  6. Frank, A., Asuncion, A.: UCI Machine Learning Repository, School of Information and Computer Science, University of California, Irvine, CA (2010), http://archive.ics.uci.edu/ml

  7. Gillis, N., Glineur, F.: Using underapproximations for sparse nonnegative matrix factorization. Pattern Recognition 43(4), 1676–1687 (2010)

    Article  MATH  Google Scholar 

  8. Hammer, P.L., Rudeanu, S.: Boolean Methods in Operations Research and Related Areas. Springer, New York (1968)

    Book  MATH  Google Scholar 

  9. Hasegawa, S., Imai, H., Inaba, M., Katoh, N., Nakano, J.: Efficient algorithms for variance-based k-clustering. In: Proc. First Pacific Conf. Comput. Graphics Appl., Seoul, Korea, pp. 75–89. World Scientific, Singapore (1993)

    Google Scholar 

  10. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  11. Koyutürk, M., Grama, A.: PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In: ACM SIGKDD, pp. 147–156 (2003)

    Google Scholar 

  12. Koyutürk, M., Grama, A., Ramakrishnan, N.: Compression, clustering, and pattern discovery in very high-dimensional discrete-attribute data sets. IEEE TKDE 17(4), 447–461 (2005)

    Google Scholar 

  13. Koyutürk, M., Grama, A., Ramakrishnan, N.: Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis. ACM Trans. Math. Softw. 32(1), 33–69 (2006)

    Article  Google Scholar 

  14. Lee, D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  15. Lee, D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Neural Information Processing Systems, NIPS (2001)

    Google Scholar 

  16. Li, T.: A general model for clustering binary data. In: ACM SIGKDD, pp. 188–197 (2005)

    Google Scholar 

  17. Li, T., Ding, C.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM, pp. 362–371 (2006)

    Google Scholar 

  18. Lin, M.M., Dong, B., Chu, M.T.: Integer Matrix Factorization and Its Application (2009) (preprint)

    Google Scholar 

  19. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inform. Theory, 129–137 (1982)

    Google Scholar 

  20. McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  21. Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T.: Modeling dyadic data with binary latent factors. In: Neural Information Processing Systems 19 (NIPS 2006), pp. 977–984 (2006)

    Google Scholar 

  22. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowledge Data Engineering 20(10), 1348–1362 (2008)

    Article  Google Scholar 

  23. Prelić, A., Bleuler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., Hennig, L., Thiele, L., Zitzler, E.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)

    Article  Google Scholar 

  24. Shen, B.H., Ji, S., Ye, J.: Mining discrete patterns via binary matrix factorization. In: ACM SIGKDD, pp. 757–766 (2009)

    Google Scholar 

  25. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  26. van Uitert, M., Meuleman, W., Wessels, L.: Biclustering sparse binary genomic data. J. Comput. Biol. 15(10), 1329–1345 (2008)

    Article  MathSciNet  Google Scholar 

  27. Zass, R., Shashua, A.: Non-negative sparse PCA. In: Advances in Neural Information Processing Systems (NIPS), vol. 19, pp. 1561–1568 (2007)

    Google Scholar 

  28. Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Discov. 20(1), 28–52 (2010)

    Article  MathSciNet  Google Scholar 

  29. Zhang, Z.Y., Li, T., Ding, C., Zhang, X.S.: Binary matrix factorization with applications. In: ICDM, pp. 391–400 (2007)

    Google Scholar 

  30. Zdunek, R.: Data clustering with semi-binary nonnegative matrix factorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 705–716. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jiang, P., Peng, J., Heath, M., Yang, R. (2014). A Clustering Approach to Constrained Binary Matrix Factorization. In: Chu, W. (eds) Data Mining and Knowledge Discovery for Big Data. Studies in Big Data, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40837-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40837-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40836-6

  • Online ISBN: 978-3-642-40837-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics