Advertisement

Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods

  • Eduard Bartl
  • Radim Belohlavek
  • Petr OsickaEmail author
  • Hana Řezanková
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7627)

Abstract

We compare four methods for Boolean matrix factorization (BMF). The oldest of these methods is the 8M method implemented in the BMDP statistical software package developed in the 1960s. The three other methods were developed recently. All the methods compute from an input object-attribute matrix I two matrices, namely an object-factor matrix A and a factor-attribute matrix B in such a way that the Boolean matrix product of A and B is approximately equal to I. Such decompositions are utilized directly in Boolean factor analysis or indirectly as a dimensionality reduction method for Boolean data in machine learning. While some comparison of the BMF methods with matrix decomposition methods designed for real valued data exists in the literature, a mutual comparison of the various BMF methods is a severely neglected topic. In this paper, we compare the four methods on real datasets. In particular, we observe the reconstruction ability of the first few computed factors as well as the number of computed factors necessary to fully reconstruct the input matrix, i.e. the approximation to the Boolean rank of I computed by the methods. In addition, we present some general remarks on all the methods being compared.

Keywords

Boolean Matrix Factorization (BMF) Boolean Data Boolean Rank Exact Decomposition House Vote 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
  2. 2.
    Belohlavek, R.: Optimal decompositions of matrices with entries from residuated lattices. J. Logic Comput., 7 September 2011. doi: 10.1093/logcom/exr023
  3. 3.
    Belohlavek, R., Vychodil, V.: Factor analysis of incidence data via novel decomposition of matrices. In: Ferré, S., Rudolph, S. (eds.) ICFCA 2009. LNCS (LNAI), vol. 5548, pp. 83–97. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  4. 4.
    Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52, 155–173 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Cudeck, R., MacCallum, R.C. (eds.): Factor Analysis at 100: Historical Developments and Future Directions. Lawrence Erlbaum Associates Inc., Hillsdale (2007)Google Scholar
  7. 7.
    Fortelius, M., et al.: Neogene of the old world database of fossil mammals (NOW) (2003). http://www.helsinki.fi/science/now/
  8. 8.
    Frolov, A.A., Húsek, D., Polyakov, P.A.: Boolean factor analysis by Hopfield-like autoassociative memory. IEEE Trans. Neural Networks 18(3), 698–707 (2007)CrossRefGoogle Scholar
  9. 9.
    Ganter, B., Wille, R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin (1999)CrossRefzbMATHGoogle Scholar
  10. 10.
    Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  11. 11.
    Golub, G.A., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1995) zbMATHGoogle Scholar
  12. 12.
    Harman, H.H.: Modern Factor Analysis, 2nd edn. The Univ. Chicago Press, Chicago (1970)zbMATHGoogle Scholar
  13. 13.
    Kim, K.H.: Boolean Matrix Theory and Applications. M. Dekker, New York (1982) zbMATHGoogle Scholar
  14. 14.
    Lee, D., Seung, H.: Learning parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  15. 15.
    Leeuw, J.D.: Principal component analysis of binary data. Application to roll-call analysis (2003). http://gifi.stat.ucla.edu
  16. 16.
    Lu, H., Vaidya, J., Atluri, V.: Optimal Boolean matrix decomposition: application to role engineering. In: Proceedings of IEEE ICDE 2008, pp. 297–306 (2008)Google Scholar
  17. 17.
    McDonald, R.P.: Factor Analysis and Related Methods. Lawrence Erlbaum Associates Inc., McHorney (1985)Google Scholar
  18. 18.
    Mickey, M.R., Mundle, P., Engelman, L.: Boolean factor analysis. In: Dixon, W.J. (ed.) BMDP Statistical Software Manual, vol. 2, pp. 849–860. University of California Press, Berkeley (1990). http://www.statistical-solutions-software.com/products-page/bmdp-statistical-software/ Google Scholar
  19. 19.
    Miettinen, P.: Sparse Boolean matrix factorizations. In: Proceedings of 10th IEEE International Conference on Data Minig (ICDM2010), pp. 935–940 (2010)Google Scholar
  20. 20.
    Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008). preliminary version in PKDD 2006, pp. 335–346CrossRefGoogle Scholar
  21. 21.
    Monson, D.S., Pullman, J.N.: A survey of clique and biclique coverings and factorizations of (0,1)-matrices. Bull. ICA 14, 17–86 (1995)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)CrossRefGoogle Scholar
  23. 23.
    Nau, D.S.: Specificity covering: immunological and other applications, computational complexity and other mathematical properties, and a computer program. A.M. Thesis, Technical report CS-1976-7, Computer Sci. Dept., Duke Univ., Durham, N.C. (1976)Google Scholar
  24. 24.
    Nau, D.S., Markowsky, G., Woodbury, M.A., Amos, D.B.: A mathematical analysis of human leukocyte antigen serology. Math. Biosci. 40, 243–270 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Outrata, J.: Boolean factor analysis for data preprocessing in machine learning. In: Proceedins of ICML 2010, Washington, D.C., USA, pp. 899–902 (2010)Google Scholar
  26. 26.
    Orlitsky, S.A.: Semi-parametric exponential family PCA. In: Saul, L.K., et al. (eds.) Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2005). http://books.nips.cc/papers/files/nips17/NIPS2004_0152.pdf Google Scholar
  27. 27.
    Seppänen, J.K., Bingham, E., Mannila, H.: A simple algorithm for topic identification in 0–1 data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 423–434. Springer, Heidelberg (2003) CrossRefGoogle Scholar
  28. 28.
    Schein, A., Saul, L., Ungar, L.: A generalized linear model for principal component analysis of binary data. In: Proceedings of International Workshop on Artificial Intelligence and Statistics, pp. 14–21 (2003)Google Scholar
  29. 29.
    Stockmeyer, L.J.: The set basis problem is NP-complete. IBM Research Report RC5431, Yorktown Heights, NY (1975)Google Scholar
  30. 30.
    Tang, F., Tao, H.: Binary principal component analysis. In: Proceedings of British Machine Vision Conference 2006, pp. 377–386 (2006)Google Scholar
  31. 31.
    Tatti, N., Mielikäinen, T., Gionis, A., Mannila, H.: What is the dimension of your binary data? In: The 2006 IEEE Conference on Data Mining (ICDM 2006), pp. 603–612. IEEE Computer Society (2006)Google Scholar
  32. 32.
    Vaidya, J., Atluri, V., Guo, Q.: The role mining problem: finding a minimal descriptive set of roles. In: ACM Symposium on Access Control Models and Technologies, pp. 175–184, June 2007Google Scholar
  33. 33.
    Zivkovic, Z., Verbeek, J.: Transformation invariant component analysis for binary images. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 254–259 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Eduard Bartl
    • 1
  • Radim Belohlavek
    • 1
  • Petr Osicka
    • 1
    Email author
  • Hana Řezanková
    • 2
  1. 1.Data Analysis and Modeling Laboratory (DAMOL), Department of Computer SciencePalacky UniveristyOlomoucCzech Republic
  2. 2.Department of Statistics and Probability, Faculty of Informatics and StatisticsUniversity of EconomicsPragueCzech Republic

Personalised recommendations