Advertisement

Multimodal Co-clustering Analysis of Big Data Based on Matrix and Tensor Decomposition

  • Hongya ZhaoEmail author
  • Zhenghong Wei
  • Hong Yan
Chapter

Abstract

In this chapter, we first give an overview of co-clustering based on matrix/tensor decomposition with which the effective signals and noise can be separately filtered. A systematic framework is proposed to perform co-clustering for multimodal data. Based on tensor decomposition, the framework can successfully identify co-clusters with hyperplanar patterns in vector spaces of factor matrices. According to the co-clustering framework, we develop an alternative algorithm to perform tensor decomposition with the full rank constraint on slice-wise matrices (SFRF). Instead of the commonly used orthogonal or nonnegative constraint, the relaxed condition makes the resolved profiles stable with respect to model dimensionality in multimodal data. The algorithm keeps a high convergence rate and greatly reduces computation complexity with the factorization technology. The synthetic and experimental results show the favorable performance of the proposed multimodal co-clustering algorithms.

Notes

Acknowledgment

This work is supported by Natural Science Funds of Shenzhen Science and Technology Innovation Commission (JCYJ20160527172144272) and Hong Kong Research Grants Council (Projects CityU 11214814 and C1007-15G).

References

  1. 1.
    Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE. 103(9), 1449–1477 (2015)CrossRefGoogle Scholar
  2. 2.
    Nunez, P.L., Silberstein, R.B.: On the relationship of synaptic activity to macroscopic measurements: does co-registration of EEG with fMRI make sense? Brain Topogr. 13(2), 79–96 (2000)CrossRefGoogle Scholar
  3. 3.
    Lei, X., Valdes-Sosa, P.A., Yao, D.: EEG/fMRI fusion based on independent component analysis: integration of datadriven and model-driven methods. J. Integr. Neurosci. 11(3), 313–337 (2012)CrossRefGoogle Scholar
  4. 4.
    Jajuga, K., Sokolowski, A., Bock, H.: Classification, Clustering, and Data Analysis: Recent Advances and Applications. Springer, Cham (2012)zbMATHGoogle Scholar
  5. 5.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRefGoogle Scholar
  6. 6.
    Madeira, S.C., Oliveira, A.L.: Bi-clustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 24–45 (2004)CrossRefGoogle Scholar
  7. 7.
    Busygin, S., Prokopyev, O., Pardalos, P.M.: Bi-clustering in data mining. Comput. Operat. Res. 35, 2964–2987 (2008)zbMATHCrossRefGoogle Scholar
  8. 8.
    Zhao, H., et al.: Bi-clustering analysis for pattern discovery: current techniques, comparative studies and applications. Curr. Bioinf. 7(1), 43–55 (2012)CrossRefGoogle Scholar
  9. 9.
    Eren, K., et al.: A comparative analysis of bi-clustering algorithms for gene expression data. Brief. Bioinf. 14(3), 279–292 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Tanay, A., Sharan, R., Shamir, R.: Bi-clustering algorithms: a survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, Computer and Information Science Series. Chapman & Hall/CRC, Boca Raton (2005)Google Scholar
  11. 11.
    Prelic, A., et al.: A systematic comparison and evaluation of bi-clustering methods for gene expression data. Bioinformatics. 22, 1122–1129 (2006)CrossRefGoogle Scholar
  12. 12.
    Zhao, H., Chan, K.L., Cheng, L., Yan, H.: A probabilistic relaxation labeling framework for reducing the noise effect in geometric bi-clustering of gene expression data. Pattern Recogn. 42, 2578–2588 (2009)zbMATHCrossRefGoogle Scholar
  13. 13.
    Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Stat. Assoc. 67(337), 123–129 (1972)CrossRefGoogle Scholar
  14. 14.
    Cheng, Y., Church, G.M.: Bi-clustering of expression data. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB ’00) pp. 93–103 (2000)Google Scholar
  15. 15.
    Henriques, R., Madeira, S.C.: BicPAM: pattern-based bi-clustering for biomedical data analysis. Algorithms Mol. Biol. 9(1), 27 (2014)CrossRefGoogle Scholar
  16. 16.
    Huang, S., Wang, H., Li, D., et al.: Spectral co-clustering ensemble. Knowl.-Based Syst. 84, 46–55 (2015)CrossRefGoogle Scholar
  17. 17.
    Hussain, S.F., Ramazan, M.: Bi-clustering of human cancer microarray data using co-similarity based co-clustering. Expert Syst. Appl. 55(C), 520–531 (2016)CrossRefGoogle Scholar
  18. 18.
    Golchin, M., Liew, A.W.C.: Parallel bi-clustering detection using strength pareto front evolutionary algorithm. Inf. Sci. 415–416, 283–297 (2017)CrossRefGoogle Scholar
  19. 19.
    Veroneze, R., Banerjee, A., Zuben, F.: Enumerating all maximal bi-clusters in numerical datasets. Inf. Sci. 379, 288–309 (2017)CrossRefGoogle Scholar
  20. 20.
    Tokuda, T., Yoshimoto, J., Shimizu, Y., Okada, G., Takamura, M., Okamoto, Y., et al.: Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions. PLoS One. 12(10), e0186566 (2017)CrossRefGoogle Scholar
  21. 21.
    Lee, M., et al.: Bi-clustering via sparse singular value decomposition. Biometrics. 66(4), 1087–1095 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral bi-clustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)CrossRefGoogle Scholar
  23. 23.
    Sill, M., Kaiser, S., Benner, A., Kopp-Schneider, A.: Robust bi-clustering by sparse singular value decomposition incorporating stability selection. Bioinformatics. 27(15), 2089–2097 (2011)CrossRefGoogle Scholar
  24. 24.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature. 401, 788–791 (1999)zbMATHCrossRefGoogle Scholar
  25. 25.
    Pascual-Montano, A., Carazo, J.M., Kochi, K., Lehmann, D., Pascual-Marqui, R.D.: Non-smooth non-negative matrix factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell. 28, 403–415 (2006)CrossRefGoogle Scholar
  26. 26.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Hore, V., et al.: Tensor decomposition for multi-tissue gene expression experiments. Nat. Genet. 48, 1094–1100 (2016)CrossRefGoogle Scholar
  28. 28.
    Papalexakis, E.E., Sidiropoulos, N.D., Bro, R.: From K-means to higher-way co-clustering: multilinear decomposition with sparse latent factors. IEEE Trans. Signal Process. 61(2), 493–506 (2013)CrossRefGoogle Scholar
  29. 29.
    Phan, A.H., Cichocki, A.: Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear theory and its applications. IEICE. 1(1), 27–68 (2010)Google Scholar
  30. 30.
    Zhao, H., Wang, D.D., Chen, L., Liu, X., Yan, H.: Identifying multi-dimensional co-clusters in tensors based on hyperplane detection in singular vector spaces. PLoS One. 11(9), e0162293 (2016)CrossRefGoogle Scholar
  31. 31.
    Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: Tensors for data mining and data fusion: models, applications, and scalable algorithms. ACM Trans. Intell. Syst. Technol. 8(2), 16 (2016)CrossRefGoogle Scholar
  32. 32.
    Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 792–799. ACM (2005)Google Scholar
  33. 33.
    Alter, O., Golub, G.H.: Reconstructing the pathways of a cellular system from genome-scale signals using matrix and tensor computations. Proc. Natl. Acad. Sci. USA. 102, 17559–17564 (2005)CrossRefGoogle Scholar
  34. 34.
    Omberg, L., Golub, G.H., Alter, O.: A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc. Natl. Acad. Sci. USA. 104, 18371–18376 (2007)CrossRefGoogle Scholar
  35. 35.
    Acar, E., Bro, R., Smilde, A.K.: Data fusion in metabolomics using coupled matrix and tensor factorizations. Proc. IEEE. 103, 1602–1620 (2015)CrossRefGoogle Scholar
  36. 36.
    Yang, W.H., Dai, D.Q., Yan, H.: Finding correlated bi-clusters from gene expression data. IEEE Trans. Knowl. Data Eng. 23(4), 568–584 (2011)CrossRefGoogle Scholar
  37. 37.
    Long, et al.: Spectral clustering for multi-type relational data. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 585–592 (2006)Google Scholar
  38. 38.
    Van Aelst, S., et al.: Linear grouping using orthogonal regression. Comput. Stat. Data Anal. 50(5), 1287–1312 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Pison, G., Van Aelst, S., Zamar, R.H.: A robust linear grouping algorithm. In: Rizzi, A., Vichi, M. (eds.) Compstat 2006 – Proceedings in Computational Statistics, pp. 43–53 (2006)CrossRefGoogle Scholar
  40. 40.
    Huang, H., Ding, C., Luo, D., Li, T.: Simultaneous tensor subspace selection and clustering: the equivalence of high order SVD and k-means clustering. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pp. 327–335. ACM (2008)Google Scholar
  41. 41.
    Jiang, J., Wu, H., Li, Y., Yu, R.: Three-way resolution by alternation slice-wise diagonalization (ASD) method. J. Chemometr. 14, 15–36 (2000)CrossRefGoogle Scholar
  42. 42.
    Zhao, L., Zaki, M.J.: Tricluster: an effective algorithm for mining coherent clusters in 3D microarray data. In: Proceedings of ACM SIGMOD 2005, p. 705 (2005)Google Scholar
  43. 43.
    Zhou, Q., Xu, G., Zong, Y.: Web co-clustering of usage network using tensor decomposition. In: Proceedings of 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 311–314 (2009)Google Scholar
  44. 44.
    Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition. IEEE Trans. Neural Netw. 20, 103–123 (2009)CrossRefGoogle Scholar
  45. 45.
    Ponnapalli, S.P., Saunders, M.A., Van Loan, C.F., Alter, O.: A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms. PLoS One. 6(12), e28072 (2011).  https://doi.org/10.1371/journal.pone.0028072 CrossRefGoogle Scholar
  46. 46.
    Hussain, S.F., Bashir, S.: Co-clustering of multi-view datasets. Knowl. Inf. Syst. 47, 545–570 (2016)CrossRefGoogle Scholar
  47. 47.
    Comon, P., Luciani, X., Almeida, A.: Tensor decompositions, alternating least squares and other tales. J. Chemometr. 23(7–8), 393–405 (2009)CrossRefGoogle Scholar
  48. 48.
    Ozdemir, A., Iwen, M.A., Aviyente, S.: Multiscale tensor decomposition. In: 2016 50th Asilomar Conference on Signals, Systems and Computers, IEEE, 2016, pp. 625–629 (2016)Google Scholar
  49. 49.
    Ozdemir, A., Iwen, M.A., Aviyente, S.: Multiscale Analysis for Higher-order Tensors. eprint arXiv:1704.08578 (2017)Google Scholar
  50. 50.
    Gan, X., Liew, A.W.C., Yan, H.: Discovering bi-clusters in gene expression data based on highdimensional linear geometries. BMC Bioinf. 9, 209 (2008)CrossRefGoogle Scholar
  51. 51.
    Zhao, H., Liew, A.W.C., Xie, X., Yan, H.: A new geometric bi-clustering algorithm based on the hough transform for analysis of large-scale microarray data. J. Theor. Biol. 251, 264–274 (2008)zbMATHCrossRefGoogle Scholar
  52. 52.
    Zhao, H., Chan, K.L., Cheng, L., Yan, H.: A probabilistic relaxation labeling framework for reducing the noise effect in geometric bi-clustering of gene expression data. Pattern Recogn. 42, 2578–2588 (2009)zbMATHCrossRefGoogle Scholar
  53. 53.
    Yan, H.: Coclustering of multidimensional big data: a useful tool for genomic, financial, and other data analysis. IEEE Syst. Man Cybern. Mag. 3(2), 23–30 (2017)CrossRefGoogle Scholar
  54. 54.
    Weiland, S., Belzen, F.: Singular value decompositions and low rank approximations of tensors. IEEE Trans. Signal Process. 58(3), 1171–1182 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    Wang, H., Nie, F., Huang, H., Ding, C.: Nonnegative matrix tri-factorization based high order co-clustering and its fast implementation. In: Proceedings of IEEE 11th International Conference on Data Mining, 2011, pp. 774–783 (2011)Google Scholar
  56. 56.
    Hundeshagen, A., Hecker, M., Paap, B.K., Angerstein, C., Kandulski, F.C., Hartmann, C., Koczan, D., Thiesen, H.J., Zettl, U.K.: Elevated type I interferon-like activity in a subset of multiple sclerosis patients: molecular basis and clinical relevance. J. Neuroinflammation. 9, 140 (2012)CrossRefGoogle Scholar
  57. 57.
    Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., et al.: GO: TermFinder-open source software for accessing Gene Ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics. 20(18), 3710–3715 (2004)CrossRefGoogle Scholar
  58. 58.
    Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Industrial CentralShenzhen PolytechnicShenzhenChina
  2. 2.Department of Electronic EngineeringCity University of Hong KongKowloon TongHong Kong
  3. 3.Department of StatisticsShenzhen UniversityShenzhenChina

Personalised recommendations