Integrating Heterogeneous Datasets by Using Multimodal Deep Learning

  • Fariba KhoshghalbvashEmail author
  • Jean X. Gao
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 517)


Rapid collection of data sources, varying in volume and structure poses a challenge for scientists to establish a practical approach to manipulating heterogeneous data sources. A multimodal learning and an integrated analysis make it possible to extract much worthwhile information from a collection of multiple simple raw data. Therefore, data integration can lead to a more reliable and robust result. High-throughput sequencing technologies, especially next-generation sequencing, leave us with multi-platform genomic data such as gene expression, SNP, CNV, DNA methylation, and miRNA expression. In this paper, we represented a multimodal deep neural network to exploit the mutual information between three different modalities to classify breast cancer patients into two groups based on their survival rate. Experimental results indicate that our method improves the classification accuracy and performs better on imbalanced data compared to the other single-modal state-of-the-art methods.


Data integration Omics Deep learning 


  1. 1.
    Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science 270(5235), 467 (1995)CrossRefGoogle Scholar
  2. 2.
    Pinkel, D., Albertson, D.G.: Comparative genomic hybridization. Annu. Rev. Genomics Hum. Genet. 6, 331–354 (2005)CrossRefGoogle Scholar
  3. 3.
    Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012)Google Scholar
  4. 4.
    Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACMGoogle Scholar
  5. 5.
    Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302(5644), 449–453 (2003)CrossRefGoogle Scholar
  6. 6.
    Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)CrossRefGoogle Scholar
  7. 7.
    Yamanishi, Y., Vert, J.-P., Kanehisa, M.: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 20(\(\text{suppl}\_\)1), i363–i370 (2004)CrossRefGoogle Scholar
  8. 8.
    Liang, M., Li, Z., Chen, T., Zeng, J.: Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(4), 928–937 (2015)CrossRefGoogle Scholar
  9. 9.
    Sun, D., Wang, M., Li, A.: A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans. Comput. Biol. BioinformGoogle Scholar
  10. 10.
    Pan, X., Shen, H.-B.: Rna-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 18(1), 136 (2017)CrossRefGoogle Scholar
  11. 11.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations