Integrating Heterogeneous Datasets by Using Multimodal Deep Learning
Rapid collection of data sources, varying in volume and structure poses a challenge for scientists to establish a practical approach to manipulating heterogeneous data sources. A multimodal learning and an integrated analysis make it possible to extract much worthwhile information from a collection of multiple simple raw data. Therefore, data integration can lead to a more reliable and robust result. High-throughput sequencing technologies, especially next-generation sequencing, leave us with multi-platform genomic data such as gene expression, SNP, CNV, DNA methylation, and miRNA expression. In this paper, we represented a multimodal deep neural network to exploit the mutual information between three different modalities to classify breast cancer patients into two groups based on their survival rate. Experimental results indicate that our method improves the classification accuracy and performs better on imbalanced data compared to the other single-modal state-of-the-art methods.
KeywordsData integration Omics Deep learning
- 3.Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep boltzmann machines. In: Advances in Neural Information Processing Systems, pp. 2222–2230 (2012)Google Scholar
- 4.Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACMGoogle Scholar
- 9.Sun, D., Wang, M., Li, A.: A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans. Comput. Biol. BioinformGoogle Scholar