Disease Prediction Using Metagenomic Data Visualizations Based on Manifold Learning and Convolutional Neural Network

  • Thanh Hai NguyenEmail author
  • Thai-Nghe Nguyen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11814)


Deep learning algorithms have obtained numerous achievements in image classification, speed recognition, video processing. Visualizing metagenomic data is a challenge because of its complexity and high-dimensional. In this paper, we introduce several approaches based on dimensionality reduction algorithms and data density to visualize features which reflect the species abundance. The sophisticated methods used in this study, that are unsupervised approaches, carry out dimensionality reduction and map the data into a 2-dimensional space. From the visualizations obtained, deep learning techniques are leveraged to enhance the prediction performance for colorectal cancer. We show by experiments on five Metagenome-based colorectal cancer datasets from different regions such as Chinese, Austrian, American, German and French cohorts that the proposed visualizations allow to visualize bio-medical signatures and improve the prediction performance compared to classical machine learning.


Dimensionality reduction algorithms Manifold learning Metagenomics Visualization Disease prediction Convolutional neural network 


  1. 1.
    Dai, Z., et al.: Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers. Microbiome 6, 70 (2018). ISSN 2049–2618
  2. 2.
    Sudarikov, K., et al.: Methods for the metagenomic data visualization and analysis. Curr. Issues Mol. Biol. 24, 37–58 (2017). ISSN: 14673037Google Scholar
  3. 3.
    Oh, J., et al.: Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014). ISSN 1476–4687CrossRefGoogle Scholar
  4. 4.
    R Development Core Team: A Language and Environment for Statistical Computing (2008). ISBN: 3-900051-07-0Google Scholar
  5. 5.
    Ondov, B.D., et al.: Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 385 (2011)Google Scholar
  6. 6.
    Kerepesi, C., et al.: AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite. Gene, 538–540 (2013). Scholar
  7. 7.
    Rudis, B., Almossawi, A., Ulmer, H.: ‘metricsgraphics’, CRAN repository (2015).
  8. 8.
    Warnes, G.R., et al.: Package ‘gplots’, CRAN repository (2016).
  9. 9.
    Jiang, X., et al.: Manifold learning reveals nonlinear structure in metagenomic profiles. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine (2012)Google Scholar
  10. 10.
    Alshawaqfeh, M., et al.: Consistent metagenomic biomarker detection via robust PCA. Biol. Direct 12(1), 4 (2016)CrossRefGoogle Scholar
  11. 11.
    Huo, X., et al.: A survey of manifold-based learning methods. In: Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, pp. 691–745 (2007). Scholar
  12. 12.
    Izenman, A.J.: Introduction to manifold learning. Wiley Interdisc. Rev.: Comput. Stat. 5, 439–446 (2012)CrossRefGoogle Scholar
  13. 13.
    Meyer, F., et al.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2011)CrossRefGoogle Scholar
  14. 14.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability. New Haven, Conn. (1982)Google Scholar
  15. 15.
    Grellmann, C., et al.: Random projection for fast and efficient multivariate correlation analysis of high-dimensional data: a new approach. Front. Genet. 7, 102 (2016)CrossRefGoogle Scholar
  16. 16.
    Lahiri, S., et al.: Random projections of random manifolds; arXiv:1607.04331 [cs, q-bio, stat] (2016)
  17. 17.
    Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the beta-divergence; arXiv:1010.1763 [cs] (2010)
  18. 18.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data 17, 377–386. ISSN 1088–9051
  19. 19.
    Gillis, N.: The Why and How of Nonnegative Matrix Factorization; arXiv:1401.5226 [cs, math, stat] (2010)
  20. 20.
    Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling. SSS. Springer, New York (2005). Scholar
  21. 21.
    McQueen, J., Meila, M., VanderPlas, J., Zhang, Z.: Manifold Learning with Millions of points; arxiv (2005)Google Scholar
  22. 22.
    Park, H.: ISOMAP induced manifold embedding and its application to Alzheimer’s disease and mild cognitive impairment. Neurosci. Lett. 513, 141–145 (2012)CrossRefGoogle Scholar
  23. 23.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2012)CrossRefGoogle Scholar
  24. 24.
    Talwalkar, A., Kumar, S., Rowley, H.: Large-scale manifold learning. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  25. 25.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  26. 26.
    Nguyen, T.H., et al.: Disease classification in metagenomics with 2D embeddings and deep learning. In: The Annual French Conference in Machine Learning (CAp 2018) (2018)Google Scholar
  27. 27.
    Hamel, P., Eck, D.: Learning features from music audio with deep belief networks (2010)Google Scholar
  28. 28.
    Garreta, R., Moncecchi, G.: Learning Scikit-Learn: Machine Learning in Python. Packt Publishing Ltd (2013)Google Scholar
  29. 29.
    Kingma, D.P., et al.: Adam: A Method for Stochastic Optimization; CoRR abs/1412.6980 (2014)Google Scholar
  30. 30.
    Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30, 2114–2120 (2014). ISSN 1367–4811CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Can Tho UniversityCan ThoVietnam

Personalised recommendations