Skip to main content

Advertisement

Log in

GC\(^2\)NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Gene–phenotype association prediction can be applied to reveal the inherited basis of human diseases and facilitate drug development. Gene–phenotype associations are related to complex biological processes and influenced by various factors, such as relationship between phenotypes and that among genes. While due to sparseness of curated gene–phenotype associations and lack of integrated analysis of the joint effect of multiple factors, existing applications are limited to prediction accuracy and potential gene–phenotype association detection. In this paper, we propose a novel method by exploiting weighted graph constraint learned from hierarchical structures of phenotype data and group prior information among genes by inheriting advantages of Non-negative Matrix Factorization (NMF), called Weighted Graph Constraint and Group Centric Non-negative Matrix Factorization (GC\(^2\)NMF). Specifically, first we introduce the depth of parent–child relationships between two adjacent phenotypes in hierarchical phenotypic data as weighted graph constraint for a better phenotype understanding. Second, we utilize intra-group correlation among genes in a gene group as group constraint for gene understanding. Such information provides us with the intuition that genes in a group probably result in similar phenotypes. The model not only allows us to achieve a high-grade prediction performance, but also helps us to learn interpretable representation of genes and phenotypes simultaneously to facilitate future biological analysis. Experimental results on biological gene–phenotype association datasets of mouse and human demonstrate that GC\(^2\)NMF can obtain superior prediction accuracy and good understandability for biological explanation over other state-of-the-arts methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Ali Mashhoori SH (2012) Incorporating hierarchical information into the matrix factorization model for collaborative filtering. Lecture notes in computer science. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  2. Benzi K, Kalofolias V, Bresson X, Vandergheynst P (2016) Song recommendation with non-negative matrix factorization and graph total variation

  3. Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA (2007) The mouse genome database (MGD): mouse biology and model systems. Nucleic Acids Res 36(Database):D724–D728. https://doi.org/10.1093/nar/gkm961

    Article  CAS  Google Scholar 

  4. Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 8:1548–1560. https://doi.org/10.1109/TPAMI.2010.231

    Article  Google Scholar 

  5. Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S et al (2013) The BioGRID interaction database: 2013 update. Nucleic Acids Res 41(Database):D816–D823. https://doi.org/10.1093/nar/gks1158

    Article  PubMed  CAS  Google Scholar 

  6. Chen Y, Li L (2015) Phenome-driven disease genetics prediction toward drug discovery. Bioinformatics 12:i276–i283. https://doi.org/10.1093/bioinformatics/btv245

    Article  CAS  Google Scholar 

  7. Daniel D, Lee HSS (2000) Algorithms for non-negative matrix factorization. In: In NIPS. MIT Press, pp 556–562

  8. De Las Rivas J, Fontanillo C (2010) Protein–protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol 6:e1000807. https://doi.org/10.1371/journal.pcbi.1000807

    Article  CAS  Google Scholar 

  9. Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE, Mouse Genome Database Group (2015) The mouse genome database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res 43(Database issue):D726–D736

    Article  PubMed  CAS  Google Scholar 

  10. Hwang T, Kuang R (2010) A heterogeneous label propagation algorithm for disease gene discovery. SIAM, p 12

  11. Jeribi A (1997) Spectral graph theory. American Mathematical Society

  12. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462. https://doi.org/10.1093/nar/gkv1070

    Article  PubMed  CAS  Google Scholar 

  13. Köhler S, Bauer S (2008) Walking the interactome for prioritization of candidate disease genes. Am J Human Genetics 4:949–958. https://doi.org/10.1016/j.ajhg.2008.02.013

    Article  CAS  Google Scholar 

  14. Köhler S, Doelken SC, Mungall CJ et al (2014) The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res 42(D1):D966–D974. https://doi.org/10.1093/nar/gkt1026

    Article  CAS  Google Scholar 

  15. Li Y, Patra JC. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network (2010) Bioinformatics 9:1219–1224. https://doi.org/10.1093/bioinformatics/btq108

    Article  CAS  Google Scholar 

  16. Ma H, Yang H, Lyu MR, King I (2008) SoRec: social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM conference on information and knowledge mining—CIKM ’08. ACM Press, New York, New York, USA, p 931. https://doi.org/10.1145/1458082.1458205

  17. Rajabi R, Khodadadzadeh M, Ghassemian H (2011) Graph regularized nonnegative matrix factorization for hyperspectral data unmixing. In: 2011 7th Iranian conference on machine vision and image processing, pp 1–4. IEEE. https://doi.org/10.1109/IranianMVIP.2011.6121599

  18. Salakhutdinov R, Mnih A (2008) Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of the 25th international conference on machine learning—ICML ’08. ACM Press, New York, New York, USA, pp 880–887. https://doi.org/10.1145/1390156.1390267

  19. Salakhutdinov R, Mnih A (2008) Probabilistic matrix factorization. In: Advances in neural information processing systems, vol 20

  20. Shan H, Banerjee A (2010) Generalized probabilistic matrix factorizations for collaborative filtering. In: 2010 IEEE International conference on data mining. IEEE, pp 1025–1030. https://doi.org/10.1109/ICDM.2010.116

  21. Smith CL, Eppig JT (2009) The mammalian phenotype ontology: enabling robust annotation and comparative analysis. Wiley interdisciplinary reviews. Syst Biol Med 3:390–399. https://doi.org/10.1002/wsbm.44

    Google Scholar 

  22. Vanunu O, Magger O (2010) Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 1:e1000641. https://doi.org/10.1371/journal.pcbi.1000641

    Article  CAS  Google Scholar 

  23. Wu X, Jiang R (2008) Network-based global inference of human disease genes. Mol Syst Biol 189. https://doi.org/10.1038/msb.2008.27

  24. Xie M, Xu Y, Zhang Y, Hwang T, Kuang R (2015) Network-based phenome–genome association prediction by bi-random walk. PloS One 5:e0125138. https://doi.org/10.1371/journal.pone.0125138

    Article  CAS  Google Scholar 

  25. Zhang S, Wang W, Ford J, Makedon F (2006) Proceedings of the 2006 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, Philadelphia, PA. https://doi.org/10.1137/1.9781611972764

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (nos. 61702367, 61300972). The Research Project of Tianjin Municipal Commission of Education (no. 2017KJ033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maoqiang Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Liu, J., Liu, X. et al. GC\(^2\)NMF: A Novel Matrix Factorization Framework for Gene–Phenotype Association Prediction. Interdiscip Sci Comput Life Sci 10, 572–582 (2018). https://doi.org/10.1007/s12539-018-0296-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-018-0296-1

Keywords

Navigation