Skip to main content

Identifying Human Essential Genes by Network Embedding Protein-Protein Interaction Network

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11490))

Abstract

Essential genes play an indispensable role in cell viability and fertility. Identifying human essential genes helps us to study the functions of human genes, but also provides a way for finding potential targets for cancer and other diseases. Recently, with the publishing of human essential gene data and the availability of a large amount of biological data, some computational methods have been proposed to predict human essential genes based on genes’ DNA sequence or their topological properties in the protein-protein interaction (PPI) network. However, there is still some room to improve the prediction accuracy. In this work, we propose a novel supervised method to predict human essential genes by network embedding protein-protein interaction network. Our method extracts the features of the genes in network by mapping them to a latent space of features that maximally preserves the relationships between the genes and their network neighborhoods. After that, the features are input into a SVM classifier to predict human essential genes. Two human PPI networks are employed to evaluate the effectiveness of our method. The prediction results show that our method outperforms the method that only uses genes’ sequence information, but also is obviously superior to the method utilizing genes’ centrality properties in the network as input features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455–D458 (2009)

    Article  Google Scholar 

  2. Clatworthy, A.E., Pierson, E., Hung, D.T.: Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol. 3(9), 541–548 (2007)

    Article  Google Scholar 

  3. Furney, S., Alba, M.M., Lopez-Bigas, N.: Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genom. 7(1), 165 (2006)

    Article  Google Scholar 

  4. Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 6869 (2002)

    Article  Google Scholar 

  5. Roemer, T.J.B., et al.: Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50(1), 167–181 (2010)

    Article  Google Scholar 

  6. Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83(3), 217–223 (2005)

    Article  Google Scholar 

  7. Fraser, A.: Essential human genes. Cell Syst. 1(6), 381–382 (2015)

    Article  Google Scholar 

  8. Hart, T., et al.: High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163(6), 1515–1526 (2015)

    Article  Google Scholar 

  9. Wang, T., et al.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)

    Article  Google Scholar 

  10. Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)

    Article  Google Scholar 

  11. Yuan, Y., et al.: Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28(9), 1246–1252 (2012)

    Article  Google Scholar 

  12. Lloyd, J.P., et al.: Characteristics of plant essential genes allow for within- and between-species prediction of lethal mutant phenotypes. Plant Cell 27(8), 2133 (2015)

    Article  Google Scholar 

  13. Wang, J., Peng, W., Wu, F.X.: Computational approaches to predicting essential proteins: a survey. PROTEOMICS-Clin. Appl. 7(1–2), 181–192 (2013)

    Article  Google Scholar 

  14. Jeong, H., et al.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)

    Article  Google Scholar 

  15. Joy, M.P., et al.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96–103 (2005)

    Article  Google Scholar 

  16. Wuchty, S., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223(1), 45–53 (2003)

    Article  MathSciNet  Google Scholar 

  17. Vallabhajosyula, R.R., et al.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)

    Article  Google Scholar 

  18. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)

    Article  Google Scholar 

  19. Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples. Soc. Netw. 11(1), 1–37 (1989)

    Article  MathSciNet  Google Scholar 

  20. Wang, J., et al.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1070–1080 (2012)

    Article  Google Scholar 

  21. Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)

    MathSciNet  Google Scholar 

  22. Li, M., et al.: Essential proteins discovery from weighted protein interaction networks. Bioinform. Res. Appl. Proc. 6053, 89–100 (2010)

    Article  Google Scholar 

  23. Li, M., et al.: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst. Biol. 6(1), 15 (2012)

    Article  Google Scholar 

  24. Tang, X., et al.: Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(2), 407–418 (2014)

    Article  Google Scholar 

  25. Peng, W., et al.: UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(2), 276–288 (2015)

    Article  Google Scholar 

  26. Peng, W., et al.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 87 (2012)

    Article  Google Scholar 

  27. Zhong, J., et al.: XGBFEMF: an XGBoost-based framework for essential protein prediction. IEEE Trans. Nanobioscience 17(3), 243–250 (2018)

    Article  MathSciNet  Google Scholar 

  28. Guo, F.B., et al.: Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics 33(12), 1758–1764 (2017)

    Article  Google Scholar 

  29. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)

    Google Scholar 

  30. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: International Conference on Neural Information Processing Systems (2013)

    Google Scholar 

  31. Wu, J., et al.: WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics 34(13), 2271–2282 (2018)

    Article  Google Scholar 

  32. Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10, 290 (2009)

    Article  Google Scholar 

  33. Liao, J., Chin, K.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)

    Article  Google Scholar 

  34. Cheng, J., et al.: Training set selection for the prediction of essential genes. PLoS ONE 9(1), e86805 (2014)

    Article  Google Scholar 

  35. Kuo-Chen, C., Hong-Bin, S.: Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J. Proteome Res. 5(8), 1888–1897 (2006)

    Article  Google Scholar 

  36. Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(5), 1–23 (2010)

    Article  Google Scholar 

  37. Li, T., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14(1), 61 (2016)

    Article  Google Scholar 

  38. Tang, Y., et al.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under grant No. 31560317, No. 61502214, No. 61472133, No. 61502166, No. 61702122 and No. 81560221. Natural Science Foundation of Yunnan Province of China (No. 2016FB107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, W., Chang, Q., Peng, W., Zhong, J., Li, Y. (2019). Identifying Human Essential Genes by Network Embedding Protein-Protein Interaction Network. In: Cai, Z., Skums, P., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science(), vol 11490. Springer, Cham. https://doi.org/10.1007/978-3-030-20242-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20242-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20241-5

  • Online ISBN: 978-3-030-20242-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics