Abstract
Essential genes play an indispensable role in cell viability and fertility. Identifying human essential genes helps us to study the functions of human genes, but also provides a way for finding potential targets for cancer and other diseases. Recently, with the publishing of human essential gene data and the availability of a large amount of biological data, some computational methods have been proposed to predict human essential genes based on genes’ DNA sequence or their topological properties in the protein-protein interaction (PPI) network. However, there is still some room to improve the prediction accuracy. In this work, we propose a novel supervised method to predict human essential genes by network embedding protein-protein interaction network. Our method extracts the features of the genes in network by mapping them to a latent space of features that maximally preserves the relationships between the genes and their network neighborhoods. After that, the features are input into a SVM classifier to predict human essential genes. Two human PPI networks are employed to evaluate the effectiveness of our method. The prediction results show that our method outperforms the method that only uses genes’ sequence information, but also is obviously superior to the method utilizing genes’ centrality properties in the network as input features.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(Database issue), D455–D458 (2009)
Clatworthy, A.E., Pierson, E., Hung, D.T.: Targeting virulence: a new paradigm for antimicrobial therapy. Nat. Chem. Biol. 3(9), 541–548 (2007)
Furney, S., Alba, M.M., Lopez-Bigas, N.: Differences in the evolutionary history of disease genes affected by dominant or recessive mutations. BMC Genom. 7(1), 165 (2006)
Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 6869 (2002)
Roemer, T.J.B., et al.: Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol. Microbiol. 50(1), 167–181 (2010)
Cullen, L.M., Arndt, G.M.: Genome-wide screening for gene function using RNAi in mammalian cells. Immunol. Cell Biol. 83(3), 217–223 (2005)
Fraser, A.: Essential human genes. Cell Syst. 1(6), 381–382 (2015)
Hart, T., et al.: High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163(6), 1515–1526 (2015)
Wang, T., et al.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)
Chen, Y., Xu, D.: Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21(5), 575–581 (2005)
Yuan, Y., et al.: Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28(9), 1246–1252 (2012)
Lloyd, J.P., et al.: Characteristics of plant essential genes allow for within- and between-species prediction of lethal mutant phenotypes. Plant Cell 27(8), 2133 (2015)
Wang, J., Peng, W., Wu, F.X.: Computational approaches to predicting essential proteins: a survey. PROTEOMICS-Clin. Appl. 7(1–2), 181–192 (2013)
Jeong, H., et al.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)
Joy, M.P., et al.: High-betweenness proteins in the yeast protein interaction network. J. Biomed. Biotechnol. 2005(2), 96–103 (2005)
Wuchty, S., Stadler, P.F.: Centers of complex networks. J. Theor. Biol. 223(1), 45–53 (2003)
Vallabhajosyula, R.R., et al.: Identifying hubs in protein interaction networks. PLoS ONE 4(4), e5344 (2009)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Stephenson, K., Zelen, M.: Rethinking centrality: methods and examples. Soc. Netw. 11(1), 1–37 (1989)
Wang, J., et al.: Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1070–1080 (2012)
Ernesto, E., Rodríguez-Velázquez, J.A.: Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 71(5 Pt 2), 056103 (2005)
Li, M., et al.: Essential proteins discovery from weighted protein interaction networks. Bioinform. Res. Appl. Proc. 6053, 89–100 (2010)
Li, M., et al.: A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst. Biol. 6(1), 15 (2012)
Tang, X., et al.: Predicting essential proteins based on weighted degree centrality. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 11(2), 407–418 (2014)
Peng, W., et al.: UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(2), 276–288 (2015)
Peng, W., et al.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6(1), 87 (2012)
Zhong, J., et al.: XGBFEMF: an XGBoost-based framework for essential protein prediction. IEEE Trans. Nanobioscience 17(3), 243–250 (2018)
Guo, F.B., et al.: Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics 33(12), 1758–1764 (2017)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: KDD, pp. 855–864 (2016)
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: International Conference on Neural Information Processing Systems (2013)
Wu, J., et al.: WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics 34(13), 2271–2282 (2018)
Acencio, M.L., Lemke, N.: Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10, 290 (2009)
Liao, J., Chin, K.: Logistic regression for disease classification using microarray data: model selection in a large p and small n case. Bioinformatics 23(15), 1945–1951 (2007)
Cheng, J., et al.: Training set selection for the prediction of essential genes. PLoS ONE 9(1), e86805 (2014)
Kuo-Chen, C., Hong-Bin, S.: Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J. Proteome Res. 5(8), 1888–1897 (2006)
Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(5), 1–23 (2010)
Li, T., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14(1), 61 (2016)
Tang, Y., et al.: CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015)
Acknowledgment
This work is supported in part by the National Natural Science Foundation of China under grant No. 31560317, No. 61502214, No. 61472133, No. 61502166, No. 61702122 and No. 81560221. Natural Science Foundation of Yunnan Province of China (No. 2016FB107).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Dai, W., Chang, Q., Peng, W., Zhong, J., Li, Y. (2019). Identifying Human Essential Genes by Network Embedding Protein-Protein Interaction Network. In: Cai, Z., Skums, P., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science(), vol 11490. Springer, Cham. https://doi.org/10.1007/978-3-030-20242-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-20242-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20241-5
Online ISBN: 978-3-030-20242-2
eBook Packages: Computer ScienceComputer Science (R0)