Skip to main content

A New Computational Approach to Identify Essential Genes in Bacterial Organisms Using Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 798))

Abstract

Essential genes of an organism are those genes that are required for the growth to a fertile adult and is pivotal for the survival of an organism. In this study, a new computational approach based on machine learning method is designed which can constructively project essential genes by integration of homologous, gene intrinsic, and network topology features. A set of 15 bacterial organisms as reference species have been used which have characterized essential genes. By applying “Extreme Gradient Boosting (XGBoost)” for Bacillus Subtilis 168, the classification model through tenfold cross-validation test gave average AUC value of 0.9649. Further applying this new model to a closely related organism Salmonella enterica serovar Typhimurium LT2 resulted in a very definitive AUC value of 0.8608. To assess the stability and consistency of the proposed classifier, a different set of target organisms comprised of Escherichia coli MG1655 and Streptococcus sanguinis SK36 and another classifier based on PCR method were implemented. The performance of the model based on principal component regression (PCR) method for both set of target organisms resulted in lower AUC values. It shows that the newly designed feature-integrated approach based on XGBoost method results in better predictive accuracy to identify essential genes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Zhang, X., Acencio, M.L., Lemke, N.: Predicting essential genes and proteins based on machine learning and network topological features: a comprehensive review. Front. Physiol. 7, 75 (2016).

    Article  Google Scholar 

  2. Hua, H.-L., Zhang, F.-Z., Labena, A.A., Dong, C., Jin, Y.-T., Guo, F.-B.: An approach for predicting essential genes using multiple homology mapping and machine learning algorithms. Biomed. Res. Int. 2016, 7639397 (2016)

    Google Scholar 

  3. Lu, Y., Deng, J., Carson, M.B., Lu, H., Lu, L.J.: Computational methods for the prediction of microbial essential genes. Curr. Bioinform. 9(2), 89–101 (2014)

    Article  Google Scholar 

  4. Juhas, M., Stark, M., von Mering, C., Lumjiaktase, P., Crook, D.W., Valvano, M.A., Eberl, L.: High confidence prediction of essential genes in Burkholderia cenocepacia. PLoS ONE 7(6), e40064 (2012)

    Article  Google Scholar 

  5. Lin, Y., Zhang, R.R.: Putative essential and core-essential genes in Mycoplasma genomes. Sci. Rep. 1, 53 (2011)

    Article  Google Scholar 

  6. Seringhaus, A., Paccanaro, A., Borneman, M., Snyder, M., Gerstein, M.: Predicting essential genes in fungal genomes. Genome Res. 16(9), 1126–1135 (2006)

    Article  Google Scholar 

  7. Peng, W., Wang, J., Wang, W., Liu, Q., Wu, F.X., Pan, Y.: Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012)

    Article  Google Scholar 

  8. Li, M., Lu, Y., Wang, J., Wu, F.-X., Pan, Y.: A topology potential- based method for identifying essential proteins from PPI networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(2), 372–383 (2015)

    Article  Google Scholar 

  9. Zhang, R., Lin, Y.: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 37(D1), D455–D458 (2009)

    Article  Google Scholar 

  10. Luo, H., Lin, Y., Gao, F., Zhang, C.T., Zhang, R.: DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 42(D1), D574–D580 (2014)

    Article  Google Scholar 

  11. Jordan, I.K., Rogozin, I.B., Wolf, Y.I., Koonin, E.V.: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12(6), 962–968 (2002)

    Article  Google Scholar 

  12. Luo, H., Gao, F., Lin, Y.: Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Sci. Rep. 5, 13210 (2015)

    Article  Google Scholar 

  13. Wei, W., Ning, L.-W., Ye, Y.-N., Guo, F.-B.: Geptop: a gene prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8(8), e72343 (2013)

    Article  Google Scholar 

  14. Knight, R.D., Freeland, S.J., Landweber, L.F.: A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2(4), 1–13 (2001)

    Article  Google Scholar 

  15. Lipman, D.J. et al.: The relationship of protein conservation and sequence length. BMC Evol. Biol. 2.1 (2002)

    Google Scholar 

  16. Peden, J.: CodonW. In: University of Nottingham (1997)

    Google Scholar 

  17. Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X., Gerstein, M.: Genomic analysis of essentiality within protein networks. Trends Genet. 20(6), 227–231 (2004)

    Article  Google Scholar 

  18. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)

    Article  Google Scholar 

  19. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA (2016)

    Google Scholar 

  20. Zou, Q., Zeng, J., Cao, L., Ji, R.: A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173, 346–354 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Devasheesh Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singhal, A., Roy, D., Mittal, S., Dhar, J., Singh, A. (2019). A New Computational Approach to Identify Essential Genes in Bacterial Organisms Using Machine Learning. In: Verma, N., Ghosh, A. (eds) Computational Intelligence: Theories, Applications and Future Directions - Volume I. Advances in Intelligent Systems and Computing, vol 798. Springer, Singapore. https://doi.org/10.1007/978-981-13-1132-1_6

Download citation

Publish with us

Policies and ethics