Skip to main content

New Feature Vector for Recognition of Short Microbial Genes

  • Conference paper
  • 1333 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 305))

Abstract

The effectiveness of a classifier is highly dependent on the discriminative power of the feature vectors extracted from the dataset. In this study a novel feature vector is presented that aims at better classification of short protein coding DNA. For this feature vector a straightforward ensemble method, Adaboost.M1 in conjunction with Multilayer Perceptron (MLP) as the base classifier was employed. The proposed model shows 97.36% accuracy, 97.76% sensitivity and 96.82% specificity. The results demonstrate that the proposed feature vector is promising, and help in increasing the prediction accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cutting, S., Anderson, M., Lysenko, E., Page, A., Tomoyasu, T., Tatematsu, K., Tatsuta, T., Kroos, L., Ogura, T.: SpoVM, a small protein essential to development in Bacillus subtilis, interacts with the ATP-dependent protease FtsH. Journal of Bacteriology 179, 5534–5542 (1997)

    Google Scholar 

  2. Schneider, D., Volkmer, T., Rogner, M.: PetG and PetN, but not PetL, are essential subunits of the cytochrome b6f complex from Synechocystis PCC 6803. Research in Microbiology 158, 45–50 (2007)

    Article  Google Scholar 

  3. Yanofsky, C.: Transcription attenuation: once viewed as a novel regulatory strategy. Journal of Bacteriology 182, 1–8 (2000)

    Article  Google Scholar 

  4. Brent, M.R., Guigo, R.: Recent advances in gene structure prediction. Current Opinion in Structural Biology 14, 264–272 (2004)

    Article  Google Scholar 

  5. Fickett, J.W., Tung, C.S.: Assessment of protein coding measures. Nucleic Acids Research 20, 6441–6450 (1992)

    Article  Google Scholar 

  6. Mathe, C., Sagot, M.F., Schiex, T., Rouze, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research 30, 4103–4117 (2002)

    Article  Google Scholar 

  7. Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Genomics, Proteomics & Bioinformatics / Beijing Genomics Institute 2, 216–221 (2004)

    Google Scholar 

  8. Do, J.H., Choi, D.K.: Computational approaches to gene prediction. Journal of Microbiology 44, 137–144 (2006)

    Google Scholar 

  9. Tramontano, A., Macchiato, M.F.: Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics. Nucleic Acids Research 14, 127–135 (1986)

    Article  Google Scholar 

  10. Zhang, C.T., Wang, J.: Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Research 28, 2804–2814 (2000)

    Article  Google Scholar 

  11. Zhou, L.Q., Yu, Z.G., Deng, J.Q., Anh, V., Long, S.C.: A fractal method to distinguish coding and non-coding sequences in a complete genome based on a number sequence representation. Journal of Theoretical Biology 232, 559–567 (2005)

    Article  MathSciNet  Google Scholar 

  12. Hutchinson, G.B., Hayden, M.R.: The prediction of exons through an analysis of spliceable open reading frames. Nucleic Acids Research 20, 3453–3462 (1992)

    Article  Google Scholar 

  13. Zhu, H., Hu, G.Q., Yang, Y.F., Wang, J., She, Z.S.: MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics 8, 97 (2007)

    Article  Google Scholar 

  14. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J.: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)

    Article  Google Scholar 

  15. Markowitz, V.M., Korzeniewski, F., Palaniappan, K., Szeto, E., Werner, G., Padki, A., Zhao, X., Dubchak, I., Hugenholtz, P., Anderson, I., Lykidis, A., Mavromatis, K., Ivanova, N., Kyrpides, N.C.: The integrated microbial genomes (IMG) system. Nucleic Acids Research 34, D344–D348 (2006)

    Article  Google Scholar 

  16. Dietterich, T.: Machine-learning research: four current directions. AI Magazine 18, 97–136 (1997)

    Google Scholar 

  17. Yang, P., Hwa Yang, Y., Zhou, B., Zomaya, A.Y.: A Review of Ensemble Methods in Bioinformatics. Current Bioinformatics 5, 296–308 (2010)

    Article  Google Scholar 

  18. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  19. Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  20. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of Online Learning and an Application to Boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  21. Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Goli, B., B.L., A., Joy, C., Nair, A.S. (2012). New Feature Vector for Recognition of Short Microbial Genes. In: Mathew, J., Patra, P., Pradhan, D.K., Kuttyamma, A.J. (eds) Eco-friendly Computing and Communication Systems. ICECCS 2012. Communications in Computer and Information Science, vol 305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32112-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32112-2_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32111-5

  • Online ISBN: 978-3-642-32112-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics