Skip to main content

Taxonomical Classification of Closely Related Reads of Genus Bacillus

  • Conference paper
Informatics Engineering and Information Science (ICIEIS 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 254))

  • 1064 Accesses

Abstract

The genus Bacillus contain spore-forming gram-positive/variable rod-shaped bacteria. Species of the Bacillus genus have long believed to have medical, veterinary and agricultural importance. In agricultural biotechnology and its applications, discriminating short environmental Bacillus DNA fragments into its various species members plays a crucial role in the pipeline of agronomic trait discovery and insect control. We here constructed a classification model for this challenging task based on consensus decision-making of support vector machines and BLAST hit strategies. We first took advantage of both the hexamer signatures of Bacillus genomes and the Bacillus species-specific toxin signatures to build the attribute space. We then explored and filtered the otherwise high dimensional attribute space with a weighted version of principal component analysis to mitigate computational cost and avoid possible overfitting of the classification model for discriminating Bacillus species. Our extensive experimental results showed that our method can perform well on differentiating Bacillus species.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alcaraz, L.D., Moreno-Hagelsieb, G., Eguiarte, L.E., Souza, V., Herrera-Estrella, L.: Understanding the evolutionary relationships and major traits of bacillus through comparative genomics. BMC Genomics 11, 332 (2010)

    Article  Google Scholar 

  2. Stec, E.: The importance of bacillus cereus in food poisoning. Przegl Epidemiol 43(4), 345–354 (1989)

    Google Scholar 

  3. Marco, D.: Metagenomics: Theory, Methods and Applications. Caister Academic Press (2010)

    Google Scholar 

  4. Teeling, H., Meyerdierks, A., Bauer, M., Amann, R., Glockner, F.: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6, 938–947 (2004)

    Article  Google Scholar 

  5. McHardy, A., Martłn, H., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length dna fragments. Nautre Methods 4(1), 63–72 (2007)

    Article  Google Scholar 

  6. Chatterji, S., Yamazaki, I., Bai, Z., Eisen, J.A.: CompostBin: A DNA Composition-Based Algorithm For Binning Environmental Shotgun Reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 17–28. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Abe, T., Sugawara, H., Kanaya, S., Ikemura, T.: A novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from mixed genomes of uncultured environmental microbes. Polar Biosci. 20, 103–112 (2006)

    Google Scholar 

  8. Brady, A., Salzberg, S.: Phymm and phymmbl: metagenomic phylogenetic classification with interpolated markov models. Nature Methods 6, 673–676 (2009)

    Article  Google Scholar 

  9. Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: TACOA-taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10(56) (2009)

    Google Scholar 

  10. Zheng, H., Wu, H.: Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J. Bioinformatics and Computational Biology 8, 995–1011 (2010)

    Article  Google Scholar 

  11. Gori, F., Folino, G., Jetten, M.S.M., Marchiori, E.: MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics 27(2), 196–203 (2011)

    Article  Google Scholar 

  12. Zheng, H., Wu, H.: A novel LDA and PCA-based hierarchical scheme for metagenomic fragment binning. CIBCB, 53–59 (2009)

    Google Scholar 

  13. Mock, F.B.M.: Toxins of bacillus anthracis. Unite des Toxines et Pathogenie Bacteriennes 39, 1747–1755 (2001)

    Google Scholar 

  14. Hofte, H., Whiteley, H.R.: Insecticidal crystal proteins of bacillus thuringiensis. Microbiological Reviews 53, 242–255 (1989)

    Google Scholar 

  15. Jolliffe, I.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  16. Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall, NJ (2007)

    MATH  Google Scholar 

  17. Skocaj, D., Leonardis, A., Bischof, H.: Weighted and robust learning of subspace representations. Pattern Recoginition 40(5), 1556–1569 (2007)

    Article  MATH  Google Scholar 

  18. Kiers, H.A.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Mclachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. John Wiley and Sons Inc., New York (2004)

    MATH  Google Scholar 

  20. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  21. Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, Heidelberg (2002)

    Book  MATH  Google Scholar 

  22. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. System Man Cybernet 21, 660–674 (1991)

    Article  MathSciNet  Google Scholar 

  23. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks Advanced Books & Software, CA, USA (1984)

    Google Scholar 

  24. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publisher: An Imprint of Elsevier, CA, USA (2006)

    Google Scholar 

  25. Quinlan, J.: Induction of decision tree. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  26. Cortes, C., Vapnik, V.: Support-vector network. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  27. Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)

    Article  Google Scholar 

  28. Karatzoglou, A., Meyer, D.: Support vector machines in R. Journal of Statistical Software 15(9) (2006)

    Google Scholar 

  29. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 15, 415–425 (2002)

    Article  Google Scholar 

  30. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  31. Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Annals of Statistics 36(5), 2135–2152 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  32. Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58(301), 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  33. MacQueen, J.: Some methods for classification and analysis of multivariate observation. In: Proc. of 5-th Berkerly Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  34. Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proc. of the 22th Symposium on Computational Geometry, pp. 144–153 (2006)

    Google Scholar 

  35. Mackay, D.J.: Information Theory, Inference and Learning Algorithm. Cambridge University Press, New York (2003)

    MATH  Google Scholar 

  36. Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  37. Melssen, W., Wehrens, R., Buydens, L.: Supervised kohonen networks for classification. Chemometrics and Intelligent Laboratory Systems 83, 99–113 (2006)

    Article  Google Scholar 

  38. Wehrens, R., Buydens, L.M.C.: Self- and super-organizing maps in R: The kohonen package. Journal of Statistical Software 21(5) (2007)

    Google Scholar 

  39. Rogers, D.J., Tanimoto, T.T.: A computer program for classifying plants. Science 132(3434), 1115–1118 (1960)

    Article  Google Scholar 

  40. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, W. (2011). Taxonomical Classification of Closely Related Reads of Genus Bacillus. In: Abd Manaf, A., Sahibuddin, S., Ahmad, R., Mohd Daud, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25483-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25483-3_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25482-6

  • Online ISBN: 978-3-642-25483-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics