Abstract
The genus Bacillus contain spore-forming gram-positive/variable rod-shaped bacteria. Species of the Bacillus genus have long believed to have medical, veterinary and agricultural importance. In agricultural biotechnology and its applications, discriminating short environmental Bacillus DNA fragments into its various species members plays a crucial role in the pipeline of agronomic trait discovery and insect control. We here constructed a classification model for this challenging task based on consensus decision-making of support vector machines and BLAST hit strategies. We first took advantage of both the hexamer signatures of Bacillus genomes and the Bacillus species-specific toxin signatures to build the attribute space. We then explored and filtered the otherwise high dimensional attribute space with a weighted version of principal component analysis to mitigate computational cost and avoid possible overfitting of the classification model for discriminating Bacillus species. Our extensive experimental results showed that our method can perform well on differentiating Bacillus species.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alcaraz, L.D., Moreno-Hagelsieb, G., Eguiarte, L.E., Souza, V., Herrera-Estrella, L.: Understanding the evolutionary relationships and major traits of bacillus through comparative genomics. BMC Genomics 11, 332 (2010)
Stec, E.: The importance of bacillus cereus in food poisoning. Przegl Epidemiol 43(4), 345–354 (1989)
Marco, D.: Metagenomics: Theory, Methods and Applications. Caister Academic Press (2010)
Teeling, H., Meyerdierks, A., Bauer, M., Amann, R., Glockner, F.: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol. 6, 938–947 (2004)
McHardy, A., Martłn, H., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length dna fragments. Nautre Methods 4(1), 63–72 (2007)
Chatterji, S., Yamazaki, I., Bai, Z., Eisen, J.A.: CompostBin: A DNA Composition-Based Algorithm For Binning Environmental Shotgun Reads. In: Vingron, M., Wong, L. (eds.) RECOMB 2008. LNCS (LNBI), vol. 4955, pp. 17–28. Springer, Heidelberg (2008)
Abe, T., Sugawara, H., Kanaya, S., Ikemura, T.: A novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from mixed genomes of uncultured environmental microbes. Polar Biosci. 20, 103–112 (2006)
Brady, A., Salzberg, S.: Phymm and phymmbl: metagenomic phylogenetic classification with interpolated markov models. Nature Methods 6, 673–676 (2009)
Diaz, N.N., Krause, L., Goesmann, A., Niehaus, K., Nattkemper, T.W.: TACOA-taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics 10(56) (2009)
Zheng, H., Wu, H.: Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis. J. Bioinformatics and Computational Biology 8, 995–1011 (2010)
Gori, F., Folino, G., Jetten, M.S.M., Marchiori, E.: MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks. Bioinformatics 27(2), 196–203 (2011)
Zheng, H., Wu, H.: A novel LDA and PCA-based hierarchical scheme for metagenomic fragment binning. CIBCB, 53–59 (2009)
Mock, F.B.M.: Toxins of bacillus anthracis. Unite des Toxines et Pathogenie Bacteriennes 39, 1747–1755 (2001)
Hofte, H., Whiteley, H.R.: Insecticidal crystal proteins of bacillus thuringiensis. Microbiological Reviews 53, 242–255 (1989)
Jolliffe, I.: Principal Component Analysis. Springer, New York (2002)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Prentice-Hall, NJ (2007)
Skocaj, D., Leonardis, A., Bischof, H.: Weighted and robust learning of subspace representations. Pattern Recoginition 40(5), 1556–1569 (2007)
Kiers, H.A.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)
Mclachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. John Wiley and Sons Inc., New York (2004)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, Heidelberg (2002)
Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. System Man Cybernet 21, 660–674 (1991)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks Advanced Books & Software, CA, USA (1984)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publisher: An Imprint of Elsevier, CA, USA (2006)
Quinlan, J.: Induction of decision tree. Machine Learning 1, 81–106 (1986)
Cortes, C., Vapnik, V.: Support-vector network. Machine Learning 20, 273–297 (1995)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17(8), 721–728 (2001)
Karatzoglou, A., Meyer, D.: Support vector machines in R. Journal of Statistical Software 15(9) (2006)
Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 15, 415–425 (2002)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Hall, P., Park, B.U., Samworth, R.J.: Choice of neighbor order in nearest-neighbor classification. Annals of Statistics 36(5), 2135–2152 (2008)
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58(301), 236–244 (1963)
MacQueen, J.: Some methods for classification and analysis of multivariate observation. In: Proc. of 5-th Berkerly Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proc. of the 22th Symposium on Computational Geometry, pp. 144–153 (2006)
Mackay, D.J.: Information Theory, Inference and Learning Algorithm. Cambridge University Press, New York (2003)
Kohonen, T.: Self-Organizing Maps. Springer, Berlin (2001)
Melssen, W., Wehrens, R., Buydens, L.: Supervised kohonen networks for classification. Chemometrics and Intelligent Laboratory Systems 83, 99–113 (2006)
Wehrens, R., Buydens, L.M.C.: Self- and super-organizing maps in R: The kohonen package. Journal of Statistical Software 21(5) (2007)
Rogers, D.J., Tanimoto, T.T.: A computer program for classifying plants. Science 132(3434), 1115–1118 (1960)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, W. (2011). Taxonomical Classification of Closely Related Reads of Genus Bacillus. In: Abd Manaf, A., Sahibuddin, S., Ahmad, R., Mohd Daud, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25483-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-25483-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25482-6
Online ISBN: 978-3-642-25483-3
eBook Packages: Computer ScienceComputer Science (R0)