Advertisement

A Revamp Approach for Training of HMM to Accelerate Classification of 16S rRNA Gene Sequences

  • Prakash Choudhary
  • M. P. Kurhekar
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10990)

Abstract

In the era of Information Technology, the field of Bioinformatics is rapidly growing with research in various related topics. The database of biological information has become much higher than its consumption. Automatic classification of biological information is one of the critical problems in Bioinformatics. Therefore, the critical issue is to regulate and manage the enormous amount of novel information to facilitate access to this useful and valuable biological information. The specific nucleus dilemma in classifying biological information is the annotation of various biological sequences with functional features. Annotation of the significant and rapidly increasing amount of genomic sequence data requires computational tools for classification of genes in DNA sequences. This paper presents a computational method for classification of highly conserved 16S rRNA biological sequences. We took Biological sequence classification as motivation to reveal a methodology that uses Hidden Markov Models (HMMs) to classify them. This paper explains the description of the algorithms used for implementing three phases of HMM (training, decoding, and evaluation) to classify sequences into clusters that have known similar functional properties. In the implementation of the training phase, we have addressed practical issues like initial parameter selection for HMM and computational weakness for the large data set. Later in the paper, we have shown that methodology presents a classification accuracy of 91% for Bacillus and 97% for Clostridia.

Keywords

HMM Parameters estimation Biological sequence 16S rRNA Gene classification Bioinformatics 

References

  1. 1.
    Ferles, C., Beaufort, W.-S., Ferle, V.: Self-Organizing Hidden Markov Model Map (SOHMMM): biological sequence clustering and cluster visualization. Methods Mol. Biol. 1552, 83–101 (2017)CrossRefGoogle Scholar
  2. 2.
    Cole, J.R., et al.: Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42(Database issue), D633–D642 (2014).  https://doi.org/10.1093/nar/gkt1244CrossRefGoogle Scholar
  3. 3.
    Lu, X.X., Wu, W., Wang, M., Huang, Y.F.: 16S rRNA gene sequencing for pathogen identification from clinical specimens. Zhonghua Yi Xue Za Zhi 88(2), 123–126 (2008).  https://doi.org/10.3321/j.issn:0376-2491.2008.02.014CrossRefGoogle Scholar
  4. 4.
    Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Found. Trends Sig. Process. 1(3), 195–304 (2008).  https://doi.org/10.1561/2000000004CrossRefzbMATHGoogle Scholar
  5. 5.
    Yoon, B.-J.: Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics 10(6), 402–415 (2009).  https://doi.org/10.2174/138920209789177575CrossRefGoogle Scholar
  6. 6.
    Xing, Z., Jian, P., Eamonn, K.: A brief survey on sequence classification. SIGKDD Explor. 12(1), 40–48 (2010).  https://doi.org/10.1145/1882471.1882478CrossRefGoogle Scholar
  7. 7.
    Kang, M.-S., Kim, H., Lee, S., Kim, M.H.: Feature-based gene classification and region clustering using gene expression grid data in mouse Hippocampal region. J. KIISE 43(1), 54–60 (2016).  https://doi.org/10.5626/JOK.2016.43.1.54CrossRefGoogle Scholar
  8. 8.
    Hawrylycz, M., et al.: Multi-scale correlation structure of gene expression in the brain. Neural Netw. 24(9), 933–942 (2011)CrossRefGoogle Scholar
  9. 9.
    Chandra, B., Gupta, M.: An efficient statistical feature selection approach for classification of gene expression data. 44(4), 529–535 (2011).  https://doi.org/10.1016/j.jbi.2011.01.001CrossRefGoogle Scholar
  10. 10.
    Abusamra, H.: A comparative study of feature selection and classification methods for gene expression data of glioma, 5–14 (2013).  https://doi.org/10.1016/j.procs.2013.10.003CrossRefGoogle Scholar
  11. 11.
    Doungpan, N., Engchuan, W., Meechai, A., Fong, S., Chan, J.H.: Gene-Network-Based Feature Set (GNFS) for expression-based cancer classification. J. Med. Imaging Health Inform. 6(4), 1093–1101 (2016).  https://doi.org/10.1166/jmihi.2016.1806CrossRefGoogle Scholar
  12. 12.
    Baralis, E., Bruno, G., Fiori, A.: Measuring gene similarity by means of the classification distance. Knowl. Inf. Syst. 29(1), 81–101 (2011)CrossRefGoogle Scholar
  13. 13.
    Iqbal, M.J., Faye, I., Said, A.M., Belhaouari Samir, B.: A distance-based feature-encoding technique for protein sequence classification in bioinformatics. In: IEEE International Conference on Computational Intelligence and Cybernetics (CYBERNETICSCOM), pp. 1–5 (2013).  https://doi.org/10.1109/CyberneticsCom.2013.6865770
  14. 14.
    Kaya, H., Gunduz Oguducu, S.: A distance based time series classification framework. Inf. Syst. (2015).  https://doi.org/10.1016/j.is.2015.02.005CrossRefGoogle Scholar
  15. 15.
    Chen, H., Zhang, Y., Gutmanb, I.: A kernel-based clustering method for gene selection with gene expression data. J. Biomed. Inform. 12–20 (2016).  https://doi.org/10.1016/j.jbi.2016.05.007CrossRefGoogle Scholar
  16. 16.
    Wang, S., Li, X., Zhang, S.: Neighborhood rough set model based gene selection for multi-subtype tumor classification. In: Huang, D.-S., Wunsch, D.C., Levine, D.S., Jo, K.-H. (eds.) ICIC 2008. LNCS, vol. 5226, pp. 146–158. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87442-3_20CrossRefGoogle Scholar
  17. 17.
    Bauer, S., Robinson, P.N., Gagneur, J.: Model-based gene set analysis for Bioconductor. Bioinformatics 27(13), 1882–1883 (2011).  https://doi.org/10.1093/bioinformatics/btr296CrossRefGoogle Scholar
  18. 18.
    Bauer, S., Gagneur, J., Robinson, P.N.: Going Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38(11), 3523–3532 (2010).  https://doi.org/10.1093/nar/gkq045CrossRefGoogle Scholar
  19. 19.
    Guo, P., et al.: Gene expression profile based classification models of psoriasis. Genomics 103(1), 48–55 (2014).  https://doi.org/10.1016/j.ygeno.2013.11.001CrossRefGoogle Scholar
  20. 20.
    Onan, A., Korukolu, S.: A feature selection model based on genetic rank aggregation for text sentiment classification. 43(1), 25–38 (2015).  https://doi.org/10.1177/0165551515613226CrossRefGoogle Scholar
  21. 21.
    Saengsiri, P., Meesad, P., Wichian, S.N., Herwig, U.: Classification models based-on incremental learning algorithm and feature selection on gene expression data. In: 8th Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI) Association of Thailand - Conference, pp. 426–429 (2011).  https://doi.org/10.1109/ECTICON.2011.5947866
  22. 22.
    Welch, L.: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53(4), 10–13 (2003)Google Scholar
  23. 23.
    Karplus, K., et al.: Predicting protein structure using hidden Markov models. Proteins 1, 134–139 (2007)Google Scholar
  24. 24.
    Yakhnenko, O., Silvescu, A., Honavar, V.: Discriminatively trained Markov model for sequence classification. In: Fifth IEEE International Conference on Data Mining, pp. 1–8 (2005).  https://doi.org/10.1109/ICDM.2005.52
  25. 25.
    Srivastava, P.K., Desai, D.K., Nandi, S., Lynn, A.M.: HMM-ModE-Improved classification using profile hidden Markov models by optimizing the discrimination threshold and modifying emission probabilities with negative training sequences. BMC Bioinform. (2007).  https://doi.org/10.1186/1471-2105-8-104CrossRefGoogle Scholar
  26. 26.
    Camproux, A.C., Tuffery, P., Chevrolat, J.P., Boisvieux, J.F., Hazout, S.: Hidden Markov model approach for identifying the modular framework of the protein backbone. Protein Eng. 12(12), 1063–1073 (1999)CrossRefGoogle Scholar
  27. 27.
    Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Durbin, R.: Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26(1), 320–322 (1998)CrossRefGoogle Scholar
  28. 28.
    Di Francesco, V., Garnier, J., Munson, P.J.: Protein topology recognition from secondary structure sequences: application of the hidden Markov models to the alpha class proteins. J. Mol. Biol. 267(2), 446–463 (1997)CrossRefGoogle Scholar
  29. 29.
    Liu, T., Lemeire, J., Yang, L.: Proper initialization of Hidden Markov models for industrial applications. In: IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), pp. 490–494 (2014).  https://doi.org/10.1109/ChinaSIP.2014.6889291
  30. 30.
    Mann, T.P.: Numerically stable Hidden Markov Model implementation (2006)Google Scholar
  31. 31.
    Tatavarty, U.R.: Implementation of numerically stable hidden Markov model. UNLV Theses, Dissertations, Professional Papers, and Capstones. 1018 (2011). http://digitalscholarship.unlv.edu/thesesdissertations/1018
  32. 32.
    Fu, B.: Computer architecture. Fall Project Report (2009)Google Scholar
  33. 33.
    Jose, S., Nair, P., Biju, V.G., Mathew, B.B., Prashanth, C.M.: Hidden Markov model: application towards genomic analysis. In: International Conference on Circuit, Power and Computing Technologies (ICCPCT), pp. 1–7. IEEE (2016).  https://doi.org/10.1109/ICCPCT.2016.7530222
  34. 34.
    Vijayabaskar, M.S.: Introduction to hidden Markov models and its applications in biology. In: Westhead, D.R., Vijayabaskar, M.S. (eds.) Hidden Markov Models: Methods and Protocols, Methods in Molecular Biology, vol. 1552 (2017)Google Scholar
  35. 35.
    Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Nave Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 61–67 (2007)CrossRefGoogle Scholar
  36. 36.
    Ghosh, T.S., Gajjalla, P., Mohammed, M.H., Mande, S.S.: C16S A Hidden Markov Model based algorithm for taxonomic classification of 16S rRNA gene sequences. Genomics 99(4), 195–201 (2012).  https://doi.org/10.1016/j.ygeno.2012.01.008CrossRefGoogle Scholar
  37. 37.
    Janda, J.M., Abbott, S.L.: 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, Perils, and Pitfalls. J. Clin. Microbiol. 45(9), 2761–2764 (2007).  https://doi.org/10.1128/JCM.01228-07CrossRefGoogle Scholar
  38. 38.
    Fontana, C., Favaro, M., Pelliccioni, M., Pistoia, E.S., Favalli, C.: Use of the MicroSeq 16S rRNA gene based sequencing for identification of bacterial isolates that commercial automated systems failed to identify correctly. J. Clin. Microbiol. 43(2), 615–619 (2005)CrossRefGoogle Scholar
  39. 39.
    Patel, J.B.: 16S rRNA gene sequencing for bacterial pathogen identification in the clinical laboratory. Mol. Diagn. 6(4), 313–321 (2001)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Mizrahi-Man, O., Davenport, E.R., Gilad, Y.: Taxonomic classification of bacterial 16S rRNA genes using short sequencing reads: evaluation of effective study designs. PLoS ONE 8(1), e53608 (2013).  https://doi.org/10.1371/journal.pone.0053608CrossRefGoogle Scholar
  41. 41.
    Song, Y., Liu, C., BolaÅos, M., Lee, J., McTeague, M., Finegold, S.M.: Evaluation of 16S rRNA sequencing and reevaluation of a short biochemical scheme for identification of clinically significant Bacteroides species. J. Clin. Microbiol. 43(4), 1531–1537 (2005)CrossRefGoogle Scholar
  42. 42.
    Heikens, E., Fleer, A., Paauw, A., Florijn, A., Fluitt, A.C.: Comparison of genotypic and phenotypic methods for species-level identification of clinical isolates of coagulase-negative staphylococci. J. Clin. Microbiol. 43(5), 2286–2290 (2005)CrossRefGoogle Scholar
  43. 43.
    Bosshard, P.P., Zbinden, R., Abels, S., Bddinghaus, B., Altwegg, M., Bttger, E.C.: 16S rRNA gene sequencing versus the API 20 NE system and the VITEK 2 ID-GNB card for identification of nonfermenting Gram-negative bacteria in the clinical laboratory. J. Clin. Microbiol. 44(4), 1359–1366 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology ManipurImphalIndia
  2. 2.Department of Computer Science and EngineeringVisvesvaraya National Institute of Technology NagpurNagpurIndia

Personalised recommendations