Abstract
In the field of biological sequence analysis there still seem to exist strong reservations against the application of techniques of statistical pattern recognition such as HMMs. These reservations are most probably due to the fact that research in the field of biological sequence analysis have the strong desire to be able to explain every detail of a model from a biological viewpoint. Therefore, in several publications on HMM-based biological sequence analysis sections can be found in which it is explicitly pointed out that HMMs principally achieve the same results as the more traditional methods. Usually only the underlying mathematical theory is given as their main advantage and less frequently the automatic trainability of the model parameters required. In this chapter we will first briefly present the two most important software tools that were developed for the analysis of biological sequences by means of hidden Markov models, namely HMMER and SAM. In the final section of the chapter we will present a system for the classification of proteins which was developed on the basis of ESMERALDA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The original HMMER 2.1.1 documentation, which today is only partially available via archive.org, says that ‘It’s “hammer”: as in, a more precise tool than a BLAST. :)’.
- 2.
The name “Plan 7” for the structure improved with respect to the older “Plan 9” model architecture was an allusion to the title of a science fiction movie—sometimes referred to as the worst movie ever made. Unfortunately, this humorous remark can no longer be found in the HMMER documentation.
- 3.
The profile HMM structure with match, insert, and delete states is incorrectly referred to as a “linear HMM” in the SAM documentation.
References
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28(1), 45–48 (2000)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Eddy, S.R.: Profile Hidden Markov Models. Bioinformatics 14(9), 755–763 (1998)
Finn, R.D., Clements, J., Eddy, S.R.: HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39(suppl 2), 29–37 (2011)
Howard Hughes Medical Institute: HMMER: biological sequence analysis using profile hidden Markov models (2013). http://hmmer.janelia.org/
Hughey, R., Karplus, K., Krogh, A.: SAM: Sequence alignment and modeling software system (2005). http://www.cse.ucsc.edu/research/compbio/sam.html
Plötz, T., Fink, G.A.: Feature extraction for improved profile HMM based biological sequence analysis. In: Proc. Int. Conf. on Pattern Recognition, pp. 315–318 (2004)
Plötz, T., Fink, G.A.: A new approach for HMM based protein sequence modeling and its application to remote homology classification. In: Proc. Workshop Statistical Signal Processing, Bordeaux, France (2005)
Plötz, T., Fink, G.A.: Robust remote homology detection by feature based Profile Hidden Markov Models. Stat. Appl. Genet. Mol. Biol. 4(1) (2005)
Plötz, T., Fink, G.A.: Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs. Pattern Recognit. 39, 2267–2280 (2006). Special Issue on Bioinformatics
Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K.f., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E.L.L., Eddy, S.R., Bateman, A., Finn, R.D.: The Pfam protein families database. Nucleic Acids Res. 40(D1), 290–301 (2012). http://nar.oxfordjournals.org/content/40/D1/D290.full.pdf+html
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Fink, G.A. (2014). Analysis of Biological Sequences. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_15
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6308-4_15
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)